research-article

Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities

Authors:

Azin Ghazimatin,

Hakan Ferhatosmanoglu,

Krishna P. Gummadi,

Gerhard WeikumAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 267 - 276

https://doi.org/10.1145/3132847.3133040

Published: 06 November 2017 Publication History

Abstract

Search engines in online communities such as Twitter or Facebook not only return matching posts, but also provide links to the profiles of the authors. Thus, when a user appears in the top-k results for a sensitive keyword query, she becomes widely exposed in a sensitive context. The effects of such exposure can result in a serious privacy violation, ranging from embarrassment all the way to becoming a victim of organizational discrimination.

In this paper, we propose the first model for quantifying search exposure on the service provider side, casting it into a reverse k-nearest-neighbor problem. Moreover, since a single user can be exposed by a large number of queries, we also devise a learning-to-rank method for identifying the most critical queries and thus making the warnings user-friendly. We develop efficient algorithms, and present experiments with a large number of user profiles from Twitter that demonstrate the practical viability and effectiveness of our framework.

References

[1]

Fuat Basık, Buugra Gedik, Hakan Ferhatosmanouglu, and Mert Emin Kalender. 2015. S3-TM: scalable streaming short text matching. The VLDB Journal - The International Journal on Very Large Data Bases (2015).

Digital Library

[2]

Michael S Bernstein, Eytan Bakshy, Moira Burke, and Brian Karrer Quantifying the invisible audience in social networks CHI '13.

Digital Library

[3]

J Asia Biega, Krishna P Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, and Gerhard Weikum. 2016. R-susceptibility: An IR-centric approach to assessing privacy risks for users in online communities. In SIGIR '16.

Digital Library

[4]

Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, and Mike Williamson Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google. In WWW '15.

Digital Library

[5]

danah boyd. 2008. Facebook's privacy trainwreck: Exposure, invasion, and social convergence. Convergence (2008).

[6]

B Barla Cambazoglu, Enver Kayaaslan, Simon Jonassen, and Cevdet Aykanat. 2013. A term-based inverted index partitioning model for efficient distributed query processing. ACM Transactions on the Web (TWEB) (2013).

Digital Library

[7]

Ben Carterette and Rosie Jones Evaluating search engines by modeling the relationship between relevance and clicks NIPS '08.

Digital Library

[8]

Gang Chen, He Bai, Lidan Shou, Ke Chen, and Yunjun Gao UPS: efficient privacy protection in personalized web search SIGIR '11.

Digital Library

[9]

Lisi Chen and Gao Cong Diversity-Aware Top-k Publish/Subscribe for Text Stream SIGMOD '15.

Digital Library

[10]

Denzil Correa, Leandro Araújo Silva, Mainack Mondal, Fabr'ıcio Benevenuto, and Krishna P Gummadi The Many Shades of Anonymity: Characterizing Anonymous Social Media Content. ICWSM '15.

[11]

W Bruce Croft, Donald Metzler, and Trevor Strohmann. 2010. Search engines. Pearson Education.

[12]

Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Johannes Niedermayer, Matthias Renz, and Andreas Züfle. 2015. On reverse-k-nearest-neighbor joins. GeoInformatica (2015).

Digital Library

[13]

Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian Certifying and Removing Disparate Impact. In KDD '15.

Digital Library

[14]

Arthur Gervais, Reza Shokri, Adish Singla, Srdjan Capkun, and Vincent Lenders Quantifying Web-Search Privacy. In CCS '14.

Digital Library

[15]

Michaela Gotz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, and Johannes Gehrke. 2012. Publishing search logs - a comparative study of privacy guarantees. IEEE Transactions on Knowledge and Data Engineering (2012).

Digital Library

[16]

Yupeng Hu, Chong Yang, Cun Ji, Yang Xu, and Xueqing Li Efficient Snapshot KNN Join Processing for Large Data Using MapReduce ICPADS '16.

[17]

Danny Yuxing Huang, Doug Grundman, Kurt Thomas, Abhishek Kumar, Elie Bursztein, Kirill Levchenko, and Alex C. Snoeren Pinning Down Abuse on Google Maps. In WWW '17.

[18]

Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR) (2008).

Digital Library

[19]

Thorsten Joachims Training linear SVMs in linear time. In KDD '06.

Digital Library

[20]

Ken CK Lee, Baihua Zheng, and Wang-Chien Lee. 2008. Ranked reverse nearest neighbor search. IEEE Transactions on Knowledge and Data Engineering (2008).

Digital Library

[21]

Jiaheng Lu, Ying Lu, and Gao Cong Reverse spatial and textual k nearest neighbor search SIGMOD '11.

Digital Library

[22]

Cheng Luo, Feng Yu, Wen-Chi Hou, Zhewei Jiang, Dunren Che, and Shan He. 2011. IRTA: An Improved Threshold Algorithm for Reverse Top-k Queries. ICEIS (1).

[23]

Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, and Emine Yilmaz Auditing Search Engines for Differential Satisfaction Across Demographics WWW '17.

Digital Library

[24]

Mainack Mondal, Peter Druschel, Krishna P Gummadi, and Alan Mislove Beyond access control: Managing online privacy via exposure USEC '14.

[25]

Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P Gummadi, and Aniket Kate Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data. In SOUPS '16.

[26]

Parikshit Ram and Alexander G Gray. 2012. Maximum inner-product search using tree data-structures. arXiv preprint arXiv:1202.6101 (2012).

Digital Library

[27]

Roman Schlegel, Apu Kapadia, and Adam J Lee Eyeing your exposure: quantifying and controlling information sharing for improved privacy SOUPS '11.

Digital Library

[28]

Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2007. Privacy protection in personalized search. In ACM SIGIR Forum.

Digital Library

[29]

Reza Shokri, George Theodorakopoulos, George Danezis, Jean-Pierre Hubaux, and Jean-Yves Le Boudec Quantifying location privacy: the case of sporadic location exposure PETS '11.

Digital Library

[30]

Yufei Tao, Dimitris Papadias, and Xiang Lian Reverse kNN search in arbitrary dimensionality. In VLDB '04.

Digital Library

[31]

Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology (2010).

[32]

Jaime Teevan, Daniel Ramage, and Merredith Ringel Morris # TwitterSearch: a comparison of microblog search and web search WSDM '11.

Digital Library

[33]

Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, and Kjetil Norvag. 2011. Monochromatic and bichromatic reverse top-k queries. IEEE Transactions on Knowledge and Data Engineering (2011).

Digital Library

[34]

Peng Wang and Chinya V Ravishankar On masking topical intent in keyword search. In ICDE '14.

[35]

Yabo Xu, Ke Wang, Benyu Zhang, and Zheng Chen Privacy-enhancing personalized web search. In WWW '07.

Digital Library

[36]

Man Lung Yiu and Nikos Mamoulis. 2007. Reverse nearest neighbors search in ad hoc subspaces. IEEE Transactions on Knowledge and Data Engineering (2007).

Digital Library

[37]

Oren Zamir, Oren Etzioni, Omid Madani, and Richard M Karp Fast and Intuitive Clustering of Web Documents. KDD '97.

Digital Library

[38]

Sicong Zhang, Hui Yang, and Lisa Singh Anonymizing Query Logs by Differential Privacy. In SIGIR '16.

Digital Library

[39]

Zhao Zhang, Cheqing Jin, and Qiangqiang Kang Reverse k-ranks query VLDB '14.

Digital Library

[40]

Yun Zhu, Li Xiong, and Christopher Verdery Anonymizing user profiles for personalized web search WWW '10.

Digital Library

Cited By

Li RLi JMitra BDiaz FBiega A(2022)Exposing Query Identification for Search TransparencyProceedings of the ACM Web Conference 202210.1145/3485447.3512262(3662-3672)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512262
Perna DInterdonato RTagarelli A(2018)Learning to lurker rank: an evaluation of learning-to-rank methods for lurking behavior analysisSocial Network Analysis and Mining10.1007/s13278-018-0516-z8:1Online publication date: 1-Jun-2018
https://doi.org/10.1007/s13278-018-0516-z

Index Terms

Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Environment-specific retrieval
        Web and social media search
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections

Recommendations

Effective rank aggregation for metasearching

Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...
Exposing Query Identification for Search Transparency
WWW '22: Proceedings of the ACM Web Conference 2022

Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Alexander von Humboldt Foundation
European Research Council

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li RLi JMitra BDiaz FBiega A(2022)Exposing Query Identification for Search TransparencyProceedings of the ACM Web Conference 202210.1145/3485447.3512262(3662-3672)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512262
Perna DInterdonato RTagarelli A(2018)Learning to lurker rank: an evaluation of learning-to-rank methods for lurking behavior analysisSocial Network Analysis and Mining10.1007/s13278-018-0516-z8:1Online publication date: 1-Jun-2018
https://doi.org/10.1007/s13278-018-0516-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents