skip to main content
10.1145/3132847.3133040acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities

Published: 06 November 2017 Publication History

Abstract

Search engines in online communities such as Twitter or Facebook not only return matching posts, but also provide links to the profiles of the authors. Thus, when a user appears in the top-k results for a sensitive keyword query, she becomes widely exposed in a sensitive context. The effects of such exposure can result in a serious privacy violation, ranging from embarrassment all the way to becoming a victim of organizational discrimination.
In this paper, we propose the first model for quantifying search exposure on the service provider side, casting it into a reverse k-nearest-neighbor problem. Moreover, since a single user can be exposed by a large number of queries, we also devise a learning-to-rank method for identifying the most critical queries and thus making the warnings user-friendly. We develop efficient algorithms, and present experiments with a large number of user profiles from Twitter that demonstrate the practical viability and effectiveness of our framework.

References

[1]
Fuat Basık, Buugra Gedik, Hakan Ferhatosmanouglu, and Mert Emin Kalender. 2015. S3-TM: scalable streaming short text matching. The VLDB Journal - The International Journal on Very Large Data Bases (2015).
[2]
Michael S Bernstein, Eytan Bakshy, Moira Burke, and Brian Karrer Quantifying the invisible audience in social networks CHI '13.
[3]
J Asia Biega, Krishna P Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, and Gerhard Weikum. 2016. R-susceptibility: An IR-centric approach to assessing privacy risks for users in online communities. In SIGIR '16.
[4]
Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, and Mike Williamson Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google. In WWW '15.
[5]
danah boyd. 2008. Facebook's privacy trainwreck: Exposure, invasion, and social convergence. Convergence (2008).
[6]
B Barla Cambazoglu, Enver Kayaaslan, Simon Jonassen, and Cevdet Aykanat. 2013. A term-based inverted index partitioning model for efficient distributed query processing. ACM Transactions on the Web (TWEB) (2013).
[7]
Ben Carterette and Rosie Jones Evaluating search engines by modeling the relationship between relevance and clicks NIPS '08.
[8]
Gang Chen, He Bai, Lidan Shou, Ke Chen, and Yunjun Gao UPS: efficient privacy protection in personalized web search SIGIR '11.
[9]
Lisi Chen and Gao Cong Diversity-Aware Top-k Publish/Subscribe for Text Stream SIGMOD '15.
[10]
Denzil Correa, Leandro Araújo Silva, Mainack Mondal, Fabr'ıcio Benevenuto, and Krishna P Gummadi The Many Shades of Anonymity: Characterizing Anonymous Social Media Content. ICWSM '15.
[11]
W Bruce Croft, Donald Metzler, and Trevor Strohmann. 2010. Search engines. Pearson Education.
[12]
Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Johannes Niedermayer, Matthias Renz, and Andreas Züfle. 2015. On reverse-k-nearest-neighbor joins. GeoInformatica (2015).
[13]
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian Certifying and Removing Disparate Impact. In KDD '15.
[14]
Arthur Gervais, Reza Shokri, Adish Singla, Srdjan Capkun, and Vincent Lenders Quantifying Web-Search Privacy. In CCS '14.
[15]
Michaela Gotz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, and Johannes Gehrke. 2012. Publishing search logs - a comparative study of privacy guarantees. IEEE Transactions on Knowledge and Data Engineering (2012).
[16]
Yupeng Hu, Chong Yang, Cun Ji, Yang Xu, and Xueqing Li Efficient Snapshot KNN Join Processing for Large Data Using MapReduce ICPADS '16.
[17]
Danny Yuxing Huang, Doug Grundman, Kurt Thomas, Abhishek Kumar, Elie Bursztein, Kirill Levchenko, and Alex C. Snoeren Pinning Down Abuse on Google Maps. In WWW '17.
[18]
Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR) (2008).
[19]
Thorsten Joachims Training linear SVMs in linear time. In KDD '06.
[20]
Ken CK Lee, Baihua Zheng, and Wang-Chien Lee. 2008. Ranked reverse nearest neighbor search. IEEE Transactions on Knowledge and Data Engineering (2008).
[21]
Jiaheng Lu, Ying Lu, and Gao Cong Reverse spatial and textual k nearest neighbor search SIGMOD '11.
[22]
Cheng Luo, Feng Yu, Wen-Chi Hou, Zhewei Jiang, Dunren Che, and Shan He. 2011. IRTA: An Improved Threshold Algorithm for Reverse Top-k Queries. ICEIS (1).
[23]
Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, and Emine Yilmaz Auditing Search Engines for Differential Satisfaction Across Demographics WWW '17.
[24]
Mainack Mondal, Peter Druschel, Krishna P Gummadi, and Alan Mislove Beyond access control: Managing online privacy via exposure USEC '14.
[25]
Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P Gummadi, and Aniket Kate Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data. In SOUPS '16.
[26]
Parikshit Ram and Alexander G Gray. 2012. Maximum inner-product search using tree data-structures. arXiv preprint arXiv:1202.6101 (2012).
[27]
Roman Schlegel, Apu Kapadia, and Adam J Lee Eyeing your exposure: quantifying and controlling information sharing for improved privacy SOUPS '11.
[28]
Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2007. Privacy protection in personalized search. In ACM SIGIR Forum.
[29]
Reza Shokri, George Theodorakopoulos, George Danezis, Jean-Pierre Hubaux, and Jean-Yves Le Boudec Quantifying location privacy: the case of sporadic location exposure PETS '11.
[30]
Yufei Tao, Dimitris Papadias, and Xiang Lian Reverse kNN search in arbitrary dimensionality. In VLDB '04.
[31]
Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology (2010).
[32]
Jaime Teevan, Daniel Ramage, and Merredith Ringel Morris # TwitterSearch: a comparison of microblog search and web search WSDM '11.
[33]
Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, and Kjetil Norvag. 2011. Monochromatic and bichromatic reverse top-k queries. IEEE Transactions on Knowledge and Data Engineering (2011).
[34]
Peng Wang and Chinya V Ravishankar On masking topical intent in keyword search. In ICDE '14.
[35]
Yabo Xu, Ke Wang, Benyu Zhang, and Zheng Chen Privacy-enhancing personalized web search. In WWW '07.
[36]
Man Lung Yiu and Nikos Mamoulis. 2007. Reverse nearest neighbors search in ad hoc subspaces. IEEE Transactions on Knowledge and Data Engineering (2007).
[37]
Oren Zamir, Oren Etzioni, Omid Madani, and Richard M Karp Fast and Intuitive Clustering of Web Documents. KDD '97.
[38]
Sicong Zhang, Hui Yang, and Lisa Singh Anonymizing Query Logs by Differential Privacy. In SIGIR '16.
[39]
Zhao Zhang, Cheqing Jin, and Qiangqiang Kang Reverse k-ranks query VLDB '14.
[40]
Yun Zhu, Li Xiong, and Christopher Verdery Anonymizing user profiles for personalized web search WWW '10.

Cited By

View all
  • (2022)Exposing Query Identification for Search TransparencyProceedings of the ACM Web Conference 202210.1145/3485447.3512262(3662-3672)Online publication date: 25-Apr-2022
  • (2018)Learning to lurker rank: an evaluation of learning-to-rank methods for lurking behavior analysisSocial Network Analysis and Mining10.1007/s13278-018-0516-z8:1Online publication date: 1-Jun-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. privacy
  3. ranking exposure
  4. search exposure
  5. social search

Qualifiers

  • Research-article

Funding Sources

  • Alexander von Humboldt Foundation
  • European Research Council

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Exposing Query Identification for Search TransparencyProceedings of the ACM Web Conference 202210.1145/3485447.3512262(3662-3672)Online publication date: 25-Apr-2022
  • (2018)Learning to lurker rank: an evaluation of learning-to-rank methods for lurking behavior analysisSocial Network Analysis and Mining10.1007/s13278-018-0516-z8:1Online publication date: 1-Jun-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media