skip to main content
10.1145/1458082.1458195acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Vanity fair: privacy in querylog bundles

Published: 26 October 2008 Publication History

Abstract

A recently proposed approach to address privacy concerns in storing web search querylogs is bundling logs of multiple users together. In this work we investigate privacy leaks that are possible even when querylogs from multiple users are bundled together, without any user or session identifiers. We begin by quantifying users' propensity to issue own-name vanity queries and geographically revealing queries. We show that these propensities interact badly with two forms of vulnerabilities in the bundling scheme. First, structural vulnerabilities arise due to properties of the heavy tail of the user search frequency distribution, or the distribution of locations that appear within a user's queries. These heavy tails may cause a user to appear visibly different from other users in the same bundle. Second, we demonstrate analytical vulnerabilities based on the ability to separate the queries in a bundle into threads corresponding to individual users. These vulnerabilities raise privacy issues suggesting that bundling must be handled with great care.

References

[1]
E. Adar. User 4XXXXX9: Anonymizing query logs. In Query Logs Workshop at 16th WWW, 2007.
[2]
L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In 16th WWW, pages 181--190, 2007.
[3]
D. Fallows. Internet search users. http://www.pewinternet.org/pdfs/PIP\_Searchengine\_users.pdf.
[4]
D. Frankowski, D. Cosley, S. Sen, L. Terveen, and J. Riedl. You are what you say: Privacy risks of public mentions. In 29th SIGIR, pages 565--572, 2006.
[5]
C. Gates and T. Whalen. Private lives: User attitudes towards personal information on the web. Technical Report CS-2005-06, Dalhousie University, 2005.
[6]
B. J. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines. JASIST, 58(6):862--871, 2007.
[7]
R. Jones, R. Kumar, B. Pang, and A. Tomkins. "I know what you did last summer" - Query logs and user privacy. In 16th CIKM, pages 909--914, 2007.
[8]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In 15th WWW, pages 387--396, 2006.
[9]
R. Kumar, J. Novak, B. Pang, and A. Tomkins. On anonymizing query logs via token-based hashing. In 16th WWW, pages 629--638, 2007.
[10]
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.
[11]
M. Meila. Comparing clusterings by variation of information. In 16th COLT, pages 173--187, 2003.
[12]
J. Novak, P. Raghavan, and A. Tomkins. Anti-aliasing on the web. In 13th WWW, pages 30--39, 2004.
[13]
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[14]
B. Rey and P. Jhala. Mining associations from web query logs. In Proc. ECML PKDD Workshop on Web Mining, 2006.
[15]
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In 17th PODS, page 188, 1998.
[16]
C. Soghoian. The problem of anonymous vanity searches. SSRN eLibrary, 2007.
[17]
J. Teevan, E. Adar, R. Jones, and M. Potts. Information re-retrieval: Repeat queries in Yahoo's logs. In 30th SIGIR, pages 151--158, 2007.
[18]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.

Cited By

View all
  • (2021)Efficient Query Obfuscation with KeyqueriesIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493950(154-161)Online publication date: 14-Dec-2021
  • (2018)On the Interplay Between Search Behavior and Collections in Digital Libraries and ArchivesProceedings of the 2018 Conference on Human Information Interaction & Retrieval10.1145/3176349.3176350(339-341)Online publication date: 1-Mar-2018
  • (2018)Metadata categorization for identifying search patterns in a digital libraryJournal of Documentation10.1108/JD-06-2018-0087Online publication date: 10-Dec-2018
  • Show More Cited By

Index Terms

  1. Vanity fair: privacy in querylog bundles

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. privacy
      2. querylogs

      Qualifiers

      • Research-article

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Efficient Query Obfuscation with KeyqueriesIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493950(154-161)Online publication date: 14-Dec-2021
      • (2018)On the Interplay Between Search Behavior and Collections in Digital Libraries and ArchivesProceedings of the 2018 Conference on Human Information Interaction & Retrieval10.1145/3176349.3176350(339-341)Online publication date: 1-Mar-2018
      • (2018)Metadata categorization for identifying search patterns in a digital libraryJournal of Documentation10.1108/JD-06-2018-0087Online publication date: 10-Dec-2018
      • (2017)Deriving Differentially Private Session Logs for Query SuggestionProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121076(51-58)Online publication date: 1-Oct-2017
      • (2014)Web search query privacyJournal of Computer Security10.5555/2590636.259064022:1(155-199)Online publication date: 1-Jan-2014
      • (2014)Searching for myselfProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/2556288.2557356(3675-3684)Online publication date: 26-Apr-2014
      • (2014)Social ListeningTrends and Challenges in Digital Business Innovation10.1007/978-3-319-04307-4_4(67-87)Online publication date: 5-Feb-2014
      • (2014)Profiling social networks to provide useful and privacy-preserving web searchJournal of the Association for Information Science and Technology10.1002/asi.2314465:12(2444-2458)Online publication date: 1-Dec-2014
      • (2012)Considerations for recruiting contributions to anonymised data setsInternational Journal of Technology Enhanced Learning10.1504/IJTEL.2012.0483154:1/2(85-98)Online publication date: 1-Jul-2012
      • (2012)Single-party private web searchProceedings of the 2012 Tenth Annual International Conference on Privacy, Security and Trust (PST)10.1109/PST.2012.6297913(1-8)Online publication date: 16-Jul-2012
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media