skip to main content
10.1145/1321440.1321573acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

"I know what you did last summer": query logs and user privacy

Published: 06 November 2007 Publication History

Abstract

We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers to map a sequence of queries into the gender, age, and location of the user issuing the queries. We then show how these classifiers may be carefully combined at multiple granularities to map a sequence of queries into a set of candidate users that is 300-600 times smaller than random chance would allow. We show that this approach remains accurate even after removing personally identifiable information such as names/numbers or limiting the size of the query log.
We also present a new attack in which a real-world acquaintance of a user attempts to identify that user in a large query log, using personal information. We show that combinations of small pieces of information about terms a user would probably search for can be highly effective in identifying the sessions of that user.
We conclude that known schemes to release even heavily scrubbed query logs that contain session information have significant privacy risks.

References

[1]
E. Adar. User 4XXXXX9: Anonymizing query logs. In Query Logs Workshop at the 16th WWW, 2007.
[2]
S. Argamon, M. Koppel, and G. Avneri. Routing documents according to style. In Proc. 1st Workshop on Innovative Information Systems, 1998.
[3]
S. Argamon, M. Koppel, J. Fine, and A. R. Shimoni. Gender, genre, and writing style in formal written texts. Text, 23(3):321--346, 2003.
[4]
L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In Proc. 16th WWW, pages 181--190, 2007.
[5]
D. Frankowski, D. Cosley, S. Sen, L. Terveen, and J. Riedl. You are what you say: Privacy risks of public mentions. In Proc. 29th SIGIR, pages 565--572, 2006.
[6]
L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing web queries according to geographical locality. In Proc. 12th CIKM, pages 325--333, 2003.
[7]
J. Hu, H.-J. Zeng, H. Li, C. Niu, and Z. Chen. Demographic prediction based on user's browsing behavior. In Proc. 16th WWW, pages 151--160, 2007.
[8]
B. J. Jansen, A. Spink, and T. Saracevic. Real life, real users, and real needs: A study and analysis of user queries on the web. IPM, 36(2):207--227, 2000.
[9]
R. Jones, W. V. Zhang, P. Jhala, and B. Rey. Geographic intention and modification in web search. International Journal of Geographical Information Science, 2007.
[10]
R. Kumar, J. Novak, B. Pang, and A. Tomkins. On anonymizing query logs via token-based hashing. In Proc. 16th WWW, pages 629--638, 2007.
[11]
F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalist Papers. Addison-Wesley, 1964.
[12]
J. Novak, P. Raghavan, and A. Tomkins. Anti-aliasing on the web. In Proc. 13th WWW, pages 30--39, 2004.
[13]
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In Proc. 17th PODS, page 188, 1998.
[14]
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report 1998--014, Digital SRC, 1998.
[15]
L. M. Tomokiyo and R. Jones. You're not from 'round here, are you? Naive Bayes detection of non-native utterance text. In Proc. 2nd NAACL, 2001.

Cited By

View all
  • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
  • (2023)Finding the Age and Education Level of Bulgarian-Speaking Internet Users Using Keystroke DynamicsEng10.3390/eng40401544:4(2711-2721)Online publication date: 25-Oct-2023
  • (2023)Privacy-Preserving Redaction of Diagnosis Data through Source Code AnalysisProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603734(1-4)Online publication date: 10-Jul-2023
  • Show More Cited By

Index Terms

  1. "I know what you did last summer": query logs and user privacy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. k-anonymity
    2. privacy
    3. query log analysis

    Qualifiers

    • Poster

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
    • (2023)Finding the Age and Education Level of Bulgarian-Speaking Internet Users Using Keystroke DynamicsEng10.3390/eng40401544:4(2711-2721)Online publication date: 25-Oct-2023
    • (2023)Privacy-Preserving Redaction of Diagnosis Data through Source Code AnalysisProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603734(1-4)Online publication date: 10-Jul-2023
    • (2023)Web Privacy: A Formal Adversarial Model for Query ObfuscationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.326212318(2132-2143)Online publication date: 2023
    • (2023)Private Web Search Using Proxy-Query Based Query Obfuscation SchemeIEEE Access10.1109/ACCESS.2023.323500011(3607-3625)Online publication date: 2023
    • (2023)Taming the round efficiency of cryptographic protocols for private web search schemesInformation Sciences10.1016/j.ins.2022.11.003621(1-21)Online publication date: Apr-2023
    • (2023)On the self-adjustment of privacy safeguards for query log streamsComputers and Security10.1016/j.cose.2023.103450134:COnline publication date: 1-Nov-2023
    • (2022)Who Knows I Like Jelly Beans? An Investigation Into Search PrivacyProceedings on Privacy Enhancing Technologies10.2478/popets-2022-00532022:2(426-446)Online publication date: 3-Mar-2022
    • (2022)MLI: A Multi-level Inference Mechanism for User Attributes in Social NetworksACM Transactions on Information Systems10.1145/354579741:2(1-30)Online publication date: 21-Dec-2022
    • (2022)Cryptography, Trust and Privacy: It's ComplicatedProceedings of the 2022 Symposium on Computer Science and Law10.1145/3511265.3550443(167-179)Online publication date: 1-Nov-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media