skip to main content
10.1145/1242572.1242657acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

On anonymizing query logs via token-based hashing

Published:08 May 2007Publication History

ABSTRACT

In this paper we study the privacy preservation properties of aspecific technique for query log anonymization: token-based hashing. In this approach, each query is tokenized, and then a secure hash function is applied to each token. We show that statistical techniques may be applied to partially compromise the anonymization. We then analyze the specific risks that arise from these partial compromises, focused on revelation of identity from unambiguous names, addresses, and so forth, and the revelation of facts associated with an identity that are deemed to be highly sensitive. Our goal in this work is two fold: to show that token-based hashing is unsuitable for anonymization, and to present a concrete analysis of specific techniques that may be effective in breaching privacy, against which other anonymization schemes should be measured.

References

  1. R. Barzilay and K. McKeown. Extracting paraphrases from a parallel corpus. In Proc. of the 39th Annual Meeting of the Association for Computational Linguistics, pages 50--57, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proc. of the 21st International Conference on Data Engineering, pages 217--228, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5--17, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. J. Jansen, A. Spink, and T. Saracevic. Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36(2):207--227, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proc. of the 15th International Conference on World Wide Web, pages 387--396, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Kleinberg and E. Tardos. Algorithm Design. Addison Wesley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Lee. Measures of distributional similarity. In Proc. of the 37th Annual Meeting of the Association for Computational Linguistics, pages 25--32, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Lempel and S. Moran. Optimizing result prefetching in web search engines with segmented indices. ACM Transactions on Internet Technology, 4(1):31--59, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proc. of the 23rd ACM Symposium on the Principles of Database Systems, pages 223--228, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Novak, P. Raghavan, and A. Tomkins. Anti-aliasing on the web. In Proc. of the 13th International Conference on World Wide Web, pages 30--39, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Pang and V. Paxson. A high-level programming environment for packet trace anonymization and transformation. In Proc. of the ACM SIGCOMM 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pages 339--351, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words. In Proc. of the 31st Annual Meeting of the Association for Computational Linguistics, pages 183--190, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. E. Rose and D. Levinson. Understanding user goals in web search. In Proc. of the 13th International Conference on World Wide Web, pages 13--19, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. C. M. Ross. End user searching on the internet: An analysis of term pair topics submitted to the excite search engine. Journal of American Society of Information Sciences, 51(10):949--958, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information. In Proc. of the 17th ACM Symposium on the Principles of Database Systems, page 188, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Slagell and W. Yurcik. Sharing computer network logs for security and privacy: A motivation for new methodologies of anonymization. In Workshop of the 1st International Conference on Security and Privacy for Emerging Areas in Communication Networks, pages 80--89, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Spink. A user-centered approach to evaluating human interaction with web search engines: An exploratory study. Information Processing and Management, 38(3):401--426, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic. From e-sex to e-commerce: Web search changes. Computer, 35(3):107--109, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Spink and H. C. Ozmultu. Characteristics of question format web queries: An exploratory study. Information Processing and Management, 38(4):453--471, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Zhong, Z. Yang, and R. N. Wright. Privacy-enhancing k-anonymization of customer data. In Proc. of the 24th ACM Symposium on the Principles of Database Systems, pages 139--147, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On anonymizing query logs via token-based hashing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '07: Proceedings of the 16th international conference on World Wide Web
        May 2007
        1382 pages
        ISBN:9781595936547
        DOI:10.1145/1242572

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 May 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

        Upcoming Conference

        WWW '24
        The ACM Web Conference 2024
        May 13 - 17, 2024
        Singapore , Singapore

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader