skip to main content
10.1145/1718487.1718489acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Leveraging temporal dynamics of document content in relevance ranking

Published:04 February 2010Publication History

ABSTRACT

Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document content, with only a single version of the document in the index at any point in time. In this paper, we present the first published analysis of using the temporal dynamics of document content to improve relevance ranking. We show that there is a strong relationship between the amount and frequency of content change and relevance. We develop a novel probabilistic document ranking algorithm that allows differential weighting of terms based on their temporal characteristics. By leveraging such content dynamics we show significant performance improvements for navigational queries.

References

  1. E. Adar, M. Dontcheva, J. Fogarty, and D.S. Weld. Zoetrope: Interacting with the ephemeral web. In UIST '08: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, pages 239--248, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Adar, J. Teevan, and S.T. Dumais. Resonance on the web: Web dynamics and revisitation patterns. In CHI '09: Proceedings of the 27th International Conference on Human Factors in Computing Systems, pages 1381--1390, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Adar, J. Teevan, S.T. Dumais, and J.L. Elsas. The web changes everything: Understanding the dynamics of web content. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 282--291, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Alonso and M. Gertz. Clustering of search results using temporal attributes. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 597--598, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Anick and R. Flynn. Versioning a full-text information retrieval system. In SIGIR '92: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 98--111, New York, NY, USA, 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Berberich, S. Bedathur, T. Neumann, and G. Weikum. A time machine for text search. In SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 519--526, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A.Z. Broder, S.C. Glassman, M.S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8-13):1157--1166, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cho, S. Roy, and R.E. Adams. Page quality: In search of an unbiased web ranking. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 551--562, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18--24, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Fetterly, M. Manasse, M. Najork, and J. Wiener. A large-scale study of the evolution of web pages. In WWW '03: Proceedings of the 12th International Conference on World Wide Web, pages 669--678, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Herscovici, R. Lempel, and S. Yogev. Efficient indexing of versioned document sequences. Lecture Notes in Computer Science, 4425:76--87, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jatowt, Y. Kawai, and K. Tanaka. Visualizing historical content of web pages. In WWW '08: Proceeding of the 17th International Conference on World Wide Web, pages 1221--1222, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Jones and F. Diaz. Temporal profiles of queries. ACM Transactions on Information Systems (TOIS), 25(3):14, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW '05: Proceedings of the 14th International Conference on World Wide Web, pages 391-400, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Li and W.B. Croft. Time-based language models. In CIKM '03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pages 469--475, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Ntoulas, J. Cho, and C. Olston. What's new on the web?: The evolution of the web from a search engine perspective. In WWW '04: Proceedings of the 13th International Conference on World Wide Web, pages 1--12, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Ogilvie and J. Callan. Combining document representations for known-item search. In SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 143--150, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Olston and S. Pandey. Recrawl scheduling based on information longevity. In WWW '08: Proceeding of the 17th International Conference on World Wide Web, pages 437--446, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Zhang, Y. Chang, Z. Zheng, D. Metzler, and J.-y. Nie. Search result re-ranking by feedback control adjustment for time-sensitive query. In NAACL '09: Proceedings of the 2009 Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 165--168, Morristown, NJ, USA, 2009. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Leveraging temporal dynamics of document content in relevance ranking

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '10: Proceedings of the third ACM international conference on Web search and data mining
      February 2010
      468 pages
      ISBN:9781605588896
      DOI:10.1145/1718487

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 February 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader