ABSTRACT
Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document content, with only a single version of the document in the index at any point in time. In this paper, we present the first published analysis of using the temporal dynamics of document content to improve relevance ranking. We show that there is a strong relationship between the amount and frequency of content change and relevance. We develop a novel probabilistic document ranking algorithm that allows differential weighting of terms based on their temporal characteristics. By leveraging such content dynamics we show significant performance improvements for navigational queries.
- E. Adar, M. Dontcheva, J. Fogarty, and D.S. Weld. Zoetrope: Interacting with the ephemeral web. In UIST '08: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, pages 239--248, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- E. Adar, J. Teevan, and S.T. Dumais. Resonance on the web: Web dynamics and revisitation patterns. In CHI '09: Proceedings of the 27th International Conference on Human Factors in Computing Systems, pages 1381--1390, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- E. Adar, J. Teevan, S.T. Dumais, and J.L. Elsas. The web changes everything: Understanding the dynamics of web content. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 282--291, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- O. Alonso and M. Gertz. Clustering of search results using temporal attributes. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 597--598, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- P. Anick and R. Flynn. Versioning a full-text information retrieval system. In SIGIR '92: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 98--111, New York, NY, USA, 1992. ACM. Google ScholarDigital Library
- K. Berberich, S. Bedathur, T. Neumann, and G. Weikum. A time machine for text search. In SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 519--526, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- A.Z. Broder, S.C. Glassman, M.S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8-13):1157--1166, 1997. Google ScholarDigital Library
- J. Cho, S. Roy, and R.E. Adams. Page quality: In search of an unbiased web ranking. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 551--562, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18--24, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- D. Fetterly, M. Manasse, M. Najork, and J. Wiener. A large-scale study of the evolution of web pages. In WWW '03: Proceedings of the 12th International Conference on World Wide Web, pages 669--678, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- M. Herscovici, R. Lempel, and S. Yogev. Efficient indexing of versioned document sequences. Lecture Notes in Computer Science, 4425:76--87, 2007. Google ScholarDigital Library
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002. Google ScholarDigital Library
- A. Jatowt, Y. Kawai, and K. Tanaka. Visualizing historical content of web pages. In WWW '08: Proceeding of the 17th International Conference on World Wide Web, pages 1221--1222, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- R. Jones and F. Diaz. Temporal profiles of queries. ACM Transactions on Information Systems (TOIS), 25(3):14, 2007. Google ScholarDigital Library
- U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW '05: Proceedings of the 14th International Conference on World Wide Web, pages 391-400, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- X. Li and W.B. Croft. Time-based language models. In CIKM '03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pages 469--475, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- A. Ntoulas, J. Cho, and C. Olston. What's new on the web?: The evolution of the web from a search engine perspective. In WWW '04: Proceedings of the 13th International Conference on World Wide Web, pages 1--12, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- P. Ogilvie and J. Callan. Combining document representations for known-item search. In SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 143--150, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- C. Olston and S. Pandey. Recrawl scheduling based on information longevity. In WWW '08: Proceeding of the 17th International Conference on World Wide Web, pages 437--446, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179--214, 2004. Google ScholarDigital Library
- R. Zhang, Y. Chang, Z. Zheng, D. Metzler, and J.-y. Nie. Search result re-ranking by feedback control adjustment for time-sensitive query. In NAACL '09: Proceedings of the 2009 Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 165--168, Morristown, NJ, USA, 2009. Association for Computational Linguistics. Google ScholarDigital Library
Index Terms
- Leveraging temporal dynamics of document content in relevance ranking
Recommendations
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementIn this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...
Learning Query and Document Relevance from a Web-scale Click Graph
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalClick-through logs over query-document pairs provide rich and valuable information for multiple tasks in information retrieval. This paper proposes a vector propagation algorithm on the click graph to learn vector representations for both queries and ...
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologyWeb users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Comments