skip to main content
10.1145/1835449.1835466acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Caching search engine results over incremental indices

Published: 19 July 2010 Publication History

Abstract

A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index.
To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo! search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.

References

[1]
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the Web. ACM Transactions on Internet Technology, 1(1):2--43, 2001.
[2]
Ricardo Baeza-Yates, Aristides Gionis, Flavio P. Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. Design trade-offs for search engine caching. ACM Transactions on the Web, 2(4):1--28, 2008.
[3]
Ricardo Baeza-Yates, Flavio Junqueira, Vassilis Plachouras, and Hans F. Witschel. Admission Policies for Caches of Search Engine Results. In SPIRE, 2007.
[4]
Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press/Addison Wesley, New York, NY, 1999.
[5]
Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. In WWW'98: Proceedings of the 7th International conference on the World Wide Web, pages 107--117, 1998.
[6]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the Web. Computer Networks and ISDN Systems, 29(8-13):1157--1166, 1997.
[7]
Soumen Chakrabarti. Mining the Web-Discovering Knowledge from Hypertext Data. MorganKaufmann Publishers, San Francisco, CA, 2003.
[8]
Junghoo Cho and Hector Garcia-Molina. The evolution of the Web and implications for an incremental crawler. In Proc. 26th InternationalConference on Very Large Data Bases(VLDB2000), pages 200--209, 2000.
[9]
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christopher Olston, Sandeep Pandey, and Andrew Tomkins. The discoverability of the Web. In WWW'07: Proceedings of the 16th International Conference on the World Wide Web, pages 421--430. ACM, 2007.
[10]
Tiziano Fagni, Raffaele Perego, Fabrizio Silvestri, and Salvatore Orlando. Boosting the Performance of Web Search Engines: Caching and Prefetching Query Results by Exploiting Historical UsageData. ACM Transactions on Information Systems, 24(1):51--78, 2006.
[11]
Marcus Fontoura, Jason Zien, Eugene Shekita, Sridhar Rajagopalan, and Andreas Neumann. High performance index build algorithms for intranet search engines. In Proc. 30th International Conference on Very Large Data Bases (VLDB2004), pages 1158--1169. Morgan Kaufmann, August 2004.
[12]
Qingqing Gan and Torsten Suel. Improved techniques for result caching in Web search engines. In WWW'09: Proceedings of the 18th International Conference on the World Wide Web, pages 431--440, April 2009.
[13]
Paul Jaccard. Etude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37:547--579, 1901.
[14]
Ronny Lempel and Shlomo Moran. Predictive Caching and Prefetching of Query Results in Search Engines. In WWW'03: Proceedings of the 12th International Conference on the World Wide Web, pages 19--28. ACMPress, 2003.
[15]
Ronny Lempel and Shlomo Moran. Competitive caching of query results in search engines. Theoretical Computer Science, 324(2):253--271, September 2004.
[16]
Xiaohui Long and Torsten Suel. Three-level caching for effcient query processing in large Web search engines. In WWW'05: Proceedings of the 14th International conference on the World Wide Web, pages 257--266, May2005.
[17]
Evangelos P. Markatos. On Caching Search Engine Query Results. Computer Communications, 24(2):137--143, 2001.
[18]
Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Building a distributed full-text index for the Web. In WWW'01: Proceedings of the 10th International Conference on the World Wide Web, pages 396--406, May2001.
[19]
P. Saraiva, E. Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Ribeiro-Neto. Rank-preserving two-level caching for scalable search engines. In Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 51--58, 2001.
[20]
Gleb Skobeltsyn, Flavio Junqueira, Vassilis Plachouras, and Ricardo Baeza-Yates. ResIn: A combination of results caching and index pruning for high-performance Web search engines. In Proceedings of the 31st ACM SIGIR conference, pages 131--138, 2008.
[21]
Ian Witten, Alistair Moffat, and Timoty Bell. Managing Gigabytes. Morgan Kaufmann Publishers, Inc., San Francisco, CA, second edition, 1999.

Cited By

View all
  • (2024)Caching in Forschung und IndustrieSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_5(91-140)Online publication date: 3-May-2024
  • (2024)HTTP für global verteilte AnwendungenSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_3(35-60)Online publication date: 3-May-2024
  • (2022)Scalability Challenges in Web Search EnginesundefinedOnline publication date: 10-Mar-2022
  • Show More Cited By

Index Terms

  1. Caching search engine results over incremental indices

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. real-time indexing
    2. search engine caching

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Caching in Forschung und IndustrieSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_5(91-140)Online publication date: 3-May-2024
    • (2024)HTTP für global verteilte AnwendungenSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_3(35-60)Online publication date: 3-May-2024
    • (2022)Scalability Challenges in Web Search EnginesundefinedOnline publication date: 10-Mar-2022
    • (2020)Caching in Research and IndustryFast and Scalable Cloud Data Management10.1007/978-3-030-43506-6_5(85-130)Online publication date: 15-May-2020
    • (2020)HTTP for Globally Distributed ApplicationsFast and Scalable Cloud Data Management10.1007/978-3-030-43506-6_3(33-55)Online publication date: 15-May-2020
    • (2018)Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search EnginesProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225102(1-10)Online publication date: 13-Aug-2018
    • (2018)On the Volatility of Commercial Search Engines and its Impact on Information Retrieval ResearchThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210088(1105-1108)Online publication date: 27-Jun-2018
    • (2017)QuaestorProceedings of the VLDB Endowment10.14778/3137765.313777310:12(1670-1681)Online publication date: 1-Aug-2017
    • (2017)Search Result Prefetching on Desktop and MobileACM Transactions on Information Systems10.1145/301546635:3(1-34)Online publication date: 12-May-2017
    • (2016)Scalability and Efficiency Challenges in Large-Scale Web Search EnginesProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914808(1223-1226)Online publication date: 7-Jul-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media