skip to main content
10.1145/1390334.1390359acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

ResIn: a combination of results caching and index pruning for high-performance web search engines

Published: 20 July 2008 Publication History

Abstract

Results caching is an efficient technique for reducing the query processing load, hence it is commonly used in real search engines. This technique, however, bounds the maximum hit rate due to the large fraction of singleton queries, which is an important limitation. In this paper we propose ResIn - an architecture that uses a combination of results caching and index pruning to overcome this limitation.
We argue that results caching is an inexpensive and efficient way to reduce the query processing load and show that it is cheaper to implement compared to a pruned index. At the same time, we show that index pruning performance is fundamentally affected by the changes in the query traffic that the results cache induces. We experiment with real query logs and a large document collection, and show that the combination of both techniques enables efficient reduction of the query processing costs and thus is practical to use in Web search engines.

References

[1]
V. N. Anh and A. Moffat. Pruned Query Evaluation Using Pre-computed Impacts. In SIGIR, 2006.
[2]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The Impact of Caching on Search Engines. In SIGIR, 2007.
[3]
R. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel. Admission Policies for Caches of Search Engine Results. In SPIRE, 2007.
[4]
R. Blanco and A. Barreiro. Static Pruning of Terms in Inverted Files. In ECIR, 2007.
[5]
P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A Scalable Fully Distributed Web Crawler. Software: Practice & Experience, 34(8), 2004.
[6]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.
[7]
E. A. Brewer. Lessons from Giant-scale Services. IEEE Internet Computing, 5(4), 2001.
[8]
S. Buttcher and C. L. A. Clarke. Efficiency vs. Effectiveness in Terabyte-scale Information Retrieval. In TREC, 2005.
[9]
S. Buttcher and C. L. A. Clarke. A Document-centric Approach to Static Index Pruning in Text Retrieval Systems. In CIKM, 2006.
[10]
D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static Index Pruning for Information Retrieval Systems. In SIGIR, 2001.
[11]
C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna. A Reference Collection for Web Spam. SIGIR Forum, 40(2), 2006.
[12]
N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance Weighting for Query Independent Evidence. In SIGIR, 2005.
[13]
E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. da Silva, P. Calado, and M. A. Nascimento. Improving Web Search Efficiency via a Locality Based Static Pruning Method. In WWW, 2005.
[14]
R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. In PODS, 2001.
[15]
T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the Performance of Web Search Engines: Caching and Prefetching Query Results by Exploiting Historical Usage Data. ACM Trans. Inf. Syst., 24(1), 2006.
[16]
X. Long and T. Suel. Optimized Query Execution in Large Search Engines with Global Page Ordering. In VLDB, 2003.
[17]
X. Long and T. Suel. Three-level Caching for Efficient Query Processing in Large Web Search Engines. In WWW, 2005.
[18]
A. Ntoulas and J. Cho. Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee. In SIGIR, 2007.
[19]
T. Strohman and W. B. Croft. E±cient Document Retrieval in Main Memory. In SIGIR, 2007.
[20]
J. Teevan, E. Adar, R. Jones, and M. A. S. Potts. Information Re-retrieval: Repeat Queries in Yahoo's Logs. In SIGIR, 2007.
[21]
Y. Tsegay, A. Turpin, and J. Zobel. Dynamic Index Pruning for Effective Caching. In CIKM, 2007.
[22]
J. Zhang, X. Long, and T. Suel. Performance of Compressed Inverted List Caching in Search Engines. In WWW, 2008.

Cited By

View all
  • (2024)Neural Passage Quality Estimation for Static PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657765(174-185)Online publication date: 10-Jul-2024
  • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
  • (2023)Static Pruning for Multi-Representation Dense RetrievalProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604896(1-10)Online publication date: 22-Aug-2023
  • Show More Cited By

Index Terms

  1. ResIn: a combination of results caching and index pruning for high-performance web search engines

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
      July 2008
      934 pages
      ISBN:9781605581644
      DOI:10.1145/1390334
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 July 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. caching
      2. pruning
      3. query logs
      4. web search

      Qualifiers

      • Research-article

      Conference

      SIGIR '08
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Neural Passage Quality Estimation for Static PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657765(174-185)Online publication date: 10-Jul-2024
      • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
      • (2023)Static Pruning for Multi-Representation Dense RetrievalProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604896(1-10)Online publication date: 22-Aug-2023
      • (2020)Pre-indexing Pruning StrategiesString Processing and Information Retrieval10.1007/978-3-030-59212-7_13(177-193)Online publication date: 17-Sep-2020
      • (2019)Cloudy Knapsack Algorithm for Offloading Tasks from Large Scale Distributed ApplicationsIEEE Transactions on Cloud Computing10.1109/TCC.2017.27137767:4(949-963)Online publication date: 1-Oct-2019
      • (2018)Exploring Size-Speed Trade-Offs in Static Index Pruning2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622177(1093-1100)Online publication date: Dec-2018
      • (2018)Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engineWorld Wide Web10.1007/s11280-018-0647-1Online publication date: 13-Nov-2018
      • (2017)LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked ListsProceedings of the 20th International Workshop on the Web and Databases10.1145/3068839.3068845(23-28)Online publication date: 14-May-2017
      • (2017)Performance improvements for search systems using an integrated cache of lists + intersectionsInformation Retrieval Journal10.1007/s10791-017-9299-520:3(172-198)Online publication date: 11-Mar-2017
      • (2017)Waves: a fast multi-tier top-k query processing algorithmInformation Retrieval Journal10.1007/s10791-017-9298-620:3(292-316)Online publication date: 13-Mar-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media