skip to main content
10.1145/1835449.1835467acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Query forwarding in geographically distributed search engines

Published: 19 July 2010 Publication History

Abstract

Query forwarding is an important technique for preserving the result quality in distributed search engines where the index is geographically partitioned over multiple search sites. The key component in query forwarding is the thresholding algorithm by which the forwarding decisions are given. In this paper, we propose a linear-programming-based thresholding algorithm that significantly outperforms the current state-of-the-art in terms of achieved search efficiency values. Moreover, we evaluate a greedy heuristic for partial index replication and investigate the impact of result cache freshness on query forwarding performance. Finally, we present some optimizations that improve the performance further, under certain conditions. We evaluate the proposed techniques by simulations over a real-life setting, using a large query log and a document collection obtained from Yahoo!.

References

[1]
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. In 23rd Int'l Conf. on Data Engineering, pages 6--20, 2007.
[2]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proc. 30th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 183--190, 2007.
[3]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Plachouras, and L. Telloli. On the feasibility of multi-site web search engines. In Proc. 18th ACM Conf. on Information and Knowledge Management, pages 425--434, 2009 (best paper).
[4]
R. Baeza-Yates, C. Middleton, and C. Castillo. The geographical life of search. In Proc. 2009 IEEE/WIC/ACM Int'l Joint Conf. on Web Intelligence and Intelligent Agent Technology, pages 252--259, 2009.
[5]
R. Baeza-Yates, V. Murdock, and C. Hauff. Efficiency trade-offs in two-tier web search systems. In Proc. 32nd Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 163--170, 2009.
[6]
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In Proc. 18th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 21--28, 1995.
[7]
B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In 19th Int'l Conf. on World Wide Web, 2010 (accepted).
[8]
B. B. Cambazoglu, V. Plachouras, and R. Baeza-Yates. Quantifying performance and quality gains in distributed web search engines. In Proc. 32nd Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 411--418, 2009.
[9]
B. B. Cambazoglu, V. Plachouras, F. Junqueira, and L. Telloli. On the feasibility of geographically distributed web crawling. In Proc. 3rd Int'l Conf. on Scalable Information Systems, 2008.
[10]
G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In Proc. 32nd Int'l Conf. on Very Large Data Bases, pages 451--462, 2006.
[11]
Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proc. 18th Int'l Conf. on World Wide Web, pages 431--440, 2009.
[12]
B. Huffaker, M. Fomenkov, D. J. Plummer, D. Moore, and K. Claffy. Distance metrics in the internet. In Proc. Int'l Telecommunications Symposium, 2002.
[13]
R. Kumar, K. Punera, T. Suel, and S. Vassilvitskii. Top-k aggregation using intersections of ranked inputs. In Proc. 2nd ACM Int'l Conf. on Web Search and Data Mining, pages 222--231, 2009.
[14]
R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proc. 12th Int'l Conf. on World Wide Web, pages 19--28, 2003.
[15]
X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In Proc. 14th Int'l Conf. on World Wide Web, pages 257--266, 2005.
[16]
E. Schurman and J. Brutlag. Performance related changes and their user impact. In Velocity: Web Performance and Operations Conf., 2009.
[17]
C. Tang, Z. Xu, and M. Mahalingam. Peersearch: Efficient information retrieval in peer-to-peer networks. In Proc. of HotNets-I, ACM SIGCOMM, 2002.

Cited By

View all
  • (2024)Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future ProspectsComputing and Informatics10.1007/978-981-99-9589-9_2(17-29)Online publication date: 26-Jan-2024
  • (2022)Scalability Challenges in Web Search EnginesundefinedOnline publication date: 10-Mar-2022
  • (2020)Improving Load Balance via Resource Exchange in Large-Scale Search EnginesProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404402(1-11)Online publication date: 17-Aug-2020
  • Show More Cited By

Index Terms

  1. Query forwarding in geographically distributed search engines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed ir
    2. index replication
    3. linear programming
    4. optimization
    5. query forwarding
    6. result caching
    7. search engines

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future ProspectsComputing and Informatics10.1007/978-981-99-9589-9_2(17-29)Online publication date: 26-Jan-2024
    • (2022)Scalability Challenges in Web Search EnginesundefinedOnline publication date: 10-Mar-2022
    • (2020)Improving Load Balance via Resource Exchange in Large-Scale Search EnginesProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404402(1-11)Online publication date: 17-Aug-2020
    • (2020)On Scalability of Association-rule-based RecommendationACM Transactions on the Web10.1145/339820214:3(1-21)Online publication date: 21-Jun-2020
    • (2019)Resource-Efficient Index Shard Replication in Large Scale Search EnginesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292442330:12(2820-2835)Online publication date: 1-Dec-2019
    • (2019)Novel Distributed Dynamic Backbone-based Flooding in Unstructured NetworksPeer-to-Peer Networking and Applications10.1007/s12083-019-00817-013:3(872-889)Online publication date: 18-Nov-2019
    • (2018)Measuring the Effectiveness of Selective Search Index Partitions without SupervisionProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234952(91-98)Online publication date: 10-Sep-2018
    • (2018)Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search EnginesProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225102(1-10)Online publication date: 13-Aug-2018
    • (2016)Scalability and Efficiency Challenges in Large-Scale Web Search EnginesProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914808(1223-1226)Online publication date: 7-Jul-2016
    • (2016)Scalable and Efficient Web Search Result DiversificationACM Transactions on the Web10.1145/290794810:3(1-30)Online publication date: 16-Aug-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media