skip to main content
10.1145/1835449.1835545acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Exploring reductions for long web queries

Published: 19 July 2010 Publication History

Abstract

Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has great potential for improving long web queries (upto 25% improvement in NDCG@5). However, query reduction on the web is hampered by the lack of accurate query performance predictors and the constraints imposed by search engine architectures and ranking algorithms.
In this paper, we present query reduction techniques for long web queries that leverage effective and efficient query performance predictors. We propose three learning formulations that combine these predictors to perform automatic query reduction. These formulations enable trading of average improvements for the number of queries impacted, and enable easy integration into the search engine's architecture for rank-time query reduction. Experiments on a large collection of long queries issued to a commercial search engine show that the proposed techniques significantly outperform baselines, with more than 12% improvement in NDCG@5 in the impacted set of queries. Extension to the formulations such as result interleaving further improves results. We find that the proposed techniques deliver consistent retrieval gains where it matters most: poorly performing long web queries.

References

[1]
Searches getting longer: A weblog by alan long, hitwise intelligence. http://weblogs.hitwise.com/alan-long/2009/11/searches_getting_longer.html.
[2]
N. Balasubramanian, G. Kumaran, and V. Carvalho. Predicting query performance on the web. In SIGIR 2010.
[3]
M. Bendersky and W. Croft. Discovering key concepts in verbose queries. In SIGIR, pages 491--498, 2008.
[4]
M. Bendersky and W. B. Croft. Analysis of long queries in a large scale search log. In WSCD, pages 8--14, 2009.
[5]
M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In WSDM '10, pages 31--40, 2010.
[6]
C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. NIPS, 19:193, 2007.
[7]
Y. Chen and Y.-Q. Zhang. A query substitution - search result refinement approach for long query web searches. In WI-IAT, pages 245--251, 2009.
[8]
C. Hauff, V. Murdock, and R. Baeza-Yates. Improved query difficulty prediction for the web. In CIKM, pages 439--448, 2008.
[9]
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, pages 43--54, 2004.
[10]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[11]
T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD, pages 133--142, 2002.
[12]
G. Kumaran and J. Allan. A case for shorter queries, and helping users create them. In HLT/NAACL, pages 220--227, 2007.
[13]
G. Kumaran and V. Carvalho. Reducing long queries using query quality predictors. In SIGIR, pages 564--571, 2009.
[14]
M. Lease. An improved markov random field model for supporting verbose queries. In SIGIR, pages 476--483, 2009.
[15]
M. Lease, J. Allan, and W. B. Croft. Regression rank: Learning to meet the opportunity of descriptive queries. In ECIR, pages 90--101, 2009.
[16]
C. Lee, Y. Lin, R. Chen, and P. Cheng. Selecting Effective Terms for Query Formulation. In AIRS 2009, pages 168--180, 2009.
[17]
C.-J. Lee, R.-C. Chen, S.-H. Kao, and P.-J. Cheng. A term dependency-based approach for query terms ranking. In CIKM '09, pages 1267--1276, 2009.
[18]
A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18--22, 2002.

Cited By

View all

Index Terms

  1. Exploring reductions for long web queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. combining searches
    2. learning to rank
    3. query reformulation

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Pre-training for Ad-hoc RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482286(1212-1221)Online publication date: 26-Oct-2021
    • (2021)History-Aware Expansion and Fuzzy for Query ReformulationArtificial Intelligence10.1007/978-3-030-93049-3_19(227-238)Online publication date: 5-Jun-2021
    • (2019)SiameseEmpirical Software Engineering10.1007/s10664-019-09697-724:4(2236-2284)Online publication date: 1-Aug-2019
    • (2018)Generating Better Queries for Systematic ReviewsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210020(475-484)Online publication date: 27-Jun-2018
    • (2018)Linguistic Patterns and Cross Modality-based Image Retrieval for Complex QueriesProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206050(257-265)Online publication date: 5-Jun-2018
    • (2018)Recommending frequently encountered bugsProceedings of the 26th Conference on Program Comprehension10.1145/3196321.3196348(120-131)Online publication date: 28-May-2018
    • (2018)Ranking Methods for Query Relaxation in Book Search2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)10.1109/WI.2018.00-51(466-473)Online publication date: Dec-2018
    • (2018)Key Terms Guided Expansion for Verbose Queries in Medical DomainInformation Retrieval Technology10.1007/978-3-030-03520-4_14(143-156)Online publication date: 17-Nov-2018
    • (2017)Reply WithProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132979(327-336)Online publication date: 6-Nov-2017
    • (2017)Can Short Queries Be Even Shorter?Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121056(43-50)Online publication date: 1-Oct-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media