research-article

A unified and discriminative model for query refinement

Authors:

Xueqi ChengAuthors Info & Claims

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 379 - 386

https://doi.org/10.1145/1390334.1390400

Published: 20 July 2008 Publication History

Abstract

This paper addresses the issue of query refinement, which involves reformulating ill-formed search queries in order to enhance relevance of search results. Query refinement typically includes a number of tasks such as spelling error correction, word splitting, word merging, phrase segmentation, word stemming, and acronym expansion. In previous research, such tasks were addressed separately or through employing generative models. This paper proposes employing a unified and discriminative model for query refinement. Specifically, it proposes a Conditional Random Field (CRF) model suitable for the problem, referred to as Conditional Random Field for Query Refinement (CRF-QR). Given a sequence of query words, CRF-QR predicts a sequence of refined query words as well as corresponding refinement operations. In that sense, CRF-QR differs greatly from conventional CRF models. Two types of CRF-QR models, namely a basic model and an extended model are introduced. One merit of employing CRF-QR is that different refinement tasks can be performed simultaneously and thus the accuracy of refinement can be enhanced. Furthermore, the advantages of discriminative models over generative models can be fully leveraged. Experimental results demonstrate that CRF-QR can significantly outperform baseline methods. Furthermore, when CRF-QR is used in web search, a significant improvement of relevance can be obtained.

References

[1]

F. Ahmad and G. Kondrak. Learning a spelling error model from search query logs. In Proceedings of EMNLP 2005, pages 955--962, 2005.

Digital Library

[2]

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Knowledge Discovery and Data Mining, pages 407--416, 2000.

Digital Library

[3]

S. Bergsma and Q. I. Wang. Learning noun phrase query segmentation. In Proceedings of EMNLP-CoNLL 2007, pages 819--826, 2007.

[4]

C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3. In Text RetrievalConference, pages 69--80, 1994.

[5]

Q. Chen, M. Li, and M. Zhou. Improving query spelling correction using web search results. In Proceedings of EMNLP-CoNLL 2007, pages 181--189, 2007.

[6]

S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP 2004, pages 293--300, 2004.

[7]

A. Feuer, S. Savev, and J. A. Aslam. Evaluation of phrasal query suggestions. In Proc. of CIKM '07, November, 2007.

Digital Library

[8]

W. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, New Jersey, 1992.

Digital Library

[9]

K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.

Digital Library

[10]

Y. Jing and W. Croft. An association thesaurus for information retrieval. In Proceedings of RIAO 94, pages 146--160, 1994.

[11]

R. Jones and D. C. Fain. Query word deletion prediction. In SIGIR, pages 435--436, 2003.

Digital Library

[12]

R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW '06, pages 387--396, 2006.

Digital Library

[13]

R. Kraft and J. Zien. Mining anchor text for query refinement. In WWW '04, New York, USA, 2004.

Digital Library

[14]

J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML '01, 2001.

Digital Library

[15]

M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of COLING-ACL 2006, pages 1025--1032, 2006.

Digital Library

[16]

D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Programming, 45:503--528, 1989.

Digital Library

[17]

F. Peng, N. Ahmed, X. Li, and Y. Lu. Context sensitive stemming for web search. In SIGIR '07, pages 23--27, July 2007.

Digital Library

[18]

Y. Qiu and H.-P. Frei. Concept-based query expansion. In SIGIR, pages 160--169, 1993.

Digital Library

[19]

L. R. Rabiner and B. H. Juang. An introduction to hidden markov models. IEEE Acoustics, Speech & Signal Processing Magazine, 3:4--16, 1986.

[20]

K. M. Risvik, T. Mikolajewski, and P. Boros. Query segmentation for web search. In WWW, 2003.

[21]

J. Rocchio. Relevance feedback in information retrieval. In The Smart Retrieval system|Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.

[22]

G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. In JASIS, pages 288--297, 1999.

[23]

A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic. From e-sex to e-commerce: Web search changes. IEEE Computer, 35(3):107--109, 2002.

Digital Library

[24]

C. Sutton, K. Rohanimanesh, and A. McCallum. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. In ICML, 2004.

Digital Library

[25]

B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: A large margin approach. In ICML '05, Bonn, Germany, August 2005.

Digital Library

[26]

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453--1484, 2005.

Digital Library

[27]

B. Vélez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In SIGIR, pages 6--15, 1997.

Digital Library

[28]

J. Xu and B. Croft. Query expansion using local and global document analysis. In SIGIR, 1996.

Digital Library

Cited By

Chikkamath RRastogi DMaan MEndres M(2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
https://doi.org/10.1016/j.wpi.2023.102254
Narayanan YFani HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)RePair: An Extensible Toolkit to Generate Large-Scale Datasets for Query Refinement via TransformersProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615129(5376-5380)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615129
Tamine LMelgarejo JPinel-Sauvagnat K(2020)What Can Task Teach Us About Query Reformulations?Advances in Information Retrieval10.1007/978-3-030-45439-5_42(636-650)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45439-5_42
Show More Cited By

Index Terms

A unified and discriminative model for query refinement
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Disjunctive Sets of Phrase Queries for Diverse Query Suggestion
WI '19: IEEE/WIC/ACM International Conference on Web Intelligence

This paper proposes a method of suggesting expanded queries that disambiguate the original Web query which has multiple interpretations. In order to produce a diverse set of queries including those corresponding to infrequent query intents, our method ...
Mining anchor text for query refinement
WWW '04: Proceedings of the 13th international conference on World Wide Web

When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of ...
Information-need driven query refinement

In this paper we presented a framework for query refinement which is driven by user's information needs. Based on the analyses of the real web IR case studies we model the query refinement process as the process of decreasing query ambiguity with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

July 2008

934 pages

ISBN:9781605581644

DOI:10.1145/1390334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '08

Sponsor:

SIGIR '08: The 31st Annual International ACM SIGIR Conference

July 20 - 24, 2008

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

76
Total Citations
View Citations
1,193
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chikkamath RRastogi DMaan MEndres M(2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
https://doi.org/10.1016/j.wpi.2023.102254
Narayanan YFani HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)RePair: An Extensible Toolkit to Generate Large-Scale Datasets for Query Refinement via TransformersProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615129(5376-5380)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615129
Tamine LMelgarejo JPinel-Sauvagnat K(2020)What Can Task Teach Us About Query Reformulations?Advances in Information Retrieval10.1007/978-3-030-45439-5_42(636-650)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45439-5_42
Sa NYuan X(2020)Examining users' partial query modification patterns in voice searchJournal of the Association for Information Science and Technology10.1002/asi.2423871:3(251-263)Online publication date: 28-Jan-2020
https://dl.acm.org/doi/10.1002/asi.24238
Salehi BLiu FBaldwin TWong WSong DLiu TSun LBruza PMelucci MSebastiani FYang G(2018)Multitask Learning for Query Segmentation in Job SearchProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234965(179-182)Online publication date: 10-Sep-2018
https://dl.acm.org/doi/10.1145/3234944.3234965
Wang SBao ZHuang SZhang RChang YZhai CLiu YMaarek Y(2018)A Unified Processing Paradigm for Interactive Location-based Web SearchProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159667(601-609)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159667
Balog KBalog K(2018)Understanding Information NeedsEntity-Oriented Search10.1007/978-3-319-93935-3_7(225-267)Online publication date: 3-Oct-2018
https://doi.org/10.1007/978-3-319-93935-3_7
Fu HNordlie RPharo NFreund LLarsen BRussel D(2017)Query Reformulation Patterns of Mixed Language Queries in Different Search IntentsProceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval10.1145/3020165.3022126(249-252)Online publication date: 7-Mar-2017
https://dl.acm.org/doi/10.1145/3020165.3022126
Priya MKalpana RSrisupriya T(2017)Hybrid optimization algorithm using N gram based edit distance2017 International Conference on Communication and Signal Processing (ICCSP)10.1109/ICCSP.2017.8286823(0216-0221)Online publication date: Apr-2017
https://doi.org/10.1109/ICCSP.2017.8286823
Mangat KVerma A(2017)An abstract model for e-content search using ontology2017 International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICCONS.2017.8250714(221-225)Online publication date: Jun-2017
https://doi.org/10.1109/ICCONS.2017.8250714
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten