skip to main content
10.1145/1076034.1076041acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Better than the real thing?: iterative pseudo-query processing using cluster-based language models

Published: 15 August 2005 Publication History

Abstract

We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses language models induced from both documents and clusters. First, we treat the pseudo-feedback documents produced in response to the original query as a set of pseudo-query that themselves can serve as input to the retrieval process. Observing that the documents returned in response to the pseudo-query can then act as pseudo-query for subsequent rounds, we arrive at a formulation of pseudo-query-based retrieval as an iterative process. Experiments show that several concrete instantiations of this idea, when applied in conjunction with techniques designed to heighten precision, yield performance results rivaling those of a number of previously-proposed algorithms, including the standard language-modeling approach. The use of cluster-based language models is a key contributing factor to our algorithms' success.

References

[1]
James Allan. Incremental relevance feedback for information filtering. In Proceedings of SIGIR, pages 270--278, 1996.
[2]
James Allan. HARD track overview in TREC 2003: High accuracy retrieval from documents. In Proceedings of the Twelfth Text Retrieval Conference (TREC-12), pages 24--37, 2003.
[3]
James Allan, Margaret E. Connell, W. Bruce Croft, Fang-Fang Feng, David Fisher, and Xiaoyan Li. INQUERY and TREC-9. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 551--562, 2001. NIST Special Publication 500--249.
[4]
Chris Buckley. Why current IR engines fail. In Proceedings of SIGIR, pages 584--585, 2004. Poster.
[5]
Margaret Connell, Ao Feng, Giridhar Kumaran, Hema Raghavan, Chirag Shah, and James Allan. UMass at TDT 2004. TDT2004 System Description, 2004.
[6]
W. Bruce Croft and D. J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285--295, 1979. Reprinted in Karen Sparck Jones and Peter Willett, eds., Readings in Information Retrieval, Morgan Kaufmann, pp. 339--344, 1997.
[7]
W. Bruce Croft and John Lafferty, editors. Language Modeling for Information Retrieval. Number~13 in Information Retrieval Book Series. Kluwer, 2003.
[8]
Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. A language modeling framework for selective query expansion. Technical Report IR-338, Center for Intelligent Information Retrieval, University of Massachusetts, 2004.
[9]
Alan Griffiths, H. Claire Luckhurst, and Peter Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986. Reprinted in Karen Sparck Jones and Peter Willett, eds., Readings in Information Retrieval, Morgan Kaufmann, pp. 365--373, 1997.
[10]
Donna Harman and Chris Buckley. The NRRC reliable information access (RIA) workshop. In Proceedings of SIGIR, pages 528--529, 2004. Poster.
[11]
Xiao Hu, Sindhura Bandhakavi, and ChengXiang Zhai. Error analysis of difficult TREC topics. In Proceedings of SIGIR, pages 407--408, 2003. Poster.
[12]
IJsbrand Jan Aalbersberg. Incremental relevance feedback. In Proceedings of SIGIR, pages 11--22, 1992.
[13]
Oren Kurland and Lillian Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.
[14]
Oren Kurland and Lillian Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR, 2005.
[15]
John D. Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.
[16]
Victor Lavrenko. Optimal mixture models in IR. In European Conference on Information Retrieval, pages 193--212, 2002.
[17]
Victor Lavrenko, James Allan, Edward DeGuzman, Daniel LaFlamme, Veera Pollard, and Steven Thomas. Relevance models for topic detection and tracking. In Proceedings of the Human Language Technology Conference (HLT), pages 104--110, 2002.
[18]
Victor Lavrenko and W. Bruce Croft. Relevance-based language models. In Proceedings of SIGIR, pages 120--127, 2001.
[19]
Victor Lavrenko and W. Bruce Croft. Relevance models in information retrieval. In Croft and Lafferty {7}, pages 11--56.
[20]
Xiaoyong Liu and W. Bruce Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.
[21]
Mandar Mitra, Amit Singhal, and Chris Buckley. Improving automatic query expansion. In Proceedings of SIGIR, pages 206--214, 1998.
[22]
Kenney Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), pages 483--492, 2000.
[23]
Paul Ogilvie and Jamie Callan. Experiments using the LEMUR toolkit. In Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.
[24]
Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.
[25]
Joseph John Rocchio. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.
[26]
Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2):95--145, 2003.
[27]
Ian Soboroff and Stephen E. Robertson. Building a filtering test collection for TREC 2002. In Proceedings of SIGIR, pages 243--250, 2003.
[28]
Tao Tao and ChengXiang Zhai. A mixture clustering model for pseudo feedback in information retrieval. In Proceedings of the International Federation of Classification Societies (IFCS), 2004. Invited paper.
[29]
Tao Tao and ChengXiang Zhai. A two-stage mixture model for pseudo feedback. In Proceedings of the 27th SIGIR, pages 486--487, 2004. Poster.
[30]
Jinxi Xu and W. Bruce Croft. Cluster-based language models for distributed retrieval. In Proceedings of SIGIR, pages 254--261, 1999.
[31]
Chengxiang Zhai and John D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM, pages 403--410, 2001.
[32]
Chengxiang Zhai and John D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.

Cited By

View all
  • (2022)Analysis of QR Reordering Algorithm Based on Feedback Technology OptimizationSmart Communications, Intelligent Algorithms and Interactive Methods10.1007/978-981-16-5164-9_26(215-221)Online publication date: 4-Jan-2022
  • (2016)Iterative Search using Query AspectsProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983903(2037-2040)Online publication date: 24-Oct-2016
  • (2016)Pseudo-Query ReformulationAdvances in Information Retrieval10.1007/978-3-319-30671-1_38(521-532)Online publication date: 2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aspect recall
  2. cluster-based language models
  3. clustering
  4. language modeling
  5. pseudo-feedback
  6. pseudo-queries
  7. query drift
  8. rendition

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Analysis of QR Reordering Algorithm Based on Feedback Technology OptimizationSmart Communications, Intelligent Algorithms and Interactive Methods10.1007/978-981-16-5164-9_26(215-221)Online publication date: 4-Jan-2022
  • (2016)Iterative Search using Query AspectsProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983903(2037-2040)Online publication date: 24-Oct-2016
  • (2016)Pseudo-Query ReformulationAdvances in Information Retrieval10.1007/978-3-319-30671-1_38(521-532)Online publication date: 2016
  • (2015)Spoken content retrievalIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.243854323:9(1389-1420)Online publication date: 1-Sep-2015
  • (2015)Negative query generation: bridging the gap between query likelihood retrieval models and relevanceInformation Retrieval Journal10.1007/s10791-015-9257-z18:4(359-378)Online publication date: 6-Jun-2015
  • (2013)Improving pseudo-relevance feedback via tweet selectionProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505701(439-448)Online publication date: 27-Oct-2013
  • (2013)Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo ExamplesIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2013.224872121:6(1272-1284)Online publication date: 1-Jun-2013
  • (2013)A deterministic resampling method using overlapping document clusters for pseudo-relevance feedbackInformation Processing and Management: an International Journal10.1016/j.ipm.2013.01.00149:4(792-806)Online publication date: 1-Jul-2013
  • (2012)Exploiting External Collections for Query ExpansionACM Transactions on the Web10.1145/2382616.23826216:4(1-29)Online publication date: 1-Nov-2012
  • (2012)Improving retrieval of short texts through document expansionProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348405(911-920)Online publication date: 12-Aug-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media