research-article

Query dependent pseudo-relevance feedback based on wikipedia

Authors:

Gareth J.F. Jones,

Bin WangAuthors Info & Claims

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 59 - 66

https://doi.org/10.1145/1571941.1571954

Published: 19 July 2009 Publication History

Abstract

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.

References

[1]

C. Bishop. Pattern recognition and machine learning. Springer, 2006.

Digital Library

[2]

C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Text REtrieval Conference, 1994.

[3]

G. Cao, J.-Y. Nie, J. Gao, and S. Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of SIGIR 2008, pages 243--250.

Digital Library

[4]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[5]

P.A. Chirita, C.S. Firan, and W. Nejdl. Personalized query expansion for the web. In Proceeding of SIGIR 2007, pages 7--14.

Digital Library

[6]

S. Cronen-Townsend, Y. Zhou, and W.B. Croft. A framework for selective query expansion. In Proceedings of CIKM 2004, pages 236--237.

Digital Library

[7]

J.L. Elsas, J. Arguello, J. Callan, and J.G. Carbonell. Retrieval and feedback models for blog feed search. In Proceedings of SIGIR 2008, pages 347--354.

Digital Library

[8]

C. Fautsch and J. Savoy. UniNE at TREC 2008: Fact and Opinion Retrieval in the Blogsphere. In Proceedings of TREC 2008.

[9]

S. Fissaha Adafre, V. Jijkoun, and M. de Rijke. Fact discovery in Wikipedia. In Proceedings of Web Intelligence 2007, pages 177--183.

Digital Library

[10]

B.M. Fonseca, P. Golgher, B. Pôssas, B. Ribeiro-Neto, and N. Ziviani. Concept-based interactive query expansion. In Proceedings of CIKM 2005, pages 696--703.

Digital Library

[11]

G.R. Giambattista Amati, Claudio Carpineto and F.U. Bordoni. Query di±culty, robustness and selective application of query expansion. In Proceedings of ECIR 2004, pages 127--137, 2004.

[12]

J.-Y. N. Hang Cui, Ji-Rong Wen and W.-Y. Ma. Query expansion by mining user logs. IEEE Transactions on knowledge and data engineering, 15(4):829--839, 2003.

Digital Library

[13]

B. He and I. Ounis. Combining fields for query expansion and adaptive query expansion. Information Processing and Management, 2007.

Digital Library

[14]

Indri. http://www.lemurproject.org/indri/.

[15]

J.A. Hartigan and M. Wong. A k-means clustering algorithm. Applied Statistics.

Digital Library

[16]

W. W.-J. H.K. Balog, E. Meij and M. de Rijke. The University of Amsterdam at TREC 2008: Blog, Enterprise, and Relevance Feedback. In Proceedings of TREC 2008.

[17]

K.L. Kwok and M. Chan. Improving two-stage ad-hoc retrieval for short queries. In Proceedings of SIGIR 1998, pages 250--256.

Digital Library

[18]

V. Lavrenko and W.B. Croft. Relevance based language models. In Proceedings of SIGIR 2001, pages 120--127.

Digital Library

[19]

K.S. Lee, W.B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of SIGIR 2008, pages 235--242.

Digital Library

[20]

Y. Li, W.P.R. Luk, K.S.E. Ho, and F.L.K. Chung. Improving weak ad-hoc queries using Wikipedia as external corpus. In Proceedings of SIGIR 2007, pages 797--798.

Digital Library

[21]

D. Metzler and W.B. Croft. Latent concept expansion using markov random fields. In Proceedings of SIGIR 2007, pages 311--318.

Digital Library

[22]

D. Metzler, T. Strohman, H. Turtle, and W. Croft. Indri at trec 2005: Terabyte track. In Proceedings of TREC 2004.

[23]

D.N. Milne, I.H. Witten, and D.M. Nichols. A knowledge-based search engine powered by Wikipedia. In Proceedings of CIKM 2007, pages 445--454.

Digital Library

[24]

G. Mishne. Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam, Amsterdam, 2007.

[25]

J. Platt. Probabilities for SV machines. Advances in large margin classifiers, pages 61--74.

[26]

S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In Proceedings of CIKM 2004, pages 42--49.

Digital Library

[27]

S.E. Robertson, S. Walker, M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In In Proceedings of the 4th Text REtrieval Conference (TREC), 1996.

[28]

M. Sanderson. Ambiguous queries: test collections need more sense. In Proceedings of SIGIR 2008, pages 499--506.

Digital Library

[29]

T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of SIGIR 2006, pages 162--169. Wikipedia. http://www.wikipedia.org.

Digital Library

[30]

J. Xu and W.B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.

Digital Library

[31]

E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty. In Proceedings of SIGIR 2005, pages 512--519.

Digital Library

[32]

M.M. Zesch Torsten, Gurevych Iryna. Analyzing and accessing Wikipedia as a lexical semantic resource. In Biannual Conference of the Society for Computational Linguistics and Language Technology 2007, pages 213--221.

[33]

C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM 2001, pages 403--410.

Digital Library

[34]

W. Zhang and C. Yu. UIC at TREC 2006 Blog Track. In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2007.

Cited By

Singh PBhowmick P(2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
https://dl.acm.org/doi/10.1016/j.websem.2024.100848
Abdollahi SKuculo TGottschalk S(2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_22
Bassani ETonellotto NPasi G(2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3624988
Show More Cited By

Index Terms

Query dependent pseudo-relevance feedback based on wikipedia
1. Information systems
  1. Information retrieval

Recommendations

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval
ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-...
A cluster-based resampling method for pseudo-relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Document-based and term-based linear methods for pseudo-relevance feedback

Query expansion is a successful approach for improving Information Retrieval effectiveness. This work focuses on pseudo-relevance feedback (PRF) which provides an automatic method for expanding queries without explicit user feedback. These techniques ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

July 2009

896 pages

ISBN:9781605584836

DOI:10.1145/1571941

General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '09

Sponsor:

SIGIR '09: The 32nd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2009

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

138
Total Citations
View Citations
2,114
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh PBhowmick P(2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
https://dl.acm.org/doi/10.1016/j.websem.2024.100848
Abdollahi SKuculo TGottschalk S(2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_22
Bassani ETonellotto NPasi G(2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3624988
Mothe JUllah M(2023)Selective Query Processing: A Risk-Sensitive Selection of Search ConfigurationsACM Transactions on Information Systems10.1145/360847442:1(1-35)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3608474
Sheetrit ERaiber FKurland OYoshioka MKiseleva JAliannejadi M(2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605128
Li HMourad AZhuang SKoopman BZuccon G(2023)Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and PitfallsACM Transactions on Information Systems10.1145/357072441:3(1-40)Online publication date: 10-Apr-2023
https://dl.acm.org/doi/10.1145/3570724
Mackie IChatterjee SDalton JChen HDuh WHuang HKato MMothe JPoblete B(2023)Generative Relevance Feedback with Large Language ModelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591992(2026-2031)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591992
Wang CWang PQin TWang CKumar SGuan XLiu JChang K(2023)SocialSift: Target Query Discovery on Online Social Media With Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313058734:9(5654-5668)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2021.3130587
Jaenich TMcDonald GOunis I(2023)ColBERT-FairPRF: Towards Fair Pseudo-Relevance Feedback in Dense RetrievalAdvances in Information Retrieval10.1007/978-3-031-28238-6_36(457-465)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-28238-6_36
Wang XMacdonald COunis I(2022)Improving zero-shot retrieval using dense external expansionInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10302659:5Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1016/j.ipm.2022.103026
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten