skip to main content
10.1145/1571941.1571954acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Query dependent pseudo-relevance feedback based on wikipedia

Published: 19 July 2009 Publication History

Abstract

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.

References

[1]
C. Bishop. Pattern recognition and machine learning. Springer, 2006.
[2]
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Text REtrieval Conference, 1994.
[3]
G. Cao, J.-Y. Nie, J. Gao, and S. Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of SIGIR 2008, pages 243--250.
[4]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[5]
P.A. Chirita, C.S. Firan, and W. Nejdl. Personalized query expansion for the web. In Proceeding of SIGIR 2007, pages 7--14.
[6]
S. Cronen-Townsend, Y. Zhou, and W.B. Croft. A framework for selective query expansion. In Proceedings of CIKM 2004, pages 236--237.
[7]
J.L. Elsas, J. Arguello, J. Callan, and J.G. Carbonell. Retrieval and feedback models for blog feed search. In Proceedings of SIGIR 2008, pages 347--354.
[8]
C. Fautsch and J. Savoy. UniNE at TREC 2008: Fact and Opinion Retrieval in the Blogsphere. In Proceedings of TREC 2008.
[9]
S. Fissaha Adafre, V. Jijkoun, and M. de Rijke. Fact discovery in Wikipedia. In Proceedings of Web Intelligence 2007, pages 177--183.
[10]
B.M. Fonseca, P. Golgher, B. Pôssas, B. Ribeiro-Neto, and N. Ziviani. Concept-based interactive query expansion. In Proceedings of CIKM 2005, pages 696--703.
[11]
G.R. Giambattista Amati, Claudio Carpineto and F.U. Bordoni. Query di±culty, robustness and selective application of query expansion. In Proceedings of ECIR 2004, pages 127--137, 2004.
[12]
J.-Y. N. Hang Cui, Ji-Rong Wen and W.-Y. Ma. Query expansion by mining user logs. IEEE Transactions on knowledge and data engineering, 15(4):829--839, 2003.
[13]
B. He and I. Ounis. Combining fields for query expansion and adaptive query expansion. Information Processing and Management, 2007.
[14]
Indri. http://www.lemurproject.org/indri/.
[15]
J.A. Hartigan and M. Wong. A k-means clustering algorithm. Applied Statistics.
[16]
W. W.-J. H.K. Balog, E. Meij and M. de Rijke. The University of Amsterdam at TREC 2008: Blog, Enterprise, and Relevance Feedback. In Proceedings of TREC 2008.
[17]
K.L. Kwok and M. Chan. Improving two-stage ad-hoc retrieval for short queries. In Proceedings of SIGIR 1998, pages 250--256.
[18]
V. Lavrenko and W.B. Croft. Relevance based language models. In Proceedings of SIGIR 2001, pages 120--127.
[19]
K.S. Lee, W.B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of SIGIR 2008, pages 235--242.
[20]
Y. Li, W.P.R. Luk, K.S.E. Ho, and F.L.K. Chung. Improving weak ad-hoc queries using Wikipedia as external corpus. In Proceedings of SIGIR 2007, pages 797--798.
[21]
D. Metzler and W.B. Croft. Latent concept expansion using markov random fields. In Proceedings of SIGIR 2007, pages 311--318.
[22]
D. Metzler, T. Strohman, H. Turtle, and W. Croft. Indri at trec 2005: Terabyte track. In Proceedings of TREC 2004.
[23]
D.N. Milne, I.H. Witten, and D.M. Nichols. A knowledge-based search engine powered by Wikipedia. In Proceedings of CIKM 2007, pages 445--454.
[24]
G. Mishne. Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam, Amsterdam, 2007.
[25]
J. Platt. Probabilities for SV machines. Advances in large margin classifiers, pages 61--74.
[26]
S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In Proceedings of CIKM 2004, pages 42--49.
[27]
S.E. Robertson, S. Walker, M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In In Proceedings of the 4th Text REtrieval Conference (TREC), 1996.
[28]
M. Sanderson. Ambiguous queries: test collections need more sense. In Proceedings of SIGIR 2008, pages 499--506.
[29]
T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of SIGIR 2006, pages 162--169. Wikipedia. http://www.wikipedia.org.
[30]
J. Xu and W.B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.
[31]
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty. In Proceedings of SIGIR 2005, pages 512--519.
[32]
M.M. Zesch Torsten, Gurevych Iryna. Analyzing and accessing Wikipedia as a lexical semantic resource. In Biannual Conference of the Society for Computational Linguistics and Language Technology 2007, pages 213--221.
[33]
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM 2001, pages 403--410.
[34]
W. Zhang and C. Yu. UIC at TREC 2006 Blog Track. In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2007.

Cited By

View all
  • (2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
  • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
  • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
  • Show More Cited By

Index Terms

  1. Query dependent pseudo-relevance feedback based on wikipedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
    July 2009
    896 pages
    ISBN:9781605584836
    DOI:10.1145/1571941
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity
    2. information retrieval
    3. pseudo-relevance feedback
    4. query expansion
    5. wikipedia

    Qualifiers

    • Research-article

    Conference

    SIGIR '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
    • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
    • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
    • (2023)Selective Query Processing: A Risk-Sensitive Selection of Search ConfigurationsACM Transactions on Information Systems10.1145/360847442:1(1-35)Online publication date: 21-Aug-2023
    • (2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
    • (2023)Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and PitfallsACM Transactions on Information Systems10.1145/357072441:3(1-40)Online publication date: 10-Apr-2023
    • (2023)Generative Relevance Feedback with Large Language ModelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591992(2026-2031)Online publication date: 19-Jul-2023
    • (2023)SocialSift: Target Query Discovery on Online Social Media With Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313058734:9(5654-5668)Online publication date: Sep-2023
    • (2023)ColBERT-FairPRF: Towards Fair Pseudo-Relevance Feedback in Dense RetrievalAdvances in Information Retrieval10.1007/978-3-031-28238-6_36(457-465)Online publication date: 17-Mar-2023
    • (2022)Improving zero-shot retrieval using dense external expansionInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10302659:5Online publication date: 1-Sep-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media