ABSTRACT
The query-performance prediction task is to estimate retrieval effectiveness with no relevance judgments. Pre-retrieval prediction methods operate prior to retrieval time. Hence, these predictors are often based on analyzing the query and the corpus upon which retrieval is performed. We propose a {\em corpus-independent} approach to pre-retrieval prediction which relies on information extracted from Wikipedia. Specifically, we present Wikipedia-based features that can attest to the effectiveness of retrieval performed in response to a query {\em regardless} of the corpus upon which search is performed. Empirical evaluation demonstrates the merits of our approach. As a case in point, integrating the Wikipedia-based features with state-of-the-art pre-retrieval predictors that analyze the corpus yields prediction quality that is consistently better than that of using the latter alone.
- J. Arguello, J. L. Elsas, J. Callan, and J. G. Carbonell. Document representation and query expansion models for blog recommendation. In Proceedings of ICWSM, 2008.Google Scholar
- N. Balasubramanian, G. Kumaran, and V. R. Carvalho. Predicting query performance on the web. In Proceedings of SIGIR, pages 785--786, 2010. Google ScholarDigital Library
- K. Balog, M. Bron, and M. De Rijke. Category-based query modeling for entity search. In Advances in Information Retrieval, pages 319--331. Springer, 2010. Google ScholarDigital Library
- J. Callan. Distributed information retrieval. In W. Croft, editor, Advances in information retrieval, chapter 5, pages 127--150. Kluwer Academic Publishers, 2000.Google Scholar
- D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, 2010. Google ScholarDigital Library
- D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In Proceedings of SIGIR, pages 390--397, 2006. Google ScholarDigital Library
- K. Collins-Thompson and P. N. Bennett. Predicting query performance via classification. In Proceedings of ECIR, pages 140--152, 2010. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of SIGIR, pages 299--306, 2002. Google ScholarDigital Library
- C. Hans. Bayesian lasso regression. Biometrika, 96(4):835--845, 2009.Google ScholarCross Ref
- C. Hauff, L. Azzopardi, and D. Hiemstra. The combination and evaluation of query performance prediction methods. In Proceedings of ECIR, pages 301--312, 2009. Google ScholarDigital Library
- C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In Proceedings of CIKM, pages 1419--1420, 2008. Google ScholarDigital Library
- B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proceedings of SPIRE, pages 43--54, 2004.Google ScholarCross Ref
- E. Hoque, G. Strong, O. Hoeber, and M. Gong. Conceptual query expansion and visual search results exploration for web image retrieval. In Advances in Intelligent Web Mastering--3, pages 73--82. Springer, 2011.Google Scholar
- O. Kurland, A. Shtok, S. Hummel, F. Raiber, D. Carmel, and O. Rom. Back to the roots: a probabilistic framework for query-performance prediction. In Proceedings of CIKM, pages 823--832, 2012. Google ScholarDigital Library
- J. Mothe and L. Tanguy. Linguistic features to predict query difficulty. In ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications, 2005.Google Scholar
- F. Scholer, H. E. Williams, and A. Turpin. Query association surrogates for web search. JASIST, 55(7):637--650, 2004. Google ScholarDigital Library
- F. Song and W. B. Croft. A general language model for information retrieval (poster abstract). In Proceedings of SIGIR, pages 279--280, 1999. Google ScholarDigital Library
- A.-M. Vercoustre, J. Pehcevski, and V. Naumovski. Topic difficulty prediction in entity ranking. In Proceedings of INEX, pages 280--291, 2009. Google ScholarDigital Library
- Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of SIGIR, pages 59--66, 2009. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of SIGIR, pages 512--519, 2005. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001. Google ScholarDigital Library
- Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proceedings of ECIR, pages 52--64, 2008. Google ScholarDigital Library
Index Terms
- Wikipedia-based query performance prediction
Recommendations
Query-performance prediction: setting the expectations straight
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalThe query-performance prediction task has been described as estimating retrieval effectiveness in the absence of relevance judgments. The expectations throughout the years were that improved prediction techniques would translate to improved retrieval ...
Query-Performance Prediction Using Minimal Relevance Feedback
ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information RetrievalThere has been much work on devising query-performance prediction approaches that estimate search effectiveness without relevance judgments (i.e., zero feedback). Specifically, post-retrieval predictors analyze the result list of top-retrieved ...
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalPseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Comments