ABSTRACT
Pseudo-relevance feedback via query expansion has been widely studied from various perspectives in the past decades. Its effectiveness in improving retrieval effectiveness has been shown in many tasks. A variety of criteria were proposed to select additional terms for the original queries. However, most of the existing methods weight and select terms individually and do not consider the impact of term-to-term relationship. In this paper, we first examine the influence of combinations of terms through data analysis, which demonstrate the significant effect of term-to-term relationship on retrieval effectiveness. Then, to address this problem, we formalize the query expansion task as an integer linear programming (ILP) problem. The model combines the weights learned from a supervised method for individual terms, and integrates constraints to capture relations between terms. Finally, three standard TREC collections are used to evaluate the proposed method. Experimental results demonstrate that the proposed method can significantly improve the effectiveness of retrieval.
- D. Alevras and M. W. Padberg. Linear Optimization and Extensions: Problems and Solutions. Springer, 2001.Google ScholarCross Ref
- J. Bhogal, A. Macfarlane, and P. Smith. A review of ontology based query expansion. Information Processing & Management, 43(4):866--886, 2007. Google ScholarDigital Library
- C. Buckley. Automatic query expansion using smart : Trec 3. In Proceedings of The third Text REtrieval Conference (TREC-3), pages 69--80, 1994.Google Scholar
- G. Cao, J.-Y. Nie, J. Gao, and S. Robertson. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR '08, pages 243--250, 2008. Google ScholarDigital Library
- C. Carpineto, R. de Mori, G. Romano, and B. Bigi. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1):1--27, 2001. Google ScholarDigital Library
- K. Collins-Thompson. Estimating robust query models with convex optimization. In Advances in Neural Information Processing Systems 21 (NIPS), pages 329--336, 2008.Google Scholar
- K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In Proceedings of the Eighteenth International Conference on Information and Knowledge Management (CIKM 2009), pages 329--336, Hong Kong, China, 2009. ACM. Google ScholarDigital Library
- K. Collins-Thompson and J. Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In SIGIR '07, pages 303--310, 2007. Google ScholarDigital Library
- B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley Publishing Company, USA, 1st edition, 2009. Google ScholarDigital Library
- H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15(4):829--839, 2003. Google ScholarDigital Library
- X. Huang and W. B. Croft. A unified relevance model for opinion retrieval. In Proceedings of 16th Conference on Information and Knowledge Management(CIKM 2009), Hong Kong, China, 2009. Google ScholarDigital Library
- K. S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In SIGIR '08, pages 235--242, 2008. Google ScholarDigital Library
- D. I. Moldovan and R. Mihalcea. Using wordnet and lexical operators to improve internet searches. IEEE Internet Computing, 4(1):34--43, 2000. Google ScholarDigital Library
- S. E. Robertson. On term selection for query expansion. Journal of Documentation, 46(4):359--364, 1990. Google ScholarDigital Library
- J. Rocchio. Relevance Feedback in Information Retrieval, pages 313--323. 1971.Google Scholar
- R. Sun, C.-H. Ong, and T.-S. Chua. Mining dependency relations for query expansion in passage retrieval. In Proceedings of SIGIR 2006, pages 382--389, 2006. Google ScholarDigital Library
- R. Udupa, A. Bhole, and P. Bhattacharyya. "a term is known by the company it keeps": On selecting a good expansion set in pseudo-relevance feedback. In ICTIR '09: Proceedings of the 2nd International Conference on Theory of Information Retrieval, pages 104--115, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
- Y. Wu, Q. Zhang, Y. Zhou, and X. Huang. Pseudo-relevance feedback based on mrmr criteria. In P.-J. Cheng, M.-Y. Kan, W. Lam, and P. Nakov, editors, Information Retrieval Technology, volume 6458 of Lecture Notes in Computer Science, pages 211--220. Springer Berlin / Heidelberg, 2010.Google Scholar
- Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on wikipedia. In SIGIR '09, pages 59--66, 2009. Google ScholarDigital Library
- S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of WWW 2003, pages 11--18, 2003. Google ScholarDigital Library
Index Terms
- Selecting expansion terms as a set via integer linear programming
Recommendations
A Heuristic Ceiling Point Algorithm for General Integer Linear Programming
This paper first examines the role of ceiling points in solving a pure, general integer linear programming problem P. Several kinds of ceiling points are defined and analyzed and one kind called "feasible 1-ceiling points" proves to be of special ...
Integer linear programming model for multidimensional two-way number partitioning problem
This paper introduces a multidimensional generalization of the two-way number partitioning problem, as well as an integer linear programming formulation of the problem. There are n binary variables and 2p constraints. The numerical experiments are made ...
Global solutions of nonconvex standard quadratic programs via mixed integer linear programming reformulations
AbstractA standard quadratic program is an optimization problem that consists of minimizing a (nonconvex) quadratic form over the unit simplex. We focus on reformulating a standard quadratic program as a mixed integer linear programming problem. We ...
Comments