ABSTRACT
Existing retrieval models generally do not offer any guarantee for optimal retrieval performance. Indeed, it is even difficult, if not impossible, to predict a model's empirical performance analytically. This limitation is at least partly caused by the way existing retrieval models are developed where relevance is only coarsely modeled at the level of documents and queries as opposed to a finer granularity level of terms. In this paper, we present a new axiomatic approach to developing retrieval models based on direct modeling of relevance with formalized retrieval constraints defined at the level of terms. The basic idea of this axiomatic approach is to search in a space of candidate retrieval functions for one that can satisfy a set of reasonable retrieval constraints. To constrain the search space, we propose to define a retrieval function inductively and decompose a retrieval function into three component functions. Inspired by the analysis of the existing retrieval functions with the inductive definition, we derive several new retrieval functions using the axiomatic retrieval framework. Experiment results show that the derived new retrieval functions are more robust and less sensitive to parameter settings than the existing retrieval functions with comparable optimal performance.
- P. D. Bruza and T. Huibers. investigating aboutness axioms using information fields. In Proceedings of the 1994 ACM SIGIR Conference on Research and Development in Information Retrieval, 1994. Google ScholarDigital Library
- H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In Proceedings of the 2004 ACM SIGIR Conference on Research and Development in Information Retrieval, 2004. Google ScholarDigital Library
- N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243--255, 1992. Google ScholarDigital Library
- W. R. Grieff. A theory of term weighting based on exploratory data analysis. In Proceedings of the 1998 ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarDigital Library
- F. Hartiwig and B. E. Dearing. Exploratory Data Analysis. Sage Publications, 1979.Google ScholarCross Ref
- T. Huibers. Towards an axiomatic aboutness theory for information retrieval. Information Retrieval, Uncertainty and Logics-Advanced Models for the representation and retrieval for information, 1998. Google ScholarDigital Library
- J. Kleinberg. An impossibility theorem for clustering. In Advances in NIPS 15, 2002.Google Scholar
- J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In W. B. Croft and J. Lafferty, editors, Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.Google ScholarCross Ref
- J. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the ACM SIGIR'98, pages 275--281, 1998. Google ScholarDigital Library
- S. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.Google ScholarCross Ref
- S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of SIGIR'94, pages 232--241, 1994. Google ScholarDigital Library
- S. E. Robertson, S. Walker, S. Jones, M. M.Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, The Third Text REtrieval Conference (TREC-3), pages 109--126, 1995.Google Scholar
- G. Salton. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, 1989. Google ScholarDigital Library
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24:513--523, 1988. Google ScholarDigital Library
- G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983. Google ScholarDigital Library
- G. Salton, C. S. Yang, and C. T. Yu. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1):33--44, Jan-Feb 1975.Google ScholarCross Ref
- A. Singhal. Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24(4):35--43, 2001.Google Scholar
- A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 1996 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--29, 1996. Google ScholarDigital Library
- K. Sparck Jones. A statistical interpretation of term specifity and its application in retrieval. Journal of Documentation, 28(1):11--22, 1972.Google ScholarCross Ref
- H. Turtle and W. B. Croft. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187--222, 1991. Google ScholarDigital Library
- C. J. van Rijbergen. A theoretical basis for theuse of co-occurrence data in information retrieval. Journal of Documentation, pages 106--119, 1977.Google Scholar
- K.-F. Wong, D. Song, P. Bruza, and C.-H. Cheng. Application of aboutness to functional benchmarking in information retrieval. ACM Transactions on Information Systems, 19(4):337--370, 2001. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'01, pages 334--342, Sept 2001. Google ScholarDigital Library
- J. Zobel and A. Moffat. Exploring the similarity space. SIGIR Forum, 31(1):18--34, 1998. Google ScholarDigital Library
Index Terms
- An exploration of axiomatic approaches to information retrieval
Recommendations
Semantic term matching in axiomatic approaches to information retrieval
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalA common limitation of many retrieval models, including the recently proposed axiomatic approaches, is that retrieval scores are solely based on exact (i.e., syntactic) matching of terms in the queries and documents, without allowing distinct but ...
A formal study of information retrieval heuristics
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalEmpirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" ...
Diagnostic Evaluation of Information Retrieval Models
Developing effective retrieval models is a long-standing central challenge in information retrieval research. In order to develop more effective models, it is necessary to understand the deficiencies of the current retrieval models and the relative ...
Comments