ABSTRACT
This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton's theory of gravitation. A mapping is built in this model from concepts of information retrieval (documents, queries, relevance, etc) to those of physics (mass, distance, radius, attractive force, etc). This model actually provides a new perspective on IR problems. A family of effective term weighting functions can be derived from it, including the well-known BM25 formula. This model has some advantages over most existing ones: First, because it is directly based on basic physical laws, the derived formulas and algorithms can have their explicit physical interpretation. Second, the ranking formulas derived from this model satisfy more intuitive heuristics than most of existing ones, thus have the potential to behave empirically better and to be used safely on various settings. Finally, a new approach for structured document retrieval derived from this model is more reasonable and behaves better than existing ones.
- G. Amati and C.J.V. Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4):357--389, 2002. Google ScholarDigital Library
- K. Sparck Jones and P. Willett, editors. Readings in Information Retrieval. Morgan Kaufmann, 1997. Google ScholarDigital Library
- G. Salton, A.Wong, and C.S. Yang. A vector space model for information retrieval. Communications of the ACM, 18(11): 613--620, Nov. 1975. Google ScholarDigital Library
- A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of SIGIR'96. Google ScholarDigital Library
- H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In Proceedings of SIGIR'04. Google ScholarDigital Library
- N. Fuhr. Probabilistic models in information retrieval. The computer Journal, Vol.35, No.3, pp 243--255. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR'01, Sept 2001. Google ScholarDigital Library
- R. Baeza-Yates, and B. Ribeiro-Neto. Modern Information Retrieval, ACM Press, 1999. Google ScholarDigital Library
- S.E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of SIGIR'94, 1994. Google ScholarDigital Library
- S.E. Robertson, C.J.V. Rijsbergen, and M.F. Porter. Probabilistic models of indexing and searching. In Proceedings of SIGIR'80. Google ScholarDigital Library
- S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC--7: automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of TREC'99.Google Scholar
- J. Ponte and W.B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR'98. Google ScholarDigital Library
- F. Song and B. Croft. A general language model for information retrieval. In Proceedings of SIGIR'99. Google ScholarDigital Library
- S.K.M. Wong and Y.Y. Yao. On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13(1), 69--99, 1995. Google ScholarDigital Library
- W.B. Croft. Combining approaches to information retrieval. In Advances in Information Retrieval, pp. 1--36. Kluwer, 2000. Google ScholarDigital Library
- S. Robetson, H. Zaragoza, and M. Yaylor. Simple BM25 extension to multiple weighted fields. In Proceedings of CIKM'04. Google ScholarDigital Library
- P. Ogilvie and J. Callan. Combining document representations for known item search. In Proceedings of SIGIR'03. Google ScholarDigital Library
- R. Wilkinson. Effective retrieval of structured documents. In Proceedings of SIGIR'94. Google ScholarDigital Library
- M. Lalmas. Uniform representation of content and structure for structured document retrieval. Technical report, Queen Mary and Westfield College, University of London, 2000.Google Scholar
- S.H. Myaeng, D.H.Jang, M.S. Kim, and Z.C.Zhoo. A flexible model for retrieval of SGML documents. In Proceedings of SIGIR'98. Google ScholarDigital Library
- S. Shi, J.R. Wen, Q. Yu, R. Song, and W.Y. Ma. Gravitation-based model for information retrieval (extended version). Technique report, MSR-TR-2005-65, Microsoft Research, May 2005.Google Scholar
- L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Libraries Working Paper, 1998.Google Scholar
- TREC main page: http://trec.nist.gov/Google Scholar
- B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Kluwer Academic Publishers, 2003. Google ScholarDigital Library
Index Terms
- Gravitation-based model for information retrieval
Recommendations
Sentence-based relevance flow analysis for high accuracy retrieval
Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond ...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalPreviously proposed approaches to ad-hoc entity retrieval in the Web of Data (ERWD) used multi-fielded representation of entities and relied on standard unigram bag-of-words retrieval models. Although retrieval models incorporating term dependencies ...
An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval
Semantic search has been one of the motivations of the Semantic Web since it was envisioned. We propose a model for the exploitation of ontology-based knowledge bases to improve search over large document repositories. In our view of Information ...
Comments