ACM Home Page
Please provide us with feedback. Feedback
Relevance information: a loss of entropy but a gain for IDF?
Full text PdfPdf (207 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
SESSION: Theory 2 table of contents
Pages: 282 - 289  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Arjen P. de Vries  CWI, The Netherlands
Thomas Roelleke  Queen Mary University of London, London, United Kingdom
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 78,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076084
What is a DOI?

ABSTRACT

When investigating alternative estimates for term discriminativeness, we discovered that relevance information and idf are much closer related than formulated in classical literature. Therefore, we revisited the justification of idf as it follows from the binary independent retrieval (BIR) model. The main result is a formal framework uncovering the close relationship of a generalised idf and the BIR model. The framework makes explicit how to incorporate relevance information into any retrieval function that involves an idf-component.In addition to the idf-based formulation of the BIR model, we propose Poisson-based estimates as an alternative to the classical estimates, this being motivated by the superiority of Poisson-based estimates for the within-document term frequencies. The main experimental finding is that a Poisson-based idf is superior to the classical idf, where the superiority is particularly evident for long queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Amati. Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Glasgow University, June 2003.
 
2
K.W. Church and W.A. Gale. Inverse document frequency: A measure of deviations from poisson. In Third Workshop on Very Large Corpora, ACL Anthology, 1995.
 
3
A.P. de Vries and D. Hiemstra. The Mirror DBMS at TREC-8. In Proceedings of the Eighth Text Retrieval Conference TREC-8, pages 725--734, Gaithersburg, Maryland, November 1999.
4
 
5
J. Lafferty and Ch. Zhai. Probabilistic Relevance Models Based on Document and Query Generation, chapter 1. Kluwer, 2002.
 
6
 
7
 
8
S.E. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
 
9
Stephen Robertson. Understanding inverse document frequency: on theoretical arguments. Journal of Documentation, 60(5):503--520, 2004.
10
 
11
T. Roelleke, T. Tsikrika, and G. Kazai. A general matrix framework for modelling information retrieval. Journal on Information Processing & Management (IP&M), 2005. To appear.
 
12
13
 
14
15


Collaborative Colleagues:
Arjen P. de Vries: colleagues
Thomas Roelleke: colleagues