ACM Home Page
Please provide us with feedback. Feedback
A parallel derivation of probabilistic information retrieval models
Full text PdfPdf (225 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Semantics table of contents
Pages: 107 - 114  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Thomas Roelleke  Queen Mary, University of London
Jun Wang  Queen Mary, University of London
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 137,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148192
What is a DOI?

ABSTRACT

This paper investigates in a stringent athematical formalism the parallel derivation of three grand probabilistic retrieval models: binary independent retrieval (BIR), Poisson model (PM), and language modelling (LM).The investigation has been motivated by a number of questions. Firstly, though sharing the same origin, namely the probability of relevance, the models differ with respect to event spaces. How can this be captured in a consistent notation, and can we relate the event spaces? Secondly, BIR and PM are closely related, but how does LM fit in? Thirdly, how are tf-idf and probabilistic models related? .The parallel investigation of the models leads to a number of formalised results:

  1. BIR and PM assume the collection to be a set of non-relevant documents, whereas LM assumes the collection to be a set of terms from relevant documents.
  2. PM can be viewed as a bridge connecting BIR and LM.
  3. A BIR-LM equivalence explains BIR as a special LM case.
  4. PM explains tf-idf, and both, BIR and LM probabilities express tf-idf in a dual way.
.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Amati. Probability Models for Information Retrieval based on Divergence from Randomness Ph thesis, Glasgow University, June 2003.
 
2
3
 
4
Marcia J. Bates. After the dot-bomb: Getting web information retrieval right this time. First Monday 7(7), 2002.
 
5
K. Church and W Gale. Inverse document frequency (idf): A measure of deviation from poisson.In Proceedings of the Third Workshop on Very Large Corpora pages 121--130, 1995.
 
6
7
 
8
joerd Hiemstra. A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2): 131--139, 2000.
 
9
K. Sparck Jones, S. E. Robertson,. Hiemstra, and H. Zaragoza. Language modelling and relevance. Language Modelling for Information Retrieval pages 57--70,2003.
10
 
11
John Lafferty and ChengXiang Zhai. Probabilistic Relevance Models Based on Document and Query Generation chapter 1. In Croft and Lafferty {6}, 2002.
12
13
 
14
 
15
 
16
S. E. Robertson. Understanding inverse document frequency:On theoretical arguments for idf. Journal of Documentation 60:503--520,2004.
 
17
S. E. Robertson and K. Sparck Jones. Relevance weighting of search terms.Journal of the American Society for Information Science 27: 129--146, 1976.
 
18
 
19
20
 
21


Collaborative Colleagues:
Thomas Roelleke: colleagues
Jun Wang: colleagues