| A parallel derivation of probabilistic information retrieval models |
| Full text |
Pdf
(225 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Seattle, Washington, USA
SESSION: Semantics
table of contents
Pages: 107 - 114
Year of Publication: 2006
ISBN:1-59593-369-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 137, Citation Count: 1
|
|
|
ABSTRACT
This paper investigates in a stringent athematical formalism the parallel derivation of three grand probabilistic retrieval models: binary independent retrieval (BIR), Poisson model (PM), and language modelling (LM).The investigation has been motivated by a number of questions. Firstly, though sharing the same origin, namely the probability of relevance, the models differ with respect to event spaces. How can this be captured in a consistent notation, and can we relate the event spaces? Secondly, BIR and PM are closely related, but how does LM fit in? Thirdly, how are tf-idf and probabilistic models related? .The parallel investigation of the models leads to a number of formalised results: - BIR and PM assume the collection to be a set of non-relevant documents, whereas LM assumes the collection to be a set of terms from relevant documents.
- PM can be viewed as a bridge connecting BIR and LM.
- A BIR-LM equivalence explains BIR as a special LM case.
- PM explains tf-idf, and both, BIR and LM probabilities express tf-idf in a dual way.
.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Amati. Probability Models for Information Retrieval based on Divergence from Randomness Ph thesis, Glasgow University, June 2003.
|
| |
2
|
|
 |
3
|
|
| |
4
|
Marcia J. Bates. After the dot-bomb: Getting web information retrieval right this time. First Monday 7(7), 2002.
|
| |
5
|
K. Church and W Gale. Inverse document frequency (idf): A measure of deviation from poisson.In Proceedings of the Third Workshop on Very Large Corpora pages 121--130, 1995.
|
| |
6
|
|
 |
7
|
|
| |
8
|
joerd Hiemstra. A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2): 131--139, 2000.
|
| |
9
|
K. Sparck Jones, S. E. Robertson,. Hiemstra, and H. Zaragoza. Language modelling and relevance. Language Modelling for Information Retrieval pages 57--70,2003.
|
 |
10
|
|
| |
11
|
John Lafferty and ChengXiang Zhai. Probabilistic Relevance Models Based on Document and Query Generation chapter 1. In Croft and Lafferty {6}, 2002.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
S. E. Robertson. Understanding inverse document frequency:On theoretical arguments for idf. Journal of Documentation 60:503--520,2004.
|
| |
17
|
S. E. Robertson and K. Sparck Jones. Relevance weighting of search terms.Journal of the American Society for Information Science 27: 129--146, 1976.
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
|