|
ABSTRACT
The setting of the term frequency normalization hyper-parameter suffers from the query dependence and collection dependence problems, which remarkably hurt the robustness of the retrieval performance. Our study in this article investigates three term frequency normalization methods, namely normalization 2, BM25's normalization and the Dirichlet Priors normalization. We tackle the query dependence problem by modifying the query term weight using a Divergence From Randomness term weighting model, and tackle the collection dependence problem by measuring the correlation of the normalized term frequency with the document length. Our research hypotheses for the two problems, as well as an automatic hyper-parameter setting methodology, are extensively validated and evaluated on four Text REtrieval Conference (TREC) collections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amati, G. 2003. Probabilistic models for information retrieval based on divergence from randomness. Ph.D. dissertation, Department of Computing Science, University of Glasgow.
|
| |
2
|
Amati, G., Carpineto, C., and Romano, G. 2004. Fondazione Ugo Bordoni at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC 2004) (Gaithersburg, MD).
|
 |
3
|
|
| |
4
|
Buckley, C., Salton, G., and Allan, J. 1992. Automatic retrieval with locality information using Smart. In Proceedings of the 1st Text REtrieval Conference (TREC-1) (Gaithersburg, MD).
|
| |
5
|
DeGroot, M. 1989. Probability and Statistics, 2nd edition ed. Addison Wesley, Reading, MA.
|
| |
6
|
Efthimiadis, N. E. 1996. Query expansion. In Ann. Rev. Inf. Syst. Tech. 31.
|
| |
7
|
Hawking, D. 2000. Overview of the TREC-9 Web Track. In Proceedings of the 9th Text REtrieval Conference (TREC-9) (Gaithersburg, MD).
|
| |
8
|
Hawking, D. and Craswell, N. 2001. Overview of the TREC 2001 Web Track. In Proceedings of the 10th Text REtrieval Conference (TREC-10) (Gaithersburg, MD).
|
| |
9
|
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. 1999. Overview of the TREC-8 Web Track. In Proceedings of the 9th Text REtrieval Conference (TREC-8) (Gaithersburg, MD).
|
 |
10
|
|
| |
11
|
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., and Liona, C. 2006. Terrier: A high performance and scalable Information Retrieval platform. In Proceedings of ACM SIGIR OSIR Workshop 2006 (Seattle, WA).
|
| |
12
|
Robertson, S.E., Walker, S., Beaulieu, M. M., Gatford, M., and Payne, A. 1995. Okapi at TREC-4. In NIST Special Publication 500-236: The 4th Text REtrieval Conference (TREC-4) (Gaithersburg, MD).
|
| |
13
|
Rocchio, J. 1971. Relevance feedback in information retrieval. Prentice-Hall, Englewood Cliffs, NJ.
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
|