ACM Home Page
Please provide us with feedback. Feedback
On setting the hyper-parameters of term frequency normalization for information retrieval
Full text PdfPdf (264 KB)
Source
ACM Transactions on Information Systems (TOIS) archive
Volume 25 ,  Issue 3  (July 2007) table of contents
Article No. 13  
Year of Publication: 2007
ISSN:1046-8188
Authors
Ben He  University of Glasgow, Glasgow, United Kingdom
Iadh Ounis  University of Glasgow, Glasgow, United Kingdom
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 133,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247715.1247719
What is a DOI?

ABSTRACT

The setting of the term frequency normalization hyper-parameter suffers from the query dependence and collection dependence problems, which remarkably hurt the robustness of the retrieval performance. Our study in this article investigates three term frequency normalization methods, namely normalization 2, BM25's normalization and the Dirichlet Priors normalization. We tackle the query dependence problem by modifying the query term weight using a Divergence From Randomness term weighting model, and tackle the collection dependence problem by measuring the correlation of the normalized term frequency with the document length. Our research hypotheses for the two problems, as well as an automatic hyper-parameter setting methodology, are extensively validated and evaluated on four Text REtrieval Conference (TREC) collections.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Amati, G. 2003. Probabilistic models for information retrieval based on divergence from randomness. Ph.D. dissertation, Department of Computing Science, University of Glasgow.
 
2
Amati, G., Carpineto, C., and Romano, G. 2004. Fondazione Ugo Bordoni at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC 2004) (Gaithersburg, MD).
3
 
4
Buckley, C., Salton, G., and Allan, J. 1992. Automatic retrieval with locality information using Smart. In Proceedings of the 1st Text REtrieval Conference (TREC-1) (Gaithersburg, MD).
 
5
DeGroot, M. 1989. Probability and Statistics, 2nd edition ed. Addison Wesley, Reading, MA.
 
6
Efthimiadis, N. E. 1996. Query expansion. In Ann. Rev. Inf. Syst. Tech. 31.
 
7
Hawking, D. 2000. Overview of the TREC-9 Web Track. In Proceedings of the 9th Text REtrieval Conference (TREC-9) (Gaithersburg, MD).
 
8
Hawking, D. and Craswell, N. 2001. Overview of the TREC 2001 Web Track. In Proceedings of the 10th Text REtrieval Conference (TREC-10) (Gaithersburg, MD).
 
9
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. 1999. Overview of the TREC-8 Web Track. In Proceedings of the 9th Text REtrieval Conference (TREC-8) (Gaithersburg, MD).
10
 
11
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., and Liona, C. 2006. Terrier: A high performance and scalable Information Retrieval platform. In Proceedings of ACM SIGIR OSIR Workshop 2006 (Seattle, WA).
 
12
Robertson, S.E., Walker, S., Beaulieu, M. M., Gatford, M., and Payne, A. 1995. Okapi at TREC-4. In NIST Special Publication 500-236: The 4th Text REtrieval Conference (TREC-4) (Gaithersburg, MD).
 
13
Rocchio, J. 1971. Relevance feedback in information retrieval. Prentice-Hall, Englewood Cliffs, NJ.
14
 
15
16