| Word document density and relevance scoring (poster session) |
| Full text |
Pdf
(247 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 345 - 347
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Authors
|
|
Martin Franz
|
IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
|
|
J. Scott McCarley
|
IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 22, Citation Count: 1
|
|
|
ABSTRACT
Previous work addressing the issue of word distribution in documents has shown the importance of Word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks [7] show useful performance improvements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. Franz, J. S. McCarley, S. Roukos, Ad hoe and MUltilingual Information Retrieval at IBM, in Proceedings of the Seventh Text REtrieval Conference (TREC-7) ed. by E. M. Vorhees and D.K. Harman. NIST Special Publication 500-242: 157-168, 1999.
|
| |
3
|
M. Franz, J. S. McCarley, R. T. Ward, Ad hoc, Crosslanguage and Spoken Document Information Retrieval at IBM, to apear in Proceedings of the Eighth Text RE- tmeval Conference (TREC-8} ed. by E. M. Vorhees and D.K. Harman.
|
| |
4
|
|
| |
5
|
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock- Beaulieu, M. Gatford, Okapi at TREC-3 in Proceedings of the Third Text REtrieval Conference (TREC-3) ed. by D.K. Harman. NIST Special Publication 500-225, 1995.
|
| |
6
|
R. Rosenfeld, A Maximum Entropy Approach to Adaptive Statistical Language Modeling, in Computer Speech and Language, 10: 187-228, 996.
|
| |
7
|
E. M. Voorhees, D. Harman, Overview of the Seventh Text Retrieval Conference (TREC-7), in Proceedings of the Seventh Text REtrieval Conference (TREC-7) ed. by E. M. Voorhees and D.K. Harman. NIST Special Publication 500-242: 1-23, 1999.
|
 |
8
|
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|