skip to main content
10.1145/1277741.1277844acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Enhancing relevance scoring with chronological term rank

Published: 23 July 2007 Publication History

Abstract

We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.

References

[1]
V. N. Anh and A. Moffat. Impact transformation: effective and efficient web retrieval. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2002.
[2]
V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 226--233, 2005.
[3]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
[4]
M. Beigbeder and A. Mercier. An information retrieval model using the fuzzy proximity degree of term occurences. In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, pages 1018--1022, 2005.
[5]
S. Büttcher, C. L. A. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 621--622, 2006.
[6]
J. R. Dominick. The Dynamics of Mass Communication. McGraw-Hill Inc., 1990.
[7]
D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, August 1996.
[8]
R. Jin, A. G. Hauptmann, and C. X. Zhai. Title language model for information retrieval. In SIGIR'02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 42--48, 2002.
[9]
K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.
[10]
E. M. Keen. Term position ranking: some new test results. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 66--76, 1992.
[11]
H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--168, 1958.
[12]
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[13]
Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In Proceedings of the 25th European Conference on IR Research (ECIR 2003), pages 207--218, April 2003.
[14]
S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM'04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, 2004.
[15]
S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, pages 253--264, July 1999.
[16]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996.
[17]
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. Technical Report IR-416, University of Massachusetts Amherst, 2005.
[18]
E. M. Voorhees and L. P. Buckland, editors. Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), NIST Special Publication 500-266. National Institute of Standards and Technology, November 15-18 2005.
[19]
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proceedings of the the Thirteenth Text REtrieval Conference (TREC 2004), NIST Special Publication 500--261, 2004.

Cited By

View all
  • (2021)Term position‐based language model for information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2443172:5(627-642)Online publication date: 10-Apr-2021
  • (2018)BM25-CTF: Improving TF and IDF factors in BM25 by using collection term frequenciesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16947534:5(2887-2899)Online publication date: 24-May-2018
  • (2015)Anecdotes extraction from webpage context as image annotationEmerging Trends in Image Processing, Computer Vision and Pattern Recognition10.1016/B978-0-12-802045-6.00022-3(353-367)Online publication date: 2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chronological term rank
  2. document structure
  3. relevance ranking
  4. similarity scoring
  5. term position
  6. term weighting

Qualifiers

  • Article

Conference

SIGIR07
Sponsor:
SIGIR07: The 30th Annual International SIGIR Conference
July 23 - 27, 2007
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Term position‐based language model for information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2443172:5(627-642)Online publication date: 10-Apr-2021
  • (2018)BM25-CTF: Improving TF and IDF factors in BM25 by using collection term frequenciesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16947534:5(2887-2899)Online publication date: 24-May-2018
  • (2015)Anecdotes extraction from webpage context as image annotationEmerging Trends in Image Processing, Computer Vision and Pattern Recognition10.1016/B978-0-12-802045-6.00022-3(353-367)Online publication date: 2015
  • (2013)Applying a lightweight iterative merging chinese segmentation in web image annotationProceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-39712-7_14(183-194)Online publication date: 19-Jul-2013
  • (2012)Topic Tracking Using Chronological Term RankingComputer and Information Sciences III10.1007/978-1-4471-4594-3_36(353-361)Online publication date: 30-Oct-2012
  • (2011)Term weighting based on document revision historyJournal of the American Society for Information Science and Technology10.1002/asi.2159762:12(2471-2478)Online publication date: 1-Dec-2011
  • (2011)Sentence-based relevance flow analysis for high accuracy retrievalJournal of the American Society for Information Science and Technology10.1002/asi.2156462:9(1666-1675)Online publication date: 1-Sep-2011
  • (2010)New event detection and topic tracking in TurkishJournal of the American Society for Information Science and Technology10.5555/1753126.175313361:4(802-819)Online publication date: 1-Apr-2010
  • (2010)New event detection and topic tracking in TurkishJournal of the American Society for Information Science and Technology10.1002/asi.2126461:4(802-819)Online publication date: 12-Jan-2010
  • (2009)Ontology-based automatic query refinementInternational Journal of Artificial Intelligence and Soft Computing10.1504/IJAISC.2009.0272981:2/3/4(316-337)Online publication date: 1-Jul-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media