Article

Enhancing relevance scoring with chronological term rank

Authors:

Guo-Qiang ZhangAuthors Info & Claims

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 599 - 606

https://doi.org/10.1145/1277741.1277844

Published: 23 July 2007 Publication History

Abstract

We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.

References

[1]

V. N. Anh and A. Moffat. Impact transformation: effective and efficient web retrieval. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, 2002.

Digital Library

[2]

V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 226--233, 2005.

Digital Library

[3]

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.

Digital Library

[4]

M. Beigbeder and A. Mercier. An information retrieval model using the fuzzy proximity degree of term occurences. In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, pages 1018--1022, 2005.

Digital Library

[5]

S. Büttcher, C. L. A. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 621--622, 2006.

Digital Library

[6]

J. R. Dominick. The Dynamics of Mass Communication. McGraw-Hill Inc., 1990.

[7]

D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, August 1996.

[8]

R. Jin, A. G. Hauptmann, and C. X. Zhai. Title language model for information retrieval. In SIGIR'02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 42--48, 2002.

Digital Library

[9]

K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.

[10]

E. M. Keen. Term position ranking: some new test results. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 66--76, 1992.

Digital Library

[11]

H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--168, 1958.

Digital Library

[12]

M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.

[13]

Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In Proceedings of the 25th European Conference on IR Research (ECIR 2003), pages 207--218, April 2003.

Digital Library

[14]

S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM'04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, 2004.

Digital Library

[15]

S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, pages 253--264, July 1999.

[16]

A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996.

Digital Library

[17]

T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. Technical Report IR-416, University of Massachusetts Amherst, 2005.

[18]

E. M. Voorhees and L. P. Buckland, editors. Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), NIST Special Publication 500-266. National Institute of Standards and Technology, November 15-18 2005.

[19]

H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proceedings of the the Thirteenth Text REtrieval Conference (TREC 2004), NIST Special Publication 500--261, 2004.

Cited By

Hammache ABoughanem M(2021)Term position‐based language model for information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2443172:5(627-642)Online publication date: 10-Apr-2021
https://dl.acm.org/doi/10.1002/asi.24431
Jimenez SCucerzan SGonzalez FGelbukh ADueñas G(2018)BM25-CTF: Improving TF and IDF factors in BM25 by using collection term frequenciesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16947534:5(2887-2899)Online publication date: 24-May-2018
https://doi.org/10.3233/JIFS-169475
Huang C(2015)Anecdotes extraction from webpage context as image annotationEmerging Trends in Image Processing, Computer Vision and Pattern Recognition10.1016/B978-0-12-802045-6.00022-3(353-367)Online publication date: 2015
https://doi.org/10.1016/B978-0-12-802045-6.00022-3
Show More Cited By

Index Terms

Enhancing relevance scoring with chronological term rank
1. Information systems
  1. Information retrieval

Recommendations

Term Proximity Constraints for Pseudo-Relevance Feedback
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pseudo-relevance feedback (PRF) refers to a query expansion strategy based on top-retrieved documents, which has been shown to be highly effective in many retrieval models. Previous work has introduced a set of constraints (axioms) that should be ...
Relevance ranking for one to three term queries
RIAO '97: Computer-Assisted Information Searching on Internet

We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating ...
Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval
Special issue: AIRS2005: Information retrieval research in Asia

In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

July 2007

946 pages

ISBN:9781595935977

DOI:10.1145/1277741

General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR07

Sponsor:

SIGIR07: The 30th Annual International SIGIR Conference

July 23 - 27, 2007

Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
830
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hammache ABoughanem M(2021)Term position‐based language model for information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2443172:5(627-642)Online publication date: 10-Apr-2021
https://dl.acm.org/doi/10.1002/asi.24431
Jimenez SCucerzan SGonzalez FGelbukh ADueñas G(2018)BM25-CTF: Improving TF and IDF factors in BM25 by using collection term frequenciesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16947534:5(2887-2899)Online publication date: 24-May-2018
https://doi.org/10.3233/JIFS-169475
Huang C(2015)Anecdotes extraction from webpage context as image annotationEmerging Trends in Image Processing, Computer Vision and Pattern Recognition10.1016/B978-0-12-802045-6.00022-3(353-367)Online publication date: 2015
https://doi.org/10.1016/B978-0-12-802045-6.00022-3
Huang CChang Y(2013)Applying a lightweight iterative merging chinese segmentation in web image annotationProceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-39712-7_14(183-194)Online publication date: 19-Jul-2013
https://dl.acm.org/doi/10.1007/978-3-642-39712-7_14
Acun BBaşpınar AOğuz ESaraç MCan F(2012)Topic Tracking Using Chronological Term RankingComputer and Information Sciences III10.1007/978-1-4471-4594-3_36(353-361)Online publication date: 30-Oct-2012
https://doi.org/10.1007/978-1-4471-4594-3_36
Nunes SRibeiro CDavid G(2011)Term weighting based on document revision historyJournal of the American Society for Information Science and Technology10.1002/asi.2159762:12(2471-2478)Online publication date: 1-Dec-2011
https://dl.acm.org/doi/10.1002/asi.21597
Lee JSeo JJeon JRim H(2011)Sentence-based relevance flow analysis for high accuracy retrievalJournal of the American Society for Information Science and Technology10.1002/asi.2156462:9(1666-1675)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1002/asi.21564
Can FKocberber SBaglioglu OKardas SOcalan HUyar E(2010)New event detection and topic tracking in TurkishJournal of the American Society for Information Science and Technology10.5555/1753126.175313361:4(802-819)Online publication date: 1-Apr-2010
https://dl.acm.org/doi/10.5555/1753126.1753133
Can FKocberber SBaglioglu OKardas SOcalan HUyar E(2010)New event detection and topic tracking in TurkishJournal of the American Society for Information Science and Technology10.1002/asi.2126461:4(802-819)Online publication date: 12-Jan-2010
https://doi.org/10.1002/asi.21264
Sendhilkumar SMahalakshmi GRajasekar S(2009)Ontology-based automatic query refinementInternational Journal of Artificial Intelligence and Soft Computing10.1504/IJAISC.2009.0272981:2/3/4(316-337)Online publication date: 1-Jul-2009
https://dl.acm.org/doi/10.1504/IJAISC.2009.027298
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten