skip to main content
10.1145/1008992.1009039acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

An effective approach to document retrieval via utilizing WordNet and recognizing phrases

Published: 25 July 2004 Publication History

Abstract

Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different and are determined using a decision tree. Phrases are more important than individual terms. Consequently, documents in response to a query are ranked with matching phrases given a higher priority. We utilize WordNet to disambiguate word senses of query terms. Whenever the sense of a query term is determined, its synonyms, hyponyms, words from its definition and its compound words are considered for possible additions to the query. Experimental results show that our approach yields between 23% and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.

References

[1]
G. Amati, C. Carpineto, G. Romano. FUB at TREC-10 Web Track: A Probabilistic Framework for Topic Relevance Term Weighting. TREC10, 2001.
[2]
G. Amati and C. J. Van Rijsbergen. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM TOIS, 2002.
[3]
S. Banerjee, and T. Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. International Conference on Computational Linguistics and Intelligent Text Processing, 2002.
[4]
S. Banerjee, and T. Pedersen. Extended Gloss Overlaps as a Measure of Semantic Relatedness. International Joint Conference on Artificial Intelligence, 2003.
[5]
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
[6]
Eric Brill. Penn Treebank Tagger, Copyright by M.I.T and University of Pennsylvania.
[7]
C. Buckley and G. Salton. Optimization of Relevance Feedback Weights. ACM SIGIR, 1995.
[8]
W. Croft, H. Turtle, and D. Lewis. The use of phrases and structured queries in information retrieval. ACM SIGIR, 1991.
[9]
J. Fagan. Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Non-271 Syntactic Methods. PhD thesis, 1987, Department of CS, Cornell University.
[10]
C. Fellbaum, WordNet An Electronic Lexical Database. The MIT Press, 1998.
[11]
Sumio FUJITA. Reflections on "Aboutness" Evaluation Experiments at Justsystem, TREC9, 2000.
[12]
D. Grossman and O. Frieder, Ad Hoc Information Retrieval: Algorithms and Heuristics, Kluwer Academic Publishers, 1998.
[13]
J. Gonzalo, F. Verdejo, I. Chugur and J. Cigarran Indexing with WordNet synsets can improve Text Retrieval, COLING/ACL '98 Workshop on Usage of WordNet for NLP, 1998.
[14]
D. Hawking. Overview of the Web Track, TREC9, 2000.
[15]
D. Hawking, N. Craswell, Overview of the Web Track, TREC11, 2001.
[16]
R. Krovetz and W. Croft. Lexical ambiguity and information retrieval. ACM TOIS, 1992.
[17]
K. Kwok, L. Grunfeld, N. Dinstl, P. Deng, TREC 2003 Robust, HARD, and QA Track Experiments using PIRCS, TREC12, 2003.
[18]
M. Lesk. Automatic Sense Disambiguation Using Machine Readable Dictionaries: how to tell a pine cone from an ice cream cone. ACM SIGDOC, 1986.
[19]
D. Lin, 1994. PRINCIPAR---An Efficient, broad-coverage, principle-based parser. COLING. 1994.
[20]
M. Mitra, C. Buckley, A. Singhal, C. Cardie. An Analysis of Statistical and Syntactic Phrases, RIAO, 1997.
[21]
R. Mihalcea, Word Sense Disambiguation Using Pattern Learning and Automatic Feature Selection. Journal of Natural Language and Engineering, 2002.
[22]
R. Mihalcea and D. Moldovan. Semantic indexing using WordNet senses. ACL Workshop on IR & NLP, 2000.
[23]
G. Miller. Special Issue. WordNet: An On-line Lexical Database, International Journal of Lexicography, 1990.
[24]
Y. Ogawa, H. Mano, M. Narita, S. Honma: Structuring and Expanding Queries in the Probabilistic Model. TREC9, 2000.
[25]
Y. Qiu. Concept Based Query Expansion. ACM SIGIR, 1993.
[26]
J. Ross Quinlan, C4.5: programs for machine learning, Morgan Kaufmann, 1993.
[27]
R. Richardson and A. Smeaton. Using WordNet in a knowledge-based approach to information retrieval. BCS-IRSG Colloquium on Information Retrieval, 1995.
[28]
S. Robertson and K. Sparck Jones. Relevance weighting of search terms. JASIS, 1976.
[29]
S. Robertson, S. Walker Okapi/Keenbow at TREC-8, 1999.
[30]
M. Sanderson. Word Sense Disambiguation and Information Retrieval. ACM SIGIR, 1994.
[31]
G. Salton, and C. Buckley. Improving retrieval performance by relevance feedback. JASIS, 1990.
[32]
C. Stokoe, M. Oakes, J. Tait, Word sense disambiguation in information retrieval revisited, ACM SIGIR, 2003.
[33]
E. Voorhees. Using WordNet to Disambiguate Word Sense for Text Retrieval. ACM SIGIR, 1993.
[34]
E. Voorhees. Query expansion using lexical-semantic relations. ACM SIGIR, 1994.
[35]
E. M. Voorhees. Using WordNet for text retrieval. In WordNet, an Electronic Lexical Database, MIT Press, 1998.
[36]
E. Voorhees, Overview of the TREC 2003 Robust Retrieval Track, TREC12, 2003.
[37]
J. Xu and W. Croft. Query Expansion Using Local and Global Document Analysis. ACM SIGIR, 1996.
[38]
D.Yeung, C. Clarke, G. Cormack, T. Lynam, E. Terra,Task-Specific Query Expansion (MultiText Experiments for TREC 2003), 2003.
[39]
C. Yu and W. Meng, Principles of database query processing for advanced applications. San Francisco, Morgan Kaufmann, 1998.
[40]
C. Zhai, T. Tao, H. Fang, Z. Shang, Improving the Robustness of Language Models--UIUC TREC 2003 Robust and Genomics Experiments, TREC12, 2003.

Cited By

View all
  • (2024)Search Strategies in the State Space of Knowledge BasesAutomatic Documentation and Mathematical Linguistics10.3103/S000510552470016X58:3(212-224)Online publication date: 1-Jun-2024
  • (2023)A three-dimensional model of semantic search: queries, resources, and resultsPROBLEMS IN PROGRAMMING10.15407/pp2023.04.039(39-55)Online publication date: Dec-2023
  • (2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
July 2004
624 pages
ISBN:1581138814
DOI:10.1145/1008992
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. WordNet
  2. information retrieval
  3. phrase
  4. word sense disambiguation

Qualifiers

  • Article

Conference

SIGIR04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Search Strategies in the State Space of Knowledge BasesAutomatic Documentation and Mathematical Linguistics10.3103/S000510552470016X58:3(212-224)Online publication date: 1-Jun-2024
  • (2023)A three-dimensional model of semantic search: queries, resources, and resultsPROBLEMS IN PROGRAMMING10.15407/pp2023.04.039(39-55)Online publication date: Dec-2023
  • (2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
  • (2023)Serendipitous Book Explorer Using Personalized Associative DictionariesHCI International 2023 – Late Breaking Papers10.1007/978-3-031-48044-7_10(131-150)Online publication date: 21-Nov-2023
  • (2022)Improving Query Expansion Performances with Pseudo Relevance Feedback and Wu-Palmer Similarity on Cross Language Information Retrieval2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)10.1109/ICAICTA56449.2022.9932984(1-6)Online publication date: 28-Sep-2022
  • (2021)Hybrid Approach to Define Semantic RelationshipsIntelligent Systems Design and Applications10.1007/978-3-030-71187-0_130(1404-1413)Online publication date: 3-Jun-2021
  • (2020)Enhancing Image Retrieval and Re-ranking Efficiency using Hybrid approach2020 International Conference on Smart Innovations in Design, Environment, Management, Planning and Computing (ICSIDEMPC)10.1109/ICSIDEMPC49020.2020.9299579(20-26)Online publication date: 30-Oct-2020
  • (2020)Lexifield: a system for the automatic building of lexicons by semantic expansion of short word listsKnowledge and Information Systems10.1007/s10115-020-01451-6Online publication date: 20-Mar-2020
  • (2020)Information Retrieval and Artificial IntelligenceA Guided Tour of Artificial Intelligence Research10.1007/978-3-030-06170-8_5(147-180)Online publication date: 8-May-2020
  • (2019)Multi-Objective GP Strategies for Topical Search Integrating Wikipedia ConceptsProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345402(1-10)Online publication date: 23-Sep-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media