skip to main content
10.1145/1390334.1390567acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Exploring and measuring dependency trees for informationretrieval

Published: 20 July 2008 Publication History

Abstract

Natural language processing techniques are believed to hold a tremendous potential to supplement the purely quantitative methods of text information retrieval. This has led to the emergence of a large number of NLP-based IR research projects over the last few years, even though the empirical evidence to support this has often been inadequate. Most contributions of NLP to IR mainly concentrate on document representation and compound term matching strategies. Researchers have noted that the simple term-based representation of document content such as vector representation is usually inadequate for accurate discrimination. The "bag of words" representation does not invoke linguistic considerations and allow modelling of relationships between subsets of words. However, even though a variety of content indicator such as syntactic phrase have been tried and investigated for representing documents rather than single terms in IR systems, the matching strategy over those representation still cannot go beyond traditional statistical techniques that measure term co-occurrence characteristics and proximity in analyzing text structure.
In this paper, we propose a novel IR strategy (SIR) with NLP techniques involved at the syntactic level. Within SIR, documents and query representation are built on the basis of a syntactic data structure of the natural language text - the dependency tree, in which syntactic relationships between words are identified and structured in the form of a tree. In order to capture the syntactic relations between words in their hierarchical structural representation, the matching strategy in SIR upgrades from the traditional statistical techniques by introducing a similarity measure method executing on the graph representation level as the key determiner. A basic IR experiment is designed and implemented on the TREC data to evaluate if this novel IR model is feasible. Experimental results indicate that this approach has the potential to outperform the standard bag of words IR model, especially in response to syntactical structured queries.

References

[1]
D. Lin. Minipar. http://www.cs.ualberta.ca/ lindek/minipar.htm.
[2]
T. Strzalkowski. Natural Language Information Retrieval. Springer, Germany, 1999.

Cited By

View all
  • (2010)Weighting common syntactic structures for natural language based information retrievalProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871653(1485-1488)Online publication date: 26-Oct-2010

Index Terms

  1. Exploring and measuring dependency trees for informationretrieval

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
      July 2008
      934 pages
      ISBN:9781605581644
      DOI:10.1145/1390334
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 July 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dependency tree
      2. information retrieval
      3. tree matching

      Qualifiers

      • Research-article

      Conference

      SIGIR '08
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2010)Weighting common syntactic structures for natural language based information retrievalProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871653(1485-1488)Online publication date: 26-Oct-2010

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media