An approach to natural language for document retrieval

Author:
B. Croft

Compute and Information Science Department, University of Massachusetts, Amherst, MA

Compute and Information Science Department, University of Massachusetts, Amherst, MA
View Profile

SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrievalNovember 1987Pages 26–32https://doi.org/10.1145/42005.42009

Published:01 November 1987Publication History

SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 26–32

ABSTRACT

Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem - the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on “conceptual case frames” and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.

References

ALSH85.Alshawi, H.; Boguraev, B.; Briscoe, T. "Towards a Dictionary Support Environment for Real Time Parsing". Technical Report, Computer Laboratory, University of Cambridge, 1985.Google Scholar
BECK75.Becket, J. D. "The Phrasal Lexicon". Bolt, Beranek, and Newman Inc. Report No. 3081, May 1975.Google Scholar
BIRN81.Birnbaum, L.; Selfridge, M. "Conceptual Analysis of Natural Language." In Inside Computer Understanding : Five Programs Plus Miniatures. Edited by R. Schank and C. Riesbeck, 318-353. Hillsdale : Lawrence Erlbaum, 1981.Google Scholar
BRUC75.Bruce, B. "Case Systems for Natural Language." Artificial Intelligence, 6: 327-360; 1975.Google ScholarCross Ref
CROF81.Croft, W. B. "Document Representation in Probabilistic Models of Information Retrieval". Journal of the American Society of Information Science, 32: 451- 457; 1981.Google ScholarCross Ref
CROF84.Croft, W.B. "A Comparison of the Cosine Correlation and the Modified Probabilistic Model". Information Technology, 2: 113-114; 1984.Google Scholar
CROF86a.Croft, W. B. "Boolean Queries and Term Dependencies in Probabilistic Retrieval Models". Journal of the American society for Information Science, 37: 71-77; 1986.Google ScholarCross Ref
CROF86b.Croft, W.B. "User-Specified Domain Knowledge for Document Retrieval". Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval, 201-206, Pisa, Italy, 1986. Google ScholarDigital Library
CROF86c.Croft, W. B.; Thompson, R. "I3R: A New Approach to the Design of Document Retrieval Systems". journal of the American Society for Information Science, (to appear). Google ScholarDigital Library
CULL86.Cullingford, Richard E. Natural Language Processing: A Knowledge-Engineering Approach. Totowa : Rowman & Littlefield, 1986. Google ScholarDigital Library
DEJO79.De Jong, G.F. "Skimming Stories in Real Time: An Experiment in Integrated Understanding." Research Report 158, Yale University Department of Computer Science, New Haven, Connecticut, 1979.Google Scholar
DILL83.Dillon, M.; Gray, A.S. "FASIT: A fully automatic syntactically based indexing system." Journal of the American Society for Information Science. 34:99-108; 1983.Google ScholarCross Ref
RIJS79.Van Rijsbergen, C. J. Information Retrieval. Second Edition. Butterworths, London; 1979. Google ScholarDigital Library
SALT83b.Salton, G.; Fox, E.A.; Wu, H. "Extended Boolean information retrieval." Communications of the A CM. 26:1022-1036; 1983. Google ScholarDigital Library
SCHA75.Schank, R. C., ed. Conceptual Information Processing. Amsterdam : North Holland, 1975. Google ScholarDigital Library
SMEA86.Smeaton, A.F. "Incorporating Syntactic Information into a Document Retrieval Strategy: An Investigation." Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval, 103-113, Piss, Italy, 1986. Google ScholarDigital Library
SPAR74.Sparck Jones, K. "Automatic Indexing". Journal of Documentation, 80: 393-432; 1974.Google ScholarCross Ref
SPAR84.Sparck Jones, K.; Tait, J. I. "Automatic Search Term Variant Generation". Journal of Documentation, 40: 50-66; 1984.Google ScholarCross Ref
TAIT82.Tait, J.I. "Automatic Summarizing of English Texts." Technical Report 47, University of Cambridge Computer Laboratory, Cambridge, England, 1982.Google Scholar
THUR86.Thurmair, G. "REALIST: Retrieval Aids by Linguistics and Statistics." Proceedings of the A CM SIGIR International Conference on Research and Development in information Retrieval, 138-143, Piss, Italy, 1986. Google ScholarDigital Library
WOOD70.Woods, W. A. "Transition Network Grammars for Natural Language Analysis." Communications of the A GM. 13:591-606; 1970. Google ScholarDigital Library

Index Terms

An approach to natural language for document retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval

Recommendations

Document representation in natural language text retrieval
HLT '94: Proceedings of the workshop on Human Language Technology

In information retrieval, the content of a document may be represented as a collection of terms: words, stems, phrases, or other units derived or inferred from the text of the document. These terms are usually weighted to indicate their importance ...
Read More
Enhancing information retrieval through statistical natural language processing: a study of collocation indexing

Although the management of information assets-specifically, of text documents that make up 80 percent of these assets-an provide organizations with a competitive advantage, the ability of information retrieval (IR) systems to deliver relevant ...
Read More
Introduction to Chinese Natural Language Processing
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
November 1987
317 pages
ISBN:0897912322
DOI:10.1145/42005
Editors:
C. T. Yu,
C. J. Van Rijsbergen
Copyright © 1987 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 1987
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 455
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An approach to natural language for document retrieval

SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Document representation in natural language text retrieval

Enhancing information retrieval through statistical natural language processing: a study of collocation indexing

Introduction to Chinese Natural Language Processing