ABSTRACT
Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem - the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on “conceptual case frames” and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.
- ALSH85.Alshawi, H.; Boguraev, B.; Briscoe, T. "Towards a Dictionary Support Environment for Real Time Parsing". Technical Report, Computer Laboratory, University of Cambridge, 1985.Google Scholar
- BECK75.Becket, J. D. "The Phrasal Lexicon". Bolt, Beranek, and Newman Inc. Report No. 3081, May 1975.Google Scholar
- BIRN81.Birnbaum, L.; Selfridge, M. "Conceptual Analysis of Natural Language." In Inside Computer Understanding : Five Programs Plus Miniatures. Edited by R. Schank and C. Riesbeck, 318-353. Hillsdale : Lawrence Erlbaum, 1981.Google Scholar
- BRUC75.Bruce, B. "Case Systems for Natural Language." Artificial Intelligence, 6: 327-360; 1975.Google ScholarCross Ref
- CROF81.Croft, W. B. "Document Representation in Probabilistic Models of Information Retrieval". Journal of the American Society of Information Science, 32: 451- 457; 1981.Google ScholarCross Ref
- CROF84.Croft, W.B. "A Comparison of the Cosine Correlation and the Modified Probabilistic Model". Information Technology, 2: 113-114; 1984.Google Scholar
- CROF86a.Croft, W. B. "Boolean Queries and Term Dependencies in Probabilistic Retrieval Models". Journal of the American society for Information Science, 37: 71-77; 1986.Google ScholarCross Ref
- CROF86b.Croft, W.B. "User-Specified Domain Knowledge for Document Retrieval". Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval, 201-206, Pisa, Italy, 1986. Google ScholarDigital Library
- CROF86c.Croft, W. B.; Thompson, R. "I3R: A New Approach to the Design of Document Retrieval Systems". journal of the American Society for Information Science, (to appear). Google ScholarDigital Library
- CULL86.Cullingford, Richard E. Natural Language Processing: A Knowledge-Engineering Approach. Totowa : Rowman & Littlefield, 1986. Google ScholarDigital Library
- DEJO79.De Jong, G.F. "Skimming Stories in Real Time: An Experiment in Integrated Understanding." Research Report 158, Yale University Department of Computer Science, New Haven, Connecticut, 1979.Google Scholar
- DILL83.Dillon, M.; Gray, A.S. "FASIT: A fully automatic syntactically based indexing system." Journal of the American Society for Information Science. 34:99-108; 1983.Google ScholarCross Ref
- RIJS79.Van Rijsbergen, C. J. Information Retrieval. Second Edition. Butterworths, London; 1979. Google ScholarDigital Library
- SALT83b.Salton, G.; Fox, E.A.; Wu, H. "Extended Boolean information retrieval." Communications of the A CM. 26:1022-1036; 1983. Google ScholarDigital Library
- SCHA75.Schank, R. C., ed. Conceptual Information Processing. Amsterdam : North Holland, 1975. Google ScholarDigital Library
- SMEA86.Smeaton, A.F. "Incorporating Syntactic Information into a Document Retrieval Strategy: An Investigation." Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval, 103-113, Piss, Italy, 1986. Google ScholarDigital Library
- SPAR74.Sparck Jones, K. "Automatic Indexing". Journal of Documentation, 80: 393-432; 1974.Google ScholarCross Ref
- SPAR84.Sparck Jones, K.; Tait, J. I. "Automatic Search Term Variant Generation". Journal of Documentation, 40: 50-66; 1984.Google ScholarCross Ref
- TAIT82.Tait, J.I. "Automatic Summarizing of English Texts." Technical Report 47, University of Cambridge Computer Laboratory, Cambridge, England, 1982.Google Scholar
- THUR86.Thurmair, G. "REALIST: Retrieval Aids by Linguistics and Statistics." Proceedings of the A CM SIGIR International Conference on Research and Development in information Retrieval, 138-143, Piss, Italy, 1986. Google ScholarDigital Library
- WOOD70.Woods, W. A. "Transition Network Grammars for Natural Language Analysis." Communications of the A GM. 13:591-606; 1970. Google ScholarDigital Library
Index Terms
- An approach to natural language for document retrieval
Recommendations
Document representation in natural language text retrieval
HLT '94: Proceedings of the workshop on Human Language TechnologyIn information retrieval, the content of a document may be represented as a collection of terms: words, stems, phrases, or other units derived or inferred from the text of the document. These terms are usually weighted to indicate their importance ...
Enhancing information retrieval through statistical natural language processing: a study of collocation indexing
Although the management of information assets-specifically, of text documents that make up 80 percent of these assets-an provide organizations with a competitive advantage, the ability of information retrieval (IR) systems to deliver relevant ...
Comments