skip to main content
10.1145/42005.42009acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

An approach to natural language for document retrieval

Authors Info & Claims
Published:01 November 1987Publication History

ABSTRACT

Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem - the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on “conceptual case frames” and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.

References

  1. ALSH85.Alshawi, H.; Boguraev, B.; Briscoe, T. "Towards a Dictionary Support Environment for Real Time Parsing". Technical Report, Computer Laboratory, University of Cambridge, 1985.Google ScholarGoogle Scholar
  2. BECK75.Becket, J. D. "The Phrasal Lexicon". Bolt, Beranek, and Newman Inc. Report No. 3081, May 1975.Google ScholarGoogle Scholar
  3. BIRN81.Birnbaum, L.; Selfridge, M. "Conceptual Analysis of Natural Language." In Inside Computer Understanding : Five Programs Plus Miniatures. Edited by R. Schank and C. Riesbeck, 318-353. Hillsdale : Lawrence Erlbaum, 1981.Google ScholarGoogle Scholar
  4. BRUC75.Bruce, B. "Case Systems for Natural Language." Artificial Intelligence, 6: 327-360; 1975.Google ScholarGoogle ScholarCross RefCross Ref
  5. CROF81.Croft, W. B. "Document Representation in Probabilistic Models of Information Retrieval". Journal of the American Society of Information Science, 32: 451- 457; 1981.Google ScholarGoogle ScholarCross RefCross Ref
  6. CROF84.Croft, W.B. "A Comparison of the Cosine Correlation and the Modified Probabilistic Model". Information Technology, 2: 113-114; 1984.Google ScholarGoogle Scholar
  7. CROF86a.Croft, W. B. "Boolean Queries and Term Dependencies in Probabilistic Retrieval Models". Journal of the American society for Information Science, 37: 71-77; 1986.Google ScholarGoogle ScholarCross RefCross Ref
  8. CROF86b.Croft, W.B. "User-Specified Domain Knowledge for Document Retrieval". Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval, 201-206, Pisa, Italy, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CROF86c.Croft, W. B.; Thompson, R. "I3R: A New Approach to the Design of Document Retrieval Systems". journal of the American Society for Information Science, (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CULL86.Cullingford, Richard E. Natural Language Processing: A Knowledge-Engineering Approach. Totowa : Rowman & Littlefield, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DEJO79.De Jong, G.F. "Skimming Stories in Real Time: An Experiment in Integrated Understanding." Research Report 158, Yale University Department of Computer Science, New Haven, Connecticut, 1979.Google ScholarGoogle Scholar
  12. DILL83.Dillon, M.; Gray, A.S. "FASIT: A fully automatic syntactically based indexing system." Journal of the American Society for Information Science. 34:99-108; 1983.Google ScholarGoogle ScholarCross RefCross Ref
  13. RIJS79.Van Rijsbergen, C. J. Information Retrieval. Second Edition. Butterworths, London; 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. SALT83b.Salton, G.; Fox, E.A.; Wu, H. "Extended Boolean information retrieval." Communications of the A CM. 26:1022-1036; 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. SCHA75.Schank, R. C., ed. Conceptual Information Processing. Amsterdam : North Holland, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. SMEA86.Smeaton, A.F. "Incorporating Syntactic Information into a Document Retrieval Strategy: An Investigation." Proceedings of the A CM SIGIR International Conference on Research and Development in Information Retrieval, 103-113, Piss, Italy, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. SPAR74.Sparck Jones, K. "Automatic Indexing". Journal of Documentation, 80: 393-432; 1974.Google ScholarGoogle ScholarCross RefCross Ref
  18. SPAR84.Sparck Jones, K.; Tait, J. I. "Automatic Search Term Variant Generation". Journal of Documentation, 40: 50-66; 1984.Google ScholarGoogle ScholarCross RefCross Ref
  19. TAIT82.Tait, J.I. "Automatic Summarizing of English Texts." Technical Report 47, University of Cambridge Computer Laboratory, Cambridge, England, 1982.Google ScholarGoogle Scholar
  20. THUR86.Thurmair, G. "REALIST: Retrieval Aids by Linguistics and Statistics." Proceedings of the A CM SIGIR International Conference on Research and Development in information Retrieval, 138-143, Piss, Italy, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. WOOD70.Woods, W. A. "Transition Network Grammars for Natural Language Analysis." Communications of the A GM. 13:591-606; 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An approach to natural language for document retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
            November 1987
            317 pages
            ISBN:0897912322
            DOI:10.1145/42005

            Copyright © 1987 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 November 1987

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader