skip to main content
article
Free Access

Efficient string matching: an aid to bibliographic search

Published:01 June 1975Publication History
Skip Abstract Section

Abstract

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

References

  1. 1 Aho, A.V.Hopscroft, J,E. and Ullman, D. J., the Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Booth, T.U Sequential Machines and Automata Theory. Wiley, New York, 1967.Google ScholarGoogle Scholar
  3. 3 Brzozowski, J.A. Derivatives of regular expressions. J. ACM 11:4 (October 1964), 481-494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Bullen, R.H., Jr., and Millen, J.K. Microtext - the design of a microprogrammed finite state search machine for full-text retrieval. Proc. Fall Joint Computer Conference, 1972, pp. 479-488.Google ScholarGoogle Scholar
  5. 5 Fischer, M.J., and Paterson, M.S. String matching and other products. Technical Report 41, Project MAC, M.I.T., 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Gimpel, J.A. A theory of discrete.patterns and their implementation in SNOBOL4. Comm. ACM 16:2 (February 1973), 91-100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 Harrison, M.C. Implementation of the substring test by hashing. Comm. ACM14:12 (December 1971), 777-779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Johnson, W.L., Porter, J.H., Ackley, S.I., and Ross, D.T. Automatic generation of efficient lexical processors using finite state techniques. Comm. ACMll:I2 (December 1968), 805-813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Kernighan, B.W., and Cherry, L.L. A system for typesetting mathematics. Comm. ACM18:3 (March 1975), 151-156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Kleene, S.C. Representation of events in nerve nets. In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press, 1956, pp. 3-40.Google ScholarGoogle Scholar
  11. 11 Knuth, D.E. Fundamental Algorithms, second edition, The Art of Computer Programming 1, Addison-Wesley, Reading, Mass., 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Knuth, D.E. Sorting and Searching, The Art of Computer Prograining 3, Addison-Wesley, Reading, Mass., 1973.Google ScholarGoogle Scholar
  13. 13 Knuth, D.E., Morris, J.H., Jr., and Pratt, V.R. Fast pattern matching in strings. TR CS-74-440, Stanford University, Stanford, California, 1974.Google ScholarGoogle Scholar
  14. 14 Kohavi, Z. Switching and Finite Automata Theory. McGraw- Hill, New York, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 McNaughton, R., and Yamada, H. Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9:1 (1960), 39-47.Google ScholarGoogle ScholarCross RefCross Ref
  16. 16 Rabin, M.O., and Scott, D. Finite automata and their decision problems. IBM J. Research and Development 3, (1959), 114-125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 Thompson, K. Regular search expression algorithm. Comm. ACM 11:6 (June 1968), 419-422. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient string matching: an aid to bibliographic search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Communications of the ACM
          Communications of the ACM  Volume 18, Issue 6
          June 1975
          51 pages
          ISSN:0001-0782
          EISSN:1557-7317
          DOI:10.1145/360825
          Issue’s Table of Contents

          Copyright © 1975 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1975

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader