Abstract
This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
- 1 Aho, A.V.Hopscroft, J,E. and Ullman, D. J., the Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. Google ScholarDigital Library
- 2 Booth, T.U Sequential Machines and Automata Theory. Wiley, New York, 1967.Google Scholar
- 3 Brzozowski, J.A. Derivatives of regular expressions. J. ACM 11:4 (October 1964), 481-494. Google ScholarDigital Library
- 4 Bullen, R.H., Jr., and Millen, J.K. Microtext - the design of a microprogrammed finite state search machine for full-text retrieval. Proc. Fall Joint Computer Conference, 1972, pp. 479-488.Google Scholar
- 5 Fischer, M.J., and Paterson, M.S. String matching and other products. Technical Report 41, Project MAC, M.I.T., 1974. Google ScholarDigital Library
- 6 Gimpel, J.A. A theory of discrete.patterns and their implementation in SNOBOL4. Comm. ACM 16:2 (February 1973), 91-100. Google ScholarDigital Library
- 7 Harrison, M.C. Implementation of the substring test by hashing. Comm. ACM14:12 (December 1971), 777-779. Google ScholarDigital Library
- 8 Johnson, W.L., Porter, J.H., Ackley, S.I., and Ross, D.T. Automatic generation of efficient lexical processors using finite state techniques. Comm. ACMll:I2 (December 1968), 805-813. Google ScholarDigital Library
- 9 Kernighan, B.W., and Cherry, L.L. A system for typesetting mathematics. Comm. ACM18:3 (March 1975), 151-156. Google ScholarDigital Library
- 10 Kleene, S.C. Representation of events in nerve nets. In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press, 1956, pp. 3-40.Google Scholar
- 11 Knuth, D.E. Fundamental Algorithms, second edition, The Art of Computer Programming 1, Addison-Wesley, Reading, Mass., 1973. Google ScholarDigital Library
- 12 Knuth, D.E. Sorting and Searching, The Art of Computer Prograining 3, Addison-Wesley, Reading, Mass., 1973.Google Scholar
- 13 Knuth, D.E., Morris, J.H., Jr., and Pratt, V.R. Fast pattern matching in strings. TR CS-74-440, Stanford University, Stanford, California, 1974.Google Scholar
- 14 Kohavi, Z. Switching and Finite Automata Theory. McGraw- Hill, New York, 1970. Google ScholarDigital Library
- 15 McNaughton, R., and Yamada, H. Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9:1 (1960), 39-47.Google ScholarCross Ref
- 16 Rabin, M.O., and Scott, D. Finite automata and their decision problems. IBM J. Research and Development 3, (1959), 114-125.Google ScholarDigital Library
- 17 Thompson, K. Regular search expression algorithm. Comm. ACM 11:6 (June 1968), 419-422. Google ScholarDigital Library
Index Terms
- Efficient string matching: an aid to bibliographic search
Recommendations
A Memory-Efficient Bit-Split Parallel String Matching Using Pattern Dividing for Intrusion Detection Systems
For the low-cost hardware-based intrusion detection systems, this paper proposes a memory-efficient parallel string matching scheme. In order to reduce the number of state transitions, the finite state machine tiles in a string matcher adopt bit-level ...
Fast Pattern Matching in Strings
An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the ...
A fast string searching algorithm
An algorithm is presented that searches for the location, “il” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The ...
Comments