Abstract
For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs. Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machine-oriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place. Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice. PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components. A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing. While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power.
- Stephen Robert Adams. Modular Grammars for Programming Language Prototyping. PhD thesis, University of Southampton, 1991.Google Scholar
- Alfred V. Aho. Indexed grammars---an extension of context-free grammars. Journal of the ACM, 15(4):647--671, October 1968. Google ScholarDigital Library
- Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation and Compiling - Vol. I: Parsing. Prentice Hall, Englewood Cliffs, N.J., 1972. Google ScholarDigital Library
- Alexander Birman. The TMG Recognition Schema. PhD thesis, Princeton University, February 1970. Google ScholarDigital Library
- Alexander Birman and Jeffrey D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1--34, August 1973.Google ScholarCross Ref
- Claus Brabrand, Michael I. Schwartzbach, and Mads Vanggaard. The metafront system: Extensible parsing and transformation. In Third Workshop on Language Descriptions, Tools and Applications, Warsaw, Poland, April 2003.Google ScholarCross Ref
- Bryan Ford. Packrat parsing: a practical linear-time algorithm with backtracking. Master's thesis, Massachusetts Institute of Technology, Sep 2002.Google ScholarDigital Library
- Bryan Ford. Packrat parsing: Simple, powerful, lazy, linear time. In Proceedings of the 2002 International Conference on Functional Programming, Oct 2002. Google ScholarDigital Library
- Dick Grune and Ceriel J.H. Jacobs. Parsing Techniques--A Practical Guide. Ellis Horwood, Chichester, England, 1990. Google ScholarDigital Library
- J. Heering, P. R. H. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism SDF--reference manual--. SIG-PLAN Notices, 24(11):43--75, 1989. Google ScholarDigital Library
- Simon Peyton Jones and John Hughes (editors). Haskell 98 Report, 1998. http://www.haskell.org.Google Scholar
- Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. Handbook of Formal Languages, 3:69--124, 1997. Google ScholarDigital Library
- C.H.A. Koster. Affix grammars. In J.E.L. Peck, editor, AL-GOL 68 Implementation, pages 95--109, Amsterdam, 1971. North-Holland Publ. Co.Google Scholar
- Lillian Lee. Fast context-free grammar parsing requires fast boolean matrix multiplication. Journal of the ACM, 49(1):1--15, 2002. Google ScholarDigital Library
- Daan Leijen. Parsec, a fast combinator parser. http://www.cs.uu.nl/daan.Google Scholar
- Sun Microsystems. Java compiler compiler (JavaCC). https://javacc.dev.java.net/.Google Scholar
- Sun Microsystems. JavaCC: LOOKAHEAD minitutorial. https://javacc.dev.java.net/doc/lookahead.html.Google Scholar
- Alexander Okhotin. Conjunctive grammars. Journal of Automata, Languages and Combinatorics, 6(4):519--535, 2001. Google ScholarDigital Library
- International Standards Organization. Syntactic metalan-guage -- Extended BNF, 1996. ISO/IEC 14977.Google Scholar
- Terence J. Parr and Russell W. Quong. Adding semantic and syntactic predicates to LL(k)--pred-LL(k).InProceedings of the International Conference on Compiler Construction, Edinburgh, Scotland, April 1994. Google ScholarDigital Library
- Terence J. Parr and Russell W. Quong. ANTLR: A Predicated-LL(k) parser generator. Software Practice and Experience, 25(7):789--810, 1995. Google ScholarDigital Library
- Daniel J. Salomon and Gordon V. Cormack. Scannerless NSLR(1) parsing of programming languages. In Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), pages 170--178, Jul 1989. Google ScholarDigital Library
- Amir Shpilka. Lower bounds for matrix product. In IEEE Symposium on Foundations of Computer Science, pages 358--367, 2001. Google ScholarDigital Library
- Edward Stabler. Derivational minimalism. Logical Aspects of Computational Linguistics, pages 68--95, 1997. Google ScholarCross Ref
- Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley, 3rd edition, June 1997. Google ScholarDigital Library
- Kuo-Chung Tai. Noncanonical SLR(1) grammars. ACM Transactions on Programming Languages and Systems, 1(2):295--320, Oct 1979. Google ScholarDigital Library
- M.G.J. van den Brand, J. Scheerder, J.J. Vinju, and E. Visser. Disambiguation filters for scannerless generalized LR parsers. In Compiler Construction, 2002. Google ScholarDigital Library
- A. van Wijngaarden, B.J. Mailloux, J.E.L. Peck, C.H.A. Koster, M. Sintzoff, C.H. Lindsey, L.G.L.T. Meertens, and R.G. Fisker. Report on the algorithmic language ALGOL 68. Numer. Math., 14:79--218, 1969. Google ScholarDigital Library
- Eelco Visser. A family of syntax definition formalisms. Technical Report P9706, Programming Research Group, University of Amsterdam, 1997.Google Scholar
- Niklaus Wirth. What can we do about the unnecessary diversity of notation for syntactic descriptions. Communications of the ACM, 20(11):822--823, November 1977. Google ScholarDigital Library
Index Terms
Parsing expression grammars: a recognition-based syntactic foundation
Recommendations
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Packrat parsing:: simple, powerful, lazy, linear time, functional pearl
Packrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees ...
Left recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG ...
Comments