skip to main content
article

Parsing expression grammars: a recognition-based syntactic foundation

Published:01 January 2004Publication History
Skip Abstract Section

Abstract

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs. Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machine-oriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place. Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice. PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components. A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing. While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power.

References

  1. Stephen Robert Adams. Modular Grammars for Programming Language Prototyping. PhD thesis, University of Southampton, 1991.Google ScholarGoogle Scholar
  2. Alfred V. Aho. Indexed grammars---an extension of context-free grammars. Journal of the ACM, 15(4):647--671, October 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation and Compiling - Vol. I: Parsing. Prentice Hall, Englewood Cliffs, N.J., 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexander Birman. The TMG Recognition Schema. PhD thesis, Princeton University, February 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alexander Birman and Jeffrey D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1--34, August 1973.Google ScholarGoogle ScholarCross RefCross Ref
  6. Claus Brabrand, Michael I. Schwartzbach, and Mads Vanggaard. The metafront system: Extensible parsing and transformation. In Third Workshop on Language Descriptions, Tools and Applications, Warsaw, Poland, April 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. Bryan Ford. Packrat parsing: a practical linear-time algorithm with backtracking. Master's thesis, Massachusetts Institute of Technology, Sep 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bryan Ford. Packrat parsing: Simple, powerful, lazy, linear time. In Proceedings of the 2002 International Conference on Functional Programming, Oct 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dick Grune and Ceriel J.H. Jacobs. Parsing Techniques--A Practical Guide. Ellis Horwood, Chichester, England, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Heering, P. R. H. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism SDF--reference manual--. SIG-PLAN Notices, 24(11):43--75, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Simon Peyton Jones and John Hughes (editors). Haskell 98 Report, 1998. http://www.haskell.org.Google ScholarGoogle Scholar
  12. Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. Handbook of Formal Languages, 3:69--124, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C.H.A. Koster. Affix grammars. In J.E.L. Peck, editor, AL-GOL 68 Implementation, pages 95--109, Amsterdam, 1971. North-Holland Publ. Co.Google ScholarGoogle Scholar
  14. Lillian Lee. Fast context-free grammar parsing requires fast boolean matrix multiplication. Journal of the ACM, 49(1):1--15, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Daan Leijen. Parsec, a fast combinator parser. http://www.cs.uu.nl/daan.Google ScholarGoogle Scholar
  16. Sun Microsystems. Java compiler compiler (JavaCC). https://javacc.dev.java.net/.Google ScholarGoogle Scholar
  17. Sun Microsystems. JavaCC: LOOKAHEAD minitutorial. https://javacc.dev.java.net/doc/lookahead.html.Google ScholarGoogle Scholar
  18. Alexander Okhotin. Conjunctive grammars. Journal of Automata, Languages and Combinatorics, 6(4):519--535, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. International Standards Organization. Syntactic metalan-guage -- Extended BNF, 1996. ISO/IEC 14977.Google ScholarGoogle Scholar
  20. Terence J. Parr and Russell W. Quong. Adding semantic and syntactic predicates to LL(k)--pred-LL(k).InProceedings of the International Conference on Compiler Construction, Edinburgh, Scotland, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Terence J. Parr and Russell W. Quong. ANTLR: A Predicated-LL(k) parser generator. Software Practice and Experience, 25(7):789--810, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Daniel J. Salomon and Gordon V. Cormack. Scannerless NSLR(1) parsing of programming languages. In Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), pages 170--178, Jul 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Amir Shpilka. Lower bounds for matrix product. In IEEE Symposium on Foundations of Computer Science, pages 358--367, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Edward Stabler. Derivational minimalism. Logical Aspects of Computational Linguistics, pages 68--95, 1997. Google ScholarGoogle ScholarCross RefCross Ref
  25. Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley, 3rd edition, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kuo-Chung Tai. Noncanonical SLR(1) grammars. ACM Transactions on Programming Languages and Systems, 1(2):295--320, Oct 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M.G.J. van den Brand, J. Scheerder, J.J. Vinju, and E. Visser. Disambiguation filters for scannerless generalized LR parsers. In Compiler Construction, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. van Wijngaarden, B.J. Mailloux, J.E.L. Peck, C.H.A. Koster, M. Sintzoff, C.H. Lindsey, L.G.L.T. Meertens, and R.G. Fisker. Report on the algorithmic language ALGOL 68. Numer. Math., 14:79--218, 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Eelco Visser. A family of syntax definition formalisms. Technical Report P9706, Programming Research Group, University of Amsterdam, 1997.Google ScholarGoogle Scholar
  30. Niklaus Wirth. What can we do about the unnecessary diversity of notation for syntactic descriptions. Communications of the ACM, 20(11):822--823, November 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parsing expression grammars: a recognition-based syntactic foundation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 39, Issue 1
          POPL '04
          January 2004
          352 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/982962
          Issue’s Table of Contents
          • cover image ACM Conferences
            POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
            January 2004
            364 pages
            ISBN:158113729X
            DOI:10.1145/964001

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 January 2004

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader