Abstract
We explore how to make the benefits of modularity available for syntactic specifications and present Rats!, a parser generator for Java that supports easily extensible syntax. Our parser generator builds on recent research on parsing expression grammars (PEGs), which, by being closed under composition, prioritizing choices, supporting unlimited lookahead, and integrating lexing and parsing, offer an attractive alternative to context-free grammars. PEGs are implemented by so-called packrat parsers, which are recursive descent parsers that memoize all intermediate results (hence their name). Memoization ensures linear-time performance in the presence of unlimited lookahead, but also results in an essentially lazy, functional parsing technique. In this paper, we explore how to leverage PEGs and packrat parsers as the foundation for extensible syntax. In particular, we show how make packrat parsing more widely applicable by implementing this lazy, functional technique in a strict, imperative language, while also generating better performing parsers through aggressive optimizations. Next, we develop a module system for organizing, modifying, and composing large-scale syntactic specifications. Finally, we describe a new technique for managing (global) parsing state in functional parsers. Our experimental evaluation demonstrates that the resulting parser generator succeeds at providing extensible syntax. In particular, Rats! enables other grammar writers to realize real-world language extensions in little time and code, and it generates parsers that consistently out-perform parsers created by two GLR parser generators.
- J. Aycock and R. N. Horspool. Practical Earley parsing. The Computer Journal, 45(6):620--630, 2002.Google ScholarCross Ref
- J. Bender. Mini-bibliography on modules for functional programming languages. http://readscheme.org/modules/, Mar. 2004.Google Scholar
- A. Birman. The TMG Recognition Schema. PhD thesis, Princeton University, Feb. 1970. Google ScholarDigital Library
- A. Birman and J. D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1--34, Aug. 1973.Google ScholarCross Ref
- C. Brabrand and M. I. Schwartzbach. Growing languages with metamorphic syntax macros. In Proc. 2002 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, pp. 31--40, Jan. 2002. Google ScholarDigital Library
- C. Brabrand, M. I. Schwartzbach, and M. Vanggaard. The \tt metafront system: Extensible parsing and transformation. Electronic Notes in Theoretical Computer Science, 82(3):592--611, Dec. 2003. Google ScholarDigital Library
- G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: Adding genericity to the Java programming language. In Proc. 1998 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 183--200, Oct. 1998. Google ScholarDigital Library
- M. Bravenboer, A. van Dam, K. Olmos, and E. Visser. Program transformation with scoped dynamic rewrite rules. Fundamenta Informaticae, 69(1--2):123--178, 2005. Google ScholarDigital Library
- M. Bravenboer and E. Visser. Concrete syntax for objects. In Proc. 2004 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 365--383, Oct. 2004. Google ScholarDigital Library
- M. Brukman and A. C. Myers. PPG: A parser generator for extensible grammars. http://www.cs.cornell.edu/Projects/polyglot/ppg.html.Google Scholar
- L. Cardelli, F. Matthes, and M. Abadi. Extensible syntax with lexical scoping. Tech. Report 121, DEC SRC, Feb. 1994.Google Scholar
- Cryptix Foundation. Cryptix JCE. http://www.cryptix.org/.Google Scholar
- A. Dikjstra and D. S. Swierstra. Lazy functional parser combinators in Java. In Proc. 1st Workshop on Multiparadigm Programming with Object-Oriented Languages, pp. 11--42. John von Neumann Institute for Computing, May 2001.Google Scholar
- J. C. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94--102, Feb. 1970. Google ScholarDigital Library
- T. Ekman and G. Hedin. Rewritable reference attributed grammars. In Proc. 18th European Conference on Object-Oriented Programming, vol. 3086 of LNCS, pp. 147--171. Springer, June 2004.Google ScholarCross Ref
- O. Enseling. Build your own languages with JavaCC. JavaWorld, Dec. 2000. http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-cooltools.html.Google Scholar
- M. E. Fiuczynski, R. Grimm, Y. Coady, and D. Walker. patch (1) considered harmful. In Proc. 10th Workshop on Hot Topics in Operating Systems, pp. 91--96, June 2005. Google ScholarDigital Library
- B. Ford. Packrat parsing: A practical linear-time algorithm with backtracking. Master's thesis, MIT, Sept. 2002.Google Scholar
- B. Ford. Packrat parsing: Simple, powerful, lazy, linear time. In Proc. 2002 ACM International Conference on Functional Programming, pp. 36--47, Oct. 2002. Google ScholarDigital Library
- B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Proc. 31st ACM Symposium on Principles of Programming Languages, pp. 111--122, Jan. 2004. Google ScholarDigital Library
- Free Software Foundation. Bison. http://www.gnu.org/software/bison/.Google Scholar
- R. Grimm. Systems need languages need systems! In Proc. 2nd ECOOP Workshop on Programming Languages and Operating Systems, July 2005.Google Scholar
- ISO. Information technology---syntactic metalanguage---extended BNF. ISO/IEC Standard 14977, Aug. 1996.Google Scholar
- E. Kohler, M. F. Kaashoek, and D. R. Montgomery. A readable TCP in the Prolac protocol language. In Proc. 1999 ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 3--13, Aug. 1999. Google ScholarDigital Library
- K. Lee, A. LaMarca, and C. Chambers. HydroJ: Object-oriented pattern matching for evolvable distributed systems. In Proc. 2003 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 205--223, Oct. 2003. Google ScholarDigital Library
- J. R. Levine. lex & yacc. O'Reilly, Oct. 1992.Google Scholar
- D. Mazières. A toolkit for user-level file systems. In Proc. 2001 USENIX Annual Technical Conference, pp. 261--274, June 2001. Google ScholarDigital Library
- S. McPeak and G. C. Necula. Elkhound: A fast, practical GLR parser generator. In Proc. 13th International Conference on Compiler Construction, vol. 2985 of LNCS, pp. 73--88. Springer, Mar. 2004.Google ScholarCross Ref
- F. Mérillon, L. Réveillère, C. Consel, R. Marlet, and G. Muller. Devil: An IDL for hardware programming. In Proc. 4th USENIX Symposium on Operating Systems Design and Implementation, pp. 17--30, Oct. 2000. Google ScholarDigital Library
- T. Millstein. Practical predicate dispatch. In Proc. 2004 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 345--364, Oct. 2004. Google ScholarDigital Library
- A. C. Myers. JFlow: Practical mostly-static information flow control. In Proc. 26th ACM Symposium on Principles of Programming Languages, pp. 228--241, Jan. 1999. Google ScholarDigital Library
- N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler framework for Java. In Proc. 12th International Conference on Compiler Construction, vol. 2622 of LNCS, pp. 138--152. Springer, Apr. 2003. Google ScholarDigital Library
- J. Palsberg, K. Tao, and W. Wang. Java tree builder. http://compilers.cs.ucla.edu/jtb/.Google Scholar
- T. J. Parr and R. W. Quong. ANTLR: A predicated-LL(k) parser generator. Software---Practice and Experience, 25(7):789--810, July 1995. Google ScholarDigital Library
- A. Rodriguez, C. Killian, S. Bhat, D. Kostić, and A. Vahdat. MACEDON: Methodology for automatically creating, evaluating, and designing overlay networks. In Proc. 1st ACM/USENIX Symposium on Networked Systems Design and Implementation, pp. 267--280, Mar. 2004. Google ScholarDigital Library
- J. Roskind. Parsing C, the last word. http://groups-beta.google.com/group/comp.compilers/msg/c0797b5b668605b4, Jan. 1992.Google Scholar
- D. J. Salomon and G. V. Cormack. Scannerless NSLR(1) parsing of programming languages. In Proc. 1989 ACM Conference on Programming Language Design and Implementation, pp. 170--178, June 1989. Google ScholarDigital Library
- M. Tomita, ed. Generalized LR Parsing. Kluwer, 1991. Google ScholarDigital Library
- M. G. J. van den Brand, J. Scheerder, J. J. Vinju, and E. Visser. Disambiguation filters for scannerless generalized LR parsers. In Proc. 11th International Conference on Compiler Construction, vol. 2304 of LNCS, pp. 143--158. Springer, Apr. 2004. Google ScholarDigital Library
- E. Visser. Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam, Sept. 1997.Google Scholar
- P. Wadler. Monads for functional programming. In Advanced Functional Programming, vol. 925 of LNCS, pp. 24--52. Springer, 1995. Google ScholarCross Ref
- N. Wirth. What can we do about the unnecessary diversity of notation for syntactic definitions? Communications of the ACM, 20(11):822--823, Nov. 1977. Google ScholarDigital Library
Index Terms
Better extensibility through modular syntax
Recommendations
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Parsing expression grammars made practical
SLE 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Software Language EngineeringParsing Expression Grammars (PEGs) define languages by specifying a recursive-descent parser that recognises them. The PEG formalism exhibits desirable properties, such as closure under composition, built-in disambiguation, unification of syntactic and ...
Better extensibility through modular syntax
PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe explore how to make the benefits of modularity available for syntactic specifications and present Rats!, a parser generator for Java that supports easily extensible syntax. Our parser generator builds on recent research on parsing expression grammars ...
Comments