skip to main content
article

Better extensibility through modular syntax

Published:11 June 2006Publication History
Skip Abstract Section

Abstract

We explore how to make the benefits of modularity available for syntactic specifications and present Rats!, a parser generator for Java that supports easily extensible syntax. Our parser generator builds on recent research on parsing expression grammars (PEGs), which, by being closed under composition, prioritizing choices, supporting unlimited lookahead, and integrating lexing and parsing, offer an attractive alternative to context-free grammars. PEGs are implemented by so-called packrat parsers, which are recursive descent parsers that memoize all intermediate results (hence their name). Memoization ensures linear-time performance in the presence of unlimited lookahead, but also results in an essentially lazy, functional parsing technique. In this paper, we explore how to leverage PEGs and packrat parsers as the foundation for extensible syntax. In particular, we show how make packrat parsing more widely applicable by implementing this lazy, functional technique in a strict, imperative language, while also generating better performing parsers through aggressive optimizations. Next, we develop a module system for organizing, modifying, and composing large-scale syntactic specifications. Finally, we describe a new technique for managing (global) parsing state in functional parsers. Our experimental evaluation demonstrates that the resulting parser generator succeeds at providing extensible syntax. In particular, Rats! enables other grammar writers to realize real-world language extensions in little time and code, and it generates parsers that consistently out-perform parsers created by two GLR parser generators.

References

  1. J. Aycock and R. N. Horspool. Practical Earley parsing. The Computer Journal, 45(6):620--630, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. Bender. Mini-bibliography on modules for functional programming languages. http://readscheme.org/modules/, Mar. 2004.Google ScholarGoogle Scholar
  3. A. Birman. The TMG Recognition Schema. PhD thesis, Princeton University, Feb. 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Birman and J. D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1--34, Aug. 1973.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Brabrand and M. I. Schwartzbach. Growing languages with metamorphic syntax macros. In Proc. 2002 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, pp. 31--40, Jan. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Brabrand, M. I. Schwartzbach, and M. Vanggaard. The \tt metafront system: Extensible parsing and transformation. Electronic Notes in Theoretical Computer Science, 82(3):592--611, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: Adding genericity to the Java programming language. In Proc. 1998 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 183--200, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Bravenboer, A. van Dam, K. Olmos, and E. Visser. Program transformation with scoped dynamic rewrite rules. Fundamenta Informaticae, 69(1--2):123--178, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Bravenboer and E. Visser. Concrete syntax for objects. In Proc. 2004 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 365--383, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Brukman and A. C. Myers. PPG: A parser generator for extensible grammars. http://www.cs.cornell.edu/Projects/polyglot/ppg.html.Google ScholarGoogle Scholar
  11. L. Cardelli, F. Matthes, and M. Abadi. Extensible syntax with lexical scoping. Tech. Report 121, DEC SRC, Feb. 1994.Google ScholarGoogle Scholar
  12. Cryptix Foundation. Cryptix JCE. http://www.cryptix.org/.Google ScholarGoogle Scholar
  13. A. Dikjstra and D. S. Swierstra. Lazy functional parser combinators in Java. In Proc. 1st Workshop on Multiparadigm Programming with Object-Oriented Languages, pp. 11--42. John von Neumann Institute for Computing, May 2001.Google ScholarGoogle Scholar
  14. J. C. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94--102, Feb. 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Ekman and G. Hedin. Rewritable reference attributed grammars. In Proc. 18th European Conference on Object-Oriented Programming, vol. 3086 of LNCS, pp. 147--171. Springer, June 2004.Google ScholarGoogle ScholarCross RefCross Ref
  16. O. Enseling. Build your own languages with JavaCC. JavaWorld, Dec. 2000. http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-cooltools.html.Google ScholarGoogle Scholar
  17. M. E. Fiuczynski, R. Grimm, Y. Coady, and D. Walker. patch (1) considered harmful. In Proc. 10th Workshop on Hot Topics in Operating Systems, pp. 91--96, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Ford. Packrat parsing: A practical linear-time algorithm with backtracking. Master's thesis, MIT, Sept. 2002.Google ScholarGoogle Scholar
  19. B. Ford. Packrat parsing: Simple, powerful, lazy, linear time. In Proc. 2002 ACM International Conference on Functional Programming, pp. 36--47, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Proc. 31st ACM Symposium on Principles of Programming Languages, pp. 111--122, Jan. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Free Software Foundation. Bison. http://www.gnu.org/software/bison/.Google ScholarGoogle Scholar
  22. R. Grimm. Systems need languages need systems! In Proc. 2nd ECOOP Workshop on Programming Languages and Operating Systems, July 2005.Google ScholarGoogle Scholar
  23. ISO. Information technology---syntactic metalanguage---extended BNF. ISO/IEC Standard 14977, Aug. 1996.Google ScholarGoogle Scholar
  24. E. Kohler, M. F. Kaashoek, and D. R. Montgomery. A readable TCP in the Prolac protocol language. In Proc. 1999 ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 3--13, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Lee, A. LaMarca, and C. Chambers. HydroJ: Object-oriented pattern matching for evolvable distributed systems. In Proc. 2003 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 205--223, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. R. Levine. lex & yacc. O'Reilly, Oct. 1992.Google ScholarGoogle Scholar
  27. D. Mazières. A toolkit for user-level file systems. In Proc. 2001 USENIX Annual Technical Conference, pp. 261--274, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. McPeak and G. C. Necula. Elkhound: A fast, practical GLR parser generator. In Proc. 13th International Conference on Compiler Construction, vol. 2985 of LNCS, pp. 73--88. Springer, Mar. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  29. F. Mérillon, L. Réveillère, C. Consel, R. Marlet, and G. Muller. Devil: An IDL for hardware programming. In Proc. 4th USENIX Symposium on Operating Systems Design and Implementation, pp. 17--30, Oct. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Millstein. Practical predicate dispatch. In Proc. 2004 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 345--364, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. C. Myers. JFlow: Practical mostly-static information flow control. In Proc. 26th ACM Symposium on Principles of Programming Languages, pp. 228--241, Jan. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler framework for Java. In Proc. 12th International Conference on Compiler Construction, vol. 2622 of LNCS, pp. 138--152. Springer, Apr. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Palsberg, K. Tao, and W. Wang. Java tree builder. http://compilers.cs.ucla.edu/jtb/.Google ScholarGoogle Scholar
  34. T. J. Parr and R. W. Quong. ANTLR: A predicated-LL(k) parser generator. Software---Practice and Experience, 25(7):789--810, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Rodriguez, C. Killian, S. Bhat, D. Kostić, and A. Vahdat. MACEDON: Methodology for automatically creating, evaluating, and designing overlay networks. In Proc. 1st ACM/USENIX Symposium on Networked Systems Design and Implementation, pp. 267--280, Mar. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Roskind. Parsing C, the last word. http://groups-beta.google.com/group/comp.compilers/msg/c0797b5b668605b4, Jan. 1992.Google ScholarGoogle Scholar
  37. D. J. Salomon and G. V. Cormack. Scannerless NSLR(1) parsing of programming languages. In Proc. 1989 ACM Conference on Programming Language Design and Implementation, pp. 170--178, June 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Tomita, ed. Generalized LR Parsing. Kluwer, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. G. J. van den Brand, J. Scheerder, J. J. Vinju, and E. Visser. Disambiguation filters for scannerless generalized LR parsers. In Proc. 11th International Conference on Compiler Construction, vol. 2304 of LNCS, pp. 143--158. Springer, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Visser. Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam, Sept. 1997.Google ScholarGoogle Scholar
  41. P. Wadler. Monads for functional programming. In Advanced Functional Programming, vol. 925 of LNCS, pp. 24--52. Springer, 1995. Google ScholarGoogle ScholarCross RefCross Ref
  42. N. Wirth. What can we do about the unnecessary diversity of notation for syntactic definitions? Communications of the ACM, 20(11):822--823, Nov. 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Better extensibility through modular syntax

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 41, Issue 6
        Proceedings of the 2006 PLDI Conference
        June 2006
        426 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1133255
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2006
          438 pages
          ISBN:1595933204
          DOI:10.1145/1133981

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2006

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader