ABSTRACT
We show how to achieve typed and unambiguous declarative pattern matching on strings using regular expressions extended with a simple recording operator.
We give a characterization of ambiguity of regular expressions that leads to a sound and complete static analysis. The analysis is capable of pinpointing all ambiguities in terms of the structure of the regular expression and report shortest ambiguous strings. We also show how pattern matching can be integrated into statically typed programming languages for deconstructing strings and reproducing typed and structured values.
We validate our approach by giving a full implementation of the approach presented in this paper. The resulting tool, reg-exp-rec, adds typed and unambiguous pattern matching to Java in a standalone and non-intrusive manner. We evaluate the approach using several realistic examples.
- Emilie Balland, Paul Brauner, Radu Kopetz, Pierre-Etienne Moreau, and Antoine Reilles. Tom: Piggybacking rewriting on java. In Proceedings of the 18th Conference on Rewriting Techniques and Applications, volume 4533 of Lecture Notes in Computer Science, pages 36--47. Springer-Verlag, 2007. Google ScholarDigital Library
- Véronique Benzaken, Giuseppe Castagna, and Alain Frisch. Cduce: an XML-centric general-purpose language. In ICFP, pages 51--63, 2003. Google ScholarDigital Library
- Ronald Book, Shimon Even, Sheila Greibach, and Gene Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, 20(2):149--153, 1971. Google ScholarDigital Library
- Claus Brabrand, Robert Giegerich, and Anders Møller. Analyzing ambiguity of context-free grammars. Science of Computer Programming, 75(3), 2010. To appear. Earlier version in Proc. 12th International Conference on Implementation and Application of Automata, CIAA '07, Springer-Verlag LNCS vol. 4783. Google ScholarDigital Library
- Claus Brabrand and Jakob G. Thomsen. Typed and unambiguous pattern matching on strings using regular expressions. Technical report, IT University of Copenhagen, 2010.Google Scholar
- Niklas Broberg, Andreas Farre, and Josef Svenningsson. Regular expression patterns. In ICFP, pages 67--78, 2004. Google ScholarDigital Library
- C. Campeanu and N. Santean. On pattern expression languages. In Proceedings AutoMathA, 2007.Google Scholar
- Noam Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2:113--124, 1956.Google ScholarCross Ref
- Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251--280, 1990. Google ScholarDigital Library
- James R. Cordy, Charles D. Halpern-Hamu, and Eric Promislow. TXL: a rapid prototyping system for programming language dialects. Comput. Lang., 16(1):97--107, 1991. Google ScholarDigital Library
- Russ Cox. Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby, ...). http://swtch.com/ rsc/regexp/regexp1.html, January 2007.Google Scholar
- Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94--102, 1970. Google ScholarDigital Library
- Burak Emir. Compiling regular patterns to sequential machines. In SAC, pages 1385--1389, 2005. Google ScholarDigital Library
- Martin Odersky et al. An overview of the scala programming language. Technical Report IC/2004/64, EPFL Lausanne, Switzerland, 2004.Google Scholar
- T. Berners-Lee et. al. Uniform resource locators (URL). http://www.ietf.org/rfc/rfc1738.txt, 1994. Google ScholarDigital Library
- T. Berners-Lee et. al. Uniform resource identifier (URI). http://www.ietf.org/rfc/rfc3986.txt, 2005.Google Scholar
- Kathleen Fisher and Robert Gruber. Pads: a domain-specific language for processing ad hoc data. SIGPLAN Not., 40(6):295--304, 2005. Google ScholarDigital Library
- Bryan Ford. Packrat parsing: simple, powerful, lazy, linear time. In In Proc. International Conference on Functional Programming (ICFP'02), pages 36--47, 2002. Google ScholarDigital Library
- Bryan Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Symposium on Principles of Programming Languages (POPL'04), pages 111--122. ACM Press, 2004. Google ScholarDigital Library
- Alain Frisch. Regular tree language recognition with static information. In IFIP TCS, pages 661--674, 2004.Google ScholarCross Ref
- Alain Frisch and Luca Cardelli. Greedy regular expression matching. In ICALP, pages 618--629, 2004.Google ScholarCross Ref
- Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. Semantic subtyping. In LICS, pages 137--146, 2002. Google ScholarDigital Library
- EtienneM. Gagnon and Laurie J. Hendren. Sablecc, an object-oriented compiler framework. Technology of Object-Oriented Languages, International Conference on, 0:140, 1998. Google ScholarDigital Library
- Fritz Henglein and Lasse Nielsen. Declarative coinductive axiomatization of regular expression containment and its computational interpretation (preliminary version). Topps d-report, Department of Computer Science, University of Copenhagen (DIKU), February 2010.Google Scholar
- John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, April 1979. Google ScholarDigital Library
- Haruo Hosoya. Regular expression pattern matching - a simpler design. Technical report, 2003.Google Scholar
- Haruo Hosoya and Benjamin C. Pierce. Regular expression pattern matching for XML. In POPL, pages 67--80, 2001. Google ScholarDigital Library
- Haruo Hosoya, Jerome Vouillon, and Benjamin C. Pierce. Regular expression types for XML. In Proc. 5th ACM SIGPLAN International Conference on Functional Programming, ICFP '00, September 2000. Google ScholarDigital Library
- J.R.Levine, T. Mason, and D. Brown. Lex and yacc. O'Reilly and Associates, Inc., Sebastopol, CA, 1992.Google Scholar
- Ville Laurikari. Nfas with tagged transitions, their conversion to deterministic automata and application to regular expressions. In SPIRE, pages 181--187, 2000. Google ScholarDigital Library
- Lillian Lee. Fast context-free grammar parsing requires fast boolean matrix multiplication. J. ACM, 49(1):1--15, 2002. Google ScholarDigital Library
- Robin Milner, Mads Tofte, and David Macqueen. The Definition of Standard ML. MIT Press, Cambridge, MA, USA, 1997. Google ScholarDigital Library
- Anders Møller. dk.brics.grammar: Context-free grammars for java. http://www.brics.dk/grammar/, 2006.Google Scholar
- Lasse Nielsen and Fritz Henglein. Parsing with dfas (preliminary version). Topps d-report, Department of Computer Science, University of Copenhagen (DIKU), May 2010.Google Scholar
- Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Bookshelf, 2007. Google ScholarDigital Library
- James F. Power and Brian A. Malloy. A metrics suite for grammarbased software: Research articles. Journal of Software Maintenance and Evolution, 16(6):405--426, 2004. Google ScholarDigital Library
- Stijn Vansummeren. Type inference for unique pattern matching. ACM Trans. Program. Lang. Syst., 28(3):389--428, 2006. Google ScholarDigital Library
- Burak Emir. Compiling regular patterns to sequential machines. InGoogle Scholar
- SAC, pages 1385---1389, 2005.Google Scholar
- Martin Odersky et al. An overview of the scala programming language.Google Scholar
- Technical Report IC/2004/64, EPFL Lausanne, Switzerland,Google Scholar
- 2004.Google Scholar
- T. Berners--Lee et. al. Uniform resource locators (URL).Google Scholar
- http://www.ietf.org/rfc/rfc1738.txt, 1994.Google Scholar
- T. Berners--Lee et. al. Uniform resource identifier (URI).Google Scholar
- http://www.ietf.org/rfc/rfc3986.txt, 2005.Google Scholar
- Kathleen Fisher and Robert Gruber. Pads: a domain--specific languageGoogle Scholar
- for processing ad hoc data. SIGPLAN Not., 40(6):295---304, 2005.Google Scholar
- Bryan Ford. Packrat parsing: simple, powerful, lazy, linear time.Google Scholar
- In In Proc. International Conference on Functional ProgrammingGoogle Scholar
- (ICFP'02), pages 36---47, 2002.Google Scholar
- Bryan Ford. Parsing expression grammars: A recognition--based syntacticGoogle Scholar
- foundation. In Symposium on Principles of Programming LanguagesGoogle Scholar
- (POPL'04), pages 111---122. ACM Press, 2004.Google Scholar
- Alain Frisch. Regular tree language recognition with static information.Google Scholar
- In IFIP TCS, pages 661---674, 2004.Google Scholar
- Alain Frisch and Luca Cardelli. Greedy regular expression matching.Google Scholar
- In ICALP, pages 618---629, 2004.Google Scholar
- Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. SemanticGoogle Scholar
- subtyping. In LICS, pages 137---146, 2002.Google Scholar
- EtienneM. Gagnon and Laurie J. Hendren. Sablecc, an object--orientedGoogle Scholar
- compiler framework. Technology of Object--Oriented Languages, InternationalGoogle Scholar
- Conference on, 0:140, 1998.Google Scholar
- Fritz Henglein and Lasse Nielsen. Declarative coinductive axiomatizationGoogle Scholar
- of regular expression containment and its computational interpretationGoogle Scholar
- (preliminary version). Topps d--report, Department of ComputerGoogle Scholar
- Science, University of Copenhagen (DIKU), February 2010.Google Scholar
- John E. Hopcroft and Jeffrey D. Ullman. Introduction to AutomataGoogle Scholar
- Theory, Languages and Computation. Addison--Wesley, April 1979.Google Scholar
- Haruo Hosoya. Regular expression pattern matching -- a simplerGoogle Scholar
- design. Technical report, 2003.Google Scholar
- Haruo Hosoya and Benjamin C. Pierce. Regular expression patternGoogle Scholar
- matching for XML. In POPL, pages 67---80, 2001.Google Scholar
- Haruo Hosoya, Jerome Vouillon, and Benjamin C. Pierce. RegularGoogle Scholar
- expression types for XML. In Proc. 5th ACM SIGPLAN International Google ScholarDigital Library
- Conference on Functional Programming, ICFP '00, September 2000.Google Scholar
- J.R.Levine, T. Mason, and D. Brown. Lex and yacc. O'Reilly andGoogle Scholar
- Associates, Inc., Sebastopol, CA, 1992.Google Scholar
- Ville Laurikari. Nfas with tagged transitions, their conversion toGoogle Scholar
- deterministic automata and application to regular expressions. InGoogle Scholar
- SPIRE, pages 181---187, 2000.Google Scholar
- Lillian Lee. Fast context--free grammar parsing requires fast booleanGoogle Scholar
- matrix multiplication. J. ACM, 49(1):1---15, 2002.Google ScholarDigital Library
- Robin Milner, Mads Tofte, and David Macqueen. The Definition ofGoogle Scholar
- Standard ML. MIT Press, Cambridge, MA, USA, 1997.Google Scholar
- AndersMøller. dk.brics.grammar: Context--free grammars for java.Google Scholar
- http://www.brics.dk/grammar/, 2006.Google Scholar
- Lasse Nielsen and Fritz Henglein. Parsing with dfas (preliminaryGoogle Scholar
- version). Topps d--report, Department of Computer Science, UniversityGoogle Scholar
- of Copenhagen (DIKU), May 2010.Google Scholar
- Terence Parr. The Definitive ANTLR Reference: Building Domain-- Google ScholarDigital Library
- Specific Languages. Pragmatic Bookshelf, 2007.Google Scholar
- James F. Power and Brian A. Malloy. A metrics suite for grammarbasedGoogle Scholar
- software: Research articles. Journal of Software MaintenanceGoogle Scholar
- and Evolution, 16(6):405---426, 2004.Google Scholar
- Stijn Vansummeren. Type inference for unique pattern matching. ACMGoogle Scholar
- Trans. Program. Lang. Syst., 28(3):389---428, 2006. Google ScholarDigital Library
Index Terms
- Typed and unambiguous pattern matching on strings using regular expressions
Recommendations
Typed self-interpretation by pattern matching
ICFP '11: Proceedings of the 16th ACM SIGPLAN international conference on Functional programmingSelf-interpreters can be roughly divided into two sorts: self-recognisers that recover the input program from a canonical representation, and self-enactors that execute the input program. Major progress for statically-typed languages was achieved in ...
Construction of fuzzy automata from fuzzy regular expressions
Li and Pedrycz have proved fundamental results that provide different equivalent ways to represent fuzzy languages with membership values in a lattice-ordered monoid, and generalize the well-known results of the classical theory of formal languages. In ...
Typed self-interpretation by pattern matching
ICFP '11Self-interpreters can be roughly divided into two sorts: self-recognisers that recover the input program from a canonical representation, and self-enactors that execute the input program. Major progress for statically-typed languages was achieved in ...
Comments