skip to main content
10.1145/1836089.1836120acmotherconferencesArticle/Chapter ViewAbstractPublication PagesppdpConference Proceedingsconference-collections
research-article

Typed and unambiguous pattern matching on strings using regular expressions

Published:26 July 2010Publication History

ABSTRACT

We show how to achieve typed and unambiguous declarative pattern matching on strings using regular expressions extended with a simple recording operator.

We give a characterization of ambiguity of regular expressions that leads to a sound and complete static analysis. The analysis is capable of pinpointing all ambiguities in terms of the structure of the regular expression and report shortest ambiguous strings. We also show how pattern matching can be integrated into statically typed programming languages for deconstructing strings and reproducing typed and structured values.

We validate our approach by giving a full implementation of the approach presented in this paper. The resulting tool, reg-exp-rec, adds typed and unambiguous pattern matching to Java in a standalone and non-intrusive manner. We evaluate the approach using several realistic examples.

References

  1. Emilie Balland, Paul Brauner, Radu Kopetz, Pierre-Etienne Moreau, and Antoine Reilles. Tom: Piggybacking rewriting on java. In Proceedings of the 18th Conference on Rewriting Techniques and Applications, volume 4533 of Lecture Notes in Computer Science, pages 36--47. Springer-Verlag, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Véronique Benzaken, Giuseppe Castagna, and Alain Frisch. Cduce: an XML-centric general-purpose language. In ICFP, pages 51--63, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ronald Book, Shimon Even, Sheila Greibach, and Gene Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, 20(2):149--153, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Claus Brabrand, Robert Giegerich, and Anders Møller. Analyzing ambiguity of context-free grammars. Science of Computer Programming, 75(3), 2010. To appear. Earlier version in Proc. 12th International Conference on Implementation and Application of Automata, CIAA '07, Springer-Verlag LNCS vol. 4783. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Claus Brabrand and Jakob G. Thomsen. Typed and unambiguous pattern matching on strings using regular expressions. Technical report, IT University of Copenhagen, 2010.Google ScholarGoogle Scholar
  6. Niklas Broberg, Andreas Farre, and Josef Svenningsson. Regular expression patterns. In ICFP, pages 67--78, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Campeanu and N. Santean. On pattern expression languages. In Proceedings AutoMathA, 2007.Google ScholarGoogle Scholar
  8. Noam Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2:113--124, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  9. Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251--280, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. James R. Cordy, Charles D. Halpern-Hamu, and Eric Promislow. TXL: a rapid prototyping system for programming language dialects. Comput. Lang., 16(1):97--107, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Russ Cox. Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby, ...). http://swtch.com/ rsc/regexp/regexp1.html, January 2007.Google ScholarGoogle Scholar
  12. Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94--102, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Burak Emir. Compiling regular patterns to sequential machines. In SAC, pages 1385--1389, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Martin Odersky et al. An overview of the scala programming language. Technical Report IC/2004/64, EPFL Lausanne, Switzerland, 2004.Google ScholarGoogle Scholar
  15. T. Berners-Lee et. al. Uniform resource locators (URL). http://www.ietf.org/rfc/rfc1738.txt, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Berners-Lee et. al. Uniform resource identifier (URI). http://www.ietf.org/rfc/rfc3986.txt, 2005.Google ScholarGoogle Scholar
  17. Kathleen Fisher and Robert Gruber. Pads: a domain-specific language for processing ad hoc data. SIGPLAN Not., 40(6):295--304, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bryan Ford. Packrat parsing: simple, powerful, lazy, linear time. In In Proc. International Conference on Functional Programming (ICFP'02), pages 36--47, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bryan Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Symposium on Principles of Programming Languages (POPL'04), pages 111--122. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alain Frisch. Regular tree language recognition with static information. In IFIP TCS, pages 661--674, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. Alain Frisch and Luca Cardelli. Greedy regular expression matching. In ICALP, pages 618--629, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  22. Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. Semantic subtyping. In LICS, pages 137--146, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. EtienneM. Gagnon and Laurie J. Hendren. Sablecc, an object-oriented compiler framework. Technology of Object-Oriented Languages, International Conference on, 0:140, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fritz Henglein and Lasse Nielsen. Declarative coinductive axiomatization of regular expression containment and its computational interpretation (preliminary version). Topps d-report, Department of Computer Science, University of Copenhagen (DIKU), February 2010.Google ScholarGoogle Scholar
  25. John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, April 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Haruo Hosoya. Regular expression pattern matching - a simpler design. Technical report, 2003.Google ScholarGoogle Scholar
  27. Haruo Hosoya and Benjamin C. Pierce. Regular expression pattern matching for XML. In POPL, pages 67--80, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Haruo Hosoya, Jerome Vouillon, and Benjamin C. Pierce. Regular expression types for XML. In Proc. 5th ACM SIGPLAN International Conference on Functional Programming, ICFP '00, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J.R.Levine, T. Mason, and D. Brown. Lex and yacc. O'Reilly and Associates, Inc., Sebastopol, CA, 1992.Google ScholarGoogle Scholar
  30. Ville Laurikari. Nfas with tagged transitions, their conversion to deterministic automata and application to regular expressions. In SPIRE, pages 181--187, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lillian Lee. Fast context-free grammar parsing requires fast boolean matrix multiplication. J. ACM, 49(1):1--15, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Robin Milner, Mads Tofte, and David Macqueen. The Definition of Standard ML. MIT Press, Cambridge, MA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Anders Møller. dk.brics.grammar: Context-free grammars for java. http://www.brics.dk/grammar/, 2006.Google ScholarGoogle Scholar
  34. Lasse Nielsen and Fritz Henglein. Parsing with dfas (preliminary version). Topps d-report, Department of Computer Science, University of Copenhagen (DIKU), May 2010.Google ScholarGoogle Scholar
  35. Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Bookshelf, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. James F. Power and Brian A. Malloy. A metrics suite for grammarbased software: Research articles. Journal of Software Maintenance and Evolution, 16(6):405--426, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Stijn Vansummeren. Type inference for unique pattern matching. ACM Trans. Program. Lang. Syst., 28(3):389--428, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Burak Emir. Compiling regular patterns to sequential machines. InGoogle ScholarGoogle Scholar
  39. SAC, pages 1385---1389, 2005.Google ScholarGoogle Scholar
  40. Martin Odersky et al. An overview of the scala programming language.Google ScholarGoogle Scholar
  41. Technical Report IC/2004/64, EPFL Lausanne, Switzerland,Google ScholarGoogle Scholar
  42. 2004.Google ScholarGoogle Scholar
  43. T. Berners--Lee et. al. Uniform resource locators (URL).Google ScholarGoogle Scholar
  44. http://www.ietf.org/rfc/rfc1738.txt, 1994.Google ScholarGoogle Scholar
  45. T. Berners--Lee et. al. Uniform resource identifier (URI).Google ScholarGoogle Scholar
  46. http://www.ietf.org/rfc/rfc3986.txt, 2005.Google ScholarGoogle Scholar
  47. Kathleen Fisher and Robert Gruber. Pads: a domain--specific languageGoogle ScholarGoogle Scholar
  48. for processing ad hoc data. SIGPLAN Not., 40(6):295---304, 2005.Google ScholarGoogle Scholar
  49. Bryan Ford. Packrat parsing: simple, powerful, lazy, linear time.Google ScholarGoogle Scholar
  50. In In Proc. International Conference on Functional ProgrammingGoogle ScholarGoogle Scholar
  51. (ICFP'02), pages 36---47, 2002.Google ScholarGoogle Scholar
  52. Bryan Ford. Parsing expression grammars: A recognition--based syntacticGoogle ScholarGoogle Scholar
  53. foundation. In Symposium on Principles of Programming LanguagesGoogle ScholarGoogle Scholar
  54. (POPL'04), pages 111---122. ACM Press, 2004.Google ScholarGoogle Scholar
  55. Alain Frisch. Regular tree language recognition with static information.Google ScholarGoogle Scholar
  56. In IFIP TCS, pages 661---674, 2004.Google ScholarGoogle Scholar
  57. Alain Frisch and Luca Cardelli. Greedy regular expression matching.Google ScholarGoogle Scholar
  58. In ICALP, pages 618---629, 2004.Google ScholarGoogle Scholar
  59. Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. SemanticGoogle ScholarGoogle Scholar
  60. subtyping. In LICS, pages 137---146, 2002.Google ScholarGoogle Scholar
  61. EtienneM. Gagnon and Laurie J. Hendren. Sablecc, an object--orientedGoogle ScholarGoogle Scholar
  62. compiler framework. Technology of Object--Oriented Languages, InternationalGoogle ScholarGoogle Scholar
  63. Conference on, 0:140, 1998.Google ScholarGoogle Scholar
  64. Fritz Henglein and Lasse Nielsen. Declarative coinductive axiomatizationGoogle ScholarGoogle Scholar
  65. of regular expression containment and its computational interpretationGoogle ScholarGoogle Scholar
  66. (preliminary version). Topps d--report, Department of ComputerGoogle ScholarGoogle Scholar
  67. Science, University of Copenhagen (DIKU), February 2010.Google ScholarGoogle Scholar
  68. John E. Hopcroft and Jeffrey D. Ullman. Introduction to AutomataGoogle ScholarGoogle Scholar
  69. Theory, Languages and Computation. Addison--Wesley, April 1979.Google ScholarGoogle Scholar
  70. Haruo Hosoya. Regular expression pattern matching -- a simplerGoogle ScholarGoogle Scholar
  71. design. Technical report, 2003.Google ScholarGoogle Scholar
  72. Haruo Hosoya and Benjamin C. Pierce. Regular expression patternGoogle ScholarGoogle Scholar
  73. matching for XML. In POPL, pages 67---80, 2001.Google ScholarGoogle Scholar
  74. Haruo Hosoya, Jerome Vouillon, and Benjamin C. Pierce. RegularGoogle ScholarGoogle Scholar
  75. expression types for XML. In Proc. 5th ACM SIGPLAN International Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Conference on Functional Programming, ICFP '00, September 2000.Google ScholarGoogle Scholar
  77. J.R.Levine, T. Mason, and D. Brown. Lex and yacc. O'Reilly andGoogle ScholarGoogle Scholar
  78. Associates, Inc., Sebastopol, CA, 1992.Google ScholarGoogle Scholar
  79. Ville Laurikari. Nfas with tagged transitions, their conversion toGoogle ScholarGoogle Scholar
  80. deterministic automata and application to regular expressions. InGoogle ScholarGoogle Scholar
  81. SPIRE, pages 181---187, 2000.Google ScholarGoogle Scholar
  82. Lillian Lee. Fast context--free grammar parsing requires fast booleanGoogle ScholarGoogle Scholar
  83. matrix multiplication. J. ACM, 49(1):1---15, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Robin Milner, Mads Tofte, and David Macqueen. The Definition ofGoogle ScholarGoogle Scholar
  85. Standard ML. MIT Press, Cambridge, MA, USA, 1997.Google ScholarGoogle Scholar
  86. AndersMøller. dk.brics.grammar: Context--free grammars for java.Google ScholarGoogle Scholar
  87. http://www.brics.dk/grammar/, 2006.Google ScholarGoogle Scholar
  88. Lasse Nielsen and Fritz Henglein. Parsing with dfas (preliminaryGoogle ScholarGoogle Scholar
  89. version). Topps d--report, Department of Computer Science, UniversityGoogle ScholarGoogle Scholar
  90. of Copenhagen (DIKU), May 2010.Google ScholarGoogle Scholar
  91. Terence Parr. The Definitive ANTLR Reference: Building Domain-- Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Specific Languages. Pragmatic Bookshelf, 2007.Google ScholarGoogle Scholar
  93. James F. Power and Brian A. Malloy. A metrics suite for grammarbasedGoogle ScholarGoogle Scholar
  94. software: Research articles. Journal of Software MaintenanceGoogle ScholarGoogle Scholar
  95. and Evolution, 16(6):405---426, 2004.Google ScholarGoogle Scholar
  96. Stijn Vansummeren. Type inference for unique pattern matching. ACMGoogle ScholarGoogle Scholar
  97. Trans. Program. Lang. Syst., 28(3):389---428, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Typed and unambiguous pattern matching on strings using regular expressions

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  PPDP '10: Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
                  July 2010
                  266 pages
                  ISBN:9781450301329
                  DOI:10.1145/1836089

                  Copyright © 2010 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 26 July 2010

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  PPDP '10 Paper Acceptance Rate21of57submissions,37%Overall Acceptance Rate230of486submissions,47%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader