skip to main content
10.1145/3338906.3342509acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
extended-abstract

Rethinking Regex engines to address ReDoS

Published:12 August 2019Publication History

ABSTRACT

Regular expressions (regexes) are a powerful string manipulation tool. Unfortunately, in programming languages like Python, Java, and JavaScript, they are unnecessarily dangerous, implemented with worst-case exponential matching behavior. This high time complexity exposes software services to regular expression denial of service (ReDoS) attacks.

We propose a data-driven redesign of regex engines, to reflect how regexes are used and what they typically look like. We report that about 95% of regexes in popular programming languages can be evaluated in linear time. The regex engine is a fundamental component of a programming language, and any changes risk introducing compatibility problems. We believe a full redesign is therefore impractical, and so we describe how the vast majority of regex matches can be made linear-time with minor, not major, changes to existing algorithms. Our prototype shows that on a kernel of the regex language, we can trade space for time to make regex matches safe

References

  1. {n. d.}. re1: A simple regular expression engine, easy to read and study. https: //code.google.com/archive/p/re1.Google ScholarGoogle Scholar
  2. Alfred V Aho. 1990. Algorithms for finding patterns in strings. Elsevier, Chapter 5, 255–300.Google ScholarGoogle Scholar
  3. Carl Chapman and Kathryn T Stolee. 2016. Exploring regular expression usage and context in Python. International Symposium on Software Testing and Analysis (ISSTA) (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Russ Cox. 2007. Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...). https://swtch.com/~rsc/regexp/regexp1.htmlGoogle ScholarGoogle Scholar
  5. Scott Crosby. 2003. Denial of service through regular expressions. USENIX Security work in progress report (2003).Google ScholarGoogle Scholar
  6. Scott A Crosby and Dan S Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In USENIX Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. James C Davis, Christy A Coghlan, Francisco Servant, and Dongyoon Lee. 2018. The Impact of Regular Expression Denial of Service (ReDoS) in Practice: an Empirical Study at the Ecosystem Scale. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. James C Davis, Michael Michael, Christy A Coghlan, Francisco Servant, and Dongyoon Lee. 2019. Are Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. James C Davis, Eric R Williamson, and Dongyoon Lee. 2018. A Sense of Time for JavaScript and Node.js: First-Class Timeouts as a Cure for Event Handler Poisoning. In USENIX Security Symposium (USENIX Security). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A Ojamaa and K Duuna. 2012. Assessing the security of Node.js platform. In 7th International Conference for Internet Technology and Secured Transactions (ICITST).Google ScholarGoogle Scholar
  11. Asiri Rathnayake and Hayo Thielecke. 2014. Static Analysis for Regular Expression Exponential Runtime via Substructural Logics. Technical Report.Google ScholarGoogle Scholar
  12. Alex Roichman and Adar Weidman. 2009. VAC - ReDoS: Regular Expression Denial Of Service. Open Web Application Security Project (OWASP) (2009).Google ScholarGoogle Scholar
  13. Michael Sipser. 2006. Introduction to the Theory of Computation. Vol. 2. Thomson Course Technology Boston. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Henry Spencer. 1994. A regular-expression matcher. In Software solutions in C. 35–71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cristian-Alexandru Staicu and Michael Pradel. 2018. Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers. In USENIX Security Symposium (USENIX Security). https://www.npmjs.com/package/safe-regexhttp: //mp.binaervarianz.de/ReDoS_TR_Dec2017.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bryan Sullivan. 2010. New Tool: SDL Regex Fuzzer. https://blogs.microsoft.com/ microsoftsecure/2010/10/12/new-tool-sdl-regex-fuzzer/Google ScholarGoogle Scholar
  17. Ken Thompson. 1968. Regular Expression Search Algorithm. Communications of the ACM (CACM) (1968). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peipei Wang and Kathryn T Stolee. 2018. How well are regular expressions tested in the wild?. In Foundations of Software Engineering (FSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nicolaas Hendrik Weideman. 2017. Static Analysis of Regular Expressions. Ph.D. Dissertation. Stellenbosch University.Google ScholarGoogle Scholar
  20. Wikipedia contributors. 2018. Regular expression — Wikipedia, The Free Encyclopedia. https://web.archive.org/web/20180920152821/https://en.wikipedia.org/ w/index.php?title=Regular_expression.Google ScholarGoogle Scholar
  21. Valentin Wustholz, Oswaldo Olivo, Marijn J H Heule, and Isil Dillig. 2017. Static Detection of DoS Vulnerabilities in Programs that use Regular Expressions. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Abstract 1 Introduction 2 Background and Related Work 3 Empirical Motivation 4 Regex Engine Design 5 Results 6 Conclusions References Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rethinking Regex engines to address ReDoS

      Recommendations

      Reviews

      Subash Tirupachur Comerica

      Regular expressions (regexes) are widely used across computer science applications and software components. However, not all programming languages are immune to regular expression denial-of-service (ReDoS) attacks. These algorithmic complexity attacks target the core of the regex engines by chewing up the resources with regex evaluations that take super-linear time to operate, thereby denying invaluable resources to legitimate clients. The paper discusses the addition of a caching mechanism to Spencer's regex engine implementation that practically reduces most super-linear regex evaluations to linear evaluations. This drastically reduces the attack surface area for ReDoS and also improves performance for most evaluations. Software designers delivering modules that rely heavily on regex evaluations should find these conclusions interesting. Popular programming languages like Python, JavaScript, and Ruby use regex engines that implement Spencer's algorithm. The worst-case time complexity for Spencer's is super-linear for 19 to 38 percent of regular expressions. Nevertheless, 95 percent of these super-linear regexes can be evaluated linearly. On the one hand, language developers prefer Spencer's for extensibility; software engineers, on the other hand, rely on heavyweight regexes. Hence there is a need to make these regex engines safe from ReDoS. The paper describes the method and prototype results for adding a cache "in a simple backtracking regex engine implementation published by Cox." While Spencer's method blows up with super-linear regexes, the cache-based implementation performs linearly in the worst case. Though a significant number of real regexes depict worst-case super-linear behavior, it is entirely possible to support these at linear time with a modest space cost.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        August 2019
        1264 pages
        ISBN:9781450355728
        DOI:10.1145/3338906

        Copyright © 2019 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2019

        Check for updates

        Qualifiers

        • extended-abstract

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

        Upcoming Conference

        FSE '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader