research-article

Grammar-based whitebox fuzzing

Authors:

Patrice Godefroid,

Michael Y. LevinAuthors Info & Claims

ACM SIGPLAN Notices, Volume 43, Issue 6

Pages 206 - 215

https://doi.org/10.1145/1379022.1375607

Published: 07 June 2008 Publication History

Abstract

Whitebox fuzzing is a form of automatic dynamic test generation, based on symbolic execution and constraint solving, designed for security testing of large applications. Unfortunately, the current effectiveness of whitebox fuzzing is limited when testing applications with highly-structured inputs, such as compilers and interpreters. These applications process their inputs in stages, such as lexing, parsing and evaluation. Due to the enormous number of control paths in early processing stages, whitebox fuzzing rarely reaches parts of the application beyond those first stages.

In this paper, we study how to enhance whitebox fuzzing of complex structured-input applications with a grammar-based specification of their valid inputs. We present a novel dynamic test generation algorithm where symbolic execution directly generates grammar-based constraints whose satisfiability is checked using a custom grammar-based constraint solver. We have implemented this algorithm and evaluated it on a large security-critical application, the JavaScript interpreter of Internet Explorer 7 (IE7). Results of our experiments show that grammar-based whitebox fuzzing explores deeper program paths and avoids dead-ends due to non-parsable inputs. Compared to regular whitebox fuzzing, grammar-based whitebox fuzzing increased coverage of the code generation module of the IE7 JavaScript interpreter from 53% to 81% while using three times fewer tests.

References

[1]

D. Aitel. The Advantages of Block-Based Protocol Analysis for Security Testing. Immunity Inc., February, 2002.

[2]

S. Artzi, A. Kie?un, J. Dolby, F. Tip, D. Dig, A. Paradkar, and M. D. Ernst. Finding bugs in dynamic Web applications. Technical Report MIT-CSAIL-TR-2008-006, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, Feb. 2008.

Digital Library

[3]

D. Bird and C. Munoz. Automatic Generation of Random Self-Checking Test Cases. IBM Systems Journal, 22(3):229--245, 1983.

Digital Library

[4]

N. Borisov, D. Brumley, H. Wang, J. Dunagan, P. Joshi, and C. Guo. Generic application-level protocol analyzer and its language. In NDSS, 2007.

[5]

C. Boyapati, S. Khurshid, and D. Marinov. Korat: automated testing based on Java predicates. In ISSTA, 2002.

Digital Library

[6]

C. Cadar, V. Ganesh, P. Pawlowski, D. Dill, and D. Engler. EXE: automatically generating inputs of death. In CCS, 2006.

Digital Library

[7]

K. Claessen and J. Hughes. QuickCheck: A lightweight tool for random testing of Haskell programs. In ICFP, 2000.

Digital Library

[8]

D. Coppit and J. Lian. yagg: an easy-to-use generator for structured test inputs. In ASE, 2005.

Digital Library

[9]

W. Cui, J. Kannan, and H. J. Wang. Discoverer: Automatic protocol reverse engineering from network traces. In USENIX Security Symposium, 2007.

Digital Library

[10]

B. Daniel, D. Dig, K. Garcia, and D. Marinov. Automated testing of refactoring engines. In FSE, 2007.

Digital Library

[11]

M. Emmi, R. Majumdar, and K. Sen. Dynamic test input generation for database applications. In ISSTA, 2007.

Digital Library

[12]

J. E. Forrester and B. P. Miller. An Empirical Study of the Robustness of Windows NT Applications Using Random Testing. In Proceedings of the 4th USENIX Windows System Symposium, Seattle, August 2000.

Digital Library

[13]

P. Godefroid. Compositional Dynamic Test Generation. In POPL, 2007.

Digital Library

[14]

P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In PLDI, 2005.

Digital Library

[15]

P. Godefroid, M. Levin, and D. Molnar. Active property checking. Technical Report MSR-TR-2007-91, Microsoft, 2007.

[16]

P. Godefroid, M. Levin, and D. Molnar. Automated whitebox fuzz testing. In NDSS, 2008.

[17]

K. Hanford. Automatic Generation of Test Cases. IBM Systems Journal, 9(4), 1970.

Digital Library

[18]

J. Hopcroft and J. Ullman. Introduction to automata theory, languages and computation. Addison-Wesley Series in Computer Science, 1979.

Digital Library

[19]

S. Khurshid and D. Marinov. TestEra: Specification-Based Testing of Java Programs Using SAT. In ASE, 2004.

Digital Library

[20]

J. King. Symbolic execution and program testing. Communications of the ACM, 19(7):385--394, 1976.

Digital Library

[21]

R. Lämmel and W. Schulte. Controllable combinatorial coverage in grammar-based testing. In TestCom, 2006.

Digital Library

[22]

R. Majumdar and K. Sen. LATEST: Lazy dynamic test input generation. Technical Report UCB/EECS-2007-36, EECS Department, University of California, Berkeley, 2007.

[23]

R. Majumdar and R.-G. Xu. Directed test generation using symbolic grammars. In ASE, 2007.

Digital Library

[24]

B. Malloy and J. Power. An interpretation of Purdom?s algorithm for automatic generation of test cases. In ICIS, 2001.

[25]

P. Maurer. Generating test data with enhanced context-free grammars. IEEE Software, 7(4), 1990.

Digital Library

[26]

B. McKenzie. Generating strings at random from a context free grammar. Technical Report TR-COSC 10/97, Department of Computer Science, University of Canterbury, 1997.

[27]

D. Melski and T. Reps. Interconvertbility of set constraints and context-free language reachability. In PEPM, 1997.

Digital Library

[28]

B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of UNIX utilities. Communications of the ACM, 33(12), 1990.

Digital Library

[29]

R. C. Moore. Removing left recursion from context-free grammars. In Proceedings of the first conference on North American chapter of the Association for Computational Linguistics, 2000.

Digital Library

[30]

C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedbackdirected random test generation. In ICSE, 2007.

Digital Library

[31]

R. Pang, V. Paxson, R. Sommer, and L. Peterson. binpac: a yacc for writing application protocol parsers. In IMC, 2006.

Digital Library

[32]

P. Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3), 1972.

[33]

D. J. Salomon and G. V. Cormack. Scannerless NSLR(1) parsing of programming languages. In PLDI, 1989.

Digital Library

[34]

K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. In FSE, 2005.

Digital Library

[35]

E. Sirer and B. Bershad. Using production grammars in software testing. In DSL, 1999.

Digital Library

[36]

K. Sullivan, J. Yang, D. Coppit, S. Khurshid, and D. Jackson. Software assurance by bounded exhaustive testing. In ISSTA, 2004.

Digital Library

[37]

M. Sutton, A. Greene, and P. Amini. Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley, 2007.

Digital Library

[38]

M. Utting, A. Pretschner, and B. Legeard. A Taxonomy of Model-Based Testing. Department of Computer Science, The University of Waikato, New Zealand, Tech. Rep, 4, 2006.

[39]

G. Wassermann and Z. Su. Sound and precise analysis of Web applications for injection vulnerabilities. In PLDI, 2007.

Digital Library

Cited By

Zlatkin IFedyukovich G(2025)Leveraging Program Structure for Test Case GenerationAutomated Technology for Verification and Analysis10.1007/978-3-031-78750-8_7(137-157)Online publication date: 12-Feb-2025
https://doi.org/10.1007/978-3-031-78750-8_7
Dinesh SParthasarathy MFletcher C(2024)ConjunCT: Learning Inductive Invariants to Prove Unbounded Instruction Safety Against Microarchitectural Timing Attacks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00180(3735-3753)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00180
Chen YZhong RYang YHu HWu DLee WCalandrino JTroncoso C(2023)µFUZZProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620312(1325-1342)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620312
Show More Cited By

Index Terms

Grammar-based whitebox fuzzing

Recommendations

Grammar-based whitebox fuzzing
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

Whitebox fuzzing is a form of automatic dynamic test generation, based on symbolic execution and constraint solving, designed for security testing of large applications. Unfortunately, the current effectiveness of whitebox fuzzing is limited when ...
Random testing for security: blackbox vs. whitebox fuzzing
RT '07: Proceedings of the 2nd international workshop on Random testing: co-located with the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007)

Fuzz testing is an effective technique for finding security vulnerabilities in software. Fuzz testing is a form of blackbox random testing which randomly mutates well-formed inputs and tests the program on the resulting data. In some cases, grammars are ...
Kex at the 2022 SBST tool competition
SBST '22: Proceedings of the 15th Workshop on Search-Based Software Testing

Kex is an automatic white-box test generation tool for Java programs, which is able to generate executable test suites (as JUnit test suites) aiming to satisfy the branch coverage criterion. It uses symbolic execution to analyze control flow graphs of ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 43, Issue 6

PLDI '08

June 2008

382 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1379022

Issue’s Table of Contents

PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2008
396 pages
ISBN:9781595938602
DOI:10.1145/1375581
General Chair:
Rajiv Gupta
University of California, Riverside, USA
,
Program Chair:
Saman Amarasinghe
Massachusetts Institute of Technology, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Published in SIGPLAN Volume 43, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

396
Total Citations
View Citations
3,548
Total Downloads

Downloads (Last 12 months)220
Downloads (Last 6 weeks)16

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zlatkin IFedyukovich G(2025)Leveraging Program Structure for Test Case GenerationAutomated Technology for Verification and Analysis10.1007/978-3-031-78750-8_7(137-157)Online publication date: 12-Feb-2025
https://doi.org/10.1007/978-3-031-78750-8_7
Dinesh SParthasarathy MFletcher C(2024)ConjunCT: Learning Inductive Invariants to Prove Unbounded Instruction Safety Against Microarchitectural Timing Attacks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00180(3735-3753)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00180
Chen YZhong RYang YHu HWu DLee WCalandrino JTroncoso C(2023)µFUZZProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620312(1325-1342)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620312
Dutra RGopinath RZeller A(2023) FormatFuzzer : Effective Fuzzing of Binary File Formats ACM Transactions on Software Engineering and Methodology10.1145/3628157Online publication date: 17-Oct-2023
https://doi.org/10.1145/3628157
Mallissery SWu Y(2023)Demystify the Fuzzing Methods: A Comprehensive SurveyACM Computing Surveys10.1145/362337556:3(1-38)Online publication date: 5-Oct-2023
https://dl.acm.org/doi/10.1145/3623375
Liu JShen YXu YSun HJiang Y(2023)Horus: Accelerating Kernel Fuzzing through Efficient Host-VM Memory Access ProceduresACM Transactions on Software Engineering and Methodology10.1145/361166533:1(1-25)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1145/3611665
Ackerman JCybenko GBöhme MNoller YRay BSzekeres L(2023)Large Language Models for Fuzzing Parsers (Registered Report)Proceedings of the 2nd International Fuzzing Workshop10.1145/3605157.3605173(31-38)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1145/3605157.3605173
Saha SSarker LShafiuzzaman MShou CLi ASankaran GBultan TJust RFraser G(2023)Rare Path Guided FuzzingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598136(1295-1306)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598136
Humayun AWu YKim MGulzar MBissyandé TKlein JBird CSarro F(2023)NaturalFuzz: Natural Input Generation for Big Data AnalyticsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00034(1592-1603)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00034
Eceiza MFlores JIturbe M(2023)Improving fuzzing assessment methods through the analysis of metrics and experimental conditionsComputers and Security10.1016/j.cose.2022.102946124:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.cose.2022.102946
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents