Boosting Fuzzer Efficiency: An Information Theoretic Perspective

Authors:
Marcel Böhme

MPI-SP, Germany; Monash University, Australia

MPI-SP, Germany; Monash University, Australia
View Profile

,
Valentin J. M. Manès

CSRC, KAIST, Korea

CSRC, KAIST, Korea
View Profile

,
Sang Kil Cha

CSRC, KAIST, Korea

CSRC, KAIST, Korea
View Profile

Authors Info & Claims

Communications of the ACM Volume 66 Issue 11November 2023pp 89–97https://doi.org/10.1145/3611019

Published:20 October 2023Publication History

Communications of the ACM

Abstract

In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P: What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals "some amount" of information about P's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop ENTROPIC, an entropy-based power schedule for greybox fuzzing that assigns more energy to seeds that maximize information. We implemented ENTROPIC into the popular greybox fuzzer LIBFUZZER. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. ENTROPIC has been independently evaluated and integrated into the main-line LIBFUZZER as the default power schedule. ENTROPIC now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.

References

Alshahwan, N., Harman, M. Coverage and fault detection of the output-uniqueness test selection criteria. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA) (2014), 181--192.Google ScholarDigital Library
Arcuri, A., Briand, L. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33^rd International Conference on Software Engineering (ICSE) (2011), 1--10.Google ScholarDigital Library
Böhme, M. STADS: Software testing as species discovery. ACM Trans. Software Eng. Method. 27, 2 (2018), 1--7.Google ScholarDigital Library
Böhme, M., Falk, B. Fuzzing: On the exponential cost of vulnerability discovery. In Proceedings of the 14^th Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020), 1--12.Google Scholar
Böhme, M., Liyanage, D., Wüstholz, V. Estimating residual risk in greybox fuzzing. In Proceedings of the 29^th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2021), ACM, NY, 230--241.Google ScholarDigital Library
Böhme, M., Manès, V., Cha, S.K. Boosting fuzzer efficiency: An information theoretic perspective. In Proceedings of the 14^th Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020), 970--981.Google ScholarDigital Library
Böhme, M. Paul, S. A probabilistic analysis of the efficiency of automated software testing. IEEE Trans. Software Eng. 42, 4 (Apr. 2016), 345--360.Google ScholarDigital Library
Bryson, M., Sukkarieh, S. Observability analysis and active control for airborne slam. IEEE Trans. Aerosp. Electron. Syst. 44, 1 (Jan. 2008), 261--280.Google ScholarCross Ref
Campos, J., Abreu, R., Fraser, G., d'Amorim, M. Entropy-based test generation for improved fault localization. In Proceedings of the 28^th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2013), 257--267.Google ScholarDigital Library
Carrillo, H., Reid, I., Castellanos, J.A. On the comparison of uncertainty criteria for active slam. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012), 2080--2087.Google Scholar
Chao, A., Wang, Y.T., Jost, L. Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods Ecol. Evol. 4, 11 (2013), 1091--1100.Google ScholarCross Ref
Feldt, R., Poulding, S., Clark, D., Yoo, S. Test set diameter: Quantifying the diversity of sets of test cases. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (2016), 223--233.Google ScholarCross Ref
Fioraldi, A., Maier, D., Eißfeldt, H., Heuse, M. A++: Combining incremental steps of fuzzing research. In Proceedings of the 14^th USENIX Workshop on Offensive Technologies (WOOT) (2020), 1--12.Google Scholar
Herrmann, B., Winter, S., Siegmund, J. Community expectations for research artifacts and evaluation processes. In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020), 1--12.Google ScholarDigital Library
Klees, G., Ruef, A., Cooper, B., Wei, S., Hicks, M. Evaluating fuzz testing. In Proceedings of the ACM Conference on Computer and Communications Security (CCS) (2018), ACM, NY, 2123--2138.Google ScholarDigital Library
LibFuzzer. Libfuzzer: A library for coverage-guided fuzz testing, 2019. http://llvm.org/docs/LibFuzzer.html. Accessed: February 20, 2019.Google Scholar
Manès, V.J.M., Han, H., Han, C., Cha, S.K., Egele, M., Schwartz, E.J., et al. The art, science, and engineering of fuzzing: A survey. IEEE Transa. Software Eng. 47 (2019), 2312--2331.Google ScholarCross Ref
Manès, V.J.M., Kim, S., Cha, S.K. Ankou: Guiding grey-box fuzzing towards combinatorial difference. In Proceedings of the International Conference on Software Engineering (2020), 1024--1036.Google Scholar
Metzman, J., Szekeres, L., Simon, L.M.R., Sprabery, R.T., Arya, A. Fuzzbench: An open fuzzer benchmarking platform and service. In Proceedings of the 29^th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2021), ACM, NY.Google ScholarDigital Library
Ruhstaller, M., Chang, O. A new chapter for oss-fuzz, 2019. https://security.googleblog.com/2018/11/a-new-chapter-for-oss-fuzz.html. Accessed: February 20, 2019.Google Scholar
Serebryany, K., Bruening, D., Potapenko, A., Vyukov, D. Addresssanitizer: A fast address sanity checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC) (2012), 28--28.Google Scholar
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948), 379--423.Google ScholarCross Ref
Yang, L. Entropy and software systems: Towards an information-theoretic foundation of software testing. PhD thesis (2011).Google Scholar
Yang, L., Dang, Z., Fischer, T.R. Information gain of black-box testing. Form. Aspec. Comput. 23, 4 (Jul. 2011), 513--539.Google Scholar
Yoo, S., Harman, M., Clark, D. Fault localization prioritization: Comparing information-theoretic and coverage-based approaches. ACM Trans. Software Eng. Method. 22, 3 (Jul. 2013), 19.Google ScholarDigital Library

Index Terms

Boosting Fuzzer Efficiency: An Information Theoretic Perspective
1. Security and privacy
  1. Software and application security
    1. Software security engineering
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Boosting fuzzer efficiency: an information theoretic perspective
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P: What does it do? Executing the first test input, we learn how P behaves for this input. ...
Read More
Efficiency analysis of information theoretic measures in image registration

Efficiency analysis of some information theoretic measures that can be used in image registration as objective functions is carried out. Shannon mutual information, Renyi and Tsallis entropy are examined using synthesized images with correlation ...
Read More
SymRustC: A Hybrid Fuzzer for Rust
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

We present SymRustC, a hybrid fuzzer for Rust. SymRustC is hybrid in the sense that it combines fuzzing and concolic execution. SymRustC leverages an existing tool called SymCC for its concolic execution capability and another existing tool ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 66, Issue 11
November 2023
94 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3629727
Editor:
James Larus
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 October 2023
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 5,009
  Total Downloads
- Downloads (Last 12 months)5,009
- Downloads (Last 6 weeks)107
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Boosting Fuzzer Efficiency: An Information Theoretic Perspective

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Boosting fuzzer efficiency: an information theoretic perspective

Efficiency analysis of information theoretic measures in image registration

SymRustC: A Hybrid Fuzzer for Rust