skip to main content
research-article
Open Access

Boosting Fuzzer Efficiency: An Information Theoretic Perspective

Published:20 October 2023Publication History
Skip Abstract Section

Abstract

In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P: What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals "some amount" of information about P's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop ENTROPIC, an entropy-based power schedule for greybox fuzzing that assigns more energy to seeds that maximize information. We implemented ENTROPIC into the popular greybox fuzzer LIBFUZZER. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. ENTROPIC has been independently evaluated and integrated into the main-line LIBFUZZER as the default power schedule. ENTROPIC now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.

References

  1. Alshahwan, N., Harman, M. Coverage and fault detection of the output-uniqueness test selection criteria. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA) (2014), 181--192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arcuri, A., Briand, L. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd International Conference on Software Engineering (ICSE) (2011), 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Böhme, M. STADS: Software testing as species discovery. ACM Trans. Software Eng. Method. 27, 2 (2018), 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Böhme, M., Falk, B. Fuzzing: On the exponential cost of vulnerability discovery. In Proceedings of the 14th Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020), 1--12.Google ScholarGoogle Scholar
  5. Böhme, M., Liyanage, D., Wüstholz, V. Estimating residual risk in greybox fuzzing. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2021), ACM, NY, 230--241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Böhme, M., Manès, V., Cha, S.K. Boosting fuzzer efficiency: An information theoretic perspective. In Proceedings of the 14th Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020), 970--981.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Böhme, M. Paul, S. A probabilistic analysis of the efficiency of automated software testing. IEEE Trans. Software Eng. 42, 4 (Apr. 2016), 345--360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bryson, M., Sukkarieh, S. Observability analysis and active control for airborne slam. IEEE Trans. Aerosp. Electron. Syst. 44, 1 (Jan. 2008), 261--280.Google ScholarGoogle ScholarCross RefCross Ref
  9. Campos, J., Abreu, R., Fraser, G., d'Amorim, M. Entropy-based test generation for improved fault localization. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2013), 257--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Carrillo, H., Reid, I., Castellanos, J.A. On the comparison of uncertainty criteria for active slam. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012), 2080--2087.Google ScholarGoogle Scholar
  11. Chao, A., Wang, Y.T., Jost, L. Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods Ecol. Evol. 4, 11 (2013), 1091--1100.Google ScholarGoogle ScholarCross RefCross Ref
  12. Feldt, R., Poulding, S., Clark, D., Yoo, S. Test set diameter: Quantifying the diversity of sets of test cases. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (2016), 223--233.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fioraldi, A., Maier, D., Eißfeldt, H., Heuse, M. A++: Combining incremental steps of fuzzing research. In Proceedings of the 14th USENIX Workshop on Offensive Technologies (WOOT) (2020), 1--12.Google ScholarGoogle Scholar
  14. Herrmann, B., Winter, S., Siegmund, J. Community expectations for research artifacts and evaluation processes. In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Klees, G., Ruef, A., Cooper, B., Wei, S., Hicks, M. Evaluating fuzz testing. In Proceedings of the ACM Conference on Computer and Communications Security (CCS) (2018), ACM, NY, 2123--2138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. LibFuzzer. Libfuzzer: A library for coverage-guided fuzz testing, 2019. http://llvm.org/docs/LibFuzzer.html. Accessed: February 20, 2019.Google ScholarGoogle Scholar
  17. Manès, V.J.M., Han, H., Han, C., Cha, S.K., Egele, M., Schwartz, E.J., et al. The art, science, and engineering of fuzzing: A survey. IEEE Transa. Software Eng. 47 (2019), 2312--2331.Google ScholarGoogle ScholarCross RefCross Ref
  18. Manès, V.J.M., Kim, S., Cha, S.K. Ankou: Guiding grey-box fuzzing towards combinatorial difference. In Proceedings of the International Conference on Software Engineering (2020), 1024--1036.Google ScholarGoogle Scholar
  19. Metzman, J., Szekeres, L., Simon, L.M.R., Sprabery, R.T., Arya, A. Fuzzbench: An open fuzzer benchmarking platform and service. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2021), ACM, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ruhstaller, M., Chang, O. A new chapter for oss-fuzz, 2019. https://security.googleblog.com/2018/11/a-new-chapter-for-oss-fuzz.html. Accessed: February 20, 2019.Google ScholarGoogle Scholar
  21. Serebryany, K., Bruening, D., Potapenko, A., Vyukov, D. Addresssanitizer: A fast address sanity checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC) (2012), 28--28.Google ScholarGoogle Scholar
  22. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948), 379--423.Google ScholarGoogle ScholarCross RefCross Ref
  23. Yang, L. Entropy and software systems: Towards an information-theoretic foundation of software testing. PhD thesis (2011).Google ScholarGoogle Scholar
  24. Yang, L., Dang, Z., Fischer, T.R. Information gain of black-box testing. Form. Aspec. Comput. 23, 4 (Jul. 2011), 513--539.Google ScholarGoogle Scholar
  25. Yoo, S., Harman, M., Clark, D. Fault localization prioritization: Comparing information-theoretic and coverage-based approaches. ACM Trans. Software Eng. Method. 22, 3 (Jul. 2013), 19.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Boosting Fuzzer Efficiency: An Information Theoretic Perspective

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 66, Issue 11
        November 2023
        94 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/3629727
        • Editor:
        • James Larus
        Issue’s Table of Contents

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 October 2023

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)5,009
        • Downloads (Last 6 weeks)107

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format