skip to main content
10.1145/2635868.2635920acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

An empirical analysis of flaky tests

Published:11 November 2014Publication History

ABSTRACT

Regression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to either always pass or always fail for the same code under test. Unfortunately, in practice, some tests often called flaky tests—have non-deterministic outcomes. Such tests undermine the regression testing as they make it difficult to rely on test results. We present the first extensive study of flaky tests. We study in detail a total of 201 commits that likely fix flaky tests in 51 open-source projects. We classify the most common root causes of flaky tests, identify approaches that could manifest flaky behavior, and describe common strategies that developers use to fix flaky tests. We believe that our insights and implications can help guide future research on the important topic of (avoiding) flaky tests.

References

  1. API design wiki - OrderOfElements. http://wiki.apidesign.org/wiki/OrderOfElements.Google ScholarGoogle Scholar
  2. Android FlakyTest annotation. http://goo.gl/e8PILv.Google ScholarGoogle Scholar
  3. Apache Software Foundation SVN Repository. http://svn.apache.org/repos/asf/.Google ScholarGoogle Scholar
  4. Apache Software Foundation. HBASE-2684. https://issues.apache.org/jira/browse/HBASE-2684.Google ScholarGoogle Scholar
  5. A. Bachmann, C. Bird, F. Rahman, P. T. Devanbu, and A. Bernstein. The missing links: bugs and bug-fix commits. In FSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. T. Barr, T. Vo, V. Le, and Z. Su. Automatic detection of floating-point exceptions. In POPL, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Bell and G. Kaiser. Unit test virtualization with VMVM. In ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Bettenburg, W. Shang, W. M. Ibrahim, B. Adams, Y. Zou, and A. E. Hassan. An empirical study on inconsistent changes to code clones at the release level. SCP, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Daniel, V. Jagannath, D. Dig, and D. Marinov. ReAssert: Suggesting repairs for broken unit tests. In ASE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. Edelstein, E. Farchi, E. Goldin, Y. Nir, G. Ratsaby, and S. Ur. Framework for Testing Multi-Threaded Java Programs. CCPE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. E. Farchi, Y. Nir, and S. Ur. Concurrent bug patterns and how to test them. In IPDPS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Flakiness dashboard HOWTO. http://goo.gl/JRZ1J8.Google ScholarGoogle Scholar
  13. M. Fowler. Eradicating non-determinism in tests. http://goo.gl/cDDGmm.Google ScholarGoogle Scholar
  14. P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In ICSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Gupta, M. Ivey, and J. Penix. Testing at the speed and scale of Google, 2011. http://goo.gl/2B5cyl.Google ScholarGoogle Scholar
  16. V. Jagannath, M. Gligoric, D. Jin, Q. Luo, G. Rosu, and D. Marinov. Improved multithreaded unit testing. In FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jenkins RandomFail annotation. http://goo.gl/tzyC0W.Google ScholarGoogle Scholar
  18. G. Jin, L. Song, W. Zhang, S. Lu, and B. Liblit. Automated atomicity-violation fixing. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. JUnit and Java7. http://goo.gl/g4crZL.Google ScholarGoogle Scholar
  20. F. Lacoste. Killing the gatekeeper: Introducing a continuous integration system. In Agile, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Lavers and L. Peters. Swing Extreme Testing. 2008.Google ScholarGoogle Scholar
  22. Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things changed now?: An empirical study of bug characteristics in modern open source software. In ASID, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Lin and D. Dig. CHECK-THEN-ACT misuse of Java concurrent collections. In ICST, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Marinescu, P. Hosek, and C. Cadar. Covrig: A framework for the analysis of code, test, and coverage evolution in real software. In ISSTA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. M. Memon and M. B. Cohen. Automated testing of GUI applications: models, tools, and controlling flakiness. In ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Micco. Continuous integration at Google scale, 2013. http://goo.gl/0qnzGj.Google ScholarGoogle Scholar
  28. B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of Unix utilities. CACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Mu¸slu, B. Soran, and J. Wuttke. Finding bugs by isolating unit tests. ESEC/FSE, 2011.Google ScholarGoogle Scholar
  30. E. R. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan. The design of bug fixes. In ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Spring Repeat Annotation. http://goo.gl/vnfU3Y.Google ScholarGoogle Scholar
  32. P. Sudarshan. No more flaky tests on the Go team. http://goo.gl/BiWaE1.Google ScholarGoogle Scholar
  33. 6 tips for writing robust, maintainable unit tests. http://blog.melski.net/tag/unit-tests.Google ScholarGoogle Scholar
  34. TotT: Avoiding flakey tests. http://goo.gl/vHE47r.Google ScholarGoogle Scholar
  35. R. Tzoref, S. Ur, and E. Yom-Tov. Instrumenting where it hurts: An automatic concurrent debugging technique. In ISSTA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Yang, S. Khurshid, and M. Kim. Specification-based test repair using a lightweight formal method. In FM, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  37. S. Zhang, D. Jalali, J. Wuttke, K. Muslu, M. Ernst, and D. Notkin. Empirically revisiting the test independence assumption. In ISSTA, 2014. Introduction Methodology Causes of Flakiness Categories of Flakiness Root Causes Async Wait Concurrency Test Order Dependency Other Root Causes Flaky Test Introduction Manifestation Platform (In)dependency Flakiness Manifestation Strategies Async Wait Concurrency Test Order Dependency Fixing Strategies Common Fixes and Effectiveness Async Wait Concurrency Test Order Dependency Others Changes to the Code under Test Threats to Validity Related Work Conclusions Acknowledgments References Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An empirical analysis of flaky tests

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering
      November 2014
      856 pages
      ISBN:9781450330565
      DOI:10.1145/2635868

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate17of128submissions,13%

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader