ABSTRACT
Automated test generation techniques can efficiently produce test data that systematically cover structural aspects of a program. In the absence of a specification, a common assumption is that these tests relieve a developer of most of the work, as the act of testing is reduced to checking the results of the tests. Although this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the fact that the approach has only seen a limited uptake in industry suggests the contrary, and calls into question its practical usefulness. To investigate this issue, we performed a controlled experiment comparing a total of 49 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.
- S. Afshan, P. McMinn, and M. Stevenson. Evolving readable string test inputs using a natural language model to reduce human oracle cost. In Int. Conference on Software Testing, Verification and Validation (ICST), 2013. (To appear).Google ScholarDigital Library
- A. Arcuri and L. Briand. A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability. (To appear).Google Scholar
- L. Baresi, P. L. Lanzi, and M. Miraz. Testful: an evolutionary test approach for Java. In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 185–194, 2010. Google ScholarDigital Library
- C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness tester for Java. Software: Practice and Experience, 34(11):1025–1050, 2004. Google ScholarDigital Library
- J. T. de Souza, C. L. Maia, F. G. de Freitas, and D. P. Coutinho. The human competitiveness of search based software engineering. In International Symposium on Search Based Software Engineering (SSBSE), pages 143–152, 2010. Google ScholarDigital Library
- G. Fraser and A. Arcuri. Evosuite: Automatic test suite generation for object-oriented software. In ACM Symposium on the Foundations of Software Engineering (FSE), pages 416–419, 2011. Google ScholarDigital Library
- G. Fraser and A. Arcuri. Sound empirical evidence in software testing. In ACM/IEEE International Conference on Software Engineering (ICSE), pages 178–188, 2012. Google ScholarDigital Library
- G. Fraser and A. Arcuri. Whole test suite generation. IEEE Transactions on Software Engineering, 39(2):276–291, 2013. Google ScholarDigital Library
- G. Fraser and A. Zeller. Exploiting common object usage in test case generation. In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 80–89. IEEE Computer Society, 2011. Google ScholarDigital Library
- G. Fraser and A. Zeller. Mutation-driven generation of unit tests and oracles. IEEE Transactions on Software Engineering, 28(2):278–292, 2012. Google ScholarDigital Library
- M. Harman, S. Mansouri, and Y. Zhang. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR), 45(1):11, 2012. Google ScholarDigital Library
- M. Harman and P. McMinn. A theoretical and empirical study of search based testing: Local, global and hybrid search. IEEE Transactions on Software Engineering, 36(2):226–247, 2010. Google ScholarDigital Library
- M. Islam and C. Csallner. Dsc+mock: A test case + mock class generator in support of coding against interfaces. In International Workshop on Dynamic Analysis (WODA), pages 26–31, 2010. Google ScholarDigital Library
- R. Just, F. Schweiggert, and G. Kapfhammer. MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In International Conference on Automated Software Engineering (ASE), pages 612–615, 2011. Google ScholarDigital Library
- B. Kitchenham, S. Pfleeger, L. Pickard, P. Jones, D. Hoaglin, K. El Emam, and J. Rosenberg. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering, 28(8):721–734, 2002. Google ScholarDigital Library
- J. Koza. Human-competitive results produced by genetic programming. Genetic Programming and Evolvable Machines, 11(3):251–284, 2010. Google ScholarDigital Library
- K. Lakhotia, P. McMinn, and M. Harman. An empirical investigation into branch coverage for C programs using CUTE and AUSTIN. Journal of Systems and Software, 83(12):2379–2391, 2010. Google ScholarDigital Library
- P. McMinn. Search-based software test data generation: A survey. Software Testing, Verification and Reliability, 14(2):105–156, 2004. Google ScholarDigital Library
- E. F. Miller, Jr. and R. A. Melton. Automated generation of testcase datasets. In International Conference on Reliable Software, pages 51–58. ACM, 1975. Google ScholarDigital Library
- A. Namin and J.H.Andrews. The influence of size and coverage on test suite effectiveness. In ACM International Symposium on Software Testing and Analysis (ISSTA). ACM, 2009. Google ScholarDigital Library
- C. Pacheco and M. D. Ernst. Randoop: feedback-directed random testing for Java. In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 815–816. ACM, 2007. Google ScholarDigital Library
- C. Parnin and A. Orso. Are automated debugging techniques actually helping programmers? In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 199–209, 2011. Google ScholarDigital Library
- C. Pasareanu and N. Rungta. Symbolic pathfinder: symbolic execution of java bytecode. In IEEE/ACM International Conference on Automated Software Engineering (ASE), volume 10, pages 179–180, 2010. Google ScholarDigital Library
- F. Pastore, L. Mariani, and G. Fraser. Crowdoracles: Can the crowd solve the oracle problem? In IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 2013. (To appear).Google ScholarDigital Library
- R. Ramler, D. Winkler, and M. Schmidt. Random test case generation and manual unit testing: Substitute or complement in retrofitting tests for legacy code? In EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), pages 286–293. IEEE, 2012. Google ScholarDigital Library
- G. Sautter, K. Böhm, F. Padberg, and W. Tichy. Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor. Research and Advanced Techn. for Digital Libraries, pages 357–367, 2007. Google ScholarDigital Library
- C. Seaman. Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering, 25(4):557–572, 1999. Google ScholarDigital Library
- D. Sjoberg, J. Hannay, O. Hansen, V. By Kampenes, A. Karahasanovic, N. Liborg, and A. C Rekdal. A survey of controlled experiments in software engineering. IEEE Transactions on Software Engineering, 31(9):733–753, 2005. Google ScholarDigital Library
- M. Staats, G. Gay, and M. Heimdahl. Automated oracle creation support, or: how I learned to stop worrying about fault propagation and love mutation testing. In ACM/IEEE International Conference on Software Engineering (ICSE), pages 870–880, 2012. Google ScholarDigital Library
- M. Staats, S. Hong, M. Kim, and G. Rothermel. Understanding user understanding: determining correctness of generated program invariants. In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 188–198. ACM, 2012. Google ScholarDigital Library
- N. Tillmann and N. J. de Halleux. Pex — white box test generation for .NET. In International Conference on Tests And Proofs (TAP), pages 134–253, 2008. Google ScholarDigital Library
- P. Tonella. Evolutionary testing of classes. In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 119–128, 2004. Google ScholarDigital Library
- J. Wegener, A. Baresel, and H. Sthamer. Evolutionary test environment for automatic structural testing. Information and Software Technology, 43(14):841–854, 2001.Google ScholarCross Ref
Index Terms
- Does automated white-box test generation really help software testers?
Recommendations
Continuous test generation: enhancing continuous integration with automated test generation
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software EngineeringIn object oriented software development, automated unit test generation tools typically target one class at a time. A class, however, is usually part of a software project consisting of more than one class, and these are subject to changes over time. ...
Automated unit test generation during software development: a controlled experiment and think-aloud observations
ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and AnalysisAutomated unit test generation tools can produce tests that are superior to manually written ones in terms of code coverage, but are these tests helpful to developers while they are writing code? A developer would first need to know when and how to ...
A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite
Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated ...
Comments