Does automated white-box test generation really help software testers?

Authors:
Gordon Fraser

University of Sheffield, UK

University of Sheffield, UK
View Profile

,
Matt Staats

KAIST, South Korea

KAIST, South Korea
View Profile

,
Phil McMinn

University of Sheffield, UK

University of Sheffield, UK
View Profile

,
Andrea Arcuri

Simula Research Laboratory, Norway

Simula Research Laboratory, Norway
View Profile

,
Frank Padberg

KIT, Germany

KIT, Germany
View Profile

ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and AnalysisJuly 2013Pages 291–301https://doi.org/10.1145/2483760.2483774

Published:15 July 2013Publication History

ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Pages 291–301

ABSTRACT

Automated test generation techniques can efficiently produce test data that systematically cover structural aspects of a program. In the absence of a specification, a common assumption is that these tests relieve a developer of most of the work, as the act of testing is reduced to checking the results of the tests. Although this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the fact that the approach has only seen a limited uptake in industry suggests the contrary, and calls into question its practical usefulness. To investigate this issue, we performed a controlled experiment comparing a total of 49 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.

References

S. Afshan, P. McMinn, and M. Stevenson. Evolving readable string test inputs using a natural language model to reduce human oracle cost. In Int. Conference on Software Testing, Verification and Validation (ICST), 2013. (To appear).Google ScholarDigital Library
A. Arcuri and L. Briand. A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability. (To appear).Google Scholar
L. Baresi, P. L. Lanzi, and M. Miraz. Testful: an evolutionary test approach for Java. In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 185–194, 2010. Google ScholarDigital Library
C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness tester for Java. Software: Practice and Experience, 34(11):1025–1050, 2004. Google ScholarDigital Library
J. T. de Souza, C. L. Maia, F. G. de Freitas, and D. P. Coutinho. The human competitiveness of search based software engineering. In International Symposium on Search Based Software Engineering (SSBSE), pages 143–152, 2010. Google ScholarDigital Library
G. Fraser and A. Arcuri. Evosuite: Automatic test suite generation for object-oriented software. In ACM Symposium on the Foundations of Software Engineering (FSE), pages 416–419, 2011. Google ScholarDigital Library
G. Fraser and A. Arcuri. Sound empirical evidence in software testing. In ACM/IEEE International Conference on Software Engineering (ICSE), pages 178–188, 2012. Google ScholarDigital Library
G. Fraser and A. Arcuri. Whole test suite generation. IEEE Transactions on Software Engineering, 39(2):276–291, 2013. Google ScholarDigital Library
G. Fraser and A. Zeller. Exploiting common object usage in test case generation. In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 80–89. IEEE Computer Society, 2011. Google ScholarDigital Library
G. Fraser and A. Zeller. Mutation-driven generation of unit tests and oracles. IEEE Transactions on Software Engineering, 28(2):278–292, 2012. Google ScholarDigital Library
M. Harman, S. Mansouri, and Y. Zhang. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR), 45(1):11, 2012. Google ScholarDigital Library
M. Harman and P. McMinn. A theoretical and empirical study of search based testing: Local, global and hybrid search. IEEE Transactions on Software Engineering, 36(2):226–247, 2010. Google ScholarDigital Library
M. Islam and C. Csallner. Dsc+mock: A test case + mock class generator in support of coding against interfaces. In International Workshop on Dynamic Analysis (WODA), pages 26–31, 2010. Google ScholarDigital Library
R. Just, F. Schweiggert, and G. Kapfhammer. MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In International Conference on Automated Software Engineering (ASE), pages 612–615, 2011. Google ScholarDigital Library
B. Kitchenham, S. Pfleeger, L. Pickard, P. Jones, D. Hoaglin, K. El Emam, and J. Rosenberg. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering, 28(8):721–734, 2002. Google ScholarDigital Library
J. Koza. Human-competitive results produced by genetic programming. Genetic Programming and Evolvable Machines, 11(3):251–284, 2010. Google ScholarDigital Library
K. Lakhotia, P. McMinn, and M. Harman. An empirical investigation into branch coverage for C programs using CUTE and AUSTIN. Journal of Systems and Software, 83(12):2379–2391, 2010. Google ScholarDigital Library
P. McMinn. Search-based software test data generation: A survey. Software Testing, Verification and Reliability, 14(2):105–156, 2004. Google ScholarDigital Library
E. F. Miller, Jr. and R. A. Melton. Automated generation of testcase datasets. In International Conference on Reliable Software, pages 51–58. ACM, 1975. Google ScholarDigital Library
A. Namin and J.H.Andrews. The influence of size and coverage on test suite effectiveness. In ACM International Symposium on Software Testing and Analysis (ISSTA). ACM, 2009. Google ScholarDigital Library
C. Pacheco and M. D. Ernst. Randoop: feedback-directed random testing for Java. In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 815–816. ACM, 2007. Google ScholarDigital Library
C. Parnin and A. Orso. Are automated debugging techniques actually helping programmers? In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 199–209, 2011. Google ScholarDigital Library
C. Pasareanu and N. Rungta. Symbolic pathfinder: symbolic execution of java bytecode. In IEEE/ACM International Conference on Automated Software Engineering (ASE), volume 10, pages 179–180, 2010. Google ScholarDigital Library
F. Pastore, L. Mariani, and G. Fraser. Crowdoracles: Can the crowd solve the oracle problem? In IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 2013. (To appear).Google ScholarDigital Library
R. Ramler, D. Winkler, and M. Schmidt. Random test case generation and manual unit testing: Substitute or complement in retrofitting tests for legacy code? In EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), pages 286–293. IEEE, 2012. Google ScholarDigital Library
G. Sautter, K. Böhm, F. Padberg, and W. Tichy. Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor. Research and Advanced Techn. for Digital Libraries, pages 357–367, 2007. Google ScholarDigital Library
C. Seaman. Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering, 25(4):557–572, 1999. Google ScholarDigital Library
D. Sjoberg, J. Hannay, O. Hansen, V. By Kampenes, A. Karahasanovic, N. Liborg, and A. C Rekdal. A survey of controlled experiments in software engineering. IEEE Transactions on Software Engineering, 31(9):733–753, 2005. Google ScholarDigital Library
M. Staats, G. Gay, and M. Heimdahl. Automated oracle creation support, or: how I learned to stop worrying about fault propagation and love mutation testing. In ACM/IEEE International Conference on Software Engineering (ICSE), pages 870–880, 2012. Google ScholarDigital Library
M. Staats, S. Hong, M. Kim, and G. Rothermel. Understanding user understanding: determining correctness of generated program invariants. In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 188–198. ACM, 2012. Google ScholarDigital Library
N. Tillmann and N. J. de Halleux. Pex — white box test generation for .NET. In International Conference on Tests And Proofs (TAP), pages 134–253, 2008. Google ScholarDigital Library
P. Tonella. Evolutionary testing of classes. In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 119–128, 2004. Google ScholarDigital Library
J. Wegener, A. Baresel, and H. Sthamer. Evolutionary test environment for automatic structural testing. Information and Software Technology, 43(14):841–854, 2001.Google ScholarCross Ref

Index Terms

Does automated white-box test generation really help software testers?
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Continuous test generation: enhancing continuous integration with automated test generation
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering

In object oriented software development, automated unit test generation tools typically target one class at a time. A class, however, is usually part of a software project consisting of more than one class, and these are subject to changes over time. ...
Read More
Automated unit test generation during software development: a controlled experiment and think-aloud observations
ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis

Automated unit test generation tools can produce tests that are superior to manually written ones in terms of code coverage, but are these tests helpful to developers while they are writing code? A developer would first need to know when and how to ...
Read More
A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite

Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis
July 2013
381 pages
ISBN:9781450321594
DOI:10.1145/2483760
General Chair:
Mauro Pezzè,
Program Chair:
Mark Harman
Copyright © 2013 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 July 2013
Check for updates
Author Tags
Unit testing
automated test generation
branch coverage
empirical software engineering
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 1,947
  Total Downloads
- Downloads (Last 12 months)162
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Does automated white-box test generation really help software testers?

ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Continuous test generation: enhancing continuous integration with automated test generation

Automated unit test generation during software development: a controlled experiment and think-aloud observations

A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite