Article

An experimental evaluation of continuous testing during development

Authors:
David Saff

MIT Computer Science & Artificial Intelligence Lab, Cambridge, MA

MIT Computer Science & Artificial Intelligence Lab, Cambridge, MA
View Profile

,
Michael D. Ernst

MIT Computer Science & Artificial Intelligence Lab, Cambridge, MA

MIT Computer Science & Artificial Intelligence Lab, Cambridge, MA
View Profile

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysisJuly 2004Pages 76–85https://doi.org/10.1145/1007512.1007523

Published:01 July 2004Publication History

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis

Pages 76–85

ABSTRACT

Continuous testing uses excess cycles on a developer's workstation to continuously run regression tests in the background, providing rapid feedback about test failures as source code is edited. It is intended to reduce the time and energy required to keep code well-tested and prevent regression errors from persisting uncaught for long periods of time. This paper reports on a controlled human experiment to evaluate whether students using continuous testing are more successful in completing programming assignments. We also summarize users' subjective impressions and discuss why the results may generalize.The experiment indicates that the tool has a statistically significant effect on success in completing a programming task, but no such effect on time worked. Participants using continuous testing were three times more likely to complete the task before the deadline than those without. Participants using continuous compilation were twice as likely to complete the task, providing empirical support to a common feature in modern development environments. Most participants found continuous testing to be useful and believed that it helped them write better code faster, and 90% would recommend the tool to others. The participants did not find the tool distracting, and intuitively developed ways of incorporating the feedback into their workflow.

References

V. R. Basili. The role of experimentation in software engineering: past, current, and future. In Proceedings of the 18th International Conference on Software Engineering, pages 442--449, Berlin, Germany, Mar. 25-29, 1996. Google ScholarDigital Library
K. Beck. Extreme Programming Explained: Embrace Change. Addison-Wesley, 1999. Google ScholarDigital Library
K. Beck. Test-Driven Development: By Example. Addison-Wesley, Boston, 2002. Google ScholarDigital Library
M. Burnett, C. Cook, O. Pendse, G. Rothermel, J. Summet, and C. Wallace. End-user software engineering with assertions in the spreadsheet paradigm. In ICSE'03, Proceedings of the 25th International Conference on Software Engineering, pages 93--103, Portland, Oregon, May 6-8, 2003. Google ScholarDigital Library
B. Childers, J. W. Davidson, and M. L. Soffa. Continuous compilation: A new approach to aggressive and adaptive code transformation. In International Parallel and Distributed Processing Symposium (IPDPS'03), pages 205--214, Nice, France, Apr. 22-26, 2003. Google ScholarDigital Library
A. Cypher, D. C. Halbert, D. Kurlander, H. Lieberman, D. Maulsby, B. A. Myers, and A. Turransky, editors. Watch What I Do: Programming by Demonstration. MIT Press, Cambridge, MA, 1993. Google ScholarDigital Library
A. Dunsmore, M. Roper, and M. Wood. Further investigations into the development and evaluation of reading techniques for object-oriented code inspection. In ICSE'02, Proceedings of the 24th International Conference on Software Engineering, pages 47--57, Orlando, Florida, May 22-24, 2002. Google ScholarDigital Library
Eclipse. http://www.eclipse.org.Google Scholar
Emacs. http://www.emacs.org.Google Scholar
M. J. Harrold, R. Gupta, and M. L. Soffa. A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Methodology, 2(3):270--285, July 1993. Google ScholarDigital Library
P. Henderson and M. Weiser. Continuous execution: The VisiProg environment. In Proceedings of the 8rd International Conference on Software Engineering, pages 68--74, London, Aug. 28-30, 1985. Google ScholarDigital Library
JetBrains IntelliJ IDEA. http://www.intellij.com/idea/.Google Scholar
JUnit. http://www.junit.org.Google Scholar
M. Karasick. The architecture of Montana: an open and extensible programming environment with an incremental C++ compiler. In FSE '98, Proceedings of the 6th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 131--142, Lake Buena Vista, FL, USA, Nov. 3-5, 1998. Google ScholarDigital Library
T. Lau, P. Domingos, and D. S. Weld. Version space algebra and its application to programming by demonstration. In International Conference on Machine Learning, pages 527--534, Stanford, CA, June 2000. Google ScholarDigital Library
H. K. N. Leung and L. White. Insights into regression testing. In Proceedings of the Conference on Software Maintenance, pages 60--69, Miami, FL, Oct. 16-19, 1989.Google Scholar
R. C. Miller. Lightweight Structure in Text. PhD thesis, Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2002. Also available as CMU Computer Science technical report CMU-CS-02-134 and CMU Human-Computer Interaction Institute technical report CMU-HCII-02-103. Google ScholarDigital Library
D. L. Moody, G. Sindre, T. Brasethvik, and A. Sølvberg. Evaluating the quality of information models: empirical testing of a conceptual model quality framework. In ICSE'03, Proceedings of the 25th International Conference on Software Engineering, pages 295--305, Portland, Oregon, May 6-8, 2003. Google ScholarDigital Library
J. W. Nimmer and M. D. Ernst. Invariant inference for static checking: An empirical evaluation. In Proceedings of the ACM SIGSOFT 10th International Symposium on the Foundations of Software Engineering (FSE 2002), pages 11--20, Charleston, SC, Nov. 20-22, 2002. Google ScholarDigital Library
R. P. Nix. Editing by example. ACM Trans. Prog. Lang. Syst., 7(4):600--621, Oct. 1985. Google ScholarDigital Library
A. Orso, T. Apiwattanapong, and M. J. Harrold. Leveraging field data for impact analysis and regression testing. In Proceedings of the 10th European Software Engineering Conference and the 11th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Helsinki, Finland, Sept. 3-5, 2003. Google ScholarDigital Library
A. Orso, D. Liang, M. J. Harrold, and R. Lipton. Gamma system: Continuous evolution of software after deployment. In ISSTA 2002, Proceedings of the 2002 International Symposium on Software Testing and Analysis, pages 65--69, Rome, Italy, July 22-24, 2002. Google ScholarDigital Library
C. Pavlopoulou and M. Young. Residual test coverage monitoring. In ICSE '99, Proceedings of the 21st International Conference on Software Engineering, pages 277--284, Los Angeles, CA, USA, May 19-21, 1999. Google ScholarDigital Library
M. P. Plezbert and R. K. Cytron. Does "just in time" = "better late than never"? In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 120--131, Paris, France, Jan. 15-17, 1997. Google ScholarDigital Library
G. Rothermel and M. J. Harrold. Analyzing regression test selection techniques. IEEE Transactions on Software Engineering, 22(8):529--551, Aug. 1996. Google ScholarDigital Library
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Transactions on Software Engineering, 27(10):929--948, Oct. 2001. Google ScholarDigital Library
D. Saff. Automated continuous testing to speed software development. Master's thesis, MIT Department of Electrical Engineering and Computer Science, Cambridge, MA, Feb. 3, 2004.Google Scholar
D. Saff and M. D. Ernst. Reducing wasted development time via continuous testing. In Fourteenth International Symposium on Software Reliability Engineering, pages 281--292, Denver, CO, Nov. 17-20, 2003. Google ScholarDigital Library
D. Saff and M. D. Ernst. Automatic mock object creation for test factoring. In ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE'04), Washington, DC, USA, June 7-8, 2004. Google ScholarDigital Library
D. Saff and M. D. Ernst. Continuous testing in Eclipse. In 2nd Eclipse Technology Exchange Workshop (eTX), Barcelona, Spain, Mar. 30, 2004.Google ScholarCross Ref
M. D. Schwartz, N. M. Delisle, and V. S. Begwani. Incremental compilation in Magpie. In Proceedings of the ACM SIGPLAN '84 Symposium on Compiler Construction, pages 122--131, Montreal, Canada, June 17-22, 1984. Google ScholarDigital Library
S. Siegel. Object-Oriented Software Testing: A Hierarchical Approach. John Wiley & Sons, 1996. Google ScholarDigital Library
M. L. Soffa. Continuous testing. Personal communication, Feb. 2003.Google Scholar
A. Srivastava and J. Thiagarajan. Effectively prioritizing tests in development environment. In ISSTA 2002, Proceedings of the 2002 International Symposium on Software Testing and Analysis, pages 97--106, Rome, Italy, July 22-24, 2002. Google ScholarDigital Library
Tacoma community college elementary algebra syllabus. http://www.tacoma.ctc.edu/home/jkellerm/MATH090/default.htm.Google Scholar
W. E. Wong, J. R. Horgan, S. London, and H. Agrawal. A study of effective regression testing in practice. In Eighth International Symposium on Software Reliability Engineering, pages 264--274, Albuquerque, NM, Nov. 2-5, 1997. Google ScholarDigital Library
A. Zeller. Yesterday, my program worked. Today, it does not. Why? In Proceedings of the 7th European Software Engineering Conference and the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 253--267, Toulouse, France, Sept. 6-9, 1999. Google ScholarDigital Library
M. K. Zimmerman, K. Lundqvist, and N. Leveson. Investigating the readability of state-based formal requirements specification languages. In ICSE'02, Proceedings of the 24th International Conference on Software Engineering, pages 33--43, Orlando, Florida, May 22-24, 2002. Google ScholarDigital Library

Index Terms

An experimental evaluation of continuous testing during development
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Development frameworks and environments
      1. Integrated and visual development environments

Recommendations

Continuous test generation: enhancing continuous integration with automated test generation
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering

In object oriented software development, automated unit test generation tools typically target one class at a time. A class, however, is usually part of a software project consisting of more than one class, and these are subject to changes over time. ...
Read More
An experimental evaluation of continuous testing during development

Continuous testing uses excess cycles on a developer's workstation to continuously run regression tests in the background, providing rapid feedback about test failures as source code is edited. It is intended to reduce the time and energy required to ...
Read More
An Experimental Evaluation of the Effectiveness and Efficiency of the Test Driven Development
ESEM '07: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement

Test Driven Development (TDD) is an approach for developing programs incrementally by first writing tests and then writing enough code to satisfy them. Though there have been some experiments for evaluating TDD on smaller scope, its impact on a larger ...
Read More

Reviews

Reviewer: Andrew Brooks

In the experiment discussed in this paper, students using a continuous testing tool, which provided rapid feedback on failing tests as code was edited, were found to be three times more likely to correctly complete a programming task than students lacking tool support. Similarly, students using a continuous compilation tool were found to be twice as likely to correctly complete the task. The experimental design, however, was not completely balanced: for each of two programming tasks, half of the 22 students were randomly assigned to continuous testing, and not all students were exposed to more than one condition. Few statistically significant effects were revealed, other than the main results in Figure 6 regarding correct completion. Rich qualitative feedback about the tools, both positive and negative, was obtained through debriefing questionnaires, staff interviews, and unsolicited emails. As noted by the authors, there were perhaps too few participants to statistically detect many effects. Are the main results sound__?__ At face value, the answer seems to be yes, but a binary outcome regarding correct completion has the potential to mislead when sample sizes are small. A sensitivity analysis, showing what happens to the main results when success is defined as being completely or nearly completely correct, with perhaps one or two failing tests, would have been a useful addition to guide future experimental work. While this paper is far from the last word on continuous testing, the Saff and Ernst experiment represents a key milestone: continuous testing was made to work. As such, this paper is strongly recommended to the software engineering community. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
July 2004
294 pages
ISBN:1581138202
DOI:10.1145/1007512
General Chair:
George Avrunin
University of Massachusetts, USA
,
Program Chair:
Gregg Rothermel
University of Nebraska -- Lincoln, USA
ACM SIGSOFT Software Engineering Notes Volume 29, Issue 4
July 2004
284 pages
ISSN:0163-5948
DOI:10.1145/1013886
Issue’s Table of Contents
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
continuous compilation
continuous testing
test-first development
unit testing
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 86
  Total Citations
  View Citations
- 1,878
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An experimental evaluation of continuous testing during development

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Continuous test generation: enhancing continuous integration with automated test generation

An experimental evaluation of continuous testing during development

An Experimental Evaluation of the Effectiveness and Efficiency of the Test Driven Development

Reviews

Access critical reviews of Computing literature here