skip to main content
10.1145/1007512.1007523acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
Article

An experimental evaluation of continuous testing during development

Authors Info & Claims
Published:01 July 2004Publication History

ABSTRACT

Continuous testing uses excess cycles on a developer's workstation to continuously run regression tests in the background, providing rapid feedback about test failures as source code is edited. It is intended to reduce the time and energy required to keep code well-tested and prevent regression errors from persisting uncaught for long periods of time. This paper reports on a controlled human experiment to evaluate whether students using continuous testing are more successful in completing programming assignments. We also summarize users' subjective impressions and discuss why the results may generalize.The experiment indicates that the tool has a statistically significant effect on success in completing a programming task, but no such effect on time worked. Participants using continuous testing were three times more likely to complete the task before the deadline than those without. Participants using continuous compilation were twice as likely to complete the task, providing empirical support to a common feature in modern development environments. Most participants found continuous testing to be useful and believed that it helped them write better code faster, and 90% would recommend the tool to others. The participants did not find the tool distracting, and intuitively developed ways of incorporating the feedback into their workflow.

References

  1. V. R. Basili. The role of experimentation in software engineering: past, current, and future. In Proceedings of the 18th International Conference on Software Engineering, pages 442--449, Berlin, Germany, Mar. 25-29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Beck. Extreme Programming Explained: Embrace Change. Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Beck. Test-Driven Development: By Example. Addison-Wesley, Boston, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Burnett, C. Cook, O. Pendse, G. Rothermel, J. Summet, and C. Wallace. End-user software engineering with assertions in the spreadsheet paradigm. In ICSE'03, Proceedings of the 25th International Conference on Software Engineering, pages 93--103, Portland, Oregon, May 6-8, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Childers, J. W. Davidson, and M. L. Soffa. Continuous compilation: A new approach to aggressive and adaptive code transformation. In International Parallel and Distributed Processing Symposium (IPDPS'03), pages 205--214, Nice, France, Apr. 22-26, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Cypher, D. C. Halbert, D. Kurlander, H. Lieberman, D. Maulsby, B. A. Myers, and A. Turransky, editors. Watch What I Do: Programming by Demonstration. MIT Press, Cambridge, MA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Dunsmore, M. Roper, and M. Wood. Further investigations into the development and evaluation of reading techniques for object-oriented code inspection. In ICSE'02, Proceedings of the 24th International Conference on Software Engineering, pages 47--57, Orlando, Florida, May 22-24, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eclipse. http://www.eclipse.org.Google ScholarGoogle Scholar
  9. Emacs. http://www.emacs.org.Google ScholarGoogle Scholar
  10. M. J. Harrold, R. Gupta, and M. L. Soffa. A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Methodology, 2(3):270--285, July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Henderson and M. Weiser. Continuous execution: The VisiProg environment. In Proceedings of the 8rd International Conference on Software Engineering, pages 68--74, London, Aug. 28-30, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. JetBrains IntelliJ IDEA. http://www.intellij.com/idea/.Google ScholarGoogle Scholar
  13. JUnit. http://www.junit.org.Google ScholarGoogle Scholar
  14. M. Karasick. The architecture of Montana: an open and extensible programming environment with an incremental C++ compiler. In FSE '98, Proceedings of the 6th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 131--142, Lake Buena Vista, FL, USA, Nov. 3-5, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Lau, P. Domingos, and D. S. Weld. Version space algebra and its application to programming by demonstration. In International Conference on Machine Learning, pages 527--534, Stanford, CA, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. K. N. Leung and L. White. Insights into regression testing. In Proceedings of the Conference on Software Maintenance, pages 60--69, Miami, FL, Oct. 16-19, 1989.Google ScholarGoogle Scholar
  17. R. C. Miller. Lightweight Structure in Text. PhD thesis, Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2002. Also available as CMU Computer Science technical report CMU-CS-02-134 and CMU Human-Computer Interaction Institute technical report CMU-HCII-02-103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. L. Moody, G. Sindre, T. Brasethvik, and A. Sølvberg. Evaluating the quality of information models: empirical testing of a conceptual model quality framework. In ICSE'03, Proceedings of the 25th International Conference on Software Engineering, pages 295--305, Portland, Oregon, May 6-8, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. W. Nimmer and M. D. Ernst. Invariant inference for static checking: An empirical evaluation. In Proceedings of the ACM SIGSOFT 10th International Symposium on the Foundations of Software Engineering (FSE 2002), pages 11--20, Charleston, SC, Nov. 20-22, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. P. Nix. Editing by example. ACM Trans. Prog. Lang. Syst., 7(4):600--621, Oct. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Orso, T. Apiwattanapong, and M. J. Harrold. Leveraging field data for impact analysis and regression testing. In Proceedings of the 10th European Software Engineering Conference and the 11th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Helsinki, Finland, Sept. 3-5, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Orso, D. Liang, M. J. Harrold, and R. Lipton. Gamma system: Continuous evolution of software after deployment. In ISSTA 2002, Proceedings of the 2002 International Symposium on Software Testing and Analysis, pages 65--69, Rome, Italy, July 22-24, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Pavlopoulou and M. Young. Residual test coverage monitoring. In ICSE '99, Proceedings of the 21st International Conference on Software Engineering, pages 277--284, Los Angeles, CA, USA, May 19-21, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. P. Plezbert and R. K. Cytron. Does "just in time" = "better late than never"? In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 120--131, Paris, France, Jan. 15-17, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Rothermel and M. J. Harrold. Analyzing regression test selection techniques. IEEE Transactions on Software Engineering, 22(8):529--551, Aug. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Transactions on Software Engineering, 27(10):929--948, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Saff. Automated continuous testing to speed software development. Master's thesis, MIT Department of Electrical Engineering and Computer Science, Cambridge, MA, Feb. 3, 2004.Google ScholarGoogle Scholar
  28. D. Saff and M. D. Ernst. Reducing wasted development time via continuous testing. In Fourteenth International Symposium on Software Reliability Engineering, pages 281--292, Denver, CO, Nov. 17-20, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Saff and M. D. Ernst. Automatic mock object creation for test factoring. In ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE'04), Washington, DC, USA, June 7-8, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Saff and M. D. Ernst. Continuous testing in Eclipse. In 2nd Eclipse Technology Exchange Workshop (eTX), Barcelona, Spain, Mar. 30, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. D. Schwartz, N. M. Delisle, and V. S. Begwani. Incremental compilation in Magpie. In Proceedings of the ACM SIGPLAN '84 Symposium on Compiler Construction, pages 122--131, Montreal, Canada, June 17-22, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Siegel. Object-Oriented Software Testing: A Hierarchical Approach. John Wiley & Sons, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. L. Soffa. Continuous testing. Personal communication, Feb. 2003.Google ScholarGoogle Scholar
  34. A. Srivastava and J. Thiagarajan. Effectively prioritizing tests in development environment. In ISSTA 2002, Proceedings of the 2002 International Symposium on Software Testing and Analysis, pages 97--106, Rome, Italy, July 22-24, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tacoma community college elementary algebra syllabus. http://www.tacoma.ctc.edu/home/jkellerm/MATH090/default.htm.Google ScholarGoogle Scholar
  36. W. E. Wong, J. R. Horgan, S. London, and H. Agrawal. A study of effective regression testing in practice. In Eighth International Symposium on Software Reliability Engineering, pages 264--274, Albuquerque, NM, Nov. 2-5, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Zeller. Yesterday, my program worked. Today, it does not. Why? In Proceedings of the 7th European Software Engineering Conference and the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 253--267, Toulouse, France, Sept. 6-9, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. K. Zimmerman, K. Lundqvist, and N. Leveson. Investigating the readability of state-based formal requirements specification languages. In ICSE'02, Proceedings of the 24th International Conference on Software Engineering, pages 33--43, Orlando, Florida, May 22-24, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An experimental evaluation of continuous testing during development

      Recommendations

      Reviews

      Andrew Brooks

      In the experiment discussed in this paper, students using a continuous testing tool, which provided rapid feedback on failing tests as code was edited, were found to be three times more likely to correctly complete a programming task than students lacking tool support. Similarly, students using a continuous compilation tool were found to be twice as likely to correctly complete the task. The experimental design, however, was not completely balanced: for each of two programming tasks, half of the 22 students were randomly assigned to continuous testing, and not all students were exposed to more than one condition. Few statistically significant effects were revealed, other than the main results in Figure 6 regarding correct completion. Rich qualitative feedback about the tools, both positive and negative, was obtained through debriefing questionnaires, staff interviews, and unsolicited emails. As noted by the authors, there were perhaps too few participants to statistically detect many effects. Are the main results sound__?__ At face value, the answer seems to be yes, but a binary outcome regarding correct completion has the potential to mislead when sample sizes are small. A sensitivity analysis, showing what happens to the main results when success is defined as being completely or nearly completely correct, with perhaps one or two failing tests, would have been a useful addition to guide future experimental work. While this paper is far from the last word on continuous testing, the Saff and Ernst experiment represents a key milestone: continuous testing was made to work. As such, this paper is strongly recommended to the software engineering community. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
        July 2004
        294 pages
        ISBN:1581138202
        DOI:10.1145/1007512
        • cover image ACM SIGSOFT Software Engineering Notes
          ACM SIGSOFT Software Engineering Notes  Volume 29, Issue 4
          July 2004
          284 pages
          ISSN:0163-5948
          DOI:10.1145/1013886
          Issue’s Table of Contents

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate58of213submissions,27%

        Upcoming Conference

        ISSTA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader