ABSTRACT
Many educators now include software testing activities in programming assignments, so there is a growing demand for appropriate methods of assessing the quality of student-written software tests. While tests can be hand-graded, some educators also use objective performance metrics to assess software tests. The most common measures used at present are code coverage measures—tracking how much of the student’s code (in terms of statements, branches, or some combination) is exercised by the corresponding software tests. Code coverage has limitations, however, and sometimes it overestimates the true quality of the tests. Some researchers have suggested that mutation analysis may provide a better indication of test quality, while some educators have experimented with simply running every student’s test suite against every other student’s program—an “all-pairs” strategy that gives a bit more insight into the quality of the tests. However, it is still unknown which one of these measures is more accurate, in terms of most closely predicting the true bug revealing capability of a given test suite. This paper directly compares all three methods of measuring test quality in terms of how well they predict the observed bug revealing capabilities of student-written tests when run against a naturally occurring collection of student-produced defects. Experimental results show that all-pairs testing—running each student’s tests against every other student’s solution—is the most effective predictor of the underlying bug revealing capability of a test suite. Further, no strong correlation was found between bug revealing capability and either code coverage or mutation analysis scores.
- S.H. Edwards. Using software testing to move students from trial-and-error to reflection-in-action. In Proc. 35th SIGCSE Tech. Symp. Comp. Sci. Education, ACM, 2004, pp. 26-30. Google ScholarDigital Library
- S.H. Edwards. Using test-driven development in the classroom: Providing students with concrete feedback. In Proc. Int'l Conf. Education and Info. Sys.: Technologies and Applications, Int'l Inst. of Informatics and Systemics, 2003, pp. 421–426.Google Scholar
- S.H. Edwards. Rethinking computer science education from a test-first perspective. In Add. 2003 Proc. Conf. Object-oriented Prog., Sys., Languages, and Applications, ACM, 2003, pp. 148–155. Google ScholarDigital Library
- D. Jackson and M. Usher. Grading student programs using ASSYST. In Pro. 28th SIGCSE Tech. Symp. Comp. Sci. Education, 1997, pp. 335-339. Google ScholarDigital Library
- J. Spacco and W. Pugh. Helping students appreciate testdriven development (TDD). In Companion to 21st ACM SIGPLAN Symp. Object-oriented Prog. Systems, Languages, and Applications, ACM, 2006, pp. 907-913. Google ScholarDigital Library
- J.C. Miller and C.J. Maloney. Systematic mistake analysis of digital computer programs. Commun. ACM, vol. 6, pp. 58-63, 1963. Google ScholarDigital Library
- (10/19/2013). JaCoCo Java Code Coverage Library. Available: http://www.eclemma.org/jacoco/Google Scholar
- (10/19/2013). Clover: Java and Groovy Code Coverage. Available: https://www.atlassian.com/software/clover/overviewGoogle Scholar
- (10/19/2013). EMMA: a free Java code coverage tool. Available: http://emma.sourceforge.net/Google Scholar
- M.H. Goldwasser. A gimmick to integrate software testing throughout the curriculum. In Proc. 33rd SIGCSE Tech. Symp. Comp. Sci. Education, ACM, pp. 271-275, 2002. Google ScholarDigital Library
- S.H. Edwards, Z. Shams, M. Cogswell, and R.C. Senkbeil. Running students' software tests against each others' code: New life for an old "gimmick". In Proc. 43rd ACM Tech. Symp. Comp. Sci. Education, ACM, 2012, pp. 221-226. Google ScholarDigital Library
- K. Aaltonen, P. Ihantola, and O. Seppälä. Mutation analysis vs. code coverage in automated assessment of students' testing skills. In Proc. ACM Int'l Conf. Companion on Object-oriented Prog. Sys., Languages, and Applications, ACM, 2010, pp. 153-160. Google ScholarDigital Library
- R.A. DeMillo, R.J. Lipton, and F.G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, vol. 11, pp. 34-41, 1978. Google ScholarDigital Library
- A.J. Offutt. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol., vol. 1, pp. 5-20, 1992. Google ScholarDigital Library
- Z. Shams and S.H. Edwards. Toward practical mutation analysis for evaluating the quality of student-written software tests. In Proc. 9th Ann. Int'l ACM Conf. Comp. Education Research, ACM, 2013, pp. 53-58. Google ScholarDigital Library
- Y.-S. Ma, J. Offutt, and Y.R. Kwon. MuJava: An automated class mutation system: Research Articles. Softw. Test. Verif. Reliab., vol. 15, pp. 97-133, 2005. Google ScholarDigital Library
- D. Schuler. (04/15/2013). Javalanche. Available: https://github.com/david-schuler/javalanche/Google Scholar
- P. Ammann and J. Offutt, Introduction to Software Testing, 1 ed.: Cambridge University Press, 2008. Google ScholarDigital Library
Index Terms
- Comparing test quality measures for assessing student-written tests
Recommendations
Toward practical mutation analysis for evaluating the quality of student-written software tests
ICER '13: Proceedings of the ninth annual international ACM conference on International computing education researchSoftware testing is being added to programming courses at many schools, but current assessment techniques for evaluating student-written tests are imperfect. Code coverage measures are typically used in practice, but they have limitations and sometimes ...
Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests
SIGCSE '15: Proceedings of the 46th ACM Technical Symposium on Computer Science EducationMany educators currently use code coverage metrics to assess student-written software tests. While test adequacy criteria such as statement or branch coverage can also be used to measure the thoroughness of a test suite, they have limitations. Coverage ...
Mutation analysis vs. code coverage in automated assessment of students' testing skills
OOPSLA '10: Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companionLearning to program should include learning about proper software testing. Some automatic assessment systems, e.g. Web-CAT, allow assessing student-generated test suites using coverage metrics. While this encourages testing, we have observed that ...
Comments