Article

Comparing test quality measures for assessing student-written tests

Authors:
Stephen H. Edwards

Virginia Tech, USA

Virginia Tech, USA
View Profile

,
Zalia Shams

Virginia Tech, USA

Virginia Tech, USA
View Profile

ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software EngineeringMay 2014Pages 354–363https://doi.org/10.1145/2591062.2591164

Published:31 May 2014Publication History

ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering

Pages 354–363

ABSTRACT

Many educators now include software testing activities in programming assignments, so there is a growing demand for appropriate methods of assessing the quality of student-written software tests. While tests can be hand-graded, some educators also use objective performance metrics to assess software tests. The most common measures used at present are code coverage measures—tracking how much of the student’s code (in terms of statements, branches, or some combination) is exercised by the corresponding software tests. Code coverage has limitations, however, and sometimes it overestimates the true quality of the tests. Some researchers have suggested that mutation analysis may provide a better indication of test quality, while some educators have experimented with simply running every student’s test suite against every other student’s program—an “all-pairs” strategy that gives a bit more insight into the quality of the tests. However, it is still unknown which one of these measures is more accurate, in terms of most closely predicting the true bug revealing capability of a given test suite. This paper directly compares all three methods of measuring test quality in terms of how well they predict the observed bug revealing capabilities of student-written tests when run against a naturally occurring collection of student-produced defects. Experimental results show that all-pairs testing—running each student’s tests against every other student’s solution—is the most effective predictor of the underlying bug revealing capability of a test suite. Further, no strong correlation was found between bug revealing capability and either code coverage or mutation analysis scores.

References

S.H. Edwards. Using software testing to move students from trial-and-error to reflection-in-action. In Proc. 35th SIGCSE Tech. Symp. Comp. Sci. Education, ACM, 2004, pp. 26-30. Google ScholarDigital Library
S.H. Edwards. Using test-driven development in the classroom: Providing students with concrete feedback. In Proc. Int'l Conf. Education and Info. Sys.: Technologies and Applications, Int'l Inst. of Informatics and Systemics, 2003, pp. 421–426.Google Scholar
S.H. Edwards. Rethinking computer science education from a test-first perspective. In Add. 2003 Proc. Conf. Object-oriented Prog., Sys., Languages, and Applications, ACM, 2003, pp. 148–155. Google ScholarDigital Library
D. Jackson and M. Usher. Grading student programs using ASSYST. In Pro. 28th SIGCSE Tech. Symp. Comp. Sci. Education, 1997, pp. 335-339. Google ScholarDigital Library
J. Spacco and W. Pugh. Helping students appreciate testdriven development (TDD). In Companion to 21st ACM SIGPLAN Symp. Object-oriented Prog. Systems, Languages, and Applications, ACM, 2006, pp. 907-913. Google ScholarDigital Library
J.C. Miller and C.J. Maloney. Systematic mistake analysis of digital computer programs. Commun. ACM, vol. 6, pp. 58-63, 1963. Google ScholarDigital Library
(10/19/2013). JaCoCo Java Code Coverage Library. Available: http://www.eclemma.org/jacoco/Google Scholar
(10/19/2013). Clover: Java and Groovy Code Coverage. Available: https://www.atlassian.com/software/clover/overviewGoogle Scholar
(10/19/2013). EMMA: a free Java code coverage tool. Available: http://emma.sourceforge.net/Google Scholar
M.H. Goldwasser. A gimmick to integrate software testing throughout the curriculum. In Proc. 33rd SIGCSE Tech. Symp. Comp. Sci. Education, ACM, pp. 271-275, 2002. Google ScholarDigital Library
S.H. Edwards, Z. Shams, M. Cogswell, and R.C. Senkbeil. Running students' software tests against each others' code: New life for an old "gimmick". In Proc. 43rd ACM Tech. Symp. Comp. Sci. Education, ACM, 2012, pp. 221-226. Google ScholarDigital Library
K. Aaltonen, P. Ihantola, and O. Seppälä. Mutation analysis vs. code coverage in automated assessment of students' testing skills. In Proc. ACM Int'l Conf. Companion on Object-oriented Prog. Sys., Languages, and Applications, ACM, 2010, pp. 153-160. Google ScholarDigital Library
R.A. DeMillo, R.J. Lipton, and F.G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, vol. 11, pp. 34-41, 1978. Google ScholarDigital Library
A.J. Offutt. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol., vol. 1, pp. 5-20, 1992. Google ScholarDigital Library
Z. Shams and S.H. Edwards. Toward practical mutation analysis for evaluating the quality of student-written software tests. In Proc. 9th Ann. Int'l ACM Conf. Comp. Education Research, ACM, 2013, pp. 53-58. Google ScholarDigital Library
Y.-S. Ma, J. Offutt, and Y.R. Kwon. MuJava: An automated class mutation system: Research Articles. Softw. Test. Verif. Reliab., vol. 15, pp. 97-133, 2005. Google ScholarDigital Library
D. Schuler. (04/15/2013). Javalanche. Available: https://github.com/david-schuler/javalanche/Google Scholar
P. Ammann and J. Offutt, Introduction to Software Testing, 1 ed.: Cambridge University Press, 2008. Google ScholarDigital Library

Index Terms

Comparing test quality measures for assessing student-written tests
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education
        Information science education
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Toward practical mutation analysis for evaluating the quality of student-written software tests
ICER '13: Proceedings of the ninth annual international ACM conference on International computing education research

Software testing is being added to programming courses at many schools, but current assessment techniques for evaluating student-written tests are imperfect. Code coverage measures are typically used in practice, but they have limitations and sometimes ...
Read More
Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests
SIGCSE '15: Proceedings of the 46th ACM Technical Symposium on Computer Science Education

Many educators currently use code coverage metrics to assess student-written software tests. While test adequacy criteria such as statement or branch coverage can also be used to measure the thoroughness of a test suite, they have limitations. Coverage ...
Read More
Mutation analysis vs. code coverage in automated assessment of students' testing skills
OOPSLA '10: Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion

Learning to program should include learning about proper software testing. Some automatic assessment systems, e.g. Web-CAT, allow assessing student-generated test suites using coverage metrics. While this encourages testing, we have observed that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering
May 2014
741 pages
ISBN:9781450327688
DOI:10.1145/2591062
General Chair:
Pankaj Jalote
IIIT-Delhi, India
,
Program Chairs:
Lionel Briand
University of Luxembourg, Luxembourg
,
André van der Hoek
University of California, Irvine, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software testing
automated assessment
automated grading
mutation testing
programming assignments
test coverage
test metrics
test quality
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 411
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Comparing test quality measures for assessing student-written tests

ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward practical mutation analysis for evaluating the quality of student-written software tests

Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests

Mutation analysis vs. code coverage in automated assessment of students' testing skills

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Comparing test quality measures for assessing student-written tests

ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward practical mutation analysis for evaluating the quality of student-written software tests

Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests

Mutation analysis vs. code coverage in automated assessment of students' testing skills

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media