ABSTRACT
Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that.
- Allen Troy Acree Jr. 1980. On Mutation. Technical Report. Georgia Institute of Tech.Google Scholar
- Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen. 2016. Can Testedness Be Effectively Measured?. In Proc. 24th International Symposium on Foundations of Software Engineering (FSE'16). ACM. Google ScholarDigital Library
- Vard Antinyan, Jesper Derehag, Anna Sandberg, and Miroslaw Staron. 2018. Mythical Unit Test Coverage. IEEE Software 35, 3 (2018).Google ScholarCross Ref
- Author 1. 2018. Pitest: pull request for computing a full mutation matrix. (2018). https://github.com/hcoles/pitest/pull/511.Google Scholar
- Paul Barford and Mark Crovella. 1998. Generating Representative Web Workloads for Network and Server Performance Evaluation. In ACM SIGMETRICS Performance Evaluation Review, Vol. 26. ACM. Google ScholarDigital Library
- Bartosz Bogacki and Bartosz Walter. 2006. Evaluation of Test Code Quality with Aspect-Oriented Mutations. In Proc. 6th International Conference on Extreme Programming and Agile Processes in Software Engineering (XP'06). Springer. Google ScholarDigital Library
- Calin Caşcaval and David A Padua. 2003. Estimating Cache Misses and Locality Using Stack Distances. In Proc. 17th International Conference on Supercomputing (ICS'03). ACM. Google ScholarDigital Library
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research (JAIR) 16 (2002). Google ScholarDigital Library
- John Joseph Chilenski and Steven P Miller. 1994. Applicability of Modified Condition/Decision Coverage to Software Testing. Software Engineering Journal 9, 5 (1994).Google ScholarCross Ref
- Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. 2016. PIT: A Practical Mutation Testing Tool For Java. In Proc. 25th International Symposium on Software Testing and Analysis (ISSTA'16). ACM. Google ScholarDigital Library
- Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11,4 (1978). Google ScholarDigital Library
- Vladimir N Fleyshgakker and Stewart N Weiss. 1994. Efficient Mutation Analysis: A New Approach. In Proc. 3rd International Symposium on Software Testing and Analysis (ISSTA'94). ACM. Google ScholarDigital Library
- Rahul Gopinath, Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2015. How Hard Does Mutation Analysis Have to Be, Anyway?. In Proc. 26th International Symposium on Software Reliability Engineering (ISSRE'15). IEEE. Google ScholarDigital Library
- Rahul Gopinath, Mohammad Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2016. On the Limits of Mutation Reduction Strategies. In Proc. 38th International Conference on Software Engineering (ICSE'16). IEEE. Google ScholarDigital Library
- Bernhard JM Grün, David Schuler, and Andreas Zeller. 2009. The Impact of Equivalent Mutants. In Proc. International Conference on Software Testing, Verification and Validation Workshops (ICSTW'09). IEEE. Google ScholarDigital Library
- Lars Heinemann, Benjamin Hummel, and Daniela Steidl. 2014. Teamscale: Software quality control in real-time. In Companion Proc. 36th International Conference on Software Engineering (ICSE'14 Companion). ACM. Google ScholarDigital Library
- Hadi Hemmati. 2015. How Effective Are Code Coverage Criteria?. In Proc. 15th International Conference on Software Quality, Reliability and Security (QRS'15). IEEE. Google ScholarDigital Library
- William E. Howden. 1982. Weak Mutation Testing and Completeness of Test Sets. IEEE Transactions on Software Engineering (TSE) 4 (1982). Google ScholarDigital Library
- JC Huang. 1975. An Approach to Program Testing. ACM Computing Surveys (CSUR) 7, 3 (1975). Google ScholarDigital Library
- Laura Inozemtseva and Reid Holmes. 2014. Coverage Is Not Strongly Correlated With Test Suite Effectiveness. In Proc. 36th International Conference on Software Engineering (ICSE'14). ACM. Google ScholarDigital Library
- Goran Petrović Marko Ivanković, Bob Kurtz, Paul Ammann, and René Just. 2018. An Industrial Application of Mutation Testing: Lessons, Challenges, and Research Directions. In Proc. 13th International Workshop on Mutation Analysis (MUTATION'18).Google Scholar
- Kevin Jalbert and Jeremy S Bradbury. 2012. Predicting mutation score using source code and test suite metrics. In Proc. 1st International Workshop on Realizing AI Synergies in Software Engineering (RAISE'12). IEEE. Google ScholarDigital Library
- Changbin Ji, Zhenyu Chen, Baowen Xu, and Zhihong Zhao. 2009. A Novel Method of Mutation Clustering Based on Domain Analysis.. In Proc. 21st International Conference on Software Engineering and Knowledge Engineering (SEKE'09), Vol. 9.Google Scholar
- Yue Jia and Mark Harman. 2008. Constructing Subtle Faults Using Higher Order Mutation Testing. In Proc. 8th International Working Conference on Source Code Analysis and Manipulation (SCAM'08). IEEE.Google ScholarCross Ref
- Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. Transactions on Software Engineering(TSE) 37, 5 (2011). Google ScholarDigital Library
- René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proc. 23rd International Symposium on Software Testing and Analysis (ISSTA'14). ACM. Google ScholarDigital Library
- Ron Kohavi and others. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, Vol. 14. Google ScholarDigital Library
- Max Kuhn, the R Core Team, and further contributors. 2017. caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret R package version 6.0--76.Google Scholar
- Richard J Lipton. 1971. Fault Diagnosis of Computer Programs. Technical Report.Google Scholar
- Yu-Seung Ma, Jeff Offutt, and Yong Rae Kwon. 2005. MuJava: An Automated Class Mutation System. Software Testing, Verification and Reliability (STVR) 15, 2 (2005). Google ScholarDigital Library
- Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal 9, 2 (1970). Google ScholarDigital Library
- Rainer Niedermayr. 2018. TestAnalyzer. (2018). https://github.com/cqse/test-analyzer/ Computation of the Minimal Stack Distance (V3).Google Scholar
- Rainer Niedermayr, Elmar Juergens, and Stefan Wagner. 2016. Will My Tests Tell Me If I Break This Code?. In Proc. 1st International Workshop on Continuous Software Evolution and Delivery (CSED'16). ACM. Google ScholarDigital Library
- Rainer Niedermayr and Stefan Wagner. 2019. Dataset: Is the Stack Distance Between Method and Test Case Correlated With Test Effectiveness? (2019).Google Scholar
- A Jefferson Offutt, Ammei Lee, Gregg Rothermel, Roland H Untch, and Christian Zapf. 1996. An Experimental Determination of Sufficient Mutant Operators. ACM Transactions on Software Engineering and Methodology (TOSEM) 5, 2 (1996). Google ScholarDigital Library
- A Jefferson Offutt, Gregg Rothermel, and Christian Zapf. 1993. An Experimental Evaluation of Selective Mutation. In Proc. 15th International Conference on Software Engineering (ICSE'93). IEEE Computer Society Press. Google ScholarDigital Library
- A Jefferson Offutt and Roland H Untch. 2001. Mutation 2000: Uniting the Orthogonal. In Mutation Testing for the New Century. Springer. Google ScholarDigital Library
- Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2017. Mutation Testing Advances: An Analysis and Survey. Advances in Computers (2017).Google Scholar
- Sandra Rapps and Elaine J Weyuker. 1982. Data Flow Analysis Techniques for Test Data Selection. In Proc. 6th International Conference on Software Engineering (ICSE'82). IEEE Computer Society Press. Google ScholarDigital Library
- David Schuler, Valentin Dallmeier, and Andreas Zeller. 2009. Efficient Mutation Testing by Checking Invariant Violations. In Proc. 18th International Symposium on Software Testing and Analysis (ISSTA'09). ACM. Google ScholarDigital Library
- David Schuler and Andreas Zeller. 2013. Checked Coverage: An Indicator for Oracle Quality. Software Testing, Verification and Reliability (STVR) 23, 7 (2013).Google Scholar
- Akbar Siami Namin, James H Andrews, and Duncan J Murdoch. 2008. Sufficient Mutation Operators for Measuring Test Effectiveness. In Proc. 30th International Conference on Software Engineering (ICSE'08). ACM. Google ScholarDigital Library
- Joanna Strug and Barbara Strug. 2012. Machine learning approach in mutation testing. In Proc. 24th International Conference on Testing Software and Systems (ICTSS'12). Springer.Google ScholarCross Ref
- Joanna Strug and Barbara Strug. 2018. Evaluation of the prediction-based approach to cost reduction in mutation testing. In Proc. 39th International Conference on Information Systems Architecture and Technology (ISAT'18). Springer.Google Scholar
- Macario Polo Usaola and Pedro Reales Mateo. 2010. Mutation Testing Cost Reduction Techniques: A Survey. IEEE Software 27, 3 (2010). Google ScholarDigital Library
- Oscar Luis Vera-Pérez, Martin Monperrus, and Benoit Baudry. 2018. Descartes: a PITest engine to detect pseudo-tested methods-tool demonstration. In Proc. 33rd International Conference on Automated Software Engineering (ASE'18). ACM Press. Google ScholarDigital Library
- Jie Zhang, Lingming Zhang, Mark Harman, Dan Hao, Yue Jia, and Lu Zhang. 2018. Predictive Mutation Testing. Transactions on Software Engineering (TSE) (2018).Google Scholar
- Hong Zhu, Patrick AV Hall, and John HR May. 1997. Software Unit Test Coverage and Adequacy. ACM Computing Surveys (CSUR) 29, 4 (1997). Google ScholarDigital Library
Index Terms
- Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?
Recommendations
Coverage is not strongly correlated with test suite effectiveness
ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringThe coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and ...
Comparing test quality measures for assessing student-written tests
ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software EngineeringMany educators now include software testing activities in programming assignments, so there is a growing demand for appropriate methods of assessing the quality of student-written software tests. While tests can be hand-graded, some educators also use ...
Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests
SIGCSE '15: Proceedings of the 46th ACM Technical Symposium on Computer Science EducationMany educators currently use code coverage metrics to assess student-written software tests. While test adequacy criteria such as statement or branch coverage can also be used to measure the thoroughness of a test suite, they have limitations. Coverage ...
Comments