research-article

Predictive mutation testing

Authors:
Jie Zhang

Peking University, China

Peking University, China
View Profile

,
Ziyi Wang

Peking University, China

Peking University, China
View Profile

,
Lingming Zhang

University of Texas at Dallas, USA

University of Texas at Dallas, USA
View Profile

,
Dan Hao

Peking University, China

Peking University, China
View Profile

,
Lei Zang

Peking University, China

Peking University, China
View Profile

,
Shiyang Cheng

University of Texas at Dallas, USA

University of Texas at Dallas, USA
View Profile

,
Lu Zhang

Peking University, China

Peking University, China
View Profile

ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and AnalysisJuly 2016Pages 342–353https://doi.org/10.1145/2931037.2931038

Published:18 July 2016Publication History

ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis

Pages 342–353

ABSTRACT

Mutation testing is a powerful methodology for evaluating test suite quality. In mutation testing, a large number of mutants are generated and executed against the test suite to check the ratio of killed mutants. Therefore, mutation testing is widely believed to be a computationally expensive technique. To alleviate the efficiency concern of mutation testing, in this paper, we propose predictive mutation testing (PMT), the first approach to predicting mutation testing results without mutant execution. In particular, the proposed approach constructs a classification model based on a series of features related to mutants and tests, and uses the classification model to predict whether a mutant is killed or survived without executing it. PMT has been evaluated on 163 real-world projects under two application scenarios (i.e., cross-version and cross-project). The experimental results demonstrate that PMT improves the efficiency of mutation testing by up to 151.4X while incurring only a small accuracy loss when predicting mutant execution results, indicating a good tradeoff between efficiency and effectiveness of mutation testing.

References

K. Adamopoulos, M. Harman, and R. M. Hierons. How to overcome the equivalent mutant problem and achieve tailored selective mutation using co-evolution. In Proc. GECCO, pages 1338–1349, 2004.Google ScholarCross Ref
J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. ICSE, pages 402–411, 2005. Google ScholarDigital Library
E. F. Barbosa, J. C. Maldonado, and A. M. R. Vincenzi. Toward the determination of suﬃcient mutant operators for C. STVR, 11(2):113–136, 2001.Google Scholar
L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. Google ScholarDigital Library
J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neurocomputing, pages 227–236. Springer, 1990.Google Scholar
A. Brillout, N. He, M. Mazzucchi, D. Kroening, M. Purandare, P. Rümmer, and G. Weissenbacher. Mutation-based test case generation for Simulink models. In Proc. FMCO, pages 208–227, 2010. Google ScholarDigital Library
Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proc. ICSE, pages 480–490, 2004. Google ScholarDigital Library
B. Choi, R. A. DeMillo, E. W. Krauser, R. Martin, A. Mathur, A. J. Offutt, H. Pan, and E. H. Spafford. The mothra tool set (software testing). In Proc. ICSS, pages 275–284, 1989.Google ScholarCross Ref
M. Delahaye and L. du Bousquet. A comparison of mutation analysis tools for Java. In Proc. QSIC, pages 187–195, 2013. Google ScholarDigital Library
M. Delamaro, M. Pezzè, A. M. R. Vincenzi, and J. C. Maldonado. Mutant operators for testing concurrent java programs. In Proc. SBES, pages 272–285, 2001.Google Scholar
R. A. DeMillo, E. W. Krauser, and A. P. Mathur. Compiler-integrated program mutation. In Proc. COMPSAC, pages 351–356, 1991.Google ScholarCross Ref
R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34–41, 1978. Google ScholarDigital Library
S. Fine and A. Ziv. Coverage directed test generation for functional verification using bayesian networks. In Proc. DAS, pages 286–291, 2003. Google ScholarDigital Library
G. Fraser and A. Arcuri. Achieving scalable mutation-based generation of whole test suites. Empirical Software Engineering, pages 1–30, 2014. Google ScholarDigital Library
M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. ISSTA, pages 302–313, 2013. Google ScholarDigital Library
M. Gligoric, L. Zhang, C. Pereira, and G. Pokam. Selective mutation testing for concurrent code. In Proc. ISSTA, pages 224–234, 2013. Google ScholarDigital Library
R. Gopinath, C. Jensen, and A. Groce. Code coverage for suite evaluation by developers. In Proc. ICSE, pages 72–82, 2014. Google ScholarDigital Library
R. Gupta, A. P. Mathur, and M. L. Soffa. Generating test data for branch coverage. In Proc. ASE, pages 219–227, 2000. Google ScholarDigital Library
T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. TSE, 31(10):897–910, 2005. Google ScholarDigital Library
R. G. Hamlet. Testing programs with the aid of a compiler. TSE, (4):279–290, 1977. Google ScholarDigital Library
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36, 1982.Google ScholarCross Ref
D. Hao, L. Zhang, M.-H. Liu, H. Li, and J.-S. Sun. Test-data generation guided by static defect detection. JCST, 24(2):284–293, 2009. Google ScholarDigital Library
M. Harman, Y. Jia, and W. B. Langdon. Strong higher order mutation-based test data generation. In Proc. FSE, pages 212–222, 2011. Google ScholarDigital Library
M. Harman, Y. Jia, P. Reales Mateo, and M. Polo. Angels and monsters: An empirical investigation of potential test effectiveness and eﬃciency improvement from strongly subsuming higher order mutation. In Proc. ASE, pages 397–408, 2014. Google ScholarDigital Library
W. E. Howden. Weak mutation testing and completeness of test sets. TSE, (4):371–379, 1982. Google ScholarDigital Library
J. Huang and C. X. Ling. Using AUC and accuracy in evaluating learning algorithms. TKDE, 17(3):299–310, 2005. Google ScholarDigital Library
L. Inozemtseva, H. Hemmati, and R. Holmes. Using fault history to improve mutation reduction. In Proc. FSE, pages 639–642, 2013. Google ScholarDigital Library
L. Inozemtseva and R. Holmes. Coverage is not strongly correlated with test suite effectiveness. In Proc. ICSE, pages 435–445, 2014. Google ScholarDigital Library
K. Jalbert and J. S. Bradbury. Predicting mutation score using source code and test suite metrics. In Proc. RAISE’, pages 42–46, 2012. Google ScholarDigital Library
Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. TSE, 37(5):649–678, 2011. Google ScholarDigital Library
Y. Jiang, S.-S. Hou, J. Shan, L. Zhang, and B. Xie. An approach to testing black-box components using contract-based mutation. ISSRE, pages 93–117, 2008.Google ScholarCross Ref
T. Joachims. Advances in kernel methods. chapter Making Large-scale Support Vector Machine Learning Practical, pages 169–184. MIT Press, 1999. Google ScholarDigital Library
R. Just, M. D. Ernst, and G. Fraser. Eﬃcient mutation analysis by propagating and partitioning infected execution states. In Proc. ISSTA, pages 315–326, 2014. Google ScholarDigital Library
R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In Proc. FSE, pages 654–665, 2014. Google ScholarDigital Library
R. Just, G. M. Kapfhammer, and F. Schweiggert. Do redundant mutants affect the effectiveness and eﬃciency of mutation analysis? In Proc. ICST, pages 720–725. IEEE, 2012. Google ScholarDigital Library
R. Just, F. Schweiggert, and G. M. Kapfhammer. Major: An eﬃcient and extensible tool for mutation analysis in a java compiler. In Proc. ASE, pages 612–615, 2011. Google ScholarDigital Library
J. T. Kent. Information gain and a general measure of correlation. Biometrika, 70(1):163–173, 1983.Google ScholarCross Ref
E. W. Krauser, A. P. Mathur, and V. J. Rego. High performance software testing on simd machines. TSE, 17(5):403–423, 1991. Google ScholarDigital Library
M. Liu, M. Wang, J. Wang, and D. Li. Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification. SABC, 177:970–980, 2013.Google Scholar
Y. Lou, D. Hao, and L. Zhang. Mutation-based test-case prioritization in software evolution. In Proc. ISSRE, pages 46–57, 2015. Google ScholarDigital Library
L. Lu, H. Jiang, and H. Zhang. A robust audio classification and segmentation method. In Proc. ACMMM, pages 203–211. ACM, 2001. Google ScholarDigital Library
L. Madeyski. The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment. IST, 52(2):169–184, 2010. Google ScholarDigital Library
T. J. McCabe. A complexity measure. TSE, (4):308–320, 1976. Google ScholarDigital Library
A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In AAAI.Google Scholar
D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. 1994. Google ScholarDigital Library
S. Moon, Y. Kim, M. Kim, and S. Yoo. Ask the mutants: Mutating faulty programs for fault localization. In Proc. ICST, pages 153–162, 2014. Google ScholarDigital Library
J.-M. Mottu, B. Baudry, and Y. Le Traon. Mutation analysis testing for model transformations. In Proc. ECMDA, pages 376–390. Springer, 2006. Google ScholarDigital Library
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional neural networks over tree structures for programming language processing. In AAAI, 2016.Google ScholarDigital Library
A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. ISSTA, pages 57–68, 2009. Google ScholarDigital Library
A. J. Offutt and S. D. Lee. An empirical evaluation of weak mutation. TSE, 20(5):337–344, 1994. Google ScholarDigital Library
A. J. Offutt, R. P. Pargas, S. V. Fichter, and P. K. Khambekar. Mutation testing of software using a mimd computer. In Proc. ICPP, 1992.Google Scholar
A. J. Offutt, G. Rothermel, and C. Zapf. An experimental evaluation of selective mutation. In Proc. ICSE, pages 100–107, 1993. Google ScholarDigital Library
M. Papadakis and Y. Le Traon. Using mutants to locate “unknown” faults. In Proc. ICSTW, pages 691–700, 2012. Google ScholarDigital Library
M. Papadakis, N. Malevris, and M. Kallia. Towards automating the generation of mutation tests. In Proc. AST, pages 111–118, 2010. Google ScholarDigital Library
T. R. Patil and S. Sherekar. Performance analysis of naive bayes and J48 classification algorithm for data classification. IJCSA, 6(2):256–261, 2013.Google Scholar
H. Peng, L. Mou, G. Li, Y. Liu, L. Zhang, and Z. Jin. Building program vector representations for deep learning. In Knowledge Science, Engineering and Management, pages 547–553. 2015.Google Scholar
PMT homepage. https://github.com/SEITest/PMT.Google Scholar
J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. Google ScholarCross Ref
S. Rayadurgam and M. P. E. Heimdahl. Coverage based test-case generation using model checkers. In Proc. ECBS, pages 83–91, 2001.Google ScholarCross Ref
D. Schuler and A. Zeller. Javalanche: eﬃcient mutation testing for java. In FSE, pages 297–298, 2009. Google ScholarDigital Library
D. Schuler and A. Zeller. Assessing oracle quality with checked coverage. In Proc. ICST, pages 90–99, 2011. Google ScholarDigital Library
A. Shi, A. Gyori, M. Gligoric, A. Zaytsev, and D. Marinov. Balancing trade-offs in test-suite reduction. In Proc. FSE, pages 246–256, 2014. Google ScholarDigital Library
A. Shi, T. Yung, A. Gyori, and D. Marinov. Comparing and combining test-suite reduction and regression test selection. In FSE, pages 237–247, 2015. Google ScholarDigital Library
A. Siami Namin, J. H. Andrews, and D. J. Murdoch. Suﬃcient mutation operators for measuring test effectiveness. In Proc. ICSE, pages 351–360, 2008. Google ScholarDigital Library
R. H. Untch, A. J. Offutt, and M. J. Harrold. Mutation analysis using mutant schemata. In Proc. ISSTA, pages 139–148, 1993. Google ScholarDigital Library
J. M. Voas. Pie: A dynamic failure-based technique. TSE, 18(8):717–727, 1992. Google ScholarDigital Library
C. G. Weng and J. Poon. A new evaluation measure for imbalanced datasets. In Proc. AusDM, pages 27–32, 2008. Google ScholarDigital Library
W. E. Wong, editor. Mutation Testing for the New Century. Kluwer Academic Publishers, 2001. Google ScholarDigital Library
W. E. Wong and A. P. Mathur. Reducing the cost of mutation testing: An empirical study. JSS, 31(3):185–196, 1995. Google ScholarDigital Library
W. E. Wong, A. P. Mathur, and J. C. Maldonado. Mutation versus all-uses: An empirical evaluation of cost, strength and effectiveness. In Software Quality and Productivity: Theory, Practice and Training, pages 258–265, 1995. Google ScholarDigital Library
M. Woodward and K. Halewood. From weak to strong, dead or alive? an analysis of some mutation testing issues. In Proc. STVA, pages 152–158, 1988.Google ScholarCross Ref
J. Xuan, X. Xie, and M. Monperrus. Crash reproduction via test case mutation: Let existing test cases help. In Proc. FSE, pages 910–913, 2015. Google ScholarDigital Library
J. Zhang, J. Chen, D. Hao, Y. Xiong, B. Xie, L. Zhang, and H. Mei. Search-based inference of polynomial metamorphic relations. In Proc. ASE, pages 701–712, 2014. Google ScholarDigital Library
J. Zhang, X. Wang, D. Hao, B. Xie, L. Zhang, and H. Mei. A survey on bug-report analysis. Science China Information Sciences, 58(2):1–24, 2015.Google ScholarCross Ref
J. Zhang, M. Zhu, D. Hao, and L. Zhang. An empirical study on the scalability of selective mutation testing. In Proc. ISSRE, pages 277–287. IEEE, 2014. Google ScholarDigital Library
L. Zhang, M. Gligoric, D. Marinov, and S. Khurshid. Operator-based and random mutant selection: Better together. In Proc. ASE, pages 92–102, 2013.Google ScholarDigital Library
L. Zhang, S.-S. Hou, J.-J. Hu, T. Xie, and H. Mei. Is operator-based mutant selection superior to random mutant selection? In Proc. ICSE, pages 435–444, 2010. Google ScholarDigital Library
L. Zhang, D. Marinov, and S. Khurshid. Faster mutation testing inspired by test prioritization and reduction. In Proc. ISSTA, pages 235–245, 2013. Google ScholarDigital Library
L. Zhang, D. Marinov, L. Zhang, and S. Khurshid. Regression mutation testing. In Proc. ISSTA, pages 331–341, 2012. Google ScholarDigital Library
L. Zhang, T. Xie, L. Zhang, N. Tillmann, J. De Halleux, and H. Mei. Test generation via dynamic symbolic execution for mutation testing. In Proc. ICSM, pages 1–10, 2010. Google ScholarDigital Library
L. Zhang, L. Zhang, and S. Khurshid. Injecting mechanical faults to localize developer faults for evolving software. In OOPSLA, pages 765–784, 2013. Google ScholarDigital Library
Y. Zhang and A. Mesbah. Assertions are strongly correlated with test suite effectiveness. In Proc. FSE, pages 214–224, 2015. Google ScholarDigital Library

Index Terms

Predictive mutation testing
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Prioritizing mutants to guide mutation testing
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Mutation testing offers concrete test goals (mutants) and a rigorous test efficacy criterion, but it is expensive due to vast numbers of mutants, many of which are neither useful nor actionable. Prior work has focused on selecting representative and ...
Read More
Mutation testing cost reduction by clustering overlapped mutants

We defined the term conditionally-overlapped (c-overlapped) mutants.C-overlapped mutants are expected to produce the same results against a test case.Clustering c-overlapped mutants effectively reduces the cost of mutation testing.Clustering c-...
Read More
Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Mutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis
July 2016
452 pages
ISBN:9781450343909
DOI:10.1145/2931037
General Chair:
Andreas Zeller
Saarland University, Germany
,
Program Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
machine learning
mutation testing
software testing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 1,116
  Total Downloads
- Downloads (Last 12 months)141
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predictive mutation testing

ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prioritizing mutants to guide mutation testing

Mutation testing cost reduction by clustering overlapped mutants

Faster mutation testing inspired by test prioritization and reduction