ABSTRACT
Mutation testing is a powerful methodology for evaluating test suite quality. In mutation testing, a large number of mutants are generated and executed against the test suite to check the ratio of killed mutants. Therefore, mutation testing is widely believed to be a computationally expensive technique. To alleviate the efficiency concern of mutation testing, in this paper, we propose predictive mutation testing (PMT), the first approach to predicting mutation testing results without mutant execution. In particular, the proposed approach constructs a classification model based on a series of features related to mutants and tests, and uses the classification model to predict whether a mutant is killed or survived without executing it. PMT has been evaluated on 163 real-world projects under two application scenarios (i.e., cross-version and cross-project). The experimental results demonstrate that PMT improves the efficiency of mutation testing by up to 151.4X while incurring only a small accuracy loss when predicting mutant execution results, indicating a good tradeoff between efficiency and effectiveness of mutation testing.
- K. Adamopoulos, M. Harman, and R. M. Hierons. How to overcome the equivalent mutant problem and achieve tailored selective mutation using co-evolution. In Proc. GECCO, pages 1338–1349, 2004.Google ScholarCross Ref
- J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. ICSE, pages 402–411, 2005. Google ScholarDigital Library
- E. F. Barbosa, J. C. Maldonado, and A. M. R. Vincenzi. Toward the determination of sufficient mutant operators for C. STVR, 11(2):113–136, 2001.Google Scholar
- L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. Google ScholarDigital Library
- J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neurocomputing, pages 227–236. Springer, 1990.Google Scholar
- A. Brillout, N. He, M. Mazzucchi, D. Kroening, M. Purandare, P. Rümmer, and G. Weissenbacher. Mutation-based test case generation for Simulink models. In Proc. FMCO, pages 208–227, 2010. Google ScholarDigital Library
- Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proc. ICSE, pages 480–490, 2004. Google ScholarDigital Library
- B. Choi, R. A. DeMillo, E. W. Krauser, R. Martin, A. Mathur, A. J. Offutt, H. Pan, and E. H. Spafford. The mothra tool set (software testing). In Proc. ICSS, pages 275–284, 1989.Google ScholarCross Ref
- M. Delahaye and L. du Bousquet. A comparison of mutation analysis tools for Java. In Proc. QSIC, pages 187–195, 2013. Google ScholarDigital Library
- M. Delamaro, M. Pezzè, A. M. R. Vincenzi, and J. C. Maldonado. Mutant operators for testing concurrent java programs. In Proc. SBES, pages 272–285, 2001.Google Scholar
- R. A. DeMillo, E. W. Krauser, and A. P. Mathur. Compiler-integrated program mutation. In Proc. COMPSAC, pages 351–356, 1991.Google ScholarCross Ref
- R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34–41, 1978. Google ScholarDigital Library
- S. Fine and A. Ziv. Coverage directed test generation for functional verification using bayesian networks. In Proc. DAS, pages 286–291, 2003. Google ScholarDigital Library
- G. Fraser and A. Arcuri. Achieving scalable mutation-based generation of whole test suites. Empirical Software Engineering, pages 1–30, 2014. Google ScholarDigital Library
- M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. ISSTA, pages 302–313, 2013. Google ScholarDigital Library
- M. Gligoric, L. Zhang, C. Pereira, and G. Pokam. Selective mutation testing for concurrent code. In Proc. ISSTA, pages 224–234, 2013. Google ScholarDigital Library
- R. Gopinath, C. Jensen, and A. Groce. Code coverage for suite evaluation by developers. In Proc. ICSE, pages 72–82, 2014. Google ScholarDigital Library
- R. Gupta, A. P. Mathur, and M. L. Soffa. Generating test data for branch coverage. In Proc. ASE, pages 219–227, 2000. Google ScholarDigital Library
- T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. TSE, 31(10):897–910, 2005. Google ScholarDigital Library
- R. G. Hamlet. Testing programs with the aid of a compiler. TSE, (4):279–290, 1977. Google ScholarDigital Library
- J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36, 1982.Google ScholarCross Ref
- D. Hao, L. Zhang, M.-H. Liu, H. Li, and J.-S. Sun. Test-data generation guided by static defect detection. JCST, 24(2):284–293, 2009. Google ScholarDigital Library
- M. Harman, Y. Jia, and W. B. Langdon. Strong higher order mutation-based test data generation. In Proc. FSE, pages 212–222, 2011. Google ScholarDigital Library
- M. Harman, Y. Jia, P. Reales Mateo, and M. Polo. Angels and monsters: An empirical investigation of potential test effectiveness and efficiency improvement from strongly subsuming higher order mutation. In Proc. ASE, pages 397–408, 2014. Google ScholarDigital Library
- W. E. Howden. Weak mutation testing and completeness of test sets. TSE, (4):371–379, 1982. Google ScholarDigital Library
- J. Huang and C. X. Ling. Using AUC and accuracy in evaluating learning algorithms. TKDE, 17(3):299–310, 2005. Google ScholarDigital Library
- L. Inozemtseva, H. Hemmati, and R. Holmes. Using fault history to improve mutation reduction. In Proc. FSE, pages 639–642, 2013. Google ScholarDigital Library
- L. Inozemtseva and R. Holmes. Coverage is not strongly correlated with test suite effectiveness. In Proc. ICSE, pages 435–445, 2014. Google ScholarDigital Library
- K. Jalbert and J. S. Bradbury. Predicting mutation score using source code and test suite metrics. In Proc. RAISE’, pages 42–46, 2012. Google ScholarDigital Library
- Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. TSE, 37(5):649–678, 2011. Google ScholarDigital Library
- Y. Jiang, S.-S. Hou, J. Shan, L. Zhang, and B. Xie. An approach to testing black-box components using contract-based mutation. ISSRE, pages 93–117, 2008.Google ScholarCross Ref
- T. Joachims. Advances in kernel methods. chapter Making Large-scale Support Vector Machine Learning Practical, pages 169–184. MIT Press, 1999. Google ScholarDigital Library
- R. Just, M. D. Ernst, and G. Fraser. Efficient mutation analysis by propagating and partitioning infected execution states. In Proc. ISSTA, pages 315–326, 2014. Google ScholarDigital Library
- R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In Proc. FSE, pages 654–665, 2014. Google ScholarDigital Library
- R. Just, G. M. Kapfhammer, and F. Schweiggert. Do redundant mutants affect the effectiveness and efficiency of mutation analysis? In Proc. ICST, pages 720–725. IEEE, 2012. Google ScholarDigital Library
- R. Just, F. Schweiggert, and G. M. Kapfhammer. Major: An efficient and extensible tool for mutation analysis in a java compiler. In Proc. ASE, pages 612–615, 2011. Google ScholarDigital Library
- J. T. Kent. Information gain and a general measure of correlation. Biometrika, 70(1):163–173, 1983.Google ScholarCross Ref
- E. W. Krauser, A. P. Mathur, and V. J. Rego. High performance software testing on simd machines. TSE, 17(5):403–423, 1991. Google ScholarDigital Library
- M. Liu, M. Wang, J. Wang, and D. Li. Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification. SABC, 177:970–980, 2013.Google Scholar
- Y. Lou, D. Hao, and L. Zhang. Mutation-based test-case prioritization in software evolution. In Proc. ISSRE, pages 46–57, 2015. Google ScholarDigital Library
- L. Lu, H. Jiang, and H. Zhang. A robust audio classification and segmentation method. In Proc. ACMMM, pages 203–211. ACM, 2001. Google ScholarDigital Library
- L. Madeyski. The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment. IST, 52(2):169–184, 2010. Google ScholarDigital Library
- T. J. McCabe. A complexity measure. TSE, (4):308–320, 1976. Google ScholarDigital Library
- A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In AAAI.Google Scholar
- D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. 1994. Google ScholarDigital Library
- S. Moon, Y. Kim, M. Kim, and S. Yoo. Ask the mutants: Mutating faulty programs for fault localization. In Proc. ICST, pages 153–162, 2014. Google ScholarDigital Library
- J.-M. Mottu, B. Baudry, and Y. Le Traon. Mutation analysis testing for model transformations. In Proc. ECMDA, pages 376–390. Springer, 2006. Google ScholarDigital Library
- L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional neural networks over tree structures for programming language processing. In AAAI, 2016.Google ScholarDigital Library
- A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. ISSTA, pages 57–68, 2009. Google ScholarDigital Library
- A. J. Offutt and S. D. Lee. An empirical evaluation of weak mutation. TSE, 20(5):337–344, 1994. Google ScholarDigital Library
- A. J. Offutt, R. P. Pargas, S. V. Fichter, and P. K. Khambekar. Mutation testing of software using a mimd computer. In Proc. ICPP, 1992.Google Scholar
- A. J. Offutt, G. Rothermel, and C. Zapf. An experimental evaluation of selective mutation. In Proc. ICSE, pages 100–107, 1993. Google ScholarDigital Library
- M. Papadakis and Y. Le Traon. Using mutants to locate “unknown” faults. In Proc. ICSTW, pages 691–700, 2012. Google ScholarDigital Library
- M. Papadakis, N. Malevris, and M. Kallia. Towards automating the generation of mutation tests. In Proc. AST, pages 111–118, 2010. Google ScholarDigital Library
- T. R. Patil and S. Sherekar. Performance analysis of naive bayes and J48 classification algorithm for data classification. IJCSA, 6(2):256–261, 2013.Google Scholar
- H. Peng, L. Mou, G. Li, Y. Liu, L. Zhang, and Z. Jin. Building program vector representations for deep learning. In Knowledge Science, Engineering and Management, pages 547–553. 2015.Google Scholar
- PMT homepage. https://github.com/SEITest/PMT.Google Scholar
- J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. Google ScholarCross Ref
- S. Rayadurgam and M. P. E. Heimdahl. Coverage based test-case generation using model checkers. In Proc. ECBS, pages 83–91, 2001.Google ScholarCross Ref
- D. Schuler and A. Zeller. Javalanche: efficient mutation testing for java. In FSE, pages 297–298, 2009. Google ScholarDigital Library
- D. Schuler and A. Zeller. Assessing oracle quality with checked coverage. In Proc. ICST, pages 90–99, 2011. Google ScholarDigital Library
- A. Shi, A. Gyori, M. Gligoric, A. Zaytsev, and D. Marinov. Balancing trade-offs in test-suite reduction. In Proc. FSE, pages 246–256, 2014. Google ScholarDigital Library
- A. Shi, T. Yung, A. Gyori, and D. Marinov. Comparing and combining test-suite reduction and regression test selection. In FSE, pages 237–247, 2015. Google ScholarDigital Library
- A. Siami Namin, J. H. Andrews, and D. J. Murdoch. Sufficient mutation operators for measuring test effectiveness. In Proc. ICSE, pages 351–360, 2008. Google ScholarDigital Library
- R. H. Untch, A. J. Offutt, and M. J. Harrold. Mutation analysis using mutant schemata. In Proc. ISSTA, pages 139–148, 1993. Google ScholarDigital Library
- J. M. Voas. Pie: A dynamic failure-based technique. TSE, 18(8):717–727, 1992. Google ScholarDigital Library
- C. G. Weng and J. Poon. A new evaluation measure for imbalanced datasets. In Proc. AusDM, pages 27–32, 2008. Google ScholarDigital Library
- W. E. Wong, editor. Mutation Testing for the New Century. Kluwer Academic Publishers, 2001. Google ScholarDigital Library
- W. E. Wong and A. P. Mathur. Reducing the cost of mutation testing: An empirical study. JSS, 31(3):185–196, 1995. Google ScholarDigital Library
- W. E. Wong, A. P. Mathur, and J. C. Maldonado. Mutation versus all-uses: An empirical evaluation of cost, strength and effectiveness. In Software Quality and Productivity: Theory, Practice and Training, pages 258–265, 1995. Google ScholarDigital Library
- M. Woodward and K. Halewood. From weak to strong, dead or alive? an analysis of some mutation testing issues. In Proc. STVA, pages 152–158, 1988.Google ScholarCross Ref
- J. Xuan, X. Xie, and M. Monperrus. Crash reproduction via test case mutation: Let existing test cases help. In Proc. FSE, pages 910–913, 2015. Google ScholarDigital Library
- J. Zhang, J. Chen, D. Hao, Y. Xiong, B. Xie, L. Zhang, and H. Mei. Search-based inference of polynomial metamorphic relations. In Proc. ASE, pages 701–712, 2014. Google ScholarDigital Library
- J. Zhang, X. Wang, D. Hao, B. Xie, L. Zhang, and H. Mei. A survey on bug-report analysis. Science China Information Sciences, 58(2):1–24, 2015.Google ScholarCross Ref
- J. Zhang, M. Zhu, D. Hao, and L. Zhang. An empirical study on the scalability of selective mutation testing. In Proc. ISSRE, pages 277–287. IEEE, 2014. Google ScholarDigital Library
- L. Zhang, M. Gligoric, D. Marinov, and S. Khurshid. Operator-based and random mutant selection: Better together. In Proc. ASE, pages 92–102, 2013.Google ScholarDigital Library
- L. Zhang, S.-S. Hou, J.-J. Hu, T. Xie, and H. Mei. Is operator-based mutant selection superior to random mutant selection? In Proc. ICSE, pages 435–444, 2010. Google ScholarDigital Library
- L. Zhang, D. Marinov, and S. Khurshid. Faster mutation testing inspired by test prioritization and reduction. In Proc. ISSTA, pages 235–245, 2013. Google ScholarDigital Library
- L. Zhang, D. Marinov, L. Zhang, and S. Khurshid. Regression mutation testing. In Proc. ISSTA, pages 331–341, 2012. Google ScholarDigital Library
- L. Zhang, T. Xie, L. Zhang, N. Tillmann, J. De Halleux, and H. Mei. Test generation via dynamic symbolic execution for mutation testing. In Proc. ICSM, pages 1–10, 2010. Google ScholarDigital Library
- L. Zhang, L. Zhang, and S. Khurshid. Injecting mechanical faults to localize developer faults for evolving software. In OOPSLA, pages 765–784, 2013. Google ScholarDigital Library
- Y. Zhang and A. Mesbah. Assertions are strongly correlated with test suite effectiveness. In Proc. FSE, pages 214–224, 2015. Google ScholarDigital Library
Index Terms
- Predictive mutation testing
Recommendations
Prioritizing mutants to guide mutation testing
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringMutation testing offers concrete test goals (mutants) and a rigorous test efficacy criterion, but it is expensive due to vast numbers of mutants, many of which are neither useful nor actionable. Prior work has focused on selecting representative and ...
Mutation testing cost reduction by clustering overlapped mutants
We defined the term conditionally-overlapped (c-overlapped) mutants.C-overlapped mutants are expected to produce the same results against a test case.Clustering c-overlapped mutants effectively reduces the cost of mutation testing.Clustering c-...
Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and AnalysisMutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...
Comments