skip to main content
10.1145/2931037.2931038acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Predictive mutation testing

Published:18 July 2016Publication History

ABSTRACT

Mutation testing is a powerful methodology for evaluating test suite quality. In mutation testing, a large number of mutants are generated and executed against the test suite to check the ratio of killed mutants. Therefore, mutation testing is widely believed to be a computationally expensive technique. To alleviate the efficiency concern of mutation testing, in this paper, we propose predictive mutation testing (PMT), the first approach to predicting mutation testing results without mutant execution. In particular, the proposed approach constructs a classification model based on a series of features related to mutants and tests, and uses the classification model to predict whether a mutant is killed or survived without executing it. PMT has been evaluated on 163 real-world projects under two application scenarios (i.e., cross-version and cross-project). The experimental results demonstrate that PMT improves the efficiency of mutation testing by up to 151.4X while incurring only a small accuracy loss when predicting mutant execution results, indicating a good tradeoff between efficiency and effectiveness of mutation testing.

References

  1. K. Adamopoulos, M. Harman, and R. M. Hierons. How to overcome the equivalent mutant problem and achieve tailored selective mutation using co-evolution. In Proc. GECCO, pages 1338–1349, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. ICSE, pages 402–411, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. F. Barbosa, J. C. Maldonado, and A. M. R. Vincenzi. Toward the determination of sufficient mutant operators for C. STVR, 11(2):113–136, 2001.Google ScholarGoogle Scholar
  4. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neurocomputing, pages 227–236. Springer, 1990.Google ScholarGoogle Scholar
  6. A. Brillout, N. He, M. Mazzucchi, D. Kroening, M. Purandare, P. Rümmer, and G. Weissenbacher. Mutation-based test case generation for Simulink models. In Proc. FMCO, pages 208–227, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proc. ICSE, pages 480–490, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Choi, R. A. DeMillo, E. W. Krauser, R. Martin, A. Mathur, A. J. Offutt, H. Pan, and E. H. Spafford. The mothra tool set (software testing). In Proc. ICSS, pages 275–284, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Delahaye and L. du Bousquet. A comparison of mutation analysis tools for Java. In Proc. QSIC, pages 187–195, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Delamaro, M. Pezzè, A. M. R. Vincenzi, and J. C. Maldonado. Mutant operators for testing concurrent java programs. In Proc. SBES, pages 272–285, 2001.Google ScholarGoogle Scholar
  11. R. A. DeMillo, E. W. Krauser, and A. P. Mathur. Compiler-integrated program mutation. In Proc. COMPSAC, pages 351–356, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34–41, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Fine and A. Ziv. Coverage directed test generation for functional verification using bayesian networks. In Proc. DAS, pages 286–291, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Fraser and A. Arcuri. Achieving scalable mutation-based generation of whole test suites. Empirical Software Engineering, pages 1–30, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. ISSTA, pages 302–313, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Gligoric, L. Zhang, C. Pereira, and G. Pokam. Selective mutation testing for concurrent code. In Proc. ISSTA, pages 224–234, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Gopinath, C. Jensen, and A. Groce. Code coverage for suite evaluation by developers. In Proc. ICSE, pages 72–82, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Gupta, A. P. Mathur, and M. L. Soffa. Generating test data for branch coverage. In Proc. ASE, pages 219–227, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. TSE, 31(10):897–910, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. G. Hamlet. Testing programs with the aid of a compiler. TSE, (4):279–290, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  22. D. Hao, L. Zhang, M.-H. Liu, H. Li, and J.-S. Sun. Test-data generation guided by static defect detection. JCST, 24(2):284–293, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Harman, Y. Jia, and W. B. Langdon. Strong higher order mutation-based test data generation. In Proc. FSE, pages 212–222, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Harman, Y. Jia, P. Reales Mateo, and M. Polo. Angels and monsters: An empirical investigation of potential test effectiveness and efficiency improvement from strongly subsuming higher order mutation. In Proc. ASE, pages 397–408, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. E. Howden. Weak mutation testing and completeness of test sets. TSE, (4):371–379, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Huang and C. X. Ling. Using AUC and accuracy in evaluating learning algorithms. TKDE, 17(3):299–310, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Inozemtseva, H. Hemmati, and R. Holmes. Using fault history to improve mutation reduction. In Proc. FSE, pages 639–642, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Inozemtseva and R. Holmes. Coverage is not strongly correlated with test suite effectiveness. In Proc. ICSE, pages 435–445, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Jalbert and J. S. Bradbury. Predicting mutation score using source code and test suite metrics. In Proc. RAISE’, pages 42–46, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. TSE, 37(5):649–678, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Jiang, S.-S. Hou, J. Shan, L. Zhang, and B. Xie. An approach to testing black-box components using contract-based mutation. ISSRE, pages 93–117, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  32. T. Joachims. Advances in kernel methods. chapter Making Large-scale Support Vector Machine Learning Practical, pages 169–184. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Just, M. D. Ernst, and G. Fraser. Efficient mutation analysis by propagating and partitioning infected execution states. In Proc. ISSTA, pages 315–326, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In Proc. FSE, pages 654–665, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Just, G. M. Kapfhammer, and F. Schweiggert. Do redundant mutants affect the effectiveness and efficiency of mutation analysis? In Proc. ICST, pages 720–725. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Just, F. Schweiggert, and G. M. Kapfhammer. Major: An efficient and extensible tool for mutation analysis in a java compiler. In Proc. ASE, pages 612–615, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. T. Kent. Information gain and a general measure of correlation. Biometrika, 70(1):163–173, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  38. E. W. Krauser, A. P. Mathur, and V. J. Rego. High performance software testing on simd machines. TSE, 17(5):403–423, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Liu, M. Wang, J. Wang, and D. Li. Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification. SABC, 177:970–980, 2013.Google ScholarGoogle Scholar
  40. Y. Lou, D. Hao, and L. Zhang. Mutation-based test-case prioritization in software evolution. In Proc. ISSRE, pages 46–57, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. L. Lu, H. Jiang, and H. Zhang. A robust audio classification and segmentation method. In Proc. ACMMM, pages 203–211. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. Madeyski. The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment. IST, 52(2):169–184, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. T. J. McCabe. A complexity measure. TSE, (4):308–320, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. McCallum, K. Nigam, et al. A comparison of event models for naive bayes text classification. In AAAI.Google ScholarGoogle Scholar
  45. D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Moon, Y. Kim, M. Kim, and S. Yoo. Ask the mutants: Mutating faulty programs for fault localization. In Proc. ICST, pages 153–162, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J.-M. Mottu, B. Baudry, and Y. Le Traon. Mutation analysis testing for model transformations. In Proc. ECMDA, pages 376–390. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional neural networks over tree structures for programming language processing. In AAAI, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. ISSTA, pages 57–68, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. J. Offutt and S. D. Lee. An empirical evaluation of weak mutation. TSE, 20(5):337–344, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. J. Offutt, R. P. Pargas, S. V. Fichter, and P. K. Khambekar. Mutation testing of software using a mimd computer. In Proc. ICPP, 1992.Google ScholarGoogle Scholar
  52. A. J. Offutt, G. Rothermel, and C. Zapf. An experimental evaluation of selective mutation. In Proc. ICSE, pages 100–107, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Papadakis and Y. Le Traon. Using mutants to locate “unknown” faults. In Proc. ICSTW, pages 691–700, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Papadakis, N. Malevris, and M. Kallia. Towards automating the generation of mutation tests. In Proc. AST, pages 111–118, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. T. R. Patil and S. Sherekar. Performance analysis of naive bayes and J48 classification algorithm for data classification. IJCSA, 6(2):256–261, 2013.Google ScholarGoogle Scholar
  56. H. Peng, L. Mou, G. Li, Y. Liu, L. Zhang, and Z. Jin. Building program vector representations for deep learning. In Knowledge Science, Engineering and Management, pages 547–553. 2015.Google ScholarGoogle Scholar
  57. PMT homepage. https://github.com/SEITest/PMT.Google ScholarGoogle Scholar
  58. J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. Google ScholarGoogle ScholarCross RefCross Ref
  59. S. Rayadurgam and M. P. E. Heimdahl. Coverage based test-case generation using model checkers. In Proc. ECBS, pages 83–91, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  60. D. Schuler and A. Zeller. Javalanche: efficient mutation testing for java. In FSE, pages 297–298, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. D. Schuler and A. Zeller. Assessing oracle quality with checked coverage. In Proc. ICST, pages 90–99, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. A. Shi, A. Gyori, M. Gligoric, A. Zaytsev, and D. Marinov. Balancing trade-offs in test-suite reduction. In Proc. FSE, pages 246–256, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. A. Shi, T. Yung, A. Gyori, and D. Marinov. Comparing and combining test-suite reduction and regression test selection. In FSE, pages 237–247, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. A. Siami Namin, J. H. Andrews, and D. J. Murdoch. Sufficient mutation operators for measuring test effectiveness. In Proc. ICSE, pages 351–360, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. R. H. Untch, A. J. Offutt, and M. J. Harrold. Mutation analysis using mutant schemata. In Proc. ISSTA, pages 139–148, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. J. M. Voas. Pie: A dynamic failure-based technique. TSE, 18(8):717–727, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. C. G. Weng and J. Poon. A new evaluation measure for imbalanced datasets. In Proc. AusDM, pages 27–32, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. W. E. Wong, editor. Mutation Testing for the New Century. Kluwer Academic Publishers, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. W. E. Wong and A. P. Mathur. Reducing the cost of mutation testing: An empirical study. JSS, 31(3):185–196, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. W. E. Wong, A. P. Mathur, and J. C. Maldonado. Mutation versus all-uses: An empirical evaluation of cost, strength and effectiveness. In Software Quality and Productivity: Theory, Practice and Training, pages 258–265, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. M. Woodward and K. Halewood. From weak to strong, dead or alive? an analysis of some mutation testing issues. In Proc. STVA, pages 152–158, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  72. J. Xuan, X. Xie, and M. Monperrus. Crash reproduction via test case mutation: Let existing test cases help. In Proc. FSE, pages 910–913, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. J. Zhang, J. Chen, D. Hao, Y. Xiong, B. Xie, L. Zhang, and H. Mei. Search-based inference of polynomial metamorphic relations. In Proc. ASE, pages 701–712, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. J. Zhang, X. Wang, D. Hao, B. Xie, L. Zhang, and H. Mei. A survey on bug-report analysis. Science China Information Sciences, 58(2):1–24, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  75. J. Zhang, M. Zhu, D. Hao, and L. Zhang. An empirical study on the scalability of selective mutation testing. In Proc. ISSRE, pages 277–287. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. L. Zhang, M. Gligoric, D. Marinov, and S. Khurshid. Operator-based and random mutant selection: Better together. In Proc. ASE, pages 92–102, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. L. Zhang, S.-S. Hou, J.-J. Hu, T. Xie, and H. Mei. Is operator-based mutant selection superior to random mutant selection? In Proc. ICSE, pages 435–444, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. L. Zhang, D. Marinov, and S. Khurshid. Faster mutation testing inspired by test prioritization and reduction. In Proc. ISSTA, pages 235–245, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. L. Zhang, D. Marinov, L. Zhang, and S. Khurshid. Regression mutation testing. In Proc. ISSTA, pages 331–341, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. L. Zhang, T. Xie, L. Zhang, N. Tillmann, J. De Halleux, and H. Mei. Test generation via dynamic symbolic execution for mutation testing. In Proc. ICSM, pages 1–10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. L. Zhang, L. Zhang, and S. Khurshid. Injecting mechanical faults to localize developer faults for evolving software. In OOPSLA, pages 765–784, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Y. Zhang and A. Mesbah. Assertions are strongly correlated with test suite effectiveness. In Proc. FSE, pages 214–224, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predictive mutation testing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis
      July 2016
      452 pages
      ISBN:9781450343909
      DOI:10.1145/2931037

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate58of213submissions,27%

      Upcoming Conference

      ISSTA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader