skip to main content
10.1145/2884781.2884791acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Comparing white-box and black-box test prioritization

Published:14 May 2016Publication History

ABSTRACT

Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including well-established white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black- and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases.

References

  1. GNU FTP Server. http://ftp.gnu.org/.Google ScholarGoogle Scholar
  2. bzip2: A freely available, patent free, high-quality data compressor. http://www.bzip.org/.Google ScholarGoogle Scholar
  3. cloc: Count Lines of Code. http://cloc.sourceforge.net/.Google ScholarGoogle Scholar
  4. gcc: The GNU Compiler Collection. https://gcc.gnu.org/.Google ScholarGoogle Scholar
  5. gcov - A Test coverage program. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html.Google ScholarGoogle Scholar
  6. R: The R project for statistical computing. https://www.r-project.org/.Google ScholarGoogle Scholar
  7. time: Run programs and summarize system resource usage. http://linux.die.net/man/1/time.Google ScholarGoogle Scholar
  8. N. Alshahwan and M. Harman. State aware test case regeneration for improving web application test suite coverage and fault detection. In ISSTA, pages 45--55, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Alshahwan and M. Harman. Coverage and fault detection of the output-uniqueness test selection criteria. In ISSTA, pages 181--192, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Ammann, M. E. Delamaro, and J. Offutt. Establishing theoretical minimal sets of mutants. In ICST, pages 21--30, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng., 32(8):608--624, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In ICSE, pages 1--10, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Ball. On the limit of control flow analysis for regression test selection. In ISSTA, pages 134--142, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. C. Bryce and C. J. Colbourn. Prioritized interaction testing for pair-wise coverage with seeding and constraints. Info. & Softw. Tech., 48(10):960--970, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. C. Bryce and A. M. Memon. Test suite prioritization by interaction coverage. In DOSTA, pages 1--7, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Cao, Z. Zhou, and T. Y. Chen. On the correlation between the effectiveness of metamorphic relations and dissimilarities of test case executions. In QSIC, pages 153--162, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. G. Cartaxo, P. D. L. Machado, and F. G. O. Neto. On the use of a similarity function for test case selection in the context of model-based testing. Softw. Test., Verif. Reliab., 21(2):75--100, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Y. Chen, F. Kuo, R. G. Merkel, and T. H. Tse. Adaptive random testing: The ART of test case diversity. Jrnl. Syst. Softw., 83(1):60--66, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. R. Cohen and P. M. B. Vitányi. Normalized compression distance of multisets with applications. IEEE Trans. Pattern Anal. Mach. Intell., 37(8):1602--1614, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. B. Cohen, M. B. Dwyer, and J. Shi. Constructing interaction test suites for highly-configurable systems in the presence of constraints: A greedy approach. IEEE Trans. Softw. Eng., 34(5):633--650, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Cotroneo, R. Pietrantuono, and S. Russo. A learning-based method for combining testing techniques. In ICSE, pages 142--151, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir. Softw. Eng., 10(4):405--435, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Do and G. Rothermel. An empirical study of regression testing techniques incorporating context and lifetime factors and improved cost-benefit models. In FSE, pages 141--151, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri. Understanding the effects of changes on the cost-effectiveness of regression testing techniques. Softw. Test., Verif. Reliab., 13(2):65--83, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, pages 235--245, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Prioritizing test cases for regression testing. In ISSTA, pages 102--112, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Incorporating varying test costs and fault severities into test case prioritization. In ICSE, pages 329--338, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Test case prioritization: A family of empirical studies. IEEE Trans. Softw. Eng., 28(2):159--182, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. G. Elbaum, G. Rothermel, S. Kanduri, and A. G. Malishevsky. Selecting a cost-effective test case prioritization technique. Softw. Qual. Jrnl., 12(3):185--210, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Engström, P. Runeson, and M. Skoglund. A systematic review on regression test selection techniques. Info. & Softw. Tech., 52(1):14--30, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Engström, M. Skoglund, and P. Runeson. Empirical evaluations of regression test selection techniques: a systematic review. In ESEM, pages 22--31, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Feldt, S. M. Poulding, D. Clark, and S. Yoo. Test set diameter: Quantifying the diversity of sets of test cases. CoRR, abs/1506.03482, 2015.Google ScholarGoogle Scholar
  33. M. Gligoric, S. Negara, O. Legunsen, and D. Marinov. An empirical evaluation and comparison of manual and automated test selection. In ASE, pages 361--372, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Harman, P. McMinn, J. Souza, and S. Yoo. Search based software engineering: Techniques, taxonomy, tutorial. In Empirical Software Engineering and Verification, pages 1--59. 2012. Google ScholarGoogle ScholarCross RefCross Ref
  35. H. Hemmati, A. Arcuri, and L. C. Briand. Achieving scalable model-based testing through test case diversity. ACM Trans. Softw. Eng. Methodol., 22(1):6, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. Le Traon. Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test configurations for software product lines. IEEE Trans. Softw. Eng., 40(7):650--670, July 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. Henard, M. Papadakis, G. Perrouin, J. Klein, and Y. L. Traon. Assessing software product line testing via model-based mutation: An application to similarity testing. In A-MOST, pages 188--197, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Jaccard. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:547--579, 1901.Google ScholarGoogle Scholar
  39. Y. Jia and M. Harman. Higher order mutation testing. Info. & Softw. Tech., 51(10):1379--1393, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. IEEE Trans. Softw. Eng., 37(5):649--678, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. Jiang, Z. Zhang, W. K. Chan, and T. H. Tse. Adaptive random test case prioritization. In ASE, pages 233--244, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Jin and A. Orso. Bugredux: Reproducing field failures for in-house debugging. In ICSE, pages 474--484, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In FSE, pages 654--665, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Kim and A. A. Porter. A history-based test prioritization technique for regression testing in resource constrained environments. In ICSE, pages 119--129, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Kintis, M. Papadakis, and N. Malevris. Evaluating mutation testing alternatives: A collateral experiment. In APSEC, pages 300--309, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Ledru, A. Petrenko, S. Boroday, and N. Mandran. Prioritizing test cases with string distances. Autom. Softw. Eng., 19(1):65--95, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Z. Li, M. Harman, and R. M. Hierons. Search algorithms for regression test case prioritization. IEEE Trans. Softw. Eng., 33(4):225--237, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Marré and A. Bertolino. Using spanning sets for coverage testing. IEEE Trans. Softw. Eng., 29(11):974--984, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. H. Mei, D. Hao, L. Zhang, L. Zhang, J. Zhou, and G. Rothermel. A static approach to prioritizing junit test cases. IEEE Trans. Softw. Eng., 38(6):1258--1275, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. C. D. Nguyen, A. Marchetto, and P. Tonella. Combining model-based and combinatorial testing for effective test case generation. In ISSTA, pages 100--110, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Nie and H. Leung. A survey of combinatorial testing. ACM Comput. Surv., 43(2):11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Orso, N. Shi, and M. J. Harrold. Scaling regression testing to large software systems. In FSE, pages 241--251, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Papadakis, C. Henard, and Y. L. Traon. Sampling program inputs with mutation analysis: Going beyond combinatorial interaction testing. In ICST, pages 1--10, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Papadakis, Y. Jia, M. Harman, and Y. LeTraon. Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In ICSE, pages 936--946, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Petke, S. Yoo, M. B. Cohen, and M. Harman. Efficiency and early fault detection with lower and higher strength combinatorial interaction testing. In FSE, pages 26--36, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. E. Rogstad, L. C. Briand, and R. Torkar. Test case selection for black-box regression testing of database applications. Info. & Softw. Tech., 55(10):1781--1795, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Test case prioritization: An empirical study. In ICSM, pages 179--188, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Trans. Softw. Eng., 27(10):929--948, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. R. K. Saha, L. Zhang, S. Khurshid, and D. E. Perry. An information retrieval approach for regression test prioritization based on program changes. In ICSE, pages 268--279, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R. A. Santelices, P. K. Chittimalli, T. Apiwattanapong, A. Orso, and M. J. Harrold. Test-suite augmentation for evolving software. In ASE, pages 218--227, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. P. J. Schroeder and B. Korel. Black-box test reduction using input-output analysis. In ISSTA, pages 173--177, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. A. Vargha and H. D. Delaney. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Jrnl. Educ. Behav. Stat., 25(2):101--132, 2000.Google ScholarGoogle Scholar
  63. P. Vitányi, F. Balbach, R. Cilibrasi, and M. Li. Normalized information distance. In Information Theory and Statistical Learning, pages 45--82. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  64. C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in Software Engineering: An Introduction. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: A survey. Softw. Test. Verif. Reliab., 22(2):67--120, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. S. Yoo and M. Harman. Test data regeneration: Generating new test data from existing test data. Softw. Test., Verif. Reliab., 22(3):171--201, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. C. Zhang, A. Groce, and M. A. Alipour. Using test case reduction and prioritization to improve symbolic execution. In ISSTA, pages 160--170, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. L. Zhang, D. Hao, L. Zhang, G. Rothermel, and H. Mei. Bridging the gap between the total and additional test-case prioritization strategies. In ICSE, pages 192--201, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Z. Q. Zhou, A. Sinaga, and W. Susilo. On the fault-detection capabilities of adaptive random test case prioritization: Case studies with large test suites. In HICSS, pages 5584--5593, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparing white-box and black-box test prioritization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '16: Proceedings of the 38th International Conference on Software Engineering
      May 2016
      1235 pages
      ISBN:9781450339001
      DOI:10.1145/2884781

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 May 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader