ABSTRACT
Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including well-established white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black- and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases.
- GNU FTP Server. http://ftp.gnu.org/.Google Scholar
- bzip2: A freely available, patent free, high-quality data compressor. http://www.bzip.org/.Google Scholar
- cloc: Count Lines of Code. http://cloc.sourceforge.net/.Google Scholar
- gcc: The GNU Compiler Collection. https://gcc.gnu.org/.Google Scholar
- gcov - A Test coverage program. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html.Google Scholar
- R: The R project for statistical computing. https://www.r-project.org/.Google Scholar
- time: Run programs and summarize system resource usage. http://linux.die.net/man/1/time.Google Scholar
- N. Alshahwan and M. Harman. State aware test case regeneration for improving web application test suite coverage and fault detection. In ISSTA, pages 45--55, 2012. Google ScholarDigital Library
- N. Alshahwan and M. Harman. Coverage and fault detection of the output-uniqueness test selection criteria. In ISSTA, pages 181--192, 2014. Google ScholarDigital Library
- P. Ammann, M. E. Delamaro, and J. Offutt. Establishing theoretical minimal sets of mutants. In ICST, pages 21--30, 2014. Google ScholarDigital Library
- J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng., 32(8):608--624, 2006. Google ScholarDigital Library
- A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In ICSE, pages 1--10, 2011. Google ScholarDigital Library
- T. Ball. On the limit of control flow analysis for regression test selection. In ISSTA, pages 134--142, 1998. Google ScholarDigital Library
- R. C. Bryce and C. J. Colbourn. Prioritized interaction testing for pair-wise coverage with seeding and constraints. Info. & Softw. Tech., 48(10):960--970, 2006.Google ScholarCross Ref
- R. C. Bryce and A. M. Memon. Test suite prioritization by interaction coverage. In DOSTA, pages 1--7, 2007. Google ScholarDigital Library
- Y. Cao, Z. Zhou, and T. Y. Chen. On the correlation between the effectiveness of metamorphic relations and dissimilarities of test case executions. In QSIC, pages 153--162, 2013. Google ScholarDigital Library
- E. G. Cartaxo, P. D. L. Machado, and F. G. O. Neto. On the use of a similarity function for test case selection in the context of model-based testing. Softw. Test., Verif. Reliab., 21(2):75--100, 2011. Google ScholarDigital Library
- T. Y. Chen, F. Kuo, R. G. Merkel, and T. H. Tse. Adaptive random testing: The ART of test case diversity. Jrnl. Syst. Softw., 83(1):60--66, 2010. Google ScholarDigital Library
- A. R. Cohen and P. M. B. Vitányi. Normalized compression distance of multisets with applications. IEEE Trans. Pattern Anal. Mach. Intell., 37(8):1602--1614, 2015.Google ScholarDigital Library
- M. B. Cohen, M. B. Dwyer, and J. Shi. Constructing interaction test suites for highly-configurable systems in the presence of constraints: A greedy approach. IEEE Trans. Softw. Eng., 34(5):633--650, 2008. Google ScholarDigital Library
- D. Cotroneo, R. Pietrantuono, and S. Russo. A learning-based method for combining testing techniques. In ICSE, pages 142--151, 2013. Google ScholarDigital Library
- H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir. Softw. Eng., 10(4):405--435, Oct. 2005. Google ScholarDigital Library
- H. Do and G. Rothermel. An empirical study of regression testing techniques incorporating context and lifetime factors and improved cost-benefit models. In FSE, pages 141--151, 2006. Google ScholarDigital Library
- S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri. Understanding the effects of changes on the cost-effectiveness of regression testing techniques. Softw. Test., Verif. Reliab., 13(2):65--83, 2003.Google ScholarCross Ref
- S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, pages 235--245, 2014. Google ScholarDigital Library
- S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Prioritizing test cases for regression testing. In ISSTA, pages 102--112, 2000. Google ScholarDigital Library
- S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Incorporating varying test costs and fault severities into test case prioritization. In ICSE, pages 329--338, 2001. Google ScholarDigital Library
- S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Test case prioritization: A family of empirical studies. IEEE Trans. Softw. Eng., 28(2):159--182, 2002. Google ScholarDigital Library
- S. G. Elbaum, G. Rothermel, S. Kanduri, and A. G. Malishevsky. Selecting a cost-effective test case prioritization technique. Softw. Qual. Jrnl., 12(3):185--210, 2004. Google ScholarDigital Library
- E. Engström, P. Runeson, and M. Skoglund. A systematic review on regression test selection techniques. Info. & Softw. Tech., 52(1):14--30, 2010. Google ScholarDigital Library
- E. Engström, M. Skoglund, and P. Runeson. Empirical evaluations of regression test selection techniques: a systematic review. In ESEM, pages 22--31, 2008. Google ScholarDigital Library
- R. Feldt, S. M. Poulding, D. Clark, and S. Yoo. Test set diameter: Quantifying the diversity of sets of test cases. CoRR, abs/1506.03482, 2015.Google Scholar
- M. Gligoric, S. Negara, O. Legunsen, and D. Marinov. An empirical evaluation and comparison of manual and automated test selection. In ASE, pages 361--372, 2014. Google ScholarDigital Library
- M. Harman, P. McMinn, J. Souza, and S. Yoo. Search based software engineering: Techniques, taxonomy, tutorial. In Empirical Software Engineering and Verification, pages 1--59. 2012. Google ScholarCross Ref
- H. Hemmati, A. Arcuri, and L. C. Briand. Achieving scalable model-based testing through test case diversity. ACM Trans. Softw. Eng. Methodol., 22(1):6, 2013. Google ScholarDigital Library
- C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. Le Traon. Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test configurations for software product lines. IEEE Trans. Softw. Eng., 40(7):650--670, July 2014. Google ScholarDigital Library
- C. Henard, M. Papadakis, G. Perrouin, J. Klein, and Y. L. Traon. Assessing software product line testing via model-based mutation: An application to similarity testing. In A-MOST, pages 188--197, 2013. Google ScholarDigital Library
- P. Jaccard. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:547--579, 1901.Google Scholar
- Y. Jia and M. Harman. Higher order mutation testing. Info. & Softw. Tech., 51(10):1379--1393, 2009. Google ScholarDigital Library
- Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. IEEE Trans. Softw. Eng., 37(5):649--678, 2011. Google ScholarDigital Library
- B. Jiang, Z. Zhang, W. K. Chan, and T. H. Tse. Adaptive random test case prioritization. In ASE, pages 233--244, 2009. Google ScholarDigital Library
- W. Jin and A. Orso. Bugredux: Reproducing field failures for in-house debugging. In ICSE, pages 474--484, 2012. Google ScholarDigital Library
- R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In FSE, pages 654--665, 2014. Google ScholarDigital Library
- J. Kim and A. A. Porter. A history-based test prioritization technique for regression testing in resource constrained environments. In ICSE, pages 119--129, 2002. Google ScholarDigital Library
- M. Kintis, M. Papadakis, and N. Malevris. Evaluating mutation testing alternatives: A collateral experiment. In APSEC, pages 300--309, 2010. Google ScholarDigital Library
- Y. Ledru, A. Petrenko, S. Boroday, and N. Mandran. Prioritizing test cases with string distances. Autom. Softw. Eng., 19(1):65--95, 2012. Google ScholarDigital Library
- Z. Li, M. Harman, and R. M. Hierons. Search algorithms for regression test case prioritization. IEEE Trans. Softw. Eng., 33(4):225--237, 2007. Google ScholarDigital Library
- M. Marré and A. Bertolino. Using spanning sets for coverage testing. IEEE Trans. Softw. Eng., 29(11):974--984, Nov. 2003. Google ScholarDigital Library
- H. Mei, D. Hao, L. Zhang, L. Zhang, J. Zhou, and G. Rothermel. A static approach to prioritizing junit test cases. IEEE Trans. Softw. Eng., 38(6):1258--1275, 2012. Google ScholarDigital Library
- C. D. Nguyen, A. Marchetto, and P. Tonella. Combining model-based and combinatorial testing for effective test case generation. In ISSTA, pages 100--110, 2012. Google ScholarDigital Library
- C. Nie and H. Leung. A survey of combinatorial testing. ACM Comput. Surv., 43(2):11, 2011. Google ScholarDigital Library
- A. Orso, N. Shi, and M. J. Harrold. Scaling regression testing to large software systems. In FSE, pages 241--251, 2004. Google ScholarDigital Library
- M. Papadakis, C. Henard, and Y. L. Traon. Sampling program inputs with mutation analysis: Going beyond combinatorial interaction testing. In ICST, pages 1--10, 2014. Google ScholarDigital Library
- M. Papadakis, Y. Jia, M. Harman, and Y. LeTraon. Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In ICSE, pages 936--946, 2015. Google ScholarDigital Library
- J. Petke, S. Yoo, M. B. Cohen, and M. Harman. Efficiency and early fault detection with lower and higher strength combinatorial interaction testing. In FSE, pages 26--36, 2013. Google ScholarDigital Library
- E. Rogstad, L. C. Briand, and R. Torkar. Test case selection for black-box regression testing of database applications. Info. & Softw. Tech., 55(10):1781--1795, 2013. Google ScholarDigital Library
- G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Test case prioritization: An empirical study. In ICSM, pages 179--188, 1999. Google ScholarDigital Library
- G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Trans. Softw. Eng., 27(10):929--948, 2001. Google ScholarDigital Library
- R. K. Saha, L. Zhang, S. Khurshid, and D. E. Perry. An information retrieval approach for regression test prioritization based on program changes. In ICSE, pages 268--279, 2015. Google ScholarDigital Library
- R. A. Santelices, P. K. Chittimalli, T. Apiwattanapong, A. Orso, and M. J. Harrold. Test-suite augmentation for evolving software. In ASE, pages 218--227, 2008. Google ScholarDigital Library
- P. J. Schroeder and B. Korel. Black-box test reduction using input-output analysis. In ISSTA, pages 173--177, 2000. Google ScholarDigital Library
- A. Vargha and H. D. Delaney. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Jrnl. Educ. Behav. Stat., 25(2):101--132, 2000.Google Scholar
- P. Vitányi, F. Balbach, R. Cilibrasi, and M. Li. Normalized information distance. In Information Theory and Statistical Learning, pages 45--82. 2009.Google ScholarCross Ref
- C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in Software Engineering: An Introduction. 2000. Google ScholarDigital Library
- S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: A survey. Softw. Test. Verif. Reliab., 22(2):67--120, Mar. 2012. Google ScholarDigital Library
- S. Yoo and M. Harman. Test data regeneration: Generating new test data from existing test data. Softw. Test., Verif. Reliab., 22(3):171--201, May 2012. Google ScholarDigital Library
- C. Zhang, A. Groce, and M. A. Alipour. Using test case reduction and prioritization to improve symbolic execution. In ISSTA, pages 160--170, 2014. Google ScholarDigital Library
- L. Zhang, D. Hao, L. Zhang, G. Rothermel, and H. Mei. Bridging the gap between the total and additional test-case prioritization strategies. In ICSE, pages 192--201, 2013. Google ScholarDigital Library
- Z. Q. Zhou, A. Sinaga, and W. Susilo. On the fault-detection capabilities of adaptive random test case prioritization: Case studies with large test suites. In HICSS, pages 5584--5593, 2012. Google ScholarDigital Library
Index Terms
- Comparing white-box and black-box test prioritization
Recommendations
Selection and Prioritization of Test Cases by Combining White-Box and Black-Box Testing Methods
ECBS-EERC '13: Proceedings of the 2013 3rd Eastern European Regional Conference on the Engineering of Computer Based SystemsIn this paper, we present a methodology that combines both white-box and black-box testing, in order to improve testing quality for a given class of embedded systems. The goal of this methodology is generation of test cases for the new functional ...
Integrating White- and Black-Box Techniques for Class-Level Regression Testing
COMPSAC '01: Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software DevelopmentIn recent years, several techniques have been proposed for class-level regression testing. Most of these techniques focus either on white- or black-box testing, although an integrated approach can have several benefits. As similar tasks have to be ...
Optimizing test prioritization via test distribution analysis
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringTest prioritization aims to detect regression faults faster via reordering test executions, and a large number of test prioritization techniques have been proposed accordingly. However, test prioritization effectiveness is usually measured in terms of ...
Comments