research-article

Comparing white-box and black-box test prioritization

Authors:
Christopher Henard

University of Luxembourg

University of Luxembourg
View Profile

,
Mike Papadakis

University of Luxembourg

University of Luxembourg
View Profile

,
Mark Harman

University College London

University College London
View Profile

,
Yue Jia

University College London

University College London
View Profile

,
Yves Le Traon

University of Luxembourg

University of Luxembourg
View Profile

ICSE '16: Proceedings of the 38th International Conference on Software EngineeringMay 2016Pages 523–534https://doi.org/10.1145/2884781.2884791

Published:14 May 2016Publication History

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

Pages 523–534

ABSTRACT

Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including well-established white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black- and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases.

References

GNU FTP Server. http://ftp.gnu.org/.Google Scholar
bzip2: A freely available, patent free, high-quality data compressor. http://www.bzip.org/.Google Scholar
cloc: Count Lines of Code. http://cloc.sourceforge.net/.Google Scholar
gcc: The GNU Compiler Collection. https://gcc.gnu.org/.Google Scholar
gcov - A Test coverage program. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html.Google Scholar
R: The R project for statistical computing. https://www.r-project.org/.Google Scholar
time: Run programs and summarize system resource usage. http://linux.die.net/man/1/time.Google Scholar
N. Alshahwan and M. Harman. State aware test case regeneration for improving web application test suite coverage and fault detection. In ISSTA, pages 45--55, 2012. Google ScholarDigital Library
N. Alshahwan and M. Harman. Coverage and fault detection of the output-uniqueness test selection criteria. In ISSTA, pages 181--192, 2014. Google ScholarDigital Library
P. Ammann, M. E. Delamaro, and J. Offutt. Establishing theoretical minimal sets of mutants. In ICST, pages 21--30, 2014. Google ScholarDigital Library
J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng., 32(8):608--624, 2006. Google ScholarDigital Library
A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In ICSE, pages 1--10, 2011. Google ScholarDigital Library
T. Ball. On the limit of control flow analysis for regression test selection. In ISSTA, pages 134--142, 1998. Google ScholarDigital Library
R. C. Bryce and C. J. Colbourn. Prioritized interaction testing for pair-wise coverage with seeding and constraints. Info. & Softw. Tech., 48(10):960--970, 2006.Google ScholarCross Ref
R. C. Bryce and A. M. Memon. Test suite prioritization by interaction coverage. In DOSTA, pages 1--7, 2007. Google ScholarDigital Library
Y. Cao, Z. Zhou, and T. Y. Chen. On the correlation between the effectiveness of metamorphic relations and dissimilarities of test case executions. In QSIC, pages 153--162, 2013. Google ScholarDigital Library
E. G. Cartaxo, P. D. L. Machado, and F. G. O. Neto. On the use of a similarity function for test case selection in the context of model-based testing. Softw. Test., Verif. Reliab., 21(2):75--100, 2011. Google ScholarDigital Library
T. Y. Chen, F. Kuo, R. G. Merkel, and T. H. Tse. Adaptive random testing: The ART of test case diversity. Jrnl. Syst. Softw., 83(1):60--66, 2010. Google ScholarDigital Library
A. R. Cohen and P. M. B. Vitányi. Normalized compression distance of multisets with applications. IEEE Trans. Pattern Anal. Mach. Intell., 37(8):1602--1614, 2015.Google ScholarDigital Library
M. B. Cohen, M. B. Dwyer, and J. Shi. Constructing interaction test suites for highly-configurable systems in the presence of constraints: A greedy approach. IEEE Trans. Softw. Eng., 34(5):633--650, 2008. Google ScholarDigital Library
D. Cotroneo, R. Pietrantuono, and S. Russo. A learning-based method for combining testing techniques. In ICSE, pages 142--151, 2013. Google ScholarDigital Library
H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir. Softw. Eng., 10(4):405--435, Oct. 2005. Google ScholarDigital Library
H. Do and G. Rothermel. An empirical study of regression testing techniques incorporating context and lifetime factors and improved cost-benefit models. In FSE, pages 141--151, 2006. Google ScholarDigital Library
S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri. Understanding the effects of changes on the cost-effectiveness of regression testing techniques. Softw. Test., Verif. Reliab., 13(2):65--83, 2003.Google ScholarCross Ref
S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, pages 235--245, 2014. Google ScholarDigital Library
S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Prioritizing test cases for regression testing. In ISSTA, pages 102--112, 2000. Google ScholarDigital Library
S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Incorporating varying test costs and fault severities into test case prioritization. In ICSE, pages 329--338, 2001. Google ScholarDigital Library
S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Test case prioritization: A family of empirical studies. IEEE Trans. Softw. Eng., 28(2):159--182, 2002. Google ScholarDigital Library
S. G. Elbaum, G. Rothermel, S. Kanduri, and A. G. Malishevsky. Selecting a cost-effective test case prioritization technique. Softw. Qual. Jrnl., 12(3):185--210, 2004. Google ScholarDigital Library
E. Engström, P. Runeson, and M. Skoglund. A systematic review on regression test selection techniques. Info. & Softw. Tech., 52(1):14--30, 2010. Google ScholarDigital Library
E. Engström, M. Skoglund, and P. Runeson. Empirical evaluations of regression test selection techniques: a systematic review. In ESEM, pages 22--31, 2008. Google ScholarDigital Library
R. Feldt, S. M. Poulding, D. Clark, and S. Yoo. Test set diameter: Quantifying the diversity of sets of test cases. CoRR, abs/1506.03482, 2015.Google Scholar
M. Gligoric, S. Negara, O. Legunsen, and D. Marinov. An empirical evaluation and comparison of manual and automated test selection. In ASE, pages 361--372, 2014. Google ScholarDigital Library
M. Harman, P. McMinn, J. Souza, and S. Yoo. Search based software engineering: Techniques, taxonomy, tutorial. In Empirical Software Engineering and Verification, pages 1--59. 2012. Google ScholarCross Ref
H. Hemmati, A. Arcuri, and L. C. Briand. Achieving scalable model-based testing through test case diversity. ACM Trans. Softw. Eng. Methodol., 22(1):6, 2013. Google ScholarDigital Library
C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. Le Traon. Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test configurations for software product lines. IEEE Trans. Softw. Eng., 40(7):650--670, July 2014. Google ScholarDigital Library
C. Henard, M. Papadakis, G. Perrouin, J. Klein, and Y. L. Traon. Assessing software product line testing via model-based mutation: An application to similarity testing. In A-MOST, pages 188--197, 2013. Google ScholarDigital Library
P. Jaccard. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:547--579, 1901.Google Scholar
Y. Jia and M. Harman. Higher order mutation testing. Info. & Softw. Tech., 51(10):1379--1393, 2009. Google ScholarDigital Library
Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. IEEE Trans. Softw. Eng., 37(5):649--678, 2011. Google ScholarDigital Library
B. Jiang, Z. Zhang, W. K. Chan, and T. H. Tse. Adaptive random test case prioritization. In ASE, pages 233--244, 2009. Google ScholarDigital Library
W. Jin and A. Orso. Bugredux: Reproducing field failures for in-house debugging. In ICSE, pages 474--484, 2012. Google ScholarDigital Library
R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In FSE, pages 654--665, 2014. Google ScholarDigital Library
J. Kim and A. A. Porter. A history-based test prioritization technique for regression testing in resource constrained environments. In ICSE, pages 119--129, 2002. Google ScholarDigital Library
M. Kintis, M. Papadakis, and N. Malevris. Evaluating mutation testing alternatives: A collateral experiment. In APSEC, pages 300--309, 2010. Google ScholarDigital Library
Y. Ledru, A. Petrenko, S. Boroday, and N. Mandran. Prioritizing test cases with string distances. Autom. Softw. Eng., 19(1):65--95, 2012. Google ScholarDigital Library
Z. Li, M. Harman, and R. M. Hierons. Search algorithms for regression test case prioritization. IEEE Trans. Softw. Eng., 33(4):225--237, 2007. Google ScholarDigital Library
M. Marré and A. Bertolino. Using spanning sets for coverage testing. IEEE Trans. Softw. Eng., 29(11):974--984, Nov. 2003. Google ScholarDigital Library
H. Mei, D. Hao, L. Zhang, L. Zhang, J. Zhou, and G. Rothermel. A static approach to prioritizing junit test cases. IEEE Trans. Softw. Eng., 38(6):1258--1275, 2012. Google ScholarDigital Library
C. D. Nguyen, A. Marchetto, and P. Tonella. Combining model-based and combinatorial testing for effective test case generation. In ISSTA, pages 100--110, 2012. Google ScholarDigital Library
C. Nie and H. Leung. A survey of combinatorial testing. ACM Comput. Surv., 43(2):11, 2011. Google ScholarDigital Library
A. Orso, N. Shi, and M. J. Harrold. Scaling regression testing to large software systems. In FSE, pages 241--251, 2004. Google ScholarDigital Library
M. Papadakis, C. Henard, and Y. L. Traon. Sampling program inputs with mutation analysis: Going beyond combinatorial interaction testing. In ICST, pages 1--10, 2014. Google ScholarDigital Library
M. Papadakis, Y. Jia, M. Harman, and Y. LeTraon. Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In ICSE, pages 936--946, 2015. Google ScholarDigital Library
J. Petke, S. Yoo, M. B. Cohen, and M. Harman. Efficiency and early fault detection with lower and higher strength combinatorial interaction testing. In FSE, pages 26--36, 2013. Google ScholarDigital Library
E. Rogstad, L. C. Briand, and R. Torkar. Test case selection for black-box regression testing of database applications. Info. & Softw. Tech., 55(10):1781--1795, 2013. Google ScholarDigital Library
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Test case prioritization: An empirical study. In ICSM, pages 179--188, 1999. Google ScholarDigital Library
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Trans. Softw. Eng., 27(10):929--948, 2001. Google ScholarDigital Library
R. K. Saha, L. Zhang, S. Khurshid, and D. E. Perry. An information retrieval approach for regression test prioritization based on program changes. In ICSE, pages 268--279, 2015. Google ScholarDigital Library
R. A. Santelices, P. K. Chittimalli, T. Apiwattanapong, A. Orso, and M. J. Harrold. Test-suite augmentation for evolving software. In ASE, pages 218--227, 2008. Google ScholarDigital Library
P. J. Schroeder and B. Korel. Black-box test reduction using input-output analysis. In ISSTA, pages 173--177, 2000. Google ScholarDigital Library
A. Vargha and H. D. Delaney. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Jrnl. Educ. Behav. Stat., 25(2):101--132, 2000.Google Scholar
P. Vitányi, F. Balbach, R. Cilibrasi, and M. Li. Normalized information distance. In Information Theory and Statistical Learning, pages 45--82. 2009.Google ScholarCross Ref
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in Software Engineering: An Introduction. 2000. Google ScholarDigital Library
S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: A survey. Softw. Test. Verif. Reliab., 22(2):67--120, Mar. 2012. Google ScholarDigital Library
S. Yoo and M. Harman. Test data regeneration: Generating new test data from existing test data. Softw. Test., Verif. Reliab., 22(3):171--201, May 2012. Google ScholarDigital Library
C. Zhang, A. Groce, and M. A. Alipour. Using test case reduction and prioritization to improve symbolic execution. In ISSTA, pages 160--170, 2014. Google ScholarDigital Library
L. Zhang, D. Hao, L. Zhang, G. Rothermel, and H. Mei. Bridging the gap between the total and additional test-case prioritization strategies. In ICSE, pages 192--201, 2013. Google ScholarDigital Library
Z. Q. Zhou, A. Sinaga, and W. Susilo. On the fault-detection capabilities of adaptive random test case prioritization: Case studies with large test suites. In HICSS, pages 5584--5593, 2012. Google ScholarDigital Library

Index Terms

Comparing white-box and black-box test prioritization
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Selection and Prioritization of Test Cases by Combining White-Box and Black-Box Testing Methods
ECBS-EERC '13: Proceedings of the 2013 3rd Eastern European Regional Conference on the Engineering of Computer Based Systems

In this paper, we present a methodology that combines both white-box and black-box testing, in order to improve testing quality for a given class of embedded systems. The goal of this methodology is generation of test cases for the new functional ...
Read More
Integrating White- and Black-Box Techniques for Class-Level Regression Testing
COMPSAC '01: Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development

In recent years, several techniques have been proposed for class-level regression testing. Most of these techniques focus either on white- or black-box testing, although an integrated approach can have several benefits. As similar tasks have to be ...
Read More
Optimizing test prioritization via test distribution analysis
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Test prioritization aims to detect regression faults faster via reordering test executions, and a large number of test prioritization techniques have been proposed accordingly. However, test prioritization effectiveness is usually measured in terms of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
General Chair:
Laura Dillon
Michigan State University
,
Program Chairs:
Willem Visser
Stellenbosch University, South Africa
,
Laurie Williams
North Carolina State University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
black-box
regression testing
white-box
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 118
  Total Citations
  View Citations
- 1,403
  Total Downloads
- Downloads (Last 12 months)138
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Comparing white-box and black-box test prioritization

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Selection and Prioritization of Test Cases by Combining White-Box and Black-Box Testing Methods

Integrating White- and Black-Box Techniques for Class-Level Regression Testing

Optimizing test prioritization via test distribution analysis