research-article

STADS: Software Testing as Species Discovery

Author:
Marcel Böhme

National University of Singapore and Monash University, Australia

National University of Singapore and Monash University, Australia

0000-0002-4470-1824
View Profile

ACM Transactions on Software Engineering and Methodology Volume 27 Issue 2Article No.: 7pp 1–52https://doi.org/10.1145/3210309

Published:27 June 2018Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (1) to estimate the total number of feasible program branches, given that only a fraction has been covered so far; (2) to estimate the additional time required to cover 10% more branches (or to estimate the coverage achieved in one more day, respectively); or (3) to assess the residual risk that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability does not mean that none exists—even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees.

In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimations (1) of the total number of species, (2) of the additional sampling effort required to discover 10% more species, or (3) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias—AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.

References

Gail E. Austen, Markus Bindemann, Richard A. Griffiths, and David L. Roberts. 2016. Species identification by experts and non-experts: Comparing images from field guides. Nature - Scientific Reports 6 (2016), 1--7.Google Scholar
Greg Banks, Marco Cova, Viktoria Felmetsger, Kevin Almeroth, Richard Kemmerer, and Giovanni Vigna. 2006. SNOOZE: Toward a stateful network protocol fuzZEr. In Proceedings of the 9th International Conference on Information Security (ISC’06). 343--358. Google ScholarDigital Library
E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo. 2015. The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering 41, 5 (May 2015), 507--525.Google ScholarDigital Library
Yves Basset, Lukas Cizek, Philippe Cuénoud, Raphael K. Didham, François Guilhaumon, Olivier Missa, Vojtech Novotny, Frode Ødegaard, Tomas Roslin, Jürgen Schmidl, Alexey K. Tishechkin, Neville N. Winchester, David W. Roubik, Henri-Pierre Aberlenc, Johannes Bail, Héctor Barrios, Jon R. Bridle, Gabriela Castaño-Meneses, Bruno Corbara, Gianfranco Curletti, Wesley Duarte da Rocha, Domir De Bakker, Jacques H. C. Delabie, Alain Dejean, Laura L. Fagan, Andreas Floren, Roger L.Kitching, Enrique Medianero, Scott E. Miller, Evandro Gama de Oliveira, Jérôme Orivel, Marc Pollet, Mathieu Rapp, Sérvio P. Ribeiro, Yves Roisin, Jesper B. Schmidt, Line Sørensen, and Maurice Leponce. 2012. Arthropod diversity in a tropical forest. Science 338, 6113 (2012), 1481--1484.Google Scholar
A. Bertolino. 2007. Software testing research: Achievements, challenges, dreams. In Future of Software Engineering (FOSE’07). 85--103. Google ScholarDigital Library
Marcel Böhme, Bruno C. d. S. Oliveira, and Abhik Roychoudhury. 2013. Partition-based regression verification. In Proceedings of the 2013 International Conference on Software Engineering (ICSE’13). 302--311. Google ScholarDigital Library
Marcel Böhme, Bruno C. d. S. Oliveira, and Abhik Roychoudhury. 2013. Regression tests to expose change interaction errors. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’13). 334--344. Google ScholarDigital Library
Marcel Böhme and Soumya Paul. 2016. A probabilistic analysis of the efficiency of automated software testing. IEEE Transactions on Software Engineering 42, 4 (April 2016), 345--360. Google ScholarDigital Library
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS’17). 2329--2344. Google ScholarDigital Library
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage-based greybox fuzzing as Markov chain. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS’16). 1032--1043. Google ScholarDigital Library
Zdravko I. Botev and Dirk P. Kroese. 2008. An efficient algorithm for rare-event probability estimation, combinatorial optimization, and counting. Methodology and Computing in Applied Probability 10, 4 (2008), 471--505.Google ScholarCross Ref
Ulrich Brose, Neo D. Martinez, and Richard J. Williams. 2003. Estimating species richness: Sensitivity to sample coverage and insensitivity to spatial patterns. Ecology 84, 9 (2003), 2364--2377.Google ScholarCross Ref
J. Bunge and M. Fitzpatrick. 1993. Estimating the number of species: A review. Journal of the American Statistical Association 88, 421 (1993), 364--373.Google ScholarCross Ref
K. P. Burnham and W. S. Overton. 1979. Robust estimation of population size when capture probabilities vary among animals. Ecology 60, 5 (1979), 927--936.Google ScholarCross Ref
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 209--224. Google ScholarDigital Library
Anne Chao. 1984. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11, 4 (1984), 265--270.Google Scholar
Anne Chao. 1987. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 4 (1987), 783--791.Google ScholarCross Ref
Anne Chao and Chun-Huo Chiu. 2014. Species Richness: Estimation and Comparison. John Wiley 8 Sons.Google Scholar
Anne Chao and Chun-Huo Chiu. 2016. Nonparametric estimation and comparison of species richness. In Encyclopedia of Life Sciences (eLS).Google Scholar
Anne Chao, Chun-Huo Chiu, Robert K. Colwell, Luiz Fernando S. Magnago, Robin L. Chazdon, and Nicholas J. Gotelli. 2017. Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on good-turing theory. Ecology 98, 11 (2017), 2914--2929.Google ScholarCross Ref
Anne Chao and Robert K. Colwell. 2017. Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling. Statistics and Operations Research Transactions 41, 1 (2017), 3--54.Google Scholar
Anne Chao, Robert K. Colwell, Chih-Wei Lin, and Nicholas J. Gotelli. 2009. Sufficient sampling for asymptotic minimum species richness estimators. Ecology 90, 4 (2009), 1125--1133.Google ScholarCross Ref
Anne Chao, T. C. Hsieh, Robin L. Chazdon, Robert K. Colwell, and Nicholas J. Gotelli. 2015. Unveiling the species-rank abundance distribution by generalizing the good-turing sample coverage theory. Ecology 96, 5 (2015), 1189--1201.Google ScholarCross Ref
Anne Chao and Lou Jost. 2012. Coverage-based rarefaction and extrapolation: Standardizing samples by completeness rather than size. Ecology 93, 12 (2012), 2533--2547.Google ScholarCross Ref
Anne Chao and Shen-Ming Lee. 1992. Estimating the number of classes via sample coverage. Journal of the American Statistical Association 87, 417 (1992), 210--217.Google ScholarCross Ref
A. Chao, K. H. Ma, T. C. Hsieh, and C. H. Chiu. 2015. Online Program SpadeR (Species-richness Prediction And Diversity Estimation in R). Program and User’s Guide. Retrieved from http://chao.stat.nthu.edu.tw/wordpress/software_download.Google Scholar
Anne Chao and Tsung-Jen Shen. 2004. Nonparametric prediction in species sampling. Journal of Agricultural, Biological, and Environmental Statistics 9, 3 (2004), 253--269.Google ScholarCross Ref
R. L. Chazdon, R. K. Colwell, J. S. Denslow, and M. R. Guariguata. 1998. Statistical methods for estimating species richness of woody regeneration in primary and secondary rain forests of Northeastern Costa Rica. In Forest Biodiversity Research, Monitoring and Modeling: Conceptual Background and Old World Case Studies. Vol. 20. Man and the Biosphere Series. 285--309.Google Scholar
Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. 2017. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In Proceedings of the 39th International Conference on Software Engineering (ICSE’17). 597--608. Google ScholarDigital Library
Tsong Yueh Chen and Yuen-Tak Yu. 1996. On the expected number of failures detected by subdomain testing and random testing. IEEE Transactions on Software Engineering 22, 2 (1996), 109--119. Google ScholarDigital Library
Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2011. S2E: A platform for in-vivo multi-path analysis of software systems. In Proceedings of the 2011 ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’11). 265--278. Google ScholarDigital Library
C. H. Chiu, Y. T. Wang, B. A. Walther, and A. Chao. 2014. An improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 70, 3 (2014), 671--682.Google ScholarCross Ref
Bernard D. Coleman. 1981. On random placement and species-area relations. Mathematical Biosciences 54, 3 (1981), 191--215.Google ScholarCross Ref
Robert K. Colwell, Anne Chao, Nicholas J. Gotelli, Shang-Yi Lin, Chang Xuan Mao, Robin L. Chazdon, and John T. Longino. 2012. Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages. Journal of Plant Ecology 5, 1 (2012), 3.Google ScholarCross Ref
Robert K. Colwell and Jonathan A. Coddington. 1994. Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society of London B: Biological Sciences 345, 1311 (1994), 101--118.Google Scholar
Robert K. Colwell and Johanna E. Elsensohn. 2014. EstimateS turns 20: Statistical estimation of species richness and shared species from samples, with non-parametric extrapolation. Ecography 37, 6 (2014), 609--613.Google ScholarCross Ref
Robert K. Colwell, Chang Xuan Mao, and Jing Chang. 2004. Interpolating, extrapolating, and comparing incidence-based species accumulation curves. Ecology 85, 10 (2004), 2717--2727.Google ScholarCross Ref
Edsger W. Dijkstra. 1970. Notes on Structured Programming.Google Scholar
Joe W. Duran and Simeon C. Ntafos. 1984. An evaluation of random testing. IEEE Transactions on Software Engineering 10, 4 (July 1984), 438--444. Google ScholarDigital Library
S. M. Durant, T. Wacher, S. Bashir, R. Woodroffe, P. De Ornellas, C. Ransom, J. Newby, T. Abáigar, M. Abdelgadir, H. El Alqamy, J. Baillie, M. Beddiaf, F. Belbachir, A. Belbachir-Bazi, A. A. Berbash, N. E. Bemadjim, R. Beudels-Jamar, L. Boitani, C. Breitenmoser, M. Cano, P. Chardonnet, B. Collen, W. A. Cornforth, F. Cuzin, P. Gerngross, B. Haddane, M. Hadjeloum, A. Jacobson, A. Jebali, F. Lamarque, D. Mallon, K. Minkowski, S. Monfort, B. Ndoassal, B. Niagate, G. Purchase, S. Samala, A. K. Samna, C. Sillero-Zubiri, A. E. Soultan, M. R. Stanley Price, and N. Pettorelli. 2014. Fiddling in biodiversity hotspots while deserts burn? Collapse of the Sahara’s megafauna. Diversity and Distributions 20, 1 (2014), 114--122.Google ScholarCross Ref
Antonio Filieri, Corina S. Păsăreanu, and Willem Visser. 2013. Reliability analysis in symbolic pathfinder. In Proceedings of the 2013 International Conference on Software Engineering (ICSE’13). 622--631. Google ScholarDigital Library
R. A. Fisher, A. S. Corbet, and C. B. Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12 (1943), 42--58.Google ScholarCross Ref
William A. Gale and Geoffrey Sampson. 1995. Good-Turing smoothing without tears. Journal of Quantitative Linguistics 2 (1995), 217--237.Google ScholarCross Ref
G. Gay, M. Staats, M. Whalen, and M. P. E. Heimdahl. 2015. The risks of coverage-directed test case generation. IEEE Transactions on Software Engineering 41, 8 (Aug. 2015), 803--819.Google ScholarDigital Library
Jaco Geldenhuys, Matthew B. Dwyer, and Willem Visser. 2012. Probabilistic symbolic execution. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA’12). 166--176. Google ScholarDigital Library
Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-based whitebox fuzzing. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). 206--215. Google ScholarDigital Library
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 213--223. Google ScholarDigital Library
I. J. Good. 2000. Turing’s anticipation of empirical Bayes in connection with the cryptanalysis of the naval enigma. Journal of Statistical Computation and Simulation 66, 2 (2000), 101--111.Google ScholarCross Ref
Irving John Good. 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40 (1953), 237--264.Google ScholarCross Ref
I. J. Good and G. H. Toulmin. 1956. The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43, 1/2 (1956), 45--63.Google ScholarCross Ref
Steven M. Goodman and Jonathan P. Benstead. 2005. Updated estimates of biotic diversity and endemism for Madagascar. Oryx 39, 1 (2005), 73--77.Google ScholarCross Ref
Nicholas J. Gotelli and Anne Chao. 2013. Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. In Encyclopedia of Biodiversity (2nd ed.). Vol. 5. Academic Press.195--211.Google Scholar
Walter J. Gutjahr. 1999. Partition testing vs. random testing: The influence of uncertainty. IEEE Transactions on Software Engineering 25, 5 (Sept. 1999), 661--674. Google ScholarDigital Library
Andrew J. Hamilton, Yves Basset, Kurt K. Benke, Peter S. Grimbacher, Scott E. Miller, Vojtech Novotnỳ, G. Allan Samuelson, Nigel E. Stork, George D. Weiblen, and Jian D.L. Yen. 2010. Quantifying uncertainty in estimation of tropical arthropod species richness. American Naturalist 176, 1 (2010), 90--95.Google ScholarCross Ref
Andrew J. Hamilton, Yves Basset, Kurt K. Benke, Peter S. Grimbacher, Scott E. Miller, Vojtech Novotnỳ, G. Allan Samuelson, Nigel E. Stork, George D. Weiblen, and Jian D. L. Yen. 2011. Correction. American Naturalist 177, 4 (2011), 544--545.Google ScholarCross Ref
D. Hamlet and R. Taylor. 1990. Partition testing does not inspire confidence {program testing}. IEEE Transactions on Software Engineering 16, 12 (Dec. 1990), 1402--1411. Google ScholarDigital Library
Dick Hamlet and Jeff Voas. 1993. Faults on its sleeve: Amplifying software reliability testing. In Proceedings of the 1993 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’93). 89--98. Google ScholarDigital Library
Mary Jean Harrold. 2000. Testing: A roadmap. In Proceedings of the Conference on the Future of Software Engineering (ICSE’00). 61--72.Google ScholarDigital Library
Frank J. Hernandez and David G. Lindquist. 1999. A comparison of two light-trap designs for sampling larval and presettlement juvenile fish above a reef in Onslow Bay, North Carolina. Bulletin of Marine Science 64, 1 (1999), 173--184.Google Scholar
Joaquin Hortal, Paulo A. V. Borges, and Clara Gaspar. 2006. Evaluating the performance of species richness estimators: Sensitivity to sample grain size. Journal of Animal Ecology 75, 1 (2006), 274--287.Google ScholarCross Ref
Matthias Höschele and Andreas Zeller. 2016. Mining input grammars from dynamic taints. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). 720--725. Google ScholarDigital Library
T. C. Hsieh, K. H. Ma, and Anne Chao. 2016. iNEXT: An R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods in Ecology and Evolution 7, 12 (2016), 1451--1456.Google ScholarCross Ref
Stuart H. Hurlbert. 1971. The nonconcept of species diversity: A critique and alternative parameters. Ecology 52, 4 (1971), 577--586.Google ScholarCross Ref
Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). 435--445. Google ScholarDigital Library
E. T. Jaynes. 2003. Probability Theory: The Logic of Science. Cambridge University Press.Google ScholarCross Ref
Y. Jia and M. Harman. 2011. An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering 37, 5 (Sept. 2011), 649--678. Google ScholarDigital Library
Shen-Ming Lee and Anne Chao. 1994. Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50, 1 (1994), 88--97.Google ScholarCross Ref
B. Littlewood and D. Wright. 1997. Some conservative stopping rules for the operational testing of safety critical software. IEEE Transactions on Software Engineering 23, 11 (Nov. 1997), 673--683. Google ScholarDigital Library
John T. Longino and Robert K. Colwell. 1997. Biodiversity assessment using structured inventory: Capturing the ant fauna of a tropical rain forest. Ecological Applications 7, 4 (1997), 1263--1277.Google ScholarCross Ref
Anne E. Magurran and Brian J. McGill. 2011. Biological Diversity: Frontiers in Measurement and Assessment. Oxford University Press.Google Scholar
Chang Xuan Mao and Robert K. Colwell. 2005. Estimation of species richness: Mixture models, the role of rare species, and inferential challenges. Ecology 86, 5 (2005), 1143--1153.Google ScholarCross Ref
Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective automated testing for android applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA’16). 94--105. Google ScholarDigital Library
Björn Matthis, Vitalii Avdiienko, Ezekiel Soremekun, Marcel Böhme, and Andreas Zeller. 2017. Detecting information flow by mutating input data. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 1--11.Google ScholarCross Ref
Brian A. Maurer, James H. Brown, and Renee D. Rusler. 1992. The micro and macro in body size evolution. Evolution 46, 4 (1992), 939--953.Google ScholarCross Ref
Phil McMinn. 2004. Search-based software test data generation: A survey: Research articles. Journal of Software Testing, Verification and Reliability 14, 2 (June 2004), 105--156. Google ScholarDigital Library
P. McMinn. 2011. Search-based software testing: Past, present and future. In Proceedings of the 4th IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW’11). 153--163. Google ScholarDigital Library
Barton P. Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Communications of the ACM 33, 12 (Dec. 1990), 32--44. Google ScholarDigital Library
K. W. Miller, L. J. Morell, R. E. Noonan, S. K. Park, D. M. Nicol, B. W. Murrill, and M. Voas. 1992. Estimating the probability of failure when testing reveals no failures. IEEE Transactions on Software Engineering 18, 1 (Jan. 1992), 33--43. Google ScholarDigital Library
Camilo Mora, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, and Boris Worm. 2011. How many species are there on earth and in the ocean? PLOS Biology 9, 8 (2011), 1--8.Google ScholarCross Ref
Mesrob I. Ohannessian. 2012. On Inference about Rare Events. PhD dissertation. Massachusetts Institute of Technology. Google ScholarDigital Library
Alon Orlitsky and Ananda Theertha Suresh. 2015. Competitive distribution estimation: Why is Good-Turing good. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). 2143--2151. Google ScholarDigital Library
Alon Orlitsky, Ananda Theertha Suresh, and Yihong Wu. 2016. Optimal prediction of the number of unseen species. Proceedings of the National Academy of Sciences 113, 47 (2016), 13283--13288.Google ScholarCross Ref
Michael W. Palmer. 1991. Estimating species richness: The second-order jackknife reconsidered. Ecology 72, 4 (1991), 1512--1513.Google ScholarCross Ref
Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. 2016. Model-based whitebox fuzzing for program binaries. In Proceedings of the 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). 543--553. Google ScholarDigital Library
E. C. Pielou. 1966. Species-diversity and pattern-diversity in the study of ecological succession. Journal of Theoretical Biology 10, 2 (1966), 370--383.Google ScholarCross Ref
F. W. Preston. 1948. The commonness, and rarity, of species. Ecology 29, 3 (1948), 254--283.Google ScholarCross Ref
Dawei Qi, Hoang D. T. Nguyen, and Abhik Roychoudhury. 2013. Path exploration based on symbolic output. ACM Transactions on Software Engineering and Methodology 22, 4 (Oct. 2013), 32:1--32:41. Google ScholarDigital Library
Sushma Reddy and Liliana M. Dávalos. 2003. Geographical sampling bias and its implications for conservation priorities in Africa. Journal of Biogeography 30, 11 (2003), 1719--1727.Google ScholarCross Ref
Herbert E. Robbins. 1968. Estimating the total probability of the unobserved outcomes of an experiment. Annals of Mathematical Statistics 39, 1 (1968), 256--257.Google ScholarCross Ref
Mark D. Robinson, Davis J. McCarthy, and Gordon K. Smyth. 2010. edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 1 (2010), 139--140. Google ScholarDigital Library
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A fast address sanity checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC’12). 28--28. Google ScholarDigital Library
Tsung-Jen Shen, Anne Chao, and Chih-Feng Lin. 2003. Predicting the number of new species in further taxonomic sampling. Ecology 84, 3 (2003), 798--804.Google ScholarCross Ref
Andrew R. Solow and Stephen Polasky. 1999. A quick estimator for taxonomic surveys. Ecology 80, 8 (1999), 2799--2803.Google ScholarCross Ref
Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast detector of uninitialized memory use in C++. In Proceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). 46--55. Google ScholarCross Ref
A. B. Wagner, P. Viswanath, and S. R. Kulkarni. 2006. Strong consistency of the good-turing estimator. In Proceedings of the 2006 IEEE International Symposium on Information Theory. 2526--2530.Google Scholar
Bruno A. Walther and Joslin L. Moore. 2005. The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography 28, 6 (2005), 815--829.Google ScholarCross Ref
Bruno A. Walther and Joslin L. Moore. 2005. The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography 28, 6 (2005), 815--829.Google ScholarCross Ref
Ponemon Institute. 2017. Ponemon Cost of Cyber Crime Study. Retrieved from https://www.accenture.com/us-en/insight-cost-of-cybercrime-2017. Accessed 11-13-2017.Google Scholar
IEEE Computer Society. 2013. Lockheed Martin Webinar Series: Michael Whalen on the Future of Verification and Validation. Retrieved from https://www.computer.org/cms/Computer.org/webinars/lmco/012413Slides-Whalen.pdf. Accessed: 05-13-2017.Google Scholar
Michał Zalewski. 2017. AFL: American Fuzzy Lop Fuzzer. Retrieved from http://lcamtuf.coredump.cx/afl/technical_details.txt. Accessed: 05-13-2017.Google Scholar
Michał Zalewski. 2017. AFL: Pulling Jpegs Out of Thin Air, Michael Zalewski. Retrieved from https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html. Accessed: 05-13-2017.Google Scholar
CERT Division. 2017. CERT Triage Tools. Retrieved from https://www.cert.org/vulnerability-analysis/tools/triage.cfm?. Accessed: 05-13-2017.Google Scholar
The LLVM Compiler Infrastructure Project. 2017. Clang Compiler Documentation. Retrieved from https://clang.llvm.org/docs/index.html. Accessed: 05-13-2017.Google Scholar
US Defense Advanced Research Projects Agency. 2017. DARPA Cyber Grand Challenge. Retrieved from http://www.darpa.mil/news-events/2016-08-04. Accessed: 05-13-2017.Google Scholar
Facebook. 2017. Facebook: Mark Harman on Software Engineering at Facebook Scale. Retrieved from https://research.fb.com/mark-harmon-on-software-engineering-at-facebook-scale/. Accessed: 05-13-2017.Google Scholar
FFMPEG. 2017. FFMPEG: A Complete, Cross-Platform Solution to Record, Convert and Stream Audio and Video. Retrieved from https://www.ffmpeg.org/. Accessed: 05-13-2017.Google Scholar
GNU. 2017. GCov: Coverage Testing Tool. Retrieved from https://linux.die.net/man/1/gcov. Accessed: 11-13-2017.Google Scholar
GNU. 2017. GNU GCC Sanitizer Options. Retrieved from https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Instrumentation-Options.html#index-fsanitize_003daddress-947. Accessed: 05-13-2017.Google Scholar
Anne Chao, K. H. Ma, and T. C. Hsieh. 2017. iNext Online: Species iNterpolation and EXTrapolation. Retrieved from https://chao.shinyapps.io/iNEXTOnline/. Accessed: 05-13-2017.Google Scholar
Niels Lohmann. 2017. JSON for Modern C++. Retrieved from https://github.com/nlohmann/json. Accessed: 11-13-2017.Google Scholar
The LLVM Compiler Infrastructure Project. 2017. LibFuzzer: A Library for Coverage-Guided Fuzz Testing. Retrieved from http://llvm.org/docs/LibFuzzer.html. Accessed: 05-13-2017.Google Scholar
The Libjpeg-turbo Project. 2017. libjpeg-turbo Is a JPEG Image Codec to Accelerate Baseline JPEG Compression and Decompression. Retrieved from http://libjpeg-turbo.virtualgl.org/. Accessed: 05-13-2017.Google Scholar
The GNOME Project. 2017. LibXML2: The XML C Parser and Toolkit of Gnome. Retrieved from http://xmlsoft.org/. Accessed: 11-13-2017.Google Scholar
Microsoft. 2017. Microsoft: Project Springfield. Retrieved from https://www.microsoft.com/Springfield/. Accessed: 05-13-2017.Google Scholar
Google. 2017. Monkey: Android Random Testing. Retrieved from http://developer.android.com/tools/help/monkey.html. Accessed: 05-13-2017.Google Scholar
Mozilla. 2017. Mozilla: Fuzzing Firefox with Peach.Retrieved from https://wiki.mozilla.org/Security/Fuzzing/Peach. Accessed: 05-13-2017.Google Scholar
The OpenSSL Project. 2017. OpenSSL: A Toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) Protocols. Retrieved from https://www.openssl.org/. Accessed: 05-13-2017.Google Scholar
Google. 2017. OSS-Fuzz: Five Months Later. Retrieved from https://testing.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html. Accessed: 05-13-2017.Google Scholar
Peach Fuzzer. 2017. Peach Fuzzer Platform. Retieved from http://www.peachfuzzer.com/products/peach-platform/. Accessed: 05-13-2017.Google Scholar
Anne Chao, K. H. Ma, T. C. Hsieh, and Chun-Huo Chiu. 2017. SpadeR Online: Species-Richness Prediction and Diversity Estimation in R. Retrieved from https://chao.shinyapps.io/SpadeR/. Accessed: 05-13-2017.Google Scholar
Eric Archer. 2017. sprex: Calculate Species Richness and Extrapolation Metrics. Retrieved from https://cran.r-project.org/web/packages/sprex/. Accessed: 05-13-2017.Google Scholar
Google. 2017. Syzkaller: Coverage-Guided Kernel Fuzzing. Retrieved from https://github.com/google/syzkaller. Accessed: 05-13-2017.Google Scholar
The Wireshark team. 2017. Wireshark Is the World’s Foremost and Widely-Used Network Protocol Analyzer. Retrieved from https://www.wireshark.org/. Accessed: 05-13-2017.Google Scholar
Sean Eron Anderson. 2017. Bithacks: Implementing Logarithm Efficiently. Retrieved from https://graphics.stanford.edu/ seander/bithacks.html#IntegerLogFloat. Accessed: 11-13-2017.Google Scholar
Cloudflare. 2017. Incident Report on Memory Leak Caused by Cloudflare Parser Bug. Retrieved from https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/. Accessed: 11-13-2017.Google Scholar
Haseeb Qureshi. 2017. Medium: A Hacker Stole USD 31M of Ether. Retrieved from https://medium.freecodecamp.org/a-hacker-stole-31m-of-ether-how-it-happened-and-what-it-means-for-ethereum-9e5dc29e33ce. Accessed: 05-13-2017.Google Scholar
Damien Gayle, Alexandra Topping, Ian Sample, Sarah Marsh, and Vikram Dodd. 2017. NHS Seeks to Recover from Global Cyber-Attack as Security Concerns Resurface. Retrieved from https://www.theguardian.com/society/2017/may/12/hospitals-across-england-hit-by-large-scale-cyber-attack. Accessed: 11-13-2017.Google Scholar
Elaine J. Weyuker. 1982. On testing non-testable programs. Computer Journal 25, 4 (1982), 465--470.Google ScholarCross Ref
E. J. Weyuker and B. Jeng. 1991. Analyzing partition testing strategies. IEEE Transactions on Software Engineering 17, 7 (July 1991), 703--711. Google ScholarDigital Library
B. Yang and M. Xie. 2000. A study of operational and testing reliability in software reliability analysis. Reliability Engineering 8 System Safety 70, 3 (2000), 323--329.Google Scholar
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 283--294. Google ScholarDigital Library
Cun-Hui Zhang and Zhiyi Zhang. 2009. Asymptotic normality of a nonparametric estimator of sample coverage. Annals of Statistics 37, 5A (10 2009), 2582--2595.Google ScholarCross Ref

Index Terms

STADS: Software Testing as Species Discovery
1. Security and privacy
  1. Formal methods and theory of security
    1. Formal security models
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Accelerating Fuzzing through Prefix-Guided Execution

Coverage-guided fuzzing is one of the most effective approaches for discovering software defects and vulnerabilities. It executes all mutated tests from seed inputs to expose coverage-increasing tests. However, executing all mutated tests incurs ...
Read More
Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced ...
Read More
Combined Source Code Approach for Test Case Prioritization
ICISS '18: Proceedings of the 1st International Conference on Information Science and Systems

Regression testing is an activity in the software testing process to ensure the software is validated and verified after modification occurred on software. It is costly process procedure which has been expected to reach half cost of the software ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 27, Issue 2
April 2018
186 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3234930
Editor:
David S. Rosenblum
National University of Singapore, Singapore
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2018
- Revised: 1 April 2018
- Accepted: 1 April 2018
- Received: 1 August 2017
Published in tosem Volume 27, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Statistical guarantees
code coverage
discovery probability
extrapolation
fuzzing
measure of confidence
measure of progress
reliability
security
species coverage
stopping rule
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 756
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

STADS: Software Testing as Species Discovery

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Accelerating Fuzzing through Prefix-Guided Execution

Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing

Combined Source Code Approach for Test Case Prioritization