ABSTRACT
The effectiveness of the software testing process is a key issue for meeting the increasing demand of quality without augmenting the overall costs of software development. The estimation of software fault-proneness is important for assessing costs and quality and thus better planning and tuning the testing process. Unfortunately, no general techniques are available for estimating software fault-proneness and the distribution of faults to identify the correct level of test for the required quality. Although software complexity and testing thoroughness are intuitively related to the costs of quality assurance and the quality of the final product, single software metrics and coverage criteria provide limited help in planning the testing process and assuring the required quality.By using logistic regression, this paper shows how models can be built that relate software measures and software fault-proneness for classes of homogeneous software products. It also proposes the use of cross-validation for selecting valid models even for small data sets.The early results show that it is possible to build statistical models based on historical data for estimating fault-proneness of software modules before testing, and thus better planning and monitoring the testing activities.
- V. Basili and D. Hutchens. An empirical study of a syntactic complexity family. IEEE Transactions on Software Engineering, 9(6):664-672, November 1983. Special Section on Software Metrics.Google ScholarDigital Library
- L. Briand, V. Basili, and W. Thomas. A pattern recognition approach for software engineering data analysis. IEEE Transaction on Software Engineering, 18(11):931-942, November 1992. Google ScholarDigital Library
- L. Briand, S. Morasca, and V. Basili. Defining and validating measures for object-based high-level design. IEEE Transactions on Software Engineering, 25(5):722-743, September/October 1999. Google ScholarDigital Library
- N. Fenton and M. Neil. A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5):675-689, September/October 1999. Google ScholarDigital Library
- P. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. ACM SIGSOFT Software Engineering Notes, 23(6):153-162, November 1998. Proceedings of the ACM SIGSOFT Sixth Internatioal Symposium on the Foundations of Software Engineering. Google ScholarDigital Library
- P. Frankl and E. Weyuker. Provable improvements on branch testing. IEEE Transactions on Software Engineering, 19(10):962-975, October 1993. Google ScholarDigital Library
- G. Gill and C. Kemerer. Cyclomatic complexity density and software maintenance productivity. IEEE Transactions on Software Engineering, 17(12):1284-1288, December 1991. Google ScholarDigital Library
- M. Halstead. Elements of Software Science. Elsevier North-Holland, New York, 1 edition, 1977. Google ScholarDigital Library
- D. Hosmer and S. Lemeshow. Applied Logistic Regression. Wiley-Interscience, 1989.Google Scholar
- M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments on the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Bruno Fadini, editor, Proceedings of the 16th International Conference on Software Engineering, pages 191-200, Sorrento, Italy, May 1994. IEEE Computer Society Press. Google ScholarDigital Library
- T. Khoshgoftaar, E. Allen, R. Halstead, G. Trio, and R. Flass. Using process history to predict software quality. Computer, 31(4):66-72, April 1998. Google ScholarDigital Library
- T. Khoshgoftaar, E. Allen, K. Kalaichelvan, and N. Goel. Early quality prediction: a case study in telecommunications. IEEE Software, 13(1):65-71, January 1996. Google ScholarDigital Library
- T. Khoshgoftaar, D. Lanning, and A. Pandya. A comparative-study of pattern-recognition techniques for quality evaluation of telecommunications software. IEEE Journal On Selected Areas In Communications, 12(2):279-291, 1994.Google ScholarDigital Library
- J. M. Kim, A. Porter, and G. Rothermel. An empirical study of regression test application frequency. In Proceedings of the 22th International Conference on Software Engineering, pages 126-135, Limerick, Ireland, June 2000. Google ScholarDigital Library
- M. Lehman, D. Perry, and J. Ramil. Implications of evolution metrics on software maintenance. In T. Koshgoftaar and K. Bennett, editors, Proceedings; International Conference on Software Maintenance, pages 208-217. IEEE Computer Society Press, 1998. Google ScholarDigital Library
- H. F. Li and W. K. Cheung. An empirical study of software metrics. IEEE Transactions on Software Engineering, SE-13(6):697-708, June 1987. Google ScholarDigital Library
- T. McCabe. A complexity measure. IEEE Transactions on Software Engineering, 2(4):308-320, December 1976.Google ScholarDigital Library
- P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, London, second edition, 1989.Google Scholar
- K. Miller, L. Morell, R. Noonan, S. Park, D. Nicol, B. Murrill, and J. Voas. Estimating the probability of failure when testing reveals no failures. IEEE Transactions on Software Engineering, 18(1):33-43, January 1992. Google ScholarDigital Library
- S. Morasca and G. Ruhe. A hybrid approach to analyze empirical software engineering data and its application to predict module fault-proneness in maintenance. The Journal of Systems and Software, 53(3):225-237, September 2000. Google ScholarDigital Library
- J. Munson and T. Khoshgoftaar. The detection of fault-prone programs. IEEE Transactions on Software Engineering, 18(5):423-33, 1992. Google ScholarDigital Library
- N. Ohlsson and H. Alberg. Predicting fault-prone software modules in telephone switches. IEEE Transactions on Software Engineering, 22(12):886-894, December 1996. Google ScholarDigital Library
- A. Pasquini, A. Crespo, and P. Matrella. Sensitivity of reliability-growth models to operational profile errors vs. testing accuracy {software testing}. IEEE Transaction on Reliability, 45(4):531-540, December 1996.Google ScholarCross Ref
- A. Porter and R. Selby. Empirically guided software development using metric-based classification trees. IEEE Software, 7(2):46-54, March 1990. Google ScholarDigital Library
- J. Quinlan. Induction of decision trees. Machine Learning, 1(1):81-106, 1986. QUINLAN86. Google ScholarCross Ref
- R. Selby and V. Basili. Analysing error-prone system structure. IEEE Transactions on Software Engineering, 17(2):141-152, 1991. Google ScholarDigital Library
- R. Selby and A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743-1757, December 1988. Special Issue on Artificial Intelligence in Software Applications. Google ScholarDigital Library
- J. Voas and K. Miller. Semantic metrics for software testability. Journal Of Systems and Software, 20(3):207-216, 1993. Google ScholarDigital Library
- I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Tecniques with Java Implementations. Morgan Kaufmann Publishers, 2000. Google ScholarDigital Library
- M. Woodward, M. Hennell, and D. Hedley. A measure of control flow complexity in program text. IEEE Transactions on Software Engineering, 5(1):45-50, January 1979.Google ScholarDigital Library
Recommendations
An empirical evaluation of fault-proneness models
ICSE '02: Proceedings of the 24th International Conference on Software EngineeringPlanning and allocating resources for testing is difficult and it is usually done on empirical basis, often leading to unsatisfactory results. The possibility of early estimating the potential faultiness of software could be of great help for planning ...
Applying machine learning to software fault-proneness prediction
The importance of software testing to quality assurance cannot be overemphasized. The estimation of a module's fault-proneness is important for minimizing cost and improving the effectiveness of the software testing process. Unfortunately, no general ...
The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction
Software engineers have limited resources and need metrics analysis tools to investigate software quality such as fault-proneness of modules. There are a large number of software metrics available to investigate quality. However, not all metrics are ...
Comments