ABSTRACT
Predicting faults in software modules can lead to a high quality and more effective software development process to follow. However, the results of a fault prediction model have to be properly interpreted before incorporating them into any decision making. Most of the earlier studies have used the prediction accuracy as the main criteria to compare amongst competing fault prediction models. However, we show that besides accuracy, other criteria like number of false positives and false negatives can equally be important to choose a candidate model for fault prediction. We have used five NASA software data sets in our experiment. Our results suggest that the performance of Simple Logistic is better than the others on raw data sets whereas the performance of Neural Network was found to be better when we applied dimensionality reduction method on raw data sets. When we used data pre-processing techniques, the prediction accuracy of Random Forest was found to be better in both cases i.e. with and without dimensionality reduction but reliability of Simple Logistic was better than Random Forest because it had less number of fault negatives.
- NASA Data Repository http://mdp.ivv.nasa.govGoogle Scholar
- T. M. Khoshgoftaar and N. Seliya. Tree-Based Software Quality Estimation Models for Fault Prediction. Proc. The 8th IIIE Symposium on Software Metrics, p. 203--214, Jun 2002. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, October 1999. http://www.cs.waikato.ac.nz/ml/weka/ Google ScholarDigital Library
- Thwin M. M. and Quah T. Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics. Proc. The 19th International Conference on Software Maintenance, Amsterdam, The Netherlands. p. 113--122, 2005. Google ScholarDigital Library
- Bibi S., Tsoumakas G., Stamelos I. and Vlahvas I. Software Defect Prediction Using Regression via Classification. Proc IEEE International Conference on Computer Systems and Applications IEEE Computer Society. Dubai, UAE. p. 330--336, 2006. Google ScholarDigital Library
- Lan Guo, Yan Ma, Bojan Cukic and Harshinder Singh, Robust Prediction of Fault-Proneness by Random Forests, Proc. The 15th International Symposium on Software Reliability Engineering (ISSRE'04), Brittany, France. p. 417--428, November 2004. Google ScholarDigital Library
Recommendations
Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study
Statistical fault prediction models and expert estimations are two popular methods for deciding where to focus the fault detection efforts when the fault detection budget is limited. In this paper, we present a study in which we empirically compare the ...
Comparing negative binomial and recursive partitioning models for fault prediction
PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineeringTwo different software fault prediction models have been used to predict the N% of the files of a large software system that are likely to contain the largest numbers of faults. We used the same predictor variables in a negative binomial regression ...
Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing
Until now, various techniques for predicting fault-prone modules have been proposed and evaluated in terms of their prediction performance; however, their actual contribution to business objectives such as quality improvement and cost reduction has ...
Comments