ABSTRACT
Open source development projects typically support an open bug repository to which both developers and users can report bugs. The reports that appear in this repository must be triaged to determine if the report is one which requires attention and if it is, which developer will be assigned the responsibility of resolving the report. Large open source developments are burdened by the rate at which new bug reports appear in the bug repository. In this paper, we present a semi-automated approach intended to ease one part of this process, the assignment of reports to a developer. Our approach applies a machine learning algorithm to the open bug repository to learn the kinds of reports each developer resolves. When a new report arrives, the classifier produced by the machine learning technique suggests a small number of developers suitable to resolve the report. With this approach, we have reached precision levels of 57% and 64% on the Eclipse and Firefox development projects respectively. We have also applied our approach to the gcc open source development with less positive results. We describe the conditions under which the approach is applicable and also report on the lessons we learned about applying machine learning to repositories used in open source development.
- R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. 1999. Google ScholarDigital Library
- I. T. Bowman and R. C. Holt. Reconstructing ownership architectures to help understand software systems. In Proceedings of International Workshop on Program Comprehension, pages 28--37, 1999. Google ScholarDigital Library
- G. Canfora and L. Cerulo. How software repositories can help in resolving a new change request. In Workshop on Empirical Studies in Reverse Engineering, 2005.Google Scholar
- D. Čubranić and G. C. Murphy. Automatic bug triage using text classification. In Proceedings of Software Engineering and Knowledge Engineering, pages 92--97, 2004.Google Scholar
- D. Čubranić, J. Singer, and K. S. Booth. Hipikat: A project memory for software development. IEEE Transactions on Software Engineering, 31(6):446--465, 2005. Google ScholarDigital Library
- S. R. Gunn. Support Vector Machines for classification and regression. Technical report, University of Southampton, Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, 1998.Google Scholar
- T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning, pages 137--142, 1998. Google ScholarDigital Library
- G. H. John and P. Langley. Estimating continous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338--345, 1995. Google ScholarDigital Library
- A. Mockus and J. D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering, pages 503--512, 2002. Google ScholarDigital Library
- A. Podgurski, D. Leon, P. Francis, Wes Masri, M. Minch, Jiayang Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25th International Conference on Software Engineering, pages 465--475, 2003. Google ScholarDigital Library
- R. Quinlan. C4.5: Programs for Machine Learning. 1993. Google ScholarDigital Library
- E. S. Raymond. The cathedral and the bazaar. First Monday, 3(3), 1998. Google ScholarDigital Library
- C. R. Reis and R. P. de Mattos Fortes. An overview of the software engineering process and tools in the Mozilla project. In Proceedings of the Open Source Software Development Workshop, pages 155--175, 2002.Google Scholar
- J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of Naive Bayes classifiers. In Proceedings of International Conference on Machine Learning, pages 616--623, 2003.Google Scholar
- F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47, 2002. Google ScholarDigital Library
- R. Segal and J. Kephart. Incremental learning in SwiftFile. In Proceedings of the Seventh International Conference on Machine Learning, pages 863--870, 2000. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools with Java Implementations. 2000. Google ScholarDigital Library
Index Terms
Who should fix this bug?
Recommendations
Improving bug triage with bug tossing graphs
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringbug report is typically assigned to a single developer who is then responsible for fixing the bug. In Mozilla and Eclipse, between 37%-44% of bug reports are "tossed" (reassigned) to other developers, for example because the bug has been assigned by ...
Automating bug report assignment
ICSE '06: Proceedings of the 28th international conference on Software engineeringOpen-source development projects typically support an open bug repository to which both developers and users can report bugs. A report that appears in this repository must be triaged to determine if the report is one which requires attention and if it ...
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability EngineeringFor complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Comments