skip to main content
10.1145/1321631.1321660acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Context-aware statistical debugging: from bug predictors to faulty control flow paths

Published: 05 November 2007 Publication History

Abstract

Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors. However, because many bugs may not be directly associated with these predicates, these techniques are often ineffective in localizing bugs. Relevant control flow paths that may contain bug locations are more informative than stand-alone predicates for discovering and understanding bugs. In this paper, we propose an approach to automatically generate such faulty control flow paths that link many bug predictors together for revealing bugs. Our approach combines feature selection (to accurately select failure-related predicates as bug predictors), clustering (to group correlated predicates), and control flow graph traversal in a novel way to help generate the paths. We have evaluated our approach on code including the Siemens test suite and rhythmbox (a large music management application for GNOME). Our experiments show that the faulty control flow paths are accurate, useful for localizing many bugs, and helped to discover previously unknown errors in rhythmbox

References

[1]
H. Agrawal, J. R. Horgan, S. London, and W. E. Wong. Fault localization using execution slices and dataflow tests. In International Symposium on Software Reliability Engineering (ISSRE), pages 143--151, 1995.
[2]
P. Arumuga Nainar, T. Chen, J. Rosin, and B. Liblit. Statistical debugging using compound boolean predicates. In International Symposium on Software Testing and Analysis (ISSTA), pages 5--15, 2007.
[3]
T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: localizing errors in counterexample traces. In Symposium on Principles of Programming Languages (POPL), pages 97--105, 2003.
[4]
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
[5]
J. F. Bowring, M. J. Harrold, and J. M. Rehg. Improving the classification of software behaviors using ensembles. Technical report, Georgia Institute of Technology, 2005.
[6]
J. F. Bowring, J. M. Rehg, and M. J. Harrold. Active learning for automatic classification of software behavior. In International Symposium on Software Testing and Analysis (ISSTA), pages 195--205, 2004.
[7]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, Oct. 2001.
[8]
Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In ICSE, pages 480--490, 2004.
[9]
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998.
[10]
C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. In Conference on Computer and Communications Security (CCS), 2006.
[11]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[12]
L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil. A formal evaluation of data flow path selection criteria. Trans. on Software Engineering (TSE), 15(11):1318--1332, 1989.
[13]
H. Cleve and A. Zeller. Locating causes of program failures. In ICSE, pages 342--351, 2005.
[14]
W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In ICSE, pages 339--348, 2001.
[15]
N. Dodoo, L. Lin, and M. D. Ernst. Selecting, refining, and evaluating predicates for program analysis. Technical Report MIT-LCS-TR-914, MIT Laboratory for Computer Science, Cambridge, MA, July 21, 2003.
[16]
M. D. Ernst, A. Czeisler,W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE, pages 449--458, 2000.
[17]
P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In Conference on Programming Language Design and Implementation (PLDI), pages 213--223, 2005.
[18]
A. Gotlieb, B. Botella, and M. Rueher. Automatic test data generation using constraint solving techniques. In International Symposium on Software Testing and Analysis (ISSTA), pages 53--62, 1998.
[19]
T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G. Rothermel. An empirical study of regression test selection techniques. Trans. on Software Engineering and Methodology (TOSEM), 10(2):184--208, 2001.
[20]
A. Groce and R. Joshi. Exploiting traces in program analysis. In Tools and Algorithms for Construction and Analysis of Systems (TACAS), volume 3920 of Lecture Notes in Computer Science (LNCS), pages 379--393. Springer, 2006.
[21]
N. Gupta, H. He, X. Zhang, and R. Gupta. Locating faulty code using failure-inducing chops. In ASE, pages 263--272, 2005.
[22]
T. Gyim2othy, 2A. Besz2edes, and I. Forg2acs. An efficient relevant slicing method for debugging. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 303--321, 1999.
[23]
M. Haran, A. F. Karr, A. Orso, A. A. Porter, and A. P. Sanil. Applying classification techniques to remotely-collected program execution data. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 146--155, 2005.
[24]
M. J. Harrold and G. Rothermel. Aristotle Analysis System - Siemens Programs, HR Variants. http://www.cc.gatech.edu/aristotle/Tools/subjects/.
[25]
M. Heiler, D. Cremers, and C. Schnörr. Efficient feature subset selection for support vector machines. Technical Report 21/2001, Computer Science Series, University of Mannheim.
[26]
M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow-and controlflow-based test adequacy criteria. In ICSE, pages 191--200, 1994.
[27]
R. Jhala and R. Majumdar. Path slicing. In Conference on Programming Language Design and Implementation (PLDI), pages 38--47, 2005.
[28]
L. Jiang and Z. Su. Automatic isolation of cause-effect chains with machine learning. Technical Report CSE-2005-32, UC Davis, 2005.
[29]
J. A. Jones, J. F. Bowring, and M. J. Harrold. Debugging in parallel. In International Symposium on Software Testing and Analysis (ISSTA), pages 16--26, 2007.
[30]
J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In ASE, pages 273--282, 2005.
[31]
J. A. Jones, M. J. Harrold, and J. T. Stasko. Visualization of test information to assist fault localization. In ICSE, pages 467--477, 2002.
[32]
A. Lal, J. Lim, M. Polishchuk, and B. Liblit. Path optimization in programs and its application to debugging. In European Symposium on Programming (ESOP), volume 3924 of Lecture Notes in Computer Science (LNCS), pages 246--263. SpringerESE, 2006.
[33]
D. Leon, W. Masri, and A. Podgurski. An empirical evaluation of test case filtering techniques based on exercising complex information flows. In ICSE, pages 412--421, 2005.
[34]
B. Liblit. Bug 137460: dangling timeout event source ID causes crashes. http://bugzilla.gnome.org/show_bug.cgi?id=137460.
[35]
B. Liblit. Bug 137834: dangling RBShellPlayer callbacks cause crashes. http://bugzilla.gnome.org/show_bug.cgi?id=137834.
[36]
B. Liblit. The Cooperative Bug Isolation Project. http://www.cs.wisc.edu/cbi/.
[37]
B. Liblit and A. Aiken. Building a better backtrace: Techniques for postmortem program analysis. Technical Report CSD-02-1203, UC Berkeley, 2002.
[38]
B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In Conference on Programming Language Design and Implementation (PLDI), pages 141--154, 2003.
[39]
B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In Conference on Programming Language Design and Implementation (PLDI), pages 15--26, 2005.
[40]
C. Liu and J. Han. Failure proximity: A fault localization-based approach. In Symposium on Foundations of Software Engineering (FSE), pages 46--56, 2006.
[41]
C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: statistical model-based bug localization. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 286--295, 2005.
[42]
R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang. PSE: Explaining program failures via postmortem static analysis. In Symposium on Foundations of Software Engineering (FSE), pages 46--56, 2004.
[43]
T. Mitchell. Machine Learning. McGraw-Hill, 1997.
[44]
G. C. Necula, J. Condit, M. Harren, S. McPeak, and W. Weimer. CCured: type-safe retrofitting of legacy software. Trans. on Programming Languages and Systems (TOPLAS), 27(3):477--526, 2005.
[45]
A. Orso, J. A. Jones, and M. J. Harrold. Visualization of program-execution data for deployed software. In Symposium on Software Visualization (SOFTVIS), pages 67--76, 2003.
[46]
H. Pan and E. H. Spafford. Heuristics for automatic localization of software faults. Technical Report SERC-TR-116-P, Purdue University, 1992.
[47]
S.-T. Pan, K. So, and J. T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 76--84, 1992.
[48]
A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE, pages 465--477, 2003.
[49]
M. Renieris and S. P. Reiss. Fault localization with nearest neighbor queries. In ASE, 2003.
[50]
T. W. Reps, T. Ball, M. Das, and J. R. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 432--449, 1997.
[51]
Rhythmbox. Music management application for GNOME. http://www.rhythmbox.org.
[52]
Salford Systems Inc. RandomForestsTM. http://www.salford-systems.com/.
[53]
K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 263--272. ACM, 2005.
[54]
W. E. Wong, S. S. Gokhale, and J. R. Horgan. Measuring distance between program features. In International Conference on Computer Software and Applications (COMPSAC), pages 307--312, 2002.
[55]
C. Young and M. D. Smith. Static correlated branch prediction. Trans. on Programming Languages and Systems (TOPLAS), 21(5):1028--1075, 1999.
[56]
A. Zeller. Isolating cause-effect chains from computer programs. In Symposium on Foundations of Software Engineering (FSE), pages 1--10, 2002.
[57]
A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. Trans. on Software Engineering (TSE), 28(2):183--200, 2002.
[58]
X. Zhang, H. He, N. Gupta, and R. Gupta. Experimental evaluation of using dynamic slices for fault location. In Automated and Algorithmic Debugging (AADEBUG), pages 33--42, 2005.
[59]
A. X. Zheng, M. I. Jordan, B. Liblit, and A. Aiken. Statistical debugging of sampled programs. In Neural Information Processing Systems (NIPS). 2003.

Cited By

View all
  • (2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
  • (2024)A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic PatternsIEEE Transactions on Software Engineering10.1109/TSE.2024.3390623(1-22)Online publication date: 2024
  • (2023)Variable-based Fault Localization via Enhanced Decision TreeACM Transactions on Software Engineering and Methodology10.1145/362474133:2(1-32)Online publication date: 18-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '07: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering
November 2007
590 pages
ISBN:9781595938824
DOI:10.1145/1321631
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bug localization
  2. control flow analysis
  3. machine learning
  4. statistical debugging

Qualifiers

  • Research-article

Conference

ASE07

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
  • (2024)A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic PatternsIEEE Transactions on Software Engineering10.1109/TSE.2024.3390623(1-22)Online publication date: 2024
  • (2023)Variable-based Fault Localization via Enhanced Decision TreeACM Transactions on Software Engineering and Methodology10.1145/362474133:2(1-32)Online publication date: 18-Sep-2023
  • (2023)Automatically Localizing Dynamic Code Generation Bugs in JIT Compiler Back-EndProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580260(145-155)Online publication date: 17-Feb-2023
  • (2023)Toward More Efficient Statistical Debugging with Abstraction RefinementACM Transactions on Software Engineering and Methodology10.1145/354479032:2(1-38)Online publication date: 30-Mar-2023
  • (2023)A Hierarchical Topical Modeling Approach for Recommending Repair of Quality Bugs2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00014(37-48)Online publication date: Mar-2023
  • (2023)Towards Context-Aware Spectrum-Based Fault Localization2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00060(496-498)Online publication date: Apr-2023
  • (2023)Data Mining‐Based Techniques for Software Fault LocalizationHandbook of Software Fault Localization10.1002/9781119880929.ch7(321-364)Online publication date: 21-Apr-2023
  • (2023)Spectrum‐Based Techniques for Software Fault LocalizationHandbook of Software Fault Localization10.1002/9781119880929.ch4(201-270)Online publication date: 21-Apr-2023
  • (2022)Modeling code manipulation in JIT compilersProceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3520313.3534656(9-15)Online publication date: 14-Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media