research-article

Context-aware statistical debugging: from bug predictors to faulty control flow paths

Authors:

Lingxiao Jiang,

Zhendong SuAuthors Info & Claims

ASE '07: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering

Pages 184 - 193

https://doi.org/10.1145/1321631.1321660

Published: 05 November 2007 Publication History

Abstract

Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors. However, because many bugs may not be directly associated with these predicates, these techniques are often ineffective in localizing bugs. Relevant control flow paths that may contain bug locations are more informative than stand-alone predicates for discovering and understanding bugs. In this paper, we propose an approach to automatically generate such faulty control flow paths that link many bug predictors together for revealing bugs. Our approach combines feature selection (to accurately select failure-related predicates as bug predictors), clustering (to group correlated predicates), and control flow graph traversal in a novel way to help generate the paths. We have evaluated our approach on code including the Siemens test suite and rhythmbox (a large music management application for GNOME). Our experiments show that the faulty control flow paths are accurate, useful for localizing many bugs, and helped to discover previously unknown errors in rhythmbox

References

[1]

H. Agrawal, J. R. Horgan, S. London, and W. E. Wong. Fault localization using execution slices and dataflow tests. In International Symposium on Software Reliability Engineering (ISSRE), pages 143--151, 1995.

[2]

P. Arumuga Nainar, T. Chen, J. Rosin, and B. Liblit. Statistical debugging using compound boolean predicates. In International Symposium on Software Testing and Analysis (ISSTA), pages 5--15, 2007.

Digital Library

[3]

T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: localizing errors in counterexample traces. In Symposium on Principles of Programming Languages (POPL), pages 97--105, 2003.

Digital Library

[4]

P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.

[5]

J. F. Bowring, M. J. Harrold, and J. M. Rehg. Improving the classification of software behaviors using ensembles. Technical report, Georgia Institute of Technology, 2005.

[6]

J. F. Bowring, J. M. Rehg, and M. J. Harrold. Active learning for automatic classification of software behavior. In International Symposium on Software Testing and Analysis (ISSTA), pages 195--205, 2004.

Digital Library

[7]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, Oct. 2001.

Digital Library

[8]

Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In ICSE, pages 480--490, 2004.

Digital Library

[9]

C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998.

Digital Library

[10]

C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. In Conference on Computer and Communications Security (CCS), 2006.

Digital Library

[11]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[12]

L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil. A formal evaluation of data flow path selection criteria. Trans. on Software Engineering (TSE), 15(11):1318--1332, 1989.

Digital Library

[13]

H. Cleve and A. Zeller. Locating causes of program failures. In ICSE, pages 342--351, 2005.

Digital Library

[14]

W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In ICSE, pages 339--348, 2001.

Digital Library

[15]

N. Dodoo, L. Lin, and M. D. Ernst. Selecting, refining, and evaluating predicates for program analysis. Technical Report MIT-LCS-TR-914, MIT Laboratory for Computer Science, Cambridge, MA, July 21, 2003.

[16]

M. D. Ernst, A. Czeisler,W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE, pages 449--458, 2000.

Digital Library

[17]

P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In Conference on Programming Language Design and Implementation (PLDI), pages 213--223, 2005.

Digital Library

[18]

A. Gotlieb, B. Botella, and M. Rueher. Automatic test data generation using constraint solving techniques. In International Symposium on Software Testing and Analysis (ISSTA), pages 53--62, 1998.

Digital Library

[19]

T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G. Rothermel. An empirical study of regression test selection techniques. Trans. on Software Engineering and Methodology (TOSEM), 10(2):184--208, 2001.

Digital Library

[20]

A. Groce and R. Joshi. Exploiting traces in program analysis. In Tools and Algorithms for Construction and Analysis of Systems (TACAS), volume 3920 of Lecture Notes in Computer Science (LNCS), pages 379--393. Springer, 2006.

Digital Library

[21]

N. Gupta, H. He, X. Zhang, and R. Gupta. Locating faulty code using failure-inducing chops. In ASE, pages 263--272, 2005.

Digital Library

[22]

T. Gyim2othy, 2A. Besz2edes, and I. Forg2acs. An efficient relevant slicing method for debugging. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 303--321, 1999.

Digital Library

[23]

M. Haran, A. F. Karr, A. Orso, A. A. Porter, and A. P. Sanil. Applying classification techniques to remotely-collected program execution data. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 146--155, 2005.

Digital Library

[24]

M. J. Harrold and G. Rothermel. Aristotle Analysis System - Siemens Programs, HR Variants. http://www.cc.gatech.edu/aristotle/Tools/subjects/.

[25]

M. Heiler, D. Cremers, and C. Schnörr. Efficient feature subset selection for support vector machines. Technical Report 21/2001, Computer Science Series, University of Mannheim.

[26]

M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow-and controlflow-based test adequacy criteria. In ICSE, pages 191--200, 1994.

Digital Library

[27]

R. Jhala and R. Majumdar. Path slicing. In Conference on Programming Language Design and Implementation (PLDI), pages 38--47, 2005.

Digital Library

[28]

L. Jiang and Z. Su. Automatic isolation of cause-effect chains with machine learning. Technical Report CSE-2005-32, UC Davis, 2005.

[29]

J. A. Jones, J. F. Bowring, and M. J. Harrold. Debugging in parallel. In International Symposium on Software Testing and Analysis (ISSTA), pages 16--26, 2007.

Digital Library

[30]

J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In ASE, pages 273--282, 2005.

Digital Library

[31]

J. A. Jones, M. J. Harrold, and J. T. Stasko. Visualization of test information to assist fault localization. In ICSE, pages 467--477, 2002.

Digital Library

[32]

A. Lal, J. Lim, M. Polishchuk, and B. Liblit. Path optimization in programs and its application to debugging. In European Symposium on Programming (ESOP), volume 3924 of Lecture Notes in Computer Science (LNCS), pages 246--263. SpringerESE, 2006.

Digital Library

[33]

D. Leon, W. Masri, and A. Podgurski. An empirical evaluation of test case filtering techniques based on exercising complex information flows. In ICSE, pages 412--421, 2005.

Digital Library

[34]

B. Liblit. Bug 137460: dangling timeout event source ID causes crashes. http://bugzilla.gnome.org/show_bug.cgi?id=137460.

[35]

B. Liblit. Bug 137834: dangling RBShellPlayer callbacks cause crashes. http://bugzilla.gnome.org/show_bug.cgi?id=137834.

[36]

B. Liblit. The Cooperative Bug Isolation Project. http://www.cs.wisc.edu/cbi/.

[37]

B. Liblit and A. Aiken. Building a better backtrace: Techniques for postmortem program analysis. Technical Report CSD-02-1203, UC Berkeley, 2002.

[38]

B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In Conference on Programming Language Design and Implementation (PLDI), pages 141--154, 2003.

Digital Library

[39]

B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In Conference on Programming Language Design and Implementation (PLDI), pages 15--26, 2005.

Digital Library

[40]

C. Liu and J. Han. Failure proximity: A fault localization-based approach. In Symposium on Foundations of Software Engineering (FSE), pages 46--56, 2006.

Digital Library

[41]

C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: statistical model-based bug localization. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 286--295, 2005.

Digital Library

[42]

R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang. PSE: Explaining program failures via postmortem static analysis. In Symposium on Foundations of Software Engineering (FSE), pages 46--56, 2004.

Digital Library

[43]

T. Mitchell. Machine Learning. McGraw-Hill, 1997.

Digital Library

[44]

G. C. Necula, J. Condit, M. Harren, S. McPeak, and W. Weimer. CCured: type-safe retrofitting of legacy software. Trans. on Programming Languages and Systems (TOPLAS), 27(3):477--526, 2005.

Digital Library

[45]

A. Orso, J. A. Jones, and M. J. Harrold. Visualization of program-execution data for deployed software. In Symposium on Software Visualization (SOFTVIS), pages 67--76, 2003.

Digital Library

[46]

H. Pan and E. H. Spafford. Heuristics for automatic localization of software faults. Technical Report SERC-TR-116-P, Purdue University, 1992.

[47]

S.-T. Pan, K. So, and J. T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 76--84, 1992.

Digital Library

[48]

A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE, pages 465--477, 2003.

Digital Library

[49]

M. Renieris and S. P. Reiss. Fault localization with nearest neighbor queries. In ASE, 2003.

[50]

T. W. Reps, T. Ball, M. Das, and J. R. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 432--449, 1997.

Digital Library

[51]

Rhythmbox. Music management application for GNOME. http://www.rhythmbox.org.

[52]

Salford Systems Inc. RandomForestsTM. http://www.salford-systems.com/.

[53]

K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 263--272. ACM, 2005.

Digital Library

[54]

W. E. Wong, S. S. Gokhale, and J. R. Horgan. Measuring distance between program features. In International Conference on Computer Software and Applications (COMPSAC), pages 307--312, 2002.

Digital Library

[55]

C. Young and M. D. Smith. Static correlated branch prediction. Trans. on Programming Languages and Systems (TOPLAS), 21(5):1028--1075, 1999.

Digital Library

[56]

A. Zeller. Isolating cause-effect chains from computer programs. In Symposium on Foundations of Software Engineering (FSE), pages 1--10, 2002.

Digital Library

[57]

A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. Trans. on Software Engineering (TSE), 28(2):183--200, 2002.

Digital Library

[58]

X. Zhang, H. He, N. Gupta, and R. Gupta. Experimental evaluation of using dynamic slices for fault location. In Automated and Algorithmic Debugging (AADEBUG), pages 33--42, 2005.

Digital Library

[59]

A. X. Zheng, M. I. Jordan, B. Liblit, and A. Aiken. Statistical debugging of sampled programs. In Neural Information Processing Systems (NIPS). 2003.

Cited By

Krasniqi R(2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
https://doi.org/10.12794/metadc2332577
Zhao YXiao LWong S(2024)A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic PatternsIEEE Transactions on Software Engineering10.1109/TSE.2024.3390623(1-22)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3390623
Jiang JWang YChen JLv DLiu M(2023)Variable-based Fault Localization via Enhanced Decision TreeACM Transactions on Software Engineering and Methodology10.1145/362474133:2(1-32)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1145/3624741
Show More Cited By

Index Terms

Context-aware statistical debugging: from bug predictors to faulty control flow paths

Recommendations

Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

We present a statistical debugging algorithm that isolates bugs in programs containing multiple undiagnosed bugs. Earlier statistical algorithms that focus solely on identifying predictors that correlate with program failure perform poorly when there ...
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Toward More Efficient Statistical Debugging with Abstraction Refinement
Debugging is known to be a notoriously painstaking and time-consuming task. As one major family of automated debugging, statistical debugging approaches have been well investigated over the past decade, which collect failing and passing executions and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '07: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering

November 2007

590 pages

ISBN:9781595938824

DOI:10.1145/1321631

General Chair:
Kurt Stirewalt
Michigan State University, USA
,
Program Chairs:
Alexander Egyed
Teknowledge Corporation, USA
,
Bernd Fischer
University of Southampton, UK

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE07

Sponsor:

ASE07: International Conference on Automated Software Engineering 2007

November 5 - 9, 2007

Georgia, Atlanta, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

90
Total Citations
View Citations
769
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Krasniqi R(2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
https://doi.org/10.12794/metadc2332577
Zhao YXiao LWong S(2024)A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic PatternsIEEE Transactions on Software Engineering10.1109/TSE.2024.3390623(1-22)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3390623
Jiang JWang YChen JLv DLiu M(2023)Variable-based Fault Localization via Enhanced Decision TreeACM Transactions on Software Engineering and Methodology10.1145/362474133:2(1-32)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1145/3624741
Lim HDebray SVerbrugge CLhoták OShen X(2023)Automatically Localizing Dynamic Code Generation Bugs in JIT Compiler Back-EndProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580260(145-155)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3578360.3580260
Zuo ZNiu XZhang SFang LKhoo SLu SSun CXu G(2023)Toward More Efficient Statistical Debugging with Abstraction RefinementACM Transactions on Software Engineering and Methodology10.1145/354479032:2(1-38)Online publication date: 30-Mar-2023
https://dl.acm.org/doi/10.1145/3544790
Krasniqi RDo H(2023)A Hierarchical Topical Modeling Approach for Recommending Repair of Quality Bugs2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00014(37-48)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00014
Szatmári A(2023)Towards Context-Aware Spectrum-Based Fault Localization2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00060(496-498)Online publication date: Apr-2023
https://doi.org/10.1109/ICST57152.2023.00060
Cellier PDucassé MFerré SRidoux OWong W(2023)Data Mining‐Based Techniques for Software Fault LocalizationHandbook of Software Fault Localization10.1002/9781119880929.ch7(321-364)Online publication date: 21-Apr-2023
https://doi.org/10.1002/9781119880929.ch7
Wong WLee HGao RNaish L(2023)Spectrum‐Based Techniques for Software Fault LocalizationHandbook of Software Fault Localization10.1002/9781119880929.ch4(201-270)Online publication date: 21-Apr-2023
https://doi.org/10.1002/9781119880929.ch4
Lim HKang XDebray SGonnord LTitolo L(2022)Modeling code manipulation in JIT compilersProceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3520313.3534656(9-15)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3520313.3534656
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten