ACM Home Page
Please provide us with feedback. Feedback
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code
Full text PdfPdf (229 KB)
Source Foundations of Software Engineering archive
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering table of contents
Lisbon, Portugal
SESSION: Bug localization table of contents
Pages: 306 - 315  
Year of Publication: 2005
ISBN:1-59593-014-0
Also published in ...
Authors
Zhenmin Li  University of Illinois at Urbana-Champaign, Urbana, IL
Yuanyuan Zhou  University of Illinois at Urbana-Champaign, Urbana, IL
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 104,   Citation Count: 26
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081706.1081755
What is a DOI?

ABSTRACT

Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Therefore, it is highly desirable to have tools to automatically extract such rules and also to automatically detect violations. Previous work in this direction focuses on simple function-pair based programming rules and additionally requires programmers to provide rule templates.This paper proposes a general method called PR-Miner that uses a data mining technique called frequent itemset mining to efficiently extract implicit programming rules from large software code written in an industrial programming language such as C, requiring little effort from programmers and no prior knowledge of the software. Benefiting from frequent itemset mining, PR-Miner can extract programming rules in general forms (without being constrained by any fixed rule templates) that can contain multiple program elements of various types such as functions, variables and data types. In addition, we also propose an efficient algorithm to automatically detect violations to the extracted programming rules, which are strong indications of bugs.Our evaluation with large software code, including Linux, PostgreSQL Server and the Apache HTTP Server, with 84K--3M lines of code each, shows that PR-Miner can efficiently extract thousands of general programming rules and detect violations within 2 minutes. Moreover, PR-Miner has detected many violations to the extracted rules. Among the top 60 violations reported by PR-Miner, 16 have been confirmed as bugs in the latest version of Linux, 6 in PostgreSQL and 1 in Apache. Most of them violate complex programming rules that contain more than 2 elements and are thereby difficult for previous tools to detect. We reported these bugs and they are currently being fixed by developers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
6
7
8
 
9
10
 
11
12
 
13
A. Garrido and R. Johnson. Refactoring C with conditional compilation. In 18th IEEE Int. Conf. on Automated Software Engineering, 2003.
 
14
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proc. of the 1st IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.
15
16
17
 
18
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In Sixth Symp. on Operating Systems Design and Implementation, 2004.
19
20
 
21
M. Ohira, R. Yokomoriz, M. Sakai, K. ichi Matsumotoy, K. Inouez, and K. Torii. Empirical project monitor: A tool for mining multiple project data. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
 
22
T. J. Ostrand and E. J. Weyuker. A tool for mining defect-tracking systems to predict fault-prone files. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
23
 
24
F. V. Rysselberghe and S. Demeyer. Mining version control systems for FACs (frequently applied changes). In Proc. of Int. Workshop on Mining Software Repositories, 2004.
 
25
R. M. Stallman and the GCC Developer Community. GNU compiler collection internals (GCC). available at http://gcc.gnu.org/onlinedocs/gccint.ps.gz, 2005.
 
26
27
 
28
T. Xie and D. Notkin. Tool-assisted unit test selection based on operational violations. In 18th IEEE Int. Conf. on Automated Software Engineering, 2003.
 
29
A. T. Ying, G. C. Murphy, R. Ng, and M. C. Chu-Carroll. Predicting source code changes by mining revision history. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
 
30
Y. Yusof and O. F. Rana. Template mining in source-code digital libraries. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
 
31

CITED BY  26
 
 
 
 
 
 

Collaborative Colleagues:
Zhenmin Li: colleagues
Yuanyuan Zhou: colleagues