|
ABSTRACT
Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Therefore, it is highly desirable to have tools to automatically extract such rules and also to automatically detect violations. Previous work in this direction focuses on simple function-pair based programming rules and additionally requires programmers to provide rule templates.This paper proposes a general method called PR-Miner that uses a data mining technique called frequent itemset mining to efficiently extract implicit programming rules from large software code written in an industrial programming language such as C, requiring little effort from programmers and no prior knowledge of the software. Benefiting from frequent itemset mining, PR-Miner can extract programming rules in general forms (without being constrained by any fixed rule templates) that can contain multiple program elements of various types such as functions, variables and data types. In addition, we also propose an efficient algorithm to automatically detect violations to the extracted programming rules, which are strong indications of bugs.Our evaluation with large software code, including Linux, PostgreSQL Server and the Apache HTTP Server, with 84K--3M lines of code each, shows that PR-Miner can efficiently extract thousands of general programming rules and detect violations within 2 minutes. Moreover, PR-Miner has detected many violations to the extracted rules. Among the top 60 violations reported by PR-Miner, 16 have been confirmed as bugs in the latest version of Linux, 6 in PostgreSQL and 1 in Apache. Most of them violate complex programming rules that contain more than 2 elements and are thereby difficult for previous tools to detect. We reported these bugs and they are currently being fixed by developers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
Andy Chou , Junfeng Yang , Benjamin Chelf , Seth Hallem , Dawson Engler, An empirical study of operating systems errors, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
 |
8
|
Dawson Engler , David Yu Chen , Seth Hallem , Andy Chou , Benjamin Chelf, Bugs as deviant behavior: a general approach to inferring errors in systems code, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
9
|
|
 |
10
|
David Evans , John Guttag , James Horning , Yang Meng Tan, LCLint: a tool for using specifications to check code, Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering, p.87-96, December 06-09, 1994, New Orleans, Louisiana, United States
|
| |
11
|
|
 |
12
|
Cormac Flanagan , K. Rustan M. Leino , Mark Lillibridge , Greg Nelson , James B. Saxe , Raymie Stata, Extended static checking for Java, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
| |
13
|
A. Garrido and R. Johnson. Refactoring C with conditional compilation. In 18th IEEE Int. Conf. on Automated Software Engineering, 2003.
|
| |
14
|
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proc. of the 1st IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.
|
 |
15
|
|
 |
16
|
|
 |
17
|
Ted Kremenek , Ken Ashcraft , Junfeng Yang , Dawson Engler, Correlation exploitation in error ranking, Proceedings of the 12th ACM SIGSOFT twelfth international symposium on Foundations of software engineering, October 31-November 06, 2004, Newport Beach, CA, USA
|
| |
18
|
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In Sixth Symp. on Operating Systems Design and Implementation, 2004.
|
 |
19
|
|
 |
20
|
|
| |
21
|
M. Ohira, R. Yokomoriz, M. Sakai, K. ichi Matsumotoy, K. Inouez, and K. Torii. Empirical project monitor: A tool for mining multiple project data. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
|
| |
22
|
T. J. Ostrand and E. J. Weyuker. A tool for mining defect-tracking systems to predict fault-prone files. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
|
 |
23
|
|
| |
24
|
F. V. Rysselberghe and S. Demeyer. Mining version control systems for FACs (frequently applied changes). In Proc. of Int. Workshop on Mining Software Repositories, 2004.
|
| |
25
|
R. M. Stallman and the GCC Developer Community. GNU compiler collection internals (GCC). available at http://gcc.gnu.org/onlinedocs/gccint.ps.gz, 2005.
|
| |
26
|
|
 |
27
|
|
| |
28
|
T. Xie and D. Notkin. Tool-assisted unit test selection based on operational violations. In 18th IEEE Int. Conf. on Automated Software Engineering, 2003.
|
| |
29
|
A. T. Ying, G. C. Murphy, R. Ng, and M. C. Chu-Carroll. Predicting source code changes by mining revision history. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
|
| |
30
|
Y. Yusof and O. F. Rana. Template mining in source-code digital libraries. In Proc. of Int. Workshop on Mining Software Repositories, 2004.
|
| |
31
|
|
CITED BY 26
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shan Lu , Soyeon Park , Chongfeng Hu , Xiao Ma , Weihang Jiang , Zhenmin Li , Raluca A. Popa , Yuanyuan Zhou, MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs, ACM SIGOPS Operating Systems Review, v.41 n.6, December 2007
|
|
|
Ted Kremenek , Paul Twohey , Godmar Back , Andrew Ng , Dawson Engler, From uncertainty to belief: inferring the specification within, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
|
Zhenmin Li , Lin Tan , Xuanhui Wang , Shan Lu , Yuanyuan Zhou , Chengxiang Zhai, Have things changed now?: an empirical study of bug characteristics in modern open source software, Proceedings of the 1st workshop on Architectural and system support for improving software dependability, p.25-33, October 21-21, 2006, San Jose, California
|
|
|
|
|
|
|
|
|
|
|
|