skip to main content
10.1145/1287624.1287634acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Context-based detection of clone-related bugs

Published: 07 September 2007 Publication History

Abstract

Studies show that programs contain much similar code, commonly known as clones. One of the main reasons for introducing clones is programmers' tendency to copy and paste code to quickly duplicate functionality. We commonly believe that clones can make programs difficult to maintain and introduce subtle bugs. Although much research has proposed techniques for detecting and removing clones to improve software maintainability, little has considered how to detect latent bugs introduced by clones. In this paper, we introduce a general notion of context-based inconsistencies among clones and develop an efficient algorithm to detect such inconsistencies for locating bugs. We have implemented our algorithm and evaluated it on large open source projects including the latest versions of the Linux kernel and Eclipse. We have discovered many previously unknown bugs and programming style issues in both projects (with 57 for the Linux kernel and 38 for Eclipse). We have also categorized the bugs and style issues and noticed that they exhibit diverse characteristics and are difficult to detect with any single existing bug detection technique. We believe that our approach complements well these existing techniques.

References

[1]
G. Ammons, R. Bodik, and J. R. Larus. Mining specification. In Symposium on Principles of Programming Languages (POPL), 2002.
[2]
B. S. Baker. On finding duplication and near-duplication in large software systems. In Working Conference on Reverse Engineering (WCRE), pages 86--95, 1995.
[3]
B. S. Baker. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM Journal on Computing (SICOMP), 26(5):1343--1362, 1997.
[4]
T. Ball and S. K. Rajamani. The SLAM project: Debugging system software via static analysis. In Symposium on Principles of Programming Languages (POPL), pages 1--3, 2002.
[5]
H. A. Basit and S. Jarzabek. Detecting higher-level similarity patterns in programs. In ESEC/FSE, pages 156--165, 2005.
[6]
I. D. Baxter, C. Pidgeon, and M. Mehlich. DMS R : Program transformations for practical scalable software evolution. In ICSE, pages 625--634, 2004.
[7]
I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. Clone detection using abstract syntax trees. In ICSM, pages 368--377, 1998.
[8]
M. Das, S. Lerner, and M. Seigle. ESP: path-sensitive program verification in polynomial time. In Conference on Programming Language Design and Implementation (PLDI), pages 57--68, 2002.
[9]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry (SoCG), pages 253--262, 2004.
[10]
I. Dillig, T. Dillig, and A. Aiken. Static error detection using semantic inconsistency inference. In Conference on Programming Language Design and Implementation (PLDI), pages 435--445, 2007.
[11]
D. R. Engler, D. Y. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In Symposium on Operating Systems Principles (SOSP), pages 57--72, 2001.
[12]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. Transactions on Programming Languages and Systems (TOPLAS), 9(3):319--349, July 1987.
[13]
C. Flanagan, K. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for Java. In Conference on Programming Language Design and Implementation (PLDI), pages 234--245, 2002.
[14]
T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Symposium on Principles of Programming Languages (POPL), pages 58--70, 2002.
[15]
S. Jarzabek, P. Bassett, H. Zhang, and W. Zhang. XVCL: XML-based variant configuration language. In ICSE, pages 810--811, 2003.
[16]
S. Jarzabek and S. Li. Eliminating redundancies with a "composition with adaptation" meta-programming technique. In ESEC/FSE, pages 237--246, 2003.
[17]
L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree--based detection of code clones. In ICSE, pages 96--105, 2007.
[18]
T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. TSE, 28(7):654--670, 2002.
[19]
C. Kapser and M.W. Godfrey. "cloning considered harmful" considered harmful. In Working Conference on Reverse Engineering (WCRE), pages 19--28, 2006.
[20]
M. Kim, V. Sazawal, and D. Notkin. An empirical study of code clone genealogies. In ESEC/FSE, pages 187--196, 2005.
[21]
R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In International Static Analysis Symposium (SAS), pages 40--56, 2001.
[22]
T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In Symposium on Operating Systems Design and Implementation (OSDI), pages 161--176, 2006.
[23]
J. Krinke. Identifying similar code with program dependence graphs. In Working Conference on Reverse Engineering (WCRE), pages 301--309, 2001.
[24]
B. Laguë, D. Proulx, J. Mayrand, E. Merlo, and J. P. Hudepohl. Assessing the benefits of incorporating function clone detection in a development process. In ICSM, pages 314--321, 1997.
[25]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In Symposium on Operating Systems Design and Implementation (OSDI), pages 289--302, 2004.
[26]
Z. Li and Y. Zhou. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In ESEC/FSE, pages 306--315, 2005.
[27]
N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Conference on Programming Language Design and Implementation (PLDI), pages 89--100, 2007.
[28]
D. C. Rajapakse and S. Jarzabek. Using server pages to unify clones in web applications: A trade-off analysis. In ICSE, pages 116--126, 2007.
[29]
V. Wahler, D. Seipel, J. W. von Gudenberg, and G. Fischer. Clone detection in source code by frequent itemset techniques. In International Workshop on Source Code Analysis and Manipulation, pages 128--135, 2004.
[30]
Y. Xie and A. Aiken. Scalable error detection using boolean satisfiability. In Symposium on Principles of Programming Languages (POPL), pages 351--363, 2005.
[31]
Y. Xie and D. R. Engler. Using redundancies to find errors. In FSE, pages 51--60, 2002.

Cited By

View all
  • (2024)Syntactic–Semantic Detection of Clone-Caused Vulnerabilities in the IoT DevicesSensors10.3390/s2422725124:22(7251)Online publication date: 13-Nov-2024
  • (2024)SolaSim: Clone Detection for Solana Smart Contracts via Program RepresentationProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644406(258-269)Online publication date: 15-Apr-2024
  • (2024)Study and Analysis of Bug Propagation in Evolving Software through Micro-clones2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10726019(1-7)Online publication date: 24-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
September 2007
638 pages
ISBN:9781595938114
DOI:10.1145/1287624
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code clone detection
  2. code clone-related bugs
  3. context-based bug detection
  4. inconsistencies

Qualifiers

  • Article

Conference

ESEC/FSE07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Syntactic–Semantic Detection of Clone-Caused Vulnerabilities in the IoT DevicesSensors10.3390/s2422725124:22(7251)Online publication date: 13-Nov-2024
  • (2024)SolaSim: Clone Detection for Solana Smart Contracts via Program RepresentationProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644406(258-269)Online publication date: 15-Apr-2024
  • (2024)Study and Analysis of Bug Propagation in Evolving Software through Micro-clones2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10726019(1-7)Online publication date: 24-Jun-2024
  • (2024)Measuring model alignment for code clone detection using causal interpretationEmpirical Software Engineering10.1007/s10664-024-10583-030:2Online publication date: 19-Dec-2024
  • (2024)A framework for embedded software portability and verification: from formal models to low-level codeSoftware and Systems Modeling10.1007/s10270-023-01144-y23:2(289-315)Online publication date: 1-Feb-2024
  • (2023)Exploring the Impact of Code Clones on Deep Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/360718132:6(1-34)Online publication date: 3-Jul-2023
  • (2023)A Characterization Study of Merge Conflicts in Java ProjectsACM Transactions on Software Engineering and Methodology10.1145/354694432:2(1-28)Online publication date: 31-Mar-2023
  • (2023)Granularity-Based Comparison of the Bug-Proneness of Code Clones2023 IEEE 17th International Workshop on Software Clones (IWSC)10.1109/IWSC60764.2023.00009(8-14)Online publication date: 1-Oct-2023
  • (2023)Learning Graph-based Code Representations for Source-level Functional Similarity Detection2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00040(345-357)Online publication date: May-2023
  • (2023)OSSFP: Precise and Scalable C/C++ Third-Party Library Detection using Fingerprinting Functions2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00034(270-282)Online publication date: May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media