|
ABSTRACT
Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software, so-called simple clones. While the knowledge of simple clones is useful, detecting design-level similarities in software could ease maintenance even further, and also help us identify reuse opportunities. We observed that recurring patterns of simple clones - so-called structural clones - often indicate the presence of interesting design-level similarities. An example would be patterns of collaborating classes or components. Finding structural clones that signify potentially useful design information requires efficient techniques to analyze the bulk of simple clone data and making non-trivial inferences based on the abstracted information. In this paper, we describe a practical solution to the problem of detecting some basic, but useful, types of design-level similarities such as groups of highly similar classes or files. First, we detect simple clones by applying conventional token-based techniques. Then we find the patterns of co-occurring clones in different files using the Frequent Itemset Mining (FIM) technique. Finally, we perform file clustering to detect those clusters of highly similar files that are likely to contribute to a design-level similarity pattern. The novelty of our approach is application of data mining techniques to detect design level similarities. Experiments confirmed that our method finds many useful structural clones and scales up to big programs. The paper describes our method for structural clone detection, a prototype tool called Clone Miner that implements the method and experimental results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
ANTLR website at http://www.antlr.org
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
E. Buss , R. De Mori , W. M. Gentleman , J. Henshaw , H. Johnson , K. Kontogiannis , E. Merlo , H. A. Müller , J. Mylopoulos , S. Paul , A. Prakash , M. Stanley , S. R. Tilley , J. Troster , K. Wong, Investigating reverse engineering technologies for the CAS program understanding project, IBM Systems Journal, v.33 n.3, p.477-500, July 1994
|
| |
10
|
Case Study: eliminating redundant codes in the Buffer library. At XVCL Website, http://xvcl.comp.nus.edu.sg/xvcl/buffer/index.htm
|
| |
11
|
Church, K. W. and Helfman, J. I. Dotplot: A program for exploring self-similarity in million of lines of text and code. Journal of Computational and Graphical Statistics, June 1993, 2(2):153--174.
|
| |
12
|
Davey, N., Barson, P., Field, S., Frank, R., and Tansley, D. The development of a software clone detector. International Journal of Applied Software Technology, 1(3-4): 219--236, 1995.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Grahne, G., and Zhu, J., Efficiently Using Prefix-trees in Mining Frequent Itemsets. In Proceeding of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), Melbourne, FL, Nov 2003.
|
| |
17
|
|
 |
18
|
|
| |
19
|
Java Technology at http://java.sun.com/
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
Karkkainen, J., and Sanders, P. Simple linear work suffix array construction. In Proc. 30th Internat. Colloq. Automata, Languages & Programming (2003) 943--955.
|
| |
24
|
|
 |
25
|
|
| |
26
|
Kim, D.K., Sim, J.S., Park, H., and Park, K. Linear-time construction of suffix arrays. In Proc. Fourteenth Annual Symp. Combinatorial Pattern Matching (2003) 186--199.
|
| |
27
|
Ko, P., and Aluru, S. Space efficient linear time construction of suffix arrays. In Proc. Fourteenth Annual Symp. Combinatorial Pattern Matching (2003) 200--210.
|
| |
28
|
Kontogiannis, K.A., De Mori, R., Merlo, E., Galler, M., and Bernstein, M. Pattern Matching for Clone and Concept Detection. J. Automated Software Eng., vol. 3, pp. 770--108, 1996.
|
| |
29
|
|
| |
30
|
|
| |
31
|
Larsson, N.J., and Sadakane, K. Faster Suffix Sorting. Technical Report LU-CS-TR:99-214, Lund University (1999) 20 pp.
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
Morzy, T., Wojciechowski, M., and Zakrzewicz, M. Web Users Clustering. In Proc. of the 15th International Symposium on Computer and Information Sciences, Istanbul, Turkey, 2000, pages 374--382.
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
Ryan, A. P. J., Smyth, W. F., Turpin, A., and Xiaoyang Y. New suffix array algorithms -- linear but not fast? In Proc. 15th Australasian Workshop on Combinatorial Algorithms, Seok-Hee Hong (ed.) (2004) 148--156.
|
| |
40
|
|
| |
41
|
|
| |
42
|
XVCL website at : http://xvcl.comp.nus.edu.sg/overview_brochure.php
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
Stan Jarzabek , Ulf Pettersson, Project-driven university-industry collaboration: modes of collaboration, outcomes, benefits, success factors, Proceedings of the 2006 international workshop on Summit on software engineering education, May 20-20, 2006, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hamid Abdul Basit , Simon J. Puglisi , William F. Smyth , Andrew Turpin , Stan Jarzabek, Efficient token based clone detection with flexible tokenization, The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers, September 03-07, 2007, Dubrovnik, Croatia
|
|
|
|
|
|
|
|
|
|