| Evaluation of source code copy detection methods on freebsd |
| Full text |
Pdf
(231 KB)
|
Source
|
International Conference on Software Engineering
archive
Proceedings of the 2008 international working conference on Mining software repositories
table of contents
Leipzig, Germany
SESSION: Changes and clones
table of contents
Pages 61-66
Year of Publication: 2008
ISBN:978-1-60558-024-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 59, Citation Count: 0
|
|
|
ABSTRACT
Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Brenda Baker. On finding duplication and near duplication in large software system, IEEE Working Conference on Reverse Engineering 1995.
|
| |
2
|
B. Lague, D. Proulx, E. Merlo, J. Maryland, J. Hudepohl, Assessing the benefits of incorporating function clone detection in a development process, IEEE International Conference on Software Maintenance 1997.
|
| |
3
|
Akito Monden, Daikai Nakae, Toshihiro Kamiya, Shin-ichi Sato and Ken-ichi Matsumoto. Software quality analysis by code clones in industrial legacy software, Proceedings of the 8th International Symposium on Software Metrics 2002.
|
| |
4
|
Stefan Haefliger, Georg von Krogh and Sebastian Spaeth. Code reuse in open source software. Management Science, Articles in Advance, pp. 1--14.
|
| |
5
|
Hung-Fu Chang and Audris Mockus. Constructing universal version history. ICSE'06 Workshop on Mining Software Repositories, pp. 76--79, Shanghai, China, May 22--23 2006.
|
| |
6
|
E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P. Samarati. An Open Digest-based Technique for Spam Detection. ACM, vol. 41, no. 8, pp. 74--83. The 2004 International Workshop on Security in Parallel and Distributed Systems.
|
| |
7
|
Michael W. Barry and Murray Browne. Understanding search engines: mathematical modeling and text retrieval. SIAM 1999.
|
| |
8
|
Ira Baxter, Andrew Yahin, Leonardo Moura, Marcelo SantAnna and Lorraine Bier. Clone detection using abstract syntax trees. In Proceedings of the 8th International Symposium on Software Metrics 1998.
|
| |
9
|
S. Ducasse, M. Rieger, and S. Demeyer. A language independent approach for detecting duplicated code. International Conference on Software Maintenance 1999.
|
| |
10
|
T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Engineering, Vol. 28, No.7, 2002.
|
| |
11
|
Audris Mockus. Large-scale code reuse in open source software. International Workshop on Emerging Trends in FLOSS Research and Development, May 20--26 2007.
|
| |
12
|
Daniel M. German. Using Software Distributions to Understand the Relationship among Free and Open Source Software Projects.ICSE'07 Workshop on Mining Software Repositories, pp.24, 2007.
|
| |
13
|
Cory Kapser and Michael W. Godfrey. Improved tool support for the investigation of duplication in software. International Conference on Software Maintenance 2005.
|
| |
14
|
Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, Ettore Merlo. Comparison and Evaluation of Clone Detection Tools. IEEE Transactions on Software Engineering, vol. 33, no. 9, pp.577--591, Sep., 2007.
|
| |
15
|
Michael W. Godfrey, Lijie Zou. Using Origin Analysis to Detect Merging and Splitting of Source Code Entities. IEEE Transactions on Software Engineering, vol. 31, no. 2, pp.166--181, Feb., 2005
|
INDEX TERMS
Primary Classification:
D.
Software
D.2
SOFTWARE ENGINEERING
D.2.7
Distribution, Maintenance, and Enhancement
Subjects:
Restructuring, reverse engineering, and reengineering
Additional Classification:
D.
Software
D.2
SOFTWARE ENGINEERING
D.2.7
Distribution, Maintenance, and Enhancement
Subjects:
Version control
General Terms:
Algorithms,
Measurement
Keywords:
clone detection,
cloning,
code copying,
open source,
version control
|