skip to main content
10.1145/1370750.1370766acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Evaluation of source code copy detection methods on freebsd

Published:10 May 2008Publication History

ABSTRACT

Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.

References

  1. Brenda Baker. On finding duplication and near duplication in large software system, IEEE Working Conference on Reverse Engineering 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Lague, D. Proulx, E. Merlo, J. Maryland, J. Hudepohl, Assessing the benefits of incorporating function clone detection in a development process, IEEE International Conference on Software Maintenance 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Akito Monden, Daikai Nakae, Toshihiro Kamiya, Shin-ichi Sato and Ken-ichi Matsumoto. Software quality analysis by code clones in industrial legacy software, Proceedings of the 8th International Symposium on Software Metrics 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stefan Haefliger, Georg von Krogh and Sebastian Spaeth. Code reuse in open source software. Management Science, Articles in Advance, pp. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hung-Fu Chang and Audris Mockus. Constructing universal version history. ICSE'06 Workshop on Mining Software Repositories, pp. 76--79, Shanghai, China, May 22--23 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P. Samarati. An Open Digest-based Technique for Spam Detection. ACM, vol. 41, no. 8, pp. 74--83. The 2004 International Workshop on Security in Parallel and Distributed Systems.Google ScholarGoogle Scholar
  7. Michael W. Barry and Murray Browne. Understanding search engines: mathematical modeling and text retrieval. SIAM 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ira Baxter, Andrew Yahin, Leonardo Moura, Marcelo SantAnna and Lorraine Bier. Clone detection using abstract syntax trees. In Proceedings of the 8th International Symposium on Software Metrics 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Ducasse, M. Rieger, and S. Demeyer. A language independent approach for detecting duplicated code. International Conference on Software Maintenance 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Engineering, Vol. 28, No.7, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Audris Mockus. Large-scale code reuse in open source software. International Workshop on Emerging Trends in FLOSS Research and Development, May 20--26 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Daniel M. German. Using Software Distributions to Understand the Relationship among Free and Open Source Software Projects.ICSE'07 Workshop on Mining Software Repositories, pp.24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cory Kapser and Michael W. Godfrey. Improved tool support for the investigation of duplication in software. International Conference on Software Maintenance 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, Ettore Merlo. Comparison and Evaluation of Clone Detection Tools. IEEE Transactions on Software Engineering, vol. 33, no. 9, pp.577--591, Sep., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael W. Godfrey, Lijie Zou. Using Origin Analysis to Detect Merging and Splitting of Source Code Entities. IEEE Transactions on Software Engineering, vol. 31, no. 2, pp.166--181, Feb., 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluation of source code copy detection methods on freebsd

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MSR '08: Proceedings of the 2008 international working conference on Mining software repositories
          May 2008
          162 pages
          ISBN:9781605580241
          DOI:10.1145/1370750

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 May 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Upcoming Conference

          ICSE 2024

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader