skip to main content
10.1145/2824864.2824887acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

DCU@FIRE-2014: An Information Retrieval Approach for Source Code Plagiarism Detection

Authors Info & Claims
Published:05 December 2014Publication History

ABSTRACT

We investigate an information retrieval (IR) based approach to source code plagiarism detection. The standard method plagiarism detection by extensively checking pairwise similarities between documents is not scalable to large collections of source code documents. To make the task of source code plagiarism detection fast and scalable in practice, we propose an IR based approach. In this method each document is treated as a pseudo-query which retrieves a list of documents which are potential candidate for containing plagiarised material in decreasing order of their similarity to the query. A threshold is then applied on the relative similarity decrement ratios to create a set of documents as potential cases of source-code reuse. Instead of treating a source code as an unstructured text document, we explore term extraction from the annotated parse tree of a source code and also make use of a field-based language model for indexing and retrieval of source code documents. Results confirm that source code parsing plays a vital role in improving the plagiarism prediction accuracy.

References

  1. A. Z. Broder. Identifying and filtering near-duplicate documents. In Combinatorial Pattern Matching, 11th Annual Symposium, CPM 2000, Montreal, Canada, June 21-23, 2000, pages 1--10, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Chae, J. Ha, S. Kim, B. Kang, and E. G. Im. Software plagiarism detection: a graph-based approach. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 1577--1580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D.-K. Chae, S.-W. Kim, J. Ha, S.-C. Lee, and G. Woo. Software plagiarism detection via the static api call frequency birthmark. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 1639--1643, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Flores, P. Rosso, L. Moreno, and E. Villatoro-Tello. PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use. In Sixth Forum for Information Retrieval Evaluation (FIRE 2014), Bangalore, India, 2014.Google ScholarGoogle Scholar
  5. D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, CTIT, AE Enschede, 2000.Google ScholarGoogle Scholar

Index Terms

  1. DCU@FIRE-2014: An Information Retrieval Approach for Source Code Plagiarism Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
      December 2014
      151 pages
      ISBN:9781450337557
      DOI:10.1145/2824864
      • Editors:
      • Prasenjit Majumder,
      • Mandar Mitra,
      • Sukomal Pal,
      • Madhulika Agrawal,
      • Parth Mehta

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate19of64submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader