research-article

DCU@FIRE-2014: An Information Retrieval Approach for Source Code Plagiarism Detection

Authors:
Debasis Ganguly

ADAPT Centre, School of Computing, Dublin City University, Dublin 9, Ireland

ADAPT Centre, School of Computing, Dublin City University, Dublin 9, Ireland
View Profile

,
Gareth J. F. Jones

ADAPT Centre, School of Computing, Dublin City University, Dublin 9, Ireland

ADAPT Centre, School of Computing, Dublin City University, Dublin 9, Ireland
View Profile

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval EvaluationDecember 2014Pages 39–42https://doi.org/10.1145/2824864.2824887

Published:05 December 2014Publication History

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

Pages 39–42

ABSTRACT

We investigate an information retrieval (IR) based approach to source code plagiarism detection. The standard method plagiarism detection by extensively checking pairwise similarities between documents is not scalable to large collections of source code documents. To make the task of source code plagiarism detection fast and scalable in practice, we propose an IR based approach. In this method each document is treated as a pseudo-query which retrieves a list of documents which are potential candidate for containing plagiarised material in decreasing order of their similarity to the query. A threshold is then applied on the relative similarity decrement ratios to create a set of documents as potential cases of source-code reuse. Instead of treating a source code as an unstructured text document, we explore term extraction from the annotated parse tree of a source code and also make use of a field-based language model for indexing and retrieval of source code documents. Results confirm that source code parsing plays a vital role in improving the plagiarism prediction accuracy.

References

A. Z. Broder. Identifying and filtering near-duplicate documents. In Combinatorial Pattern Matching, 11th Annual Symposium, CPM 2000, Montreal, Canada, June 21-23, 2000, pages 1--10, 2000. Google ScholarDigital Library
D. Chae, J. Ha, S. Kim, B. Kang, and E. G. Im. Software plagiarism detection: a graph-based approach. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 1577--1580. Google ScholarDigital Library
D.-K. Chae, S.-W. Kim, J. Ha, S.-C. Lee, and G. Woo. Software plagiarism detection via the static api call frequency birthmark. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 1639--1643, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
E. Flores, P. Rosso, L. Moreno, and E. Villatoro-Tello. PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use. In Sixth Forum for Information Retrieval Evaluation (FIRE 2014), Bangalore, India, 2014.Google Scholar
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, CTIT, AE Enschede, 2000.Google Scholar

Index Terms

DCU@FIRE-2014: An Information Retrieval Approach for Source Code Plagiarism Detection
1. Information systems
  1. Information retrieval

Recommendations

Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

We present a method for source code plagiarism detection that is independent of the programming language. Our method EsaGst combines Explicit Semantic Analysis and Greedy String Tiling. Using 25 cases of source code plagiarism in C++, Java, Ja-vaScript, ...
Read More
Retrieving and classifying instances of source code plagiarism
Abstract
Automatic detection of source code plagiarism is an important research field for both the commercial software industry and within the research community. Existing methods of plagiarism detection primarily involve exhaustive pairwise document ...
Read More
DCU@FIRE-2014: Fuzzy Queries with Rule-based Normalization for Mixed Script Information Retrieval
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

We describe the participation of Dublin City University (DCU) in the FIRE-2014 shared task on transliteration search, hereby referred to as the TST (Transliteration Search Task). The TST involves an ad-hoc search over a collection of Hindi film song ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2014
151 pages
ISBN:9781450337557
DOI:10.1145/2824864
Editors:
Prasenjit Majumder
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Mandar Mitra
Indian Statistical Institute, Kolkata, India
,
Sukomal Pal
Indian School of Mines, Dhanbad
,
Madhulika Agrawal
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Parth Mehta
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Field Search
Source Code Plagiarism Detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate19of64submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DCU@FIRE-2014: An Information Retrieval Approach for Source Code Plagiarism Detection

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling

Retrieving and classifying instances of source code plagiarism

DCU@FIRE-2014: Fuzzy Queries with Rule-based Normalization for Mixed Script Information Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

DCU@FIRE-2014: An Information Retrieval Approach for Source Code Plagiarism Detection

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling

Retrieving and classifying instances of source code plagiarism

DCU@FIRE-2014: Fuzzy Queries with Rule-based Normalization for Mixed Script Information Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media