ACM Home Page
Please provide us with feedback. Feedback
Effective identification of source code authors using byte-level information
Full text PdfPdf (216 KB)
Source International Conference on Software Engineering archive
Proceedings of the 28th international conference on Software engineering table of contents
Shanghai, China
SESSION: Emerging results: program analysis table of contents
Pages: 893 - 896  
Year of Publication: 2006
ISBN:1-59593-375-1
Authors
Georgia Frantzeskou  University of the Aegean, Karlovasi, Greece
Efstathios Stamatatos  University of the Aegean, Karlovasi, Greece
Stefanos Gritzalis  University of the Aegean, Karlovasi, Greece
Sokratis Katsikas  University of the Aegean, Karlovasi, Greece
Sponsors
ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 77,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1134285.1134445
What is a DOI?

ABSTRACT

Source code author identification deals with the task of identifying the most likely author of a computer program, given a set of predefined author candidates. This is usually .based on the analysis of other program samples of undisputed authorship by the same programmer. There are several cases where the application of such a method could be of a major benefit, such as authorship disputes, proof of authorship in court, tracing the source of code left in the system after a cyber attack, etc. We present a new approach, called the SCAP (Source Code Author Profiles) approach, based on byte-level n-gram profiles in order to represent a source code author's style. Experiments on data sets of different programming-language (Java or C++) and varying difficulty (6 to 30 candidate authors) demonstrate the effectiveness of the proposed approach.A comparison with a previous source code authorship identification study based on more complicated information shows that the SCAP approach is language independent and that n-gram author profiles are better able to capture the idiosyncrasies of the source code authors. Moreover, the SCAP approach is able to deal surprisingly well with cases where only a limited amount of very short programs per programmer is available for training. It is also demonstrated that the effectiveness of the proposed model is not affected by the absence of comments in the source code, a condition usually met in cyber-crime cases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Frantzeskou, G., Gritzalis, S., Mac Donell, S., Source Code Authorship Analysis for supporting the cybercrime investigation process, (ICETE04), Vol 2, pages (85-92), 2004.
 
3
 
4
Gray, A., Sallis, P., and MacDonell, S., Software forensics: Extending authorship analysis techniques to computer programs, in Proc. 3rd Biannual Conf. Int. Assoc. of Forensic Linguists (IAFL'97), pages 1--8, 1997.
 
5
Keselj, V., Peng, F., Cercone, N., Thomas, C., N-gram based author profiles for authorship attribution, In Proc. Pacific Association for Computational Linguistics 2003.
 
6
Keselj, V.,. Perl package Text::N-grams http://www.cs.dal.ca/~vlado/srcperl/N-grams , 2003.
 
7
Kilgour, R. I., Gray, A.R., Sallis, P. J., and MacDonell, S. G., A Fuzzy Logic Approach to Computer Software Source Code Authorship Analysis, Accepted In Proc. Of (ICONIP'97). Dunedin. New Zealand, 1997.
 
8
Krsul, I., and Spafford, E. H, Authorship analysis: Identifying the author of a program, In Proc. 8th National Information Systems Security Conference, pages 514--524, National Institute of Standards and Technology., 1995.
 
9
 
10
MacDonell, S.G, and Gray, A.R. Software forensics applied to the task of discriminating between program authors. Journal of Systems Research and Information Systems 10: 113--127 (2001).
 
11
 
12
13
 
14
 
15


Collaborative Colleagues:
Georgia Frantzeskou: colleagues
Efstathios Stamatatos: colleagues
Stefanos Gritzalis: colleagues
Sokratis Katsikas: colleagues