skip to main content
survey

Code Authorship Attribution: Methods and Challenges

Published:13 February 2019Publication History
Skip Abstract Section

Abstract

Code authorship attribution is the process of identifying the author of a given code. With increasing numbers of malware and advanced mutation techniques, the authors of malware are creating a large number of malware variants. To better deal with this problem, methods for examining the authorship of malicious code are necessary. Code authorship attribution techniques can thus be utilized to identify and categorize the authors of malware. This information can help predict the types of tools and techniques that the author of a specific malware uses, as well as the manner in which the malware spreads and evolves. In this article, we present the first comprehensive review of research on code authorship attribution. The article summarizes various methods of authorship attribution and highlights challenges in the field.

References

  1. Alex Aiken. 1994. MOSS: A system for detecting software similarity. Retrieved August 29, 2017 from https://theory.stanford.edu/∼aiken/moss/.Google ScholarGoogle Scholar
  2. Allan J. Albrecht and John E. Gaffney. 1983. Software function, source lines of code, and development effort prediction: A software science validation. IEEE Transactions on Software Engineering SE-9, 6 (Nov. 1983), 639--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An onion approach to binary code authorship attribution. Digital Investigation 11, 1 (May 2014), S94--S103.Google ScholarGoogle ScholarCross RefCross Ref
  4. Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. 2017. On the feasibility of malware authorship attribution. arXiv:1701.02711.Google ScholarGoogle Scholar
  5. Saed Alrabaee, Lingyu Wang, and Mourad Debbabi. 2016. BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Investigation 18 (Aug. 2016), S11--S22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bander Alsulami, Edwin Dauber, Richard Harang, Spiros Mancoridis, and Rachel Greenstadt. 2017. Source code authorship attribution using long short-term memory-based networks. In Proceedings of the European Symposium on Research in Computer Security. 65--82.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ionut Arghire. 2018. WannaMine malware spreads via NSA-linked exploit. Retrieved July 13, 2018 from https://www.securityweek.com/wannamine-malware-spreads-nsa-linked-exploit.Google ScholarGoogle Scholar
  8. Upul Bandara and Gamini Wijayarathna. 2013. Source code author identification with unsupervised feature learning. Pattern Recognition Letters 34, 3 (Feb. 2013), 330--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hal L. Berghel and David L. Sallach. 1984. Measurements of program similarity in identical task environments. ACM SIGPLAN Notices 19, 8 (Aug. 1984), 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Vinod Bhattathiripad. 2012. Software piracy forensics: A proposal for incorporating dead codes and other programming blunders as important evidence in AFC test. In Proceedings of the 36th Annual Computer Software and Applications Conference Workshops. IEEE, Los Alamitos, CA, 206--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Necdet Bulut and Maurice H. Halstead. 1973. Invariant properties of algorithms. ACM SIGPLAN Notices 8, 6 (1973), 12--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Necdet Bulut, Maurice H. Halstead, and Rudolf Bayer. 1974. Experimental validation of a structural property of fortran algorithms. In Proceedings of the 1974 Annual Conference. ACM, New York, 207--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Burrows and S. M. M. Tahaghoghi. 2007. Source code authorship attribution using N-grams. In Proceedings of the 12th Australasian Document Computing Symposium. 32--39.Google ScholarGoogle Scholar
  14. Steven Burrows, Alexandra L. Uitdenbogerd, and Andrew Turpin. 2009. Application of information retrieval techniques for source code authorship attribution. In Proceedings of the 14th International Conference on Database Systems for Advanced Applications (DASFAA’09). 699--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Steven Burrows, Alexandra L. Uitdenbogerd, and Andrew Turpin. 2014. Comparing techniques for authorship attribution of source code. Software: Practice and Experience 44, 1 (Aug. 2014), 1--32.Google ScholarGoogle ScholarCross RefCross Ref
  16. Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, et al. 2018. When coding style survives compilation: De-anonymizing programmers from executable binaries. In Proceedings of the 2018 Network and Distributed System Security Symposium (NDSS’18).Google ScholarGoogle ScholarCross RefCross Ref
  17. Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, et al. 2015. De-anonymizing programmers via code stylometry. In Proceedings of the 24th USENIX Security Symposium. 255--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, et al. 2015. When coding style survives compilation: De-anonymizing programmers from executable binaries. arXiv:1512.08546.Google ScholarGoogle Scholar
  19. Dong-Kyu Chae, Sang-Wook Kim, Jiwoon Ha, Sang-Chul Lee, and Gyun Woo. 2013. Software plagiarism detection via the static API call frequency birthmark. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, New York, NY, 1639--1643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Evangelos Chatzicharalampous, Georgia Frantzeskou, and Efstathios Stamatatos. 2012. Author identification in imbalanced sets of source code samples. In Proceedings of the 24th International Conference on Tools With Artificial Intelligence. IEEE, Los Alamitos, CA, 790--797. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rong Chen, Lina Hong, Chunyan Lu, and Wu Deng. 2010. Author identification of software source code with program dependence graphs. In Proceedings of the IEEE 34th Annual Computer Software and Applications Conference Workshops. IEEE, Los Alamitos, CA, 281--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Seokwoo Choi, Heewan Park, Hyun il Lim, and Taisook Han. 2009. A static API birthmark for Windows binary executables. Journal of Systems and Software 82, 5 (May 2009), 862--873. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Radhouane Chouchane, Natalia Stakhanova, Andrew Walenstein, and Arun Lakhotia. 2013. Detecting machine-morphed malware variants via engine attribution. Journal of Computer Virology and Hacking Techniques 9, 3 (Sept. 2013), 137--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Curtis, S. B. Sheppard, P. Milliman, M. A. Borst, and T. Love. 1979. Measuring the psychological complexity of software maintenance tasks with the Halstead and McCabe metrics. IEEE Transactions on Software Engineering SE-5, 2 (March 1979), 96--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Charles Curtsinger, Benjamin Livshits, Benjamin Zorn, and Christian Seifert. 2011. Zozzle: Fast and precise in-browser JavaScript malware detection. In Proceedings of the 20th Usenix Security Symposium. 33--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Edwin Dauber, Aylin Caliskan, Richard Harang, and Rachel Greenstadt. 2017. Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. arXiv:1701.05681.Google ScholarGoogle Scholar
  27. Edwin Dauber, Aylin Caliskan, Richard Harang, and Rachel Greenstadt. 2018. Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings (ICSE’18). ACM, New York, NY, 356--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lucas Davi and Ahmad-Reza Sadeghi (Eds.). 2015. Code-reuse in malware. In Building Secure Defenses Against Code-Reuse Attacks. Springer, Cham, Switzerland, 22--26.Google ScholarGoogle Scholar
  29. Haibiao Ding and Mansur H. Samadzadeh. 2004. Extraction of Java program fingerprints for software authorship identification. Journal of Systems and Software 72, 1 (June 2004), 49--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. John L. Donaldson, Ann-Marie Lancaster, and Paula H. Sposato. 1981. A plagiarism detection system. In Proceedings of the 12th SIGCSE Technical Symposium on Computer Science Education. ACM, New York, NY, 21--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Krishna S. R. Dubba and Arun K. Pujari. 2006. N-gram analysis for computer virus detection. Journal in Computer Virology 2, 3 (Dec. 2006), 231--239.Google ScholarGoogle Scholar
  32. Bruce S. Elenbogen and Naeem Seliya. 2008. Detecting outsourced student programming assignments. Journal of Computing Sciences in Colleges 23, 3 (2008), 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. A. W. Faidhi and S. K. Robinson. 1987. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Computers and Education 11, 1 (Jan. 1987), 11--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Brandi Firestine. 2017. Celebrating nine years of GitHub with an anniversary sale. Retrieved July 17, 2018 from https://blog.github.com/2017-04-10-celebrating-nine-years-of-github-with-an-anniversary-sale/.Google ScholarGoogle Scholar
  35. Ann Fitzsimmons and Tom Love. 1978. A review and evaluation of software science. ACM Computing Surveys 10, 1 (March 1978), 3--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Georgia Frantzeskou, Stefanos Gritzalis, and Stephen G. Macdonell. 2004. Source code authorship analysis for supporting the cybercrime investigation process. In Proceedings of the 1st International Conference on E-Business and Telecommunication Networks. 85--92.Google ScholarGoogle Scholar
  37. Georgia Frantzeskou, Efstathios Stamatatos, and Stefanos Gritzalis. 2007. Identifying authorship by byte-level N-grams: The source code author profile (SCAP) method. International Journal of Digital Evidence 6, 1 (2007), 1--18.Google ScholarGoogle Scholar
  38. Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, and Sokratis Katsikas. 2006. Effective identification of source code authors using byte-level information. In Proceedings of the 28th International Conference on Software Engineering. ACM, New York, NY, 893--896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, and Sokratis Katsikas. 2006. Source Code Author Identification Based on N-gram Author Profiles. IFIP International Federation for Information Processing, Vol. 204. Springer, Boston, MA.Google ScholarGoogle Scholar
  40. Marina L. Gavrilova and Roman V. Yampolskiy. 2010. Applying biometric principles to avatar recognition. In Proceedings of the International Conference on Cyberworlds. IEEE, Los Alamitos, CA, 179--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hugo Gonzalez, Natalia Stakhanova, and Ali A. Ghorbani. 2018. Authorship attribution of Android apps. In Proceedings of the 8th ACM Conference on Data and Application Security and Privacy. ACM, New York, NY, 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sam Grier. 1981. A tool that detects plagiarism in Pascal programs. In Proceedings of the 12th SIGCSE Technical Symposium on Computer Science Education. ACM, New York, NY, 15--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dick Grune. 1991. Concise Report on Algorithms in Sim. Report distributed with Sim software.Google ScholarGoogle Scholar
  44. Muqaddas Gull, Tehseen Zia, and Muhammad Ilyas. 2017. Source code author attribution using author’s programming style and code smells. International Journal of Intelligent Systems and Applications 9, 5, 27.Google ScholarGoogle ScholarCross RefCross Ref
  45. Maurice H. Halstead. 1972. Natural laws controlling algorithm structure? ACM SSIGPLAN Newsletter 7, 2 (Feb. 1972), 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Maurice H. Halstead. 1975. Toward a theoretical basis for estimating programming effort. In Proceedings of the ACM Annual Conference. ACM, New York, NY, 222--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Peter G. Hamer and Gillian D. Frewin. 1982. MH Halstead’s software science-a critical examination. In Proceedings of the 6th International Conference on Software Engineering. 197--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jane Huffman Hayes. 2008. Authorship attribution: A principal component and linear discriminant analysis of the consistent programmer hypothesis. International Journal on Computers and Their Applications 15, 2 (2008), 79--99.Google ScholarGoogle Scholar
  49. Jane Huffman Hayes and Jeff Offutt. 2010. Recognizing authors: An examination of the consistent programmer hypothesis. Software Testing, Verification and Reliability 20, 4 (Dec. 2010), 329--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. David I. Holmes. 1998. The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing 13, 3 (Sept. 1998), 111--117.Google ScholarGoogle ScholarCross RefCross Ref
  51. Mikko Hypponen. 2011. BRAIN: Searching for the first PC virus in Pakistan. Retrieved July 4, 2017 from http://campaigns.f-secure.com/brain/virus.html.Google ScholarGoogle Scholar
  52. Jeong-Hoon Ji, Gyun Woo, and Hwan-Gue Cho. 2008. A plagiarism detection technique for Java program using bytecode analysis. In Proceedings of the 3rd International Conference on Convergence and Hybrid Information Technology. IEEE, Los Alamitos, CA, 1092--1098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Vlado Kešelj, Fuchun Peng, Nick Cercone, and Calvin Thomas. 2003. N-gram-based author profiles for authorship attribution. In Proceedings of the Pacific Association for Computational Linguistics, Vol. 3. 255--264.Google ScholarGoogle Scholar
  54. Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2009. Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology 60, 1 (Jan. 2009), 9--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jay Kothari, Maxim Shevertalov, Edward Stehle, and Spiros Mancoridis. 2007. A probabilistic approach to source code authorship identification. In Proceedings of the 4th International Conference on Information Technology. IEEE, Los Alamitos, CA, 243--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Ivan Krsul and Eugene Spafford. 1997. Authorship analysis: Identifying the author of a program. Computers and Security 16, 3 (Oct. 1997), 233--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ivan Krsul and Eugene H. Spafford. 1994. Authorship Analysis: Identifying the Author of a Program. Technical Report 96-052. Purdue University, West Lafayette, IN.Google ScholarGoogle Scholar
  58. Ivan Krsul and Eugene H. Spafford. 1997. Authorship analysis: Identifying the author of a program. Computers and Security 16, 3 (Jan. 1997), 233--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Vaibhavi Kulgutkar, Natalia Stakhanova, Paul Cook, and Alina Matyukhina. 2018. Authorship attribution through string analysis. In Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES’18). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Robert Charles Lange and Spiros Mancoridis. 2007. Using code metric histograms and genetic algorithms to perform author identification for software forensics. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 2082--2089. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tımea László and Ákos Kiss. 2009. Obfuscating C++ programs via control flow flattening. Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica 30, 1 (Aug. 2009), 3--19.Google ScholarGoogle Scholar
  62. Robert Layton and Ahmad Azab. 2014. Authorship analysis of the Zeus botnet source code. In Proceedings of the 5th Cybercrime and Trustworthy Computing Conference. IEEE, Los Alamitos, CA, 38--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Robert Layton, Paul Watters, and Richard Dazeley. 2010. Automatically determining phishing campaigns using the USCAP methodology. In Proceedings of the 2010 eCrime Researchers Summit. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  64. Robert Layton, Paul Watters, and Richard Dazeley. 2012. Unsupervised authorship analysis of phishing Webpages. In Proceedings of the International Symposium on Communications and Information Technologies. IEEE, Los Alamitos, CA, 1104--1109.Google ScholarGoogle ScholarCross RefCross Ref
  65. Ronald J. Leach. 1995. Using metrics to evaluate student programs. ACM SIGCSE Bulletin 27, 2 (June 1995), 41--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: Detection of software plagiarism by program dependence graph analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 872--881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Thomas J. McCabe. 1976. A complexity measure. IEEE Transactions on Software Engineering SE-2, 4 (Dec. 1976), 308--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Xiaozhu Meng. 2016. Fine-grained binary code authorship identification. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 1097--1099. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Paul W. Oman and Curt R. Cook. 1989. Programming style authorship analysis. In Proceedings of the 17th ACM Annual Computer Science Conference (CSC’89). 320--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Karl J. Ottenstein. 1976. An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bulletin 8, 4 (Dec. 1976), 30--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Alan Parker and James O. Hamblen. 1989. Computer algorithms for plagiarism detection. IEEE Transactions on Education 32, 2 (May 1989), 94--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science 8, 11 (Nov. 2002), 1016--1038.Google ScholarGoogle Scholar
  73. Nathan Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2011. Recovering the toolchain provenance of binary code. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, New York, NY, 100--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who Wrote This Code? Identifying the Authors of Program Binaries. Lecture Notes in Computer Science, Vol. 6879. Springer, 172--189.Google ScholarGoogle Scholar
  75. Nathan E. Rosenblum. 2011. The Provenance Hierarchy of Computer Programs. Ph.D. Dissertation. University of Wisconsin, Madison. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, New York, NY, 21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Philip J. Sallis, Asbjorn Aakjaer, and Stephen G. Macdonell. 1996. Software forensics: Old methods for a new science. In Proceedings of the International Conference Software Engineering: Education and Practice. IEEE, Los Alamitos, CA, 481--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: Local algorithms for document fingerprinting. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 76--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. V. Y. Shen, S. D. Conte, and H. E. Dunsmore. 1983. Software science revisited: A critical analysis of the theory and its empirical support. IEEE Transactions on Software Engineering SE-9, 2 (March 1983), 155--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Maxim Shevertalov, Jay Kothari, Edward Stehle, and Spiros Mancoridis. 2009. On the use of discretized source code metrics for author identification. In Proceedings of the IEEE 1st International Symposium on Search Based Software Engineering. IEEE, Los Alamitos, CA, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Lucy Simko, Luke Zettlemoyer, and Tadayoshi Kohno. 2018. Recognizing and imitating programmer style: Adversaries in program authorship attribution. Proceedings on Privacy Enhancing Technologies 2018, 1 (2018), 127--144.Google ScholarGoogle ScholarCross RefCross Ref
  82. Eugene H. Spafford and Stephen A. Weeber. 1993. Software forensics: Can we track code to its authors? Computers and Security 12, 6 (1993), 585--595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the Association for Information Science and Technology 60, 3 (March 2009), 538--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Benno Stein, Nedim Lipka, and Peter Prettenhofer. 2011. Intrinsic plagiarism analysis. Language Resources and Evaluation 45, 1 (March 2011), 63--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Matthew F. Tennyson and Francisco J. Mitropoulos. 2014. Choosing a profile length in the SCAP method of source code authorship attribution. In Proceedings of the 2014 IEEE SoutheastCon. IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle Scholar
  86. Kristina L. Verco and Michael J. Wise. 1996. Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems. In Proceedings of the 1st Australasian Conference on Computer Science Education. ACM, New York, NY, 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Andrew Walenstein and Arun Lakhotia. 2007. The software similarity problem in malware analysis. In Proceedings of the Conference on Duplication, Redundancy, and Similarity in Software. 1--10.Google ScholarGoogle Scholar
  88. Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, and Peng Liu. 2009. Behavior based software theft detection. In Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM, New York, NY, 280--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Geoff Whale. 1990. Identification of program similarity in large populations. Computer Journal 33, 2 (April 1990), 140--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Michael J. Wise. 1996. YAP3: Improved detection of similarities in computer program and other texts. ACM SIGCSE Bulletin 28, 1 (March 1996), 130--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Wilco Wisse and Cor Veenman. 2015. Scripting dna: Identifying the JavaScript programmer. Digital Investigation 15 (2015), 61--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Xinyu Yang, Guoai Xu, Qi Li, Yanhui Guo, and Miao Zhang. 2017. Authorship attribution of source code by using back propagation neural network based on particle swarm optimization. PloS One 12, 11 (2017), e0187204.Google ScholarGoogle ScholarCross RefCross Ref
  93. Chunxia Zhang, Sen Wang, Jiayu Wu, and Zhendong Niu. 2017. Authorship identification of source codes. In Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. 282--296.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Code Authorship Attribution: Methods and Challenges

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 52, Issue 1
          January 2020
          758 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/3309872
          • Editor:
          • Sartaj Sahni
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 February 2019
          • Accepted: 1 September 2018
          • Revised: 1 August 2018
          • Received: 1 December 2017
          Published in csur Volume 52, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • survey
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format