skip to main content
10.1145/3106237.3121274acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
short-paper

Suggesting meaningful variable names for decompiled code: a machine translation approach

Published:21 August 2017Publication History

ABSTRACT

Decompiled code lacks meaningful variable names. We used statistical machine translation to suggest variable names that are natural given the context. This technique has previously been successfully applied to obfuscated JavaScript code, but decompiled C code poses unique challenges in constructing an aligned corpus and selecting the best translation from among several candidates.

References

  1. Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281–293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gal Beniamini, Sarah Gingichashvili, Alon Klein Orbach, and Dror G Feitelson. 2017. Meaningful identifier names: the case of single-letter variables. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 45–54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015.Google ScholarGoogle Scholar
  4. Programming with "Big Code": Lessons, Techniques and Applications. In 1st Summit on Advances in Programming Languages (SNAPL 2015) (Leibniz International Proceedings in Informatics (LIPIcs)), Thomas Ball, Rastislav Bodik, Shriram Krishnamurthi, Benjamin S. Lerner, and Greg Morrisett (Eds.), Vol. 32. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 41–50. LIPIcs.SNAPL.2015.41Google ScholarGoogle Scholar
  5. Brendan Cleary, Christoph Treude, Fernando Figueira Filho, Margaret-Anne Storey, and Martin Salois. 2013. Improving Tool Support for Software Reverse Engineering in a Security Context. In International Conference on Augmented Cognition. Springer, 113–122.Google ScholarGoogle ScholarCross RefCross Ref
  6. Premkumar Devanbu. 2015. New initiative: the naturalness of software. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 543–546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. MV Emmerik and Trent Waddington. 2004. Using a decompiler for real-world source recovery. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on. IEEE, 27–36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, Piscataway, NJ, USA, 233–236. http://dl.acm.org/citation.cfm?id=2487085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 2487132Google ScholarGoogle Scholar
  10. Ilfak Guilfanov. 2008. Decompilers and beyond. Black Hat USA (2008).Google ScholarGoogle Scholar
  11. Hex-Rays. 2017. Hex-Rays 2.4. (2017). https://www.hexrays.comGoogle ScholarGoogle Scholar
  12. Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 837–847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software. ACM, 173–184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Donald Ervin Knuth. 1984. Literate programming. Comput. J. 27, 2 (1984), 97–111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007.Google ScholarGoogle Scholar
  16. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL ’07). Association for Computational Linguistics, Stroudsburg, PA, USA, 177–180. http://dl.acm.org/citation.cfm?id=1557769.1557821 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What’s in a Name? A Study of Identifiers. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC ’06). IEEE Computer Society, Washington, DC, USA, 3–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL ’15). ACM, New York, NY, USA, 111–124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Notices, Vol. 49. ACM, 419–428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Edward J. Schwartz, JongHyup Lee, Maverick Woo, and David Brumley. 2013.Google ScholarGoogle Scholar
  21. Native x86 Decompilation Using Semantics-preserving Structural Analysis and Iterative Control-flow Structuring. In Proceedings of the 22Nd USENIX Conference on Security (SEC’13). USENIX Association, Berkeley, CA, USA, 353–368. http: //dl.acm.org/citation.cfm?id=2534766.2534797 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering Clear, Natural Identifiers from Obfuscated JavaScript Names. In 12th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Khaled Yakdan, Sergej Dechand, Elmar Gerhards-Padilla, and Matthew Smith. 2016. Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 158–177.Google ScholarGoogle ScholarCross RefCross Ref
  24. Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. 2015. No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations.. In NDSS. Abstract 1 Research Problem and Motivation 2 Background and Related Work 3 Approach and Uniqueness 4 Results and Contributions 4.1 Extracting Aligned Training Data 4.2 Machine Translation Framework 4.3 Evaluation 5 Conclusions ReferencesGoogle ScholarGoogle Scholar

Index Terms

  1. Suggesting meaningful variable names for decompiled code: a machine translation approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
      August 2017
      1073 pages
      ISBN:9781450351058
      DOI:10.1145/3106237

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader