short-paper

Suggesting meaningful variable names for decompiled code: a machine translation approach

Author:
Alan Jaffe

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software EngineeringAugust 2017Pages 1050–1052https://doi.org/10.1145/3106237.3121274

Published:21 August 2017Publication History

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pages 1050–1052

ABSTRACT

Decompiled code lacks meaningful variable names. We used statistical machine translation to suggest variable names that are natural given the context. This technique has previously been successfully applied to obfuscated JavaScript code, but decompiled C code poses unique challenges in constructing an aligned corpus and selecting the best translation from among several candidates.

References

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281–293. Google ScholarDigital Library
Gal Beniamini, Sarah Gingichashvili, Alon Klein Orbach, and Dror G Feitelson. 2017. Meaningful identifier names: the case of single-letter variables. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 45–54. Google ScholarDigital Library
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015.Google Scholar
Programming with "Big Code": Lessons, Techniques and Applications. In 1st Summit on Advances in Programming Languages (SNAPL 2015) (Leibniz International Proceedings in Informatics (LIPIcs)), Thomas Ball, Rastislav Bodik, Shriram Krishnamurthi, Benjamin S. Lerner, and Greg Morrisett (Eds.), Vol. 32. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 41–50. LIPIcs.SNAPL.2015.41Google Scholar
Brendan Cleary, Christoph Treude, Fernando Figueira Filho, Margaret-Anne Storey, and Martin Salois. 2013. Improving Tool Support for Software Reverse Engineering in a Security Context. In International Conference on Augmented Cognition. Springer, 113–122.Google ScholarCross Ref
Premkumar Devanbu. 2015. New initiative: the naturalness of software. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 543–546. Google ScholarDigital Library
MV Emmerik and Trent Waddington. 2004. Using a decompiler for real-world source recovery. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on. IEEE, 27–36. Google ScholarDigital Library
Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, Piscataway, NJ, USA, 233–236. http://dl.acm.org/citation.cfm?id=2487085. Google ScholarDigital Library
2487132Google Scholar
Ilfak Guilfanov. 2008. Decompilers and beyond. Black Hat USA (2008).Google Scholar
Hex-Rays. 2017. Hex-Rays 2.4. (2017). https://www.hexrays.comGoogle Scholar
Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 837–847. Google ScholarDigital Library
Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software. ACM, 173–184. Google ScholarDigital Library
Donald Ervin Knuth. 1984. Literate programming. Comput. J. 27, 2 (1984), 97–111.Google ScholarDigital Library
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007.Google Scholar
Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL ’07). Association for Computational Linguistics, Stroudsburg, PA, USA, 177–180. http://dl.acm.org/citation.cfm?id=1557769.1557821 Google ScholarDigital Library
Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What’s in a Name? A Study of Identifiers. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC ’06). IEEE Computer Society, Washington, DC, USA, 3–12. Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL ’15). ACM, New York, NY, USA, 111–124. Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Notices, Vol. 49. ACM, 419–428. Google ScholarDigital Library
Edward J. Schwartz, JongHyup Lee, Maverick Woo, and David Brumley. 2013.Google Scholar
Native x86 Decompilation Using Semantics-preserving Structural Analysis and Iterative Control-flow Structuring. In Proceedings of the 22Nd USENIX Conference on Security (SEC’13). USENIX Association, Berkeley, CA, USA, 353–368. http: //dl.acm.org/citation.cfm?id=2534766.2534797 Google ScholarDigital Library
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering Clear, Natural Identifiers from Obfuscated JavaScript Names. In 12th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM. Google ScholarDigital Library
Khaled Yakdan, Sergej Dechand, Elmar Gerhards-Padilla, and Matthew Smith. 2016. Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 158–177.Google ScholarCross Ref
Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. 2015. No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations.. In NDSS. Abstract 1 Research Problem and Motivation 2 Background and Related Work 3 Approach and Uniqueness 4 Results and Contributions 4.1 Extracting Aligned Training Data 4.2 Machine Translation Framework 4.3 Evaluation 5 Conclusions ReferencesGoogle Scholar

Index Terms

Suggesting meaningful variable names for decompiled code: a machine translation approach
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software reverse engineering

Recommendations

A Comb for Decompiled C Code
ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security

Decompilers are fundamental tools to perform security assessments of third-party software. The quality of decompiled code can be a game changer in order to reduce the time and effort required for analysis. This paper proposes a novel approach to ...
Read More
Meaningful variable names for decompiled code: a machine translation approach
ICPC '18: Proceedings of the 26th Conference on Program Comprehension

When code is compiled, information is lost, including some of the structure of the original source code as well as local identifier names. Existing decompilers can reconstruct much of the original source code, but typically use meaningless placeholder ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
August 2017
1073 pages
ISBN:9781450351058
DOI:10.1145/3106237
General Chairs:
Eric Bodden
Paderborn University, Germany / Fraunhofer IEM, Germany
,
Wilhelm Schäfer
Paderborn University, Germany
,
Program Chairs:
Arie van Deursen
Delft University of Technology, Netherlands
,
Andrea Zisman
Open University, UK
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Decompilation
Reverse engineering
Statistical machine translation
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 224
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Suggesting meaningful variable names for decompiled code: a machine translation approach

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Comb for Decompiled C Code

Meaningful variable names for decompiled code: a machine translation approach

Syntactic discriminative language model rerankers for statistical machine translation