ABSTRACT
Decompiled code lacks meaningful variable names. We used statistical machine translation to suggest variable names that are natural given the context. This technique has previously been successfully applied to obfuscated JavaScript code, but decompiled C code poses unique challenges in constructing an aligned corpus and selecting the best translation from among several candidates.
- Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281–293. Google ScholarDigital Library
- Gal Beniamini, Sarah Gingichashvili, Alon Klein Orbach, and Dror G Feitelson. 2017. Meaningful identifier names: the case of single-letter variables. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 45–54. Google ScholarDigital Library
- Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015.Google Scholar
- Programming with "Big Code": Lessons, Techniques and Applications. In 1st Summit on Advances in Programming Languages (SNAPL 2015) (Leibniz International Proceedings in Informatics (LIPIcs)), Thomas Ball, Rastislav Bodik, Shriram Krishnamurthi, Benjamin S. Lerner, and Greg Morrisett (Eds.), Vol. 32. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 41–50. LIPIcs.SNAPL.2015.41Google Scholar
- Brendan Cleary, Christoph Treude, Fernando Figueira Filho, Margaret-Anne Storey, and Martin Salois. 2013. Improving Tool Support for Software Reverse Engineering in a Security Context. In International Conference on Augmented Cognition. Springer, 113–122.Google ScholarCross Ref
- Premkumar Devanbu. 2015. New initiative: the naturalness of software. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 543–546. Google ScholarDigital Library
- MV Emmerik and Trent Waddington. 2004. Using a decompiler for real-world source recovery. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on. IEEE, 27–36. Google ScholarDigital Library
- Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, Piscataway, NJ, USA, 233–236. http://dl.acm.org/citation.cfm?id=2487085. Google ScholarDigital Library
- 2487132Google Scholar
- Ilfak Guilfanov. 2008. Decompilers and beyond. Black Hat USA (2008).Google Scholar
- Hex-Rays. 2017. Hex-Rays 2.4. (2017). https://www.hexrays.comGoogle Scholar
- Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 837–847. Google ScholarDigital Library
- Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software. ACM, 173–184. Google ScholarDigital Library
- Donald Ervin Knuth. 1984. Literate programming. Comput. J. 27, 2 (1984), 97–111.Google ScholarDigital Library
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007.Google Scholar
- Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL ’07). Association for Computational Linguistics, Stroudsburg, PA, USA, 177–180. http://dl.acm.org/citation.cfm?id=1557769.1557821 Google ScholarDigital Library
- Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What’s in a Name? A Study of Identifiers. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC ’06). IEEE Computer Society, Washington, DC, USA, 3–12. Google ScholarDigital Library
- Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL ’15). ACM, New York, NY, USA, 111–124. Google ScholarDigital Library
- Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Notices, Vol. 49. ACM, 419–428. Google ScholarDigital Library
- Edward J. Schwartz, JongHyup Lee, Maverick Woo, and David Brumley. 2013.Google Scholar
- Native x86 Decompilation Using Semantics-preserving Structural Analysis and Iterative Control-flow Structuring. In Proceedings of the 22Nd USENIX Conference on Security (SEC’13). USENIX Association, Berkeley, CA, USA, 353–368. http: //dl.acm.org/citation.cfm?id=2534766.2534797 Google ScholarDigital Library
- Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering Clear, Natural Identifiers from Obfuscated JavaScript Names. In 12th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM. Google ScholarDigital Library
- Khaled Yakdan, Sergej Dechand, Elmar Gerhards-Padilla, and Matthew Smith. 2016. Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 158–177.Google ScholarCross Ref
- Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. 2015. No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations.. In NDSS. Abstract 1 Research Problem and Motivation 2 Background and Related Work 3 Approach and Uniqueness 4 Results and Contributions 4.1 Extracting Aligned Training Data 4.2 Machine Translation Framework 4.3 Evaluation 5 Conclusions ReferencesGoogle Scholar
Index Terms
- Suggesting meaningful variable names for decompiled code: a machine translation approach
Recommendations
A Comb for Decompiled C Code
ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications SecurityDecompilers are fundamental tools to perform security assessments of third-party software. The quality of decompiled code can be a game changer in order to reduce the time and effort required for analysis. This paper proposes a novel approach to ...
Meaningful variable names for decompiled code: a machine translation approach
ICPC '18: Proceedings of the 26th Conference on Program ComprehensionWhen code is compiled, information is lost, including some of the structure of the original source code as well as local identifier names. Existing decompilers can reconstruct much of the original source code, but typically use meaningless placeholder ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Comments