ABSTRACT
Binary code presents unique analysis challenges, particularly when debugging information has been stripped from the executable. Among the valuable information lost in stripping are the identities of standard library functions linked into the executable; knowing the identities of such functions can help to optimize automated analysis and is instrumental in understanding program behavior. Library fingerprinting attempts to restore the names of library functions in stripped binaries, using signatures extracted from reference libraries. Existing methods are brittle in the face of variations in the toolchain that produced the reference libraries and do not generalize well to new library versions. We introduce semantic descriptors, high-level representations of library functions that avoid the brittleness of existing approaches. We have extended a tool, unstrip, to apply this technique to fingerprint wrapper functions in the GNU C library. unstrip discovers functions in a stripped binary and outputs a new binary, with meaningful names added to the symbol table. Other tools can leverage these symbols to perform further analysis. We demonstrate that our semantic descriptors generalize well and substantially outperform existing library fingerprinting techniques.
- G. Balakrishnan, T. Reps, D. Melski, and T. Teitelbaum. WYSINWYX: What You See Is Not What You eXecute. In Verified Software: Theories, Tools, Experiments. Springer-Verlag, 2007. Google ScholarDigital Library
- U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda. Scalable, behavior-based malware clustering. In Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, February 2009.Google Scholar
- T. E. Cheatham, G. H. Holloway, and J. A. Townley. Symbolic evaluation and the analysis of programs. IEEE Trans. Softw. Eng., 5 (4): 402--417, 1979. Google ScholarDigital Library
- M. Christodorescu, S. Jha, and C. Krugel. Mining specifications of malicious behavior. In Proceedings of the Sixth Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 5--14, Dubrovnik, Croatia, 2007. Google ScholarDigital Library
- C. Cifuentes and A. Fraboulet. Intraprocedural static slicing of binary executables. In Proc. International Conference on Software Maintenance, pages 188--195, Bari, Italy, October 1997. Google ScholarDigital Library
- C. Cifuentes and K. J. Gough. Decompilation of binary programs. Software--Practice and Experience, 25 (7), 1995. Google ScholarDigital Library
- P. Coward. Symbolic execution systems-a review. Software Engineering Journal, 3 (6): 229--239, Nov 1988. Google ScholarDigital Library
- ROSED. J. Quinlan et al. ROSE Compiler Project. http://www.rosecompiler.org.Google Scholar
- T. Dullien and R. Rolles. Graph-based comparison of executable objects. In Symposium sur la Sécurité des Technologies de l'Information et des Communications (SSTIC), June 2005.Google Scholar
- M. V. Emmerik. Signatures for library functions in executable files. Technical Report 2194, Queensland University of Technology, 1994.Google Scholar
- H. Flake. Structural comparison of executable objects. In Conference Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2004), Dortmund, Germany, July 2004.Google Scholar
- M. Fredrikson, S. Jha, M. Christodorescu, R. Sailer, and X. Yan. Synthesizing near-optimal malware specifications from suspicious behaviors. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Berkeley, California, May 2010. Google ScholarDigital Library
- I. Guilfanova and DataRescue. Fast library identificatiion and recognition technology. http://www.hex-rays.com/idapro/flirt.htm, 1997.Google Scholar
- Hex-Rays. IDA Pro disassembler. http://www.hex-rays.com/idapro.Google Scholar
- A. Kiss, J. Jasz, G. Lehotai, and T. Gyimothy. Interprocedural static slicing of binary executables. In Source Code Analysis and Manipulation, Amsterdam, The Netherlands, September 2003.Google Scholar
- C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X. Zho, and X. Wang. Effective and efficient malware detection at the end host. In Eighteenth USENIX Security Symposium, Montreal, Canada, August 2009. Google ScholarDigital Library
- C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic worm detection using structural information of executables. In Eighth International Symposium on Recent Advances in Intrusion Detection (RAID 2005), Seattle,WA, September 2005. Google ScholarDigital Library
- Paradyn Project. Dyninst 7.0. 2011. URL http://www.paradyn.org/html/dyninst7.0-features.html.Google Scholar
- Paradyn Project. ParseAPI: An application program interface for binary parsing. 2011. URL http://paradyn.org/html/parse0.9-features.html.Google Scholar
- Paradyn Project. shape unstrip. 2011. URL http://paradyn.org/html/tools/unstrip.html.Google Scholar
- N. Rosenblum, X. Zhu, B. Miller, and K. Hunt. Learning to analyze binary computer code. In 23rd conference on Artificial Intellegence (AAAI '08), Chicago, IL, July 2008. Google ScholarDigital Library
- N. E. Rosenblum, B. P. Miller, and X. Zhu. Extracting compiler provenance from program binaries. In 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering (PASTE '10), Toronto, Ontario, Canada, June 2010. Google ScholarDigital Library
- H. Theiling. Ecxtracting safe and precise control flow from binaries. In 7th Conference on Real-Time Computing Systems and Applications (RTCSA '00), Washington, DC, December 2000. Google ScholarDigital Library
Index Terms
- Labeling library functions in stripped binaries
Recommendations
Probabilistic Naming of Functions in Stripped Binaries
ACSAC '20: Proceedings of the 36th Annual Computer Security Applications ConferenceDebugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and ...
Identifying functions in binary code with reverse extended control flow graphs
In binary code analysis, current function identification approaches are challenged by functions without explicit call sites and handcrafted assembly without standard prologues/epilogues. We propose a new function representation called a reverse extended ...
Comments