Abstract
Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs.
We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.
- [1] . 2020. The tip of the iceberg: On the merits of finding security bugs. ACM Trans. Privacy Secur. 24, 1 (2020), 1–33.Google Scholar
- [2] . 2018. Vulcon: A system for vulnerability prioritization, mitigation, and management. ACM Trans. Privacy Secur. 21, 4 (2018), 1–28.Google ScholarDigital Library
- [3] . 2008. Software security—The dangers of abstraction. In Proceedings of the IFIP Summer School on the Future of Identity in the Information Society. Springer, 1–12.Google Scholar
- [4] Flawfinder Home Page. Retrieved from http://https://dwheeler.com/flawfinder/.Google Scholar
- [5] Rough Auditing Tool for Security. Retrieved from http://https://github.com/andrew-d/rough-auditing-tool-for-security.Google Scholar
- [6] . 2018. VulDeePecker: A deep learning-based system for vulnerability detection. Retrieved from https://arXiv:1801.01681.Google Scholar
- [7] . 2018. Automated software vulnerability detection with machine learning. Retrieved from https://arXiv:1803.04497.Google Scholar
- [8] . 2018. Maximal divergence sequential autoencoder for binary software vulnerability detection. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [9] . 2019. A novel neural source code representation based on abstract syntax tree. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 783–794.Google ScholarDigital Library
- [10] . 2021. Vulnerability detection with fine-grained interpretations. Retrieved from https://arXiv:2106.10478.Google Scholar
- [11] . 2021. Interpreting deep learning-based vulnerability detector predictions based on heuristic searching. ACM Trans. Softw. Eng. Methodol. 30, 2 (2021), 1–31.Google ScholarDigital Library
- [12] . 2018. Malware detection by eating a whole exe. In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- [13] . 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). IEEE, 87–98.Google ScholarDigital Library
- [14] . 2017. Vulnerability detection with deep learning. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC’17). IEEE, 1298–1302.Google ScholarCross Ref
- [15] . 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the IEEE Symposium on Security and Privacy (SP’19). IEEE, 472–489.Google ScholarCross Ref
- [16] . 2013. Path sensitive static analysis of web applications for remote code execution vulnerability detection. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 652–661.Google ScholarCross Ref
- [17] F. A. Gers, J. Schmidhuber, and F. Cummins. 2000. Learning to forget: Continual prediction with lstm. Neural Computation 12, 10 (2000), 2451–2471.Google Scholar
- [18] . 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. Retrieved from https://arXiv:1412.3555.Google Scholar
- [19] . 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. Retrieved from https://arXiv:1412.3555.Google Scholar
- [20] . 2017. Breaking the softmax bottleneck: A high-rank RNN language model. Retrieved from https://arXiv:1711.03953.Google Scholar
- [21] . 2008. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (2008), 61–80.Google ScholarDigital Library
- [22] . 2017. Graph convolutional matrix completion. Retrieved from https://arXiv:1706.02263.Google Scholar
- [23] . 2017. An overview of multi-task learning in deep neural networks. Retrieved from https://arXiv:1706.05098.Google Scholar
- [24] . 1997. Multitask learning. Mach. Learn. 28, 1 (1997), 41–75.Google ScholarDigital Library
- [25] . 2016. Statistical similarity of binaries. ACM Sigplan Notices 51, 6 (2016), 266–280.Google ScholarDigital Library
- [26] . 2020. A modified maximal divergence sequential auto-encoder and time delay neural network models for vulnerable binary codes detection. IEEE Access 8 (2020), 14999–15006.Google ScholarCross Ref
- [27] . 2017. Learning binary code with deep learning to detect software weakness. In Proceedings of the 9th International Conference on Internet (ICONI’17).Google Scholar
- [28] . 2019. Instruction2vec: Efficient preprocessor of assembly code to detect software weakness with CNN. Appl. Sci. 9, 19 (2019), 4086.Google ScholarCross Ref
- [29] . 2020. Towards learning representations of binary executable files for security tasks. Retrieved from https://arXiv:2002.03388.Google Scholar
- [30] . 2020. VYPER: Vulnerability detection in binary code. Secur. Privacy 3, 2 (2020), e100.Google Scholar
- [31] . 2018. Survey of automated vulnerability detection and exploit generation techniques in cyber reasoning systems. In Science and Information Conference. Springer, 1083–1102.Google Scholar
- [32] . 2012. Unleashing mayhem on binary code. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE, 380–394.Google ScholarDigital Library
- [33] . 2016. Driller: Augmenting fuzzing through selective symbolic execution.. In Proceedings of the Network and Distributed System Security Symposium (NDSS’16), Vol. 16. 1–16.Google ScholarCross Ref
- [34] . 2018. Vulseeker: A semantic learning based vulnerability seeker for cross-platform binary. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 896–899.Google ScholarDigital Library
- [35] . 2019. Inferring javascript types using graph neural networks. Retrieved from https://arXiv:1905.06707.Google Scholar
- [36] . 2017. Vulnerability detection with deep learning. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC’17). IEEE, 1298–1302.Google ScholarCross Ref
- [37] . 2018. Sysevr: A framework for using deep learning to detect software vulnerabilities. Retrieved from https://arXiv:1807.06756.Google Scholar
- [38] . 2018. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18). IEEE, 757–762.Google ScholarCross Ref
- [39] . 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111–3119.Google ScholarDigital Library
- [40] . 2019. A comprehensive survey on graph neural networks. Retrieved from https://arXiv:1901.00596.Google Scholar
- [41] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, W. Li, X. Xie, and M. Guo. 2019. Learning graph representation with generative adversarial nets. IEEE Transactions on Knowledge and Data Engineering 33, 8 (2019), 3090–3103.Google Scholar
- [42] . 2018. Deep inductive graph representation learning. IEEE Trans. Knowl. Data Eng. 32, 3 (2018), 438–452.Google ScholarCross Ref
- [43] . 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 363–376.Google ScholarDigital Library
- [44] . 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. MIT Press, 10197–10207.Google Scholar
Index Terms
- VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution
Recommendations
Learning-based Vulnerability Detection in Binary Code
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and ComputingCyberattacks typically exploit software vulnerabilities to compromise computers and smart devices. To address vulnerabilities, many approaches have been developed to detect vulnerabilities using deep learning. However, most learning-based approaches ...
ExplAInable Pixels: Investigating One-Pixel Attacks on Deep Learning Models with Explainable Visualizations
MUM '22: Proceedings of the 21st International Conference on Mobile and Ubiquitous MultimediaNowadays, deep learning models enable numerous safety-critical applications, such as biometric authentication, medical diagnosis support, and self-driving cars. However, previous studies have frequently demonstrated that these models are attackable ...
Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software EngineeringRecently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several ...
Comments