research-article

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

Authors:
Litao Li

School of Computing, Queen’s University, Canada

School of Computing, Queen’s University, Canada

0000-0001-7925-764X
View Profile

,
Steven H. H. Ding

School of Computing, Queen’s University, Canada

School of Computing, Queen’s University, Canada

0000-0003-4513-200X
View Profile

,
Yuan Tian

School of Computing, Queen’s University, Canada

School of Computing, Queen’s University, Canada

0000-0002-2208-3893
View Profile

,
Benjamin C. M. Fung

School of Information Studies, McGill University, Canada

School of Information Studies, McGill University, Canada

0000-0001-8423-2906
View Profile

,
Philippe Charland

Mission Critical Cyber Security Section, Defence R&D Canada, Canada

Mission Critical Cyber Security Section, Defence R&D Canada, Canada

0000-0003-4051-9942
View Profile

,
Weihan Ou

School of Computing, Queen’s University, Canada

School of Computing, Queen’s University, Canada

0000-0002-6911-6146
View Profile

,
Leo Song

School of Computing, Queen’s University, Canada

School of Computing, Queen’s University, Canada

0000-0002-1195-0007
View Profile

,
Congwei Chen

School of Computing, Queen’s University, Canada

School of Computing, Queen’s University, Canada

0000-0003-4387-4210
View Profile

Authors Info & Claims

ACM Transactions on Privacy and Security Volume 26 Issue 3Article No.: 28pp 1–25https://doi.org/10.1145/3585386

Published:14 April 2023Publication History

ACM Transactions on Privacy and Security

Abstract

Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs.

We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.

REFERENCES

[1] Alexopoulos Nikolaos, Habib Sheikh Mahbub, Schulz Steffen, and Mühlhäuser Max. 2020. The tip of the iceberg: On the merits of finding security bugs. ACM Trans. Privacy Secur. 24, 1 (2020), 1–33.Google Scholar
[2] Farris Katheryn A., Shah Ankit, Cybenko George, Ganesan Rajesh, and Jajodia Sushil. 2018. Vulcon: A system for vulnerability prioritization, mitigation, and management. ACM Trans. Privacy Secur. 21, 4 (2018), 1–28.Google ScholarDigital Library
[3] Gollmann Dieter. 2008. Software security—The dangers of abstraction. In Proceedings of the IFIP Summer School on the Future of Identity in the Information Society. Springer, 1–12.Google Scholar
[4] Flawfinder Home Page. Retrieved from http://https://dwheeler.com/flawfinder/.Google Scholar
[5] Rough Auditing Tool for Security. Retrieved from http://https://github.com/andrew-d/rough-auditing-tool-for-security.Google Scholar
[6] Li Zhen, Zou Deqing, Xu Shouhuai, Ou Xinyu, Jin Hai, Wang Sujuan, Deng Zhijun, and Zhong Yuyi. 2018. VulDeePecker: A deep learning-based system for vulnerability detection. Retrieved from https://arXiv:1801.01681.Google Scholar
[7] Harer Jacob A., Kim Louis Y., Russell Rebecca L., Ozdemir Onur, Kosta Leonard R., Rangamani Akshay, Hamilton Lei H., Centeno Gabriel I., Key Jonathan R., Ellingwood Paul M. et al. 2018. Automated software vulnerability detection with machine learning. Retrieved from https://arXiv:1803.04497.Google Scholar
[8] Le Tue, Nguyen Tuan, Le Trung, Phung Dinh, Montague Paul, Vel Olivier De, and Qu Lizhen. 2018. Maximal divergence sequential autoencoder for binary software vulnerability detection. In Proceedings of the International Conference on Learning Representations.Google Scholar
[9] Zhang Jian, Wang Xu, Zhang Hongyu, Sun Hailong, Wang Kaixuan, and Liu Xudong. 2019. A novel neural source code representation based on abstract syntax tree. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 783–794.Google ScholarDigital Library
[10] Li Yi, Wang Shaohua, and Nguyen Tien N.. 2021. Vulnerability detection with fine-grained interpretations. Retrieved from https://arXiv:2106.10478.Google Scholar
[11] Zou Deqing, Zhu Yawei, Xu Shouhuai, Li Zhen, Jin Hai, and Ye Hengkai. 2021. Interpreting deep learning-based vulnerability detector predictions based on heuristic searching. ACM Trans. Softw. Eng. Methodol. 30, 2 (2021), 1–31.Google ScholarDigital Library
[12] Raff Edward, Barker Jon, Sylvester Jared, Brandon Robert, Catanzaro Bryan, and Nicholas Charles K.. 2018. Malware detection by eating a whole exe. In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
[13] White Martin, Tufano Michele, Vendome Christopher, and Poshyvanyk Denys. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). IEEE, 87–98.Google ScholarDigital Library
[14] Wu Fang, Wang Jigang, Liu Jiqiang, and Wang Wei. 2017. Vulnerability detection with deep learning. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC’17). IEEE, 1298–1302.Google ScholarCross Ref
[15] Ding Steven H. H., Fung Benjamin C. M., and Charland Philippe. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the IEEE Symposium on Security and Privacy (SP’19). IEEE, 472–489.Google ScholarCross Ref
[16] Zheng Yunhui and Zhang Xiangyu. 2013. Path sensitive static analysis of web applications for remote code execution vulnerability detection. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 652–661.Google ScholarCross Ref
[17] F. A. Gers, J. Schmidhuber, and F. Cummins. 2000. Learning to forget: Continual prediction with lstm. Neural Computation 12, 10 (2000), 2451–2471.Google Scholar
[18] Chung Junyoung, Gulcehre Caglar, Cho KyungHyun, and Bengio Yoshua. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. Retrieved from https://arXiv:1412.3555.Google Scholar
[19] Chung Junyoung, Gulcehre Caglar, Cho KyungHyun, and Bengio Yoshua. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. Retrieved from https://arXiv:1412.3555.Google Scholar
[20] Yang Zhilin, Dai Zihang, Salakhutdinov Ruslan, and Cohen William W.. 2017. Breaking the softmax bottleneck: A high-rank RNN language model. Retrieved from https://arXiv:1711.03953.Google Scholar
[21] Scarselli Franco, Gori Marco, Tsoi Ah Chung, Hagenbuchner Markus, and Monfardini Gabriele. 2008. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (2008), 61–80.Google ScholarDigital Library
[22] Berg Rianne van den, Kipf Thomas N., and Welling Max. 2017. Graph convolutional matrix completion. Retrieved from https://arXiv:1706.02263.Google Scholar
[23] Ruder Sebastian. 2017. An overview of multi-task learning in deep neural networks. Retrieved from https://arXiv:1706.05098.Google Scholar
[24] Caruana Rich. 1997. Multitask learning. Mach. Learn. 28, 1 (1997), 41–75.Google ScholarDigital Library
[25] David Yaniv, Partush Nimrod, and Yahav Eran. 2016. Statistical similarity of binaries. ACM Sigplan Notices 51, 6 (2016), 266–280.Google ScholarDigital Library
[26] Albahar Marwan Ali. 2020. A modified maximal divergence sequential auto-encoder and time delay neural network models for vulnerable binary codes detection. IEEE Access 8 (2020), 14999–15006.Google ScholarCross Ref
[27] Lee Young Jun, Choi Sang-Hoon, Kim Chulwoo, Lim Seung-Ho, and Park Ki-Woong. 2017. Learning binary code with deep learning to detect software weakness. In Proceedings of the 9th International Conference on Internet (ICONI’17).Google Scholar
[28] Lee Yongjun, Kwon Hyun, Choi Sang-Hoon, Lim Seung-Ho, Baek Sung Hoon, and Park Ki-Woong. 2019. Instruction2vec: Efficient preprocessor of assembly code to detect software weakness with CNN. Appl. Sci. 9, 19 (2019), 4086.Google ScholarCross Ref
[29] Arakelyan Shushan, Hauser Christophe, Kline Erik, and Galstyan Aram. 2020. Towards learning representations of binary executable files for security tasks. Retrieved from https://arXiv:2002.03388.Google Scholar
[30] Boudjema El Habib, Verlan Sergey, Mokdad Lynda, and Faure Christèle. 2020. VYPER: Vulnerability detection in binary code. Secur. Privacy 3, 2 (2020), e100.Google Scholar
[31] Brooks Teresa Nicole. 2018. Survey of automated vulnerability detection and exploit generation techniques in cyber reasoning systems. In Science and Information Conference. Springer, 1083–1102.Google Scholar
[32] Cha Sang Kil, Avgerinos Thanassis, Rebert Alexandre, and Brumley David. 2012. Unleashing mayhem on binary code. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE, 380–394.Google ScholarDigital Library
[33] Stephens Nick, Grosen John, Salls Christopher, Dutcher Andrew, Wang Ruoyu, Corbetta Jacopo, Shoshitaishvili Yan, Kruegel Christopher, and Vigna Giovanni. 2016. Driller: Augmenting fuzzing through selective symbolic execution.. In Proceedings of the Network and Distributed System Security Symposium (NDSS’16), Vol. 16. 1–16.Google ScholarCross Ref
[34] Gao Jian, Yang Xin, Fu Ying, Jiang Yu, and Sun Jiaguang. 2018. Vulseeker: A semantic learning based vulnerability seeker for cross-platform binary. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 896–899.Google ScholarDigital Library
[35] Schrouff Jessica, Wohlfahrt Kai, Marnette Bruno, and Atkinson Liam. 2019. Inferring javascript types using graph neural networks. Retrieved from https://arXiv:1905.06707.Google Scholar
[36] Wu Fang, Wang Jigang, Liu Jiqiang, and Wang Wei. 2017. Vulnerability detection with deep learning. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC’17). IEEE, 1298–1302.Google ScholarCross Ref
[37] Li Zhen, Zou Deqing, Xu Shouhuai, Jin Hai, Zhu Yawei, and Chen Zhaoxuan. 2018. Sysevr: A framework for using deep learning to detect software vulnerabilities. Retrieved from https://arXiv:1807.06756.Google Scholar
[38] Russell Rebecca, Kim Louis, Hamilton Lei, Lazovich Tomo, Harer Jacob, Ozdemir Onur, Ellingwood Paul, and McConley Marc. 2018. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18). IEEE, 757–762.Google ScholarCross Ref
[39] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S., and Dean Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111–3119.Google ScholarDigital Library
[40] Wu Zonghan, Pan Shirui, Chen Fengwen, Long Guodong, Zhang Chengqi, and Yu Philip S.. 2019. A comprehensive survey on graph neural networks. Retrieved from https://arXiv:1901.00596.Google Scholar
[41] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, W. Li, X. Xie, and M. Guo. 2019. Learning graph representation with generative adversarial nets. IEEE Transactions on Knowledge and Data Engineering 33, 8 (2019), 3090–3103.Google Scholar
[42] Rossi Ryan A., Zhou Rong, and Ahmed Nesreen K.. 2018. Deep inductive graph representation learning. IEEE Trans. Knowl. Data Eng. 32, 3 (2018), 438–452.Google ScholarCross Ref
[43] Xu Xiaojun, Liu Chang, Feng Qian, Yin Heng, Song Le, and Song Dawn. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 363–376.Google ScholarDigital Library
[44] Zhou Yaqin, Liu Shangqing, Siow Jingkai, Du Xiaoning, and Liu Yang. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. MIT Press, 10197–10207.Google Scholar

Index Terms

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Software and application security
    1. Software reverse engineering

Recommendations

Learning-based Vulnerability Detection in Binary Code
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

Cyberattacks typically exploit software vulnerabilities to compromise computers and smart devices. To address vulnerabilities, many approaches have been developed to detect vulnerabilities using deep learning. However, most learning-based approaches ...
Read More
ExplAInable Pixels: Investigating One-Pixel Attacks on Deep Learning Models with Explainable Visualizations
MUM '22: Proceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia

Nowadays, deep learning models enable numerous safety-critical applications, such as biometric authentication, medical diagnosis support, and self-driving cars. However, previous studies have frequently demonstrated that these models are attackable ...
Read More
Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Privacy and Security Volume 26, Issue 3
August 2023
640 pages
ISSN:2471-2566
EISSN:2471-2574
DOI:10.1145/3582895
Editor:
Ninghui Li
Purdue University, USA
Issue’s Table of Contents
This article was authored by employees of the Government of Canada. As such, the Canadian government retains all interest in the copyright to this work and grants to ACM a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, provided that clear attribution is given both to the authors and the Canadian government agency employing them. Permission to make digital or hard copies for personal or classroom use is granted. Copies must bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the Canadian Government must be honored. To copy otherwise, distribute, republish, or post, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2023
- Online AM: 3 March 2023
- Accepted: 15 February 2023
- Received: 16 March 2022
Published in tops Volume 26, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Binary vulnerability detection
multi-task deep learning
attentional GCNN
explainability
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 954
  Total Downloads
- Downloads (Last 12 months)900
- Downloads (Last 6 weeks)100
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

ACM Transactions on Privacy and Security

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Learning-based Vulnerability Detection in Binary Code

ExplAInable Pixels: Investigating One-Pixel Attacks on Deep Learning Models with Explainable Visualizations

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

ACM Transactions on Privacy and Security

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Learning-based Vulnerability Detection in Binary Code

ExplAInable Pixels: Investigating One-Pixel Attacks on Deep Learning Models with Explainable Visualizations

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media