research-article

VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits

Authors:

Sergej Dechand,

Fabian Yamaguchi,

Yasemin AcarAuthors Info & Claims

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Pages 426 - 437

https://doi.org/10.1145/2810103.2813604

Published: 12 October 2015 Publication History

Abstract

Despite the security community's best effort, the number of serious vulnerabilities discovered in software is increasing rapidly. In theory, security audits should find and remove the vulnerabilities before the code ever gets deployed. However, due to the enormous amount of code being produced, as well as a the lack of manpower and expertise, not all code is sufficiently audited. Thus, many vulnerabilities slip into production systems. A best-practice approach is to use a code metric analysis tool, such as Flawfinder, to flag potentially dangerous code so that it can receive special attention. However, because these tools have a very high false-positive rate, the manual effort needed to find vulnerabilities remains overwhelming. In this paper, we present a new method of finding potentially dangerous code in code repositories with a significantly lower false-positive rate than comparable systems. We combine code-metric analysis with metadata gathered from code repositories to help code review teams prioritize their work. The paper makes three contributions. First, we conducted the first large-scale mapping of CVEs to GitHub commits in order to create a vulnerable commit database. Second, based on this database, we trained a SVM classifier to flag suspicious commits. Compared to Flawfinder, our approach reduces the amount of false alarms by over 99 % at the same level of recall. Finally, we present a thorough quantitative and qualitative analysis of our approach and discuss lessons learned from the results. We will share the database as a benchmark for future research and will also provide our analysis tool as a web service.

References

[1]

Clang static analyzer. http://clang-analyzer.llvm.org/. Accessed: 2015-05-08.

[2]

Trinity: A linux system call fuzzer. http://codemonkey.org.uk/projects/trinity/. Accessed: 2015-05-08.

[3]

Valgrind. http://valgrind.org/. Accessed: 2015-05-08.

[4]

CodeSonar® | GrammaTech static analysis. https://www.grammatech.com/codesonar/, visited August, 2015.

[5]

Coverity Scan -- static analysis. https://scan.coverity.com/, visited August, 2015.

[6]

HP Fortify. https://www.hpfod.com/, visited August, 2015.

[7]

IBM Security AppScan Source. https://www.ibm.com/software/products/en/appscan-source/, visited August, 2015.

[8]

PREfast analysis tool. https://msdn.microsoft.com/en-us/library/ms933794.aspx, visited January, 2015.

[9]

Rough auditing tool for security (RATS). https://code.google.com/p/rough-auditing-tool-for-security/, visited January, 2015.

[10]

Splint -- annotation-assisted lightweight static checking. http://splint.org/, visited January, 2015.

[11]

S. Bandhakavi, S. T. King, P. Madhusudan, and M. Winslett. VEX: Vetting browser extensions for security vulnerabilities. In USENIX Security Symposium, volume 10, pages 339--354, 2010.

Digital Library

[12]

C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008.

Digital Library

[13]

R.-Y. Chang, A. Podgurski, and J. Yang. Discovering neglected conditions in software by mining dependence graphs. Software Engineering, IEEE Transactions on, 34 (5): 579--596, Sept 2008.

Digital Library

[14]

C. Y. Cho, D. Babic, P. Poosankam, K. Z. Chen, E. X. Wu, and D. Song. MACE: Model-inference-assisted concolic exploration for protocol and vulnerability discovery. In USENIX Security Symposium, pages 139--154, 2011.

Digital Library

[15]

J. Dahse and T. Holz. Static detection of second-order vulnerabilities in web applications. In 23rd USENIX Security Symposium (USENIX Security 14), pages 989--1003, San Diego, CA, Aug. 2014. USENIX Association.

Digital Library

[16]

E. W. Dijkstra. Letters to the editor: go to statement considered harmful. Communications of the ACM, 11 (3): 147--148, 1968.

Digital Library

[17]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JMLR), 9: 1871--1874, 2008.

Digital Library

[18]

J. Graylin, J. E. Hale, R. K. Smith, H. David, N. A. Kraft, W. Charles, et al. Cyclomatic complexity and lines of code: empirical evidence of a stable linear relationship. Journal of Software Engineering and Applications, 2 (03): 137, 2009.

[19]

M. H. Halstead. Elements of software science. Elsevier computer science library: operational programming systems series. North-Holland, New York, NY, 1977.

Digital Library

[20]

C. Holler, K. Herzig, and A. Zeller. Fuzzing with code fragments. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12), pages 445--458, Bellevue, WA, 2012. USENIX.

Digital Library

[21]

S. Kim, E. J. Whitehead Jr, and Y. Zhang. Classifying software changes: Clean or buggy? Software Engineering, IEEE Transactions on, 34 (2): 181--196, 2008.

Digital Library

[22]

T. J. McCabe. A complexity measure. Software Engineering, IEEE Transactions on, (4): 308--320, 1976.

Digital Library

[23]

A. Meneely and O. Williams. Interactive churn metrics: Socio-technical variants of code churn. SIGSOFT Softw. Eng. Notes, 37 (6): 1--6, Nov. 2012.

Digital Library

[24]

A. Meneely, H. Srinivasan, A. Musa, A. Rodriguez Tejeda, M. Mokary, and B. Spates. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In Empirical Software Engineering and Measurement, 2013 ACM / IEEE International Symposium on, pages 65--74, Oct 2013.

[25]

A. Meneely, A. C. R. Tejeda, B. Spates, S. Trudeau, D. Neuberger, K. Whitlock, C. Ketant, and K. Davis. An empirical investigation of socio-technical code review metrics and security vulnerabilities. In Proceedings of the 6th International Workshop on Social Software Engineering, SSE 2014, pages 37--44. ACM, 2014.

Digital Library

[26]

S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security, pages 529--540. ACM, 2007.

Digital Library

[27]

K. Rieck, C. Wressnegger, and A. Bikadorov. Sally: A tool for embedding strings in vector spaces. Journal of Machine Learning Research (JMLR), 13 (Nov): 3247--3251, Nov. 2012.

Digital Library

[28]

A. Sadeghi, N. Esfahani, and S. Malek. Mining the categorized software repositories to improve the analysis of security vulnerabilities. In Fundamental Approaches to Software Engineering, pages 155--169. Springer, 2014.

Digital Library

[29]

G. Salton. Mathematics and information retrieval. Journal of Documentation, 35 (1): 1--29, 1979.

[30]

G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1986.

Digital Library

[31]

R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen. Predicting vulnerable software components via text mining. Software Engineering, IEEE Transactions on, 40 (10): 993--1006, Oct 2014.

[32]

J.'Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? ACM Sigsoft Software Engineering Notes, 30 (4): 1--5, 2005.

Digital Library

[33]

H. W. Wendt. Dealing with a common problem in social science: A simplified rank-biserial coefficient of correlation based on the u statistic. European Journal of Social Psychology, 2 (4): 463--465, 1972.

[34]

D. A. Wheeler. Flawfinder. http://www.dwheeler.com/flawfinder/, visited January, 2015.

[35]

D. Wijayasekara, M. Manic, J. L. Wright, and M. McQueen. Mining bug databases for unidentified software vulnerabilities. In Human System Interactions (HSI), 2012 5th International Conference on, pages 89--96. IEEE, 2012.

Digital Library

[36]

F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 499--510. ACM, 2013.

Digital Library

[37]

F. Yamaguchi, N. Golde, D. Arp, and K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 2014.

Digital Library

[38]

T. Zimmermann, N. Nagappan, and L. Williams. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on, pages 421--428. IEEE, 2010.

Digital Library

Cited By

Farhi NKoenigstein NShavitt Y(2025)PatchView: Multi-modality detection of security patchesComputers & Security10.1016/j.cose.2025.104356151(104356)Online publication date: Apr-2025
https://doi.org/10.1016/j.cose.2025.104356
Magyar A(2024)Source Code Vulnerability Analysis Using GPT-2Redefining Security With Cyber AI10.4018/979-8-3693-6517-5.ch009(161-182)Online publication date: 17-Jul-2024
https://doi.org/10.4018/979-8-3693-6517-5.ch009
Bagheri AHegedűs P(2024)Towards a Block-Level Conformer-Based Python Vulnerability DetectionSoftware10.3390/software30300163:3(310-327)Online publication date: 31-Jul-2024
https://doi.org/10.3390/software3030016
Show More Cited By

Index Terms

Recommendations

DEKANT: a static analysis tool that learns to detect web application vulnerabilities
ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis

The state of web security remains troubling as web applications continue to be favorite targets of hackers. Static analysis tools are important mechanisms for programmers to deal with this problem as they search for vulnerabilities automatically in the ...
Bran: Reduce Vulnerability Search Space in Large Open Source Repositories by Learning Bug Symptoms
ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security

Software is continually increasing in size and complexity, and therefore, vulnerability discovery would benefit from techniques that identify potentially vulnerable regions within large code bases, as this allows for easing vulnerability detection by ...
Vulvet: Vetting of Vulnerabilities in Android Apps to Thwart Exploitation
Field Notes

Data security and privacy of Android users is one of the challenging security problems addressed by the security research community. A major source of the security vulnerabilities in Android apps is attributed to bugs within source code, insecure APIs, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

October 2015

1750 pages

ISBN:9781450338325

DOI:10.1145/2810103

General Chair:
Indrajit Ray
Colorado State University, USA
,
Program Chairs:
Ninghui Li
Purdue University, USA
,
Christopher Kruegel
University of California, Santa Barbara, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS'15

Sponsor:

SIGSAC

CCS'15: The 22nd ACM Conference on Computer and Communications Security

October 12 - 16, 2015

Colorado, Denver, USA

Acceptance Rates

CCS '15 Paper Acceptance Rate 128 of 660 submissions, 19%;

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

176
Total Citations
View Citations
2,581
Total Downloads

Downloads (Last 12 months)247
Downloads (Last 6 weeks)24

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Farhi NKoenigstein NShavitt Y(2025)PatchView: Multi-modality detection of security patchesComputers & Security10.1016/j.cose.2025.104356151(104356)Online publication date: Apr-2025
https://doi.org/10.1016/j.cose.2025.104356
Magyar A(2024)Source Code Vulnerability Analysis Using GPT-2Redefining Security With Cyber AI10.4018/979-8-3693-6517-5.ch009(161-182)Online publication date: 17-Jul-2024
https://doi.org/10.4018/979-8-3693-6517-5.ch009
Bagheri AHegedűs P(2024)Towards a Block-Level Conformer-Based Python Vulnerability DetectionSoftware10.3390/software30300163:3(310-327)Online publication date: 31-Jul-2024
https://doi.org/10.3390/software3030016
Aladics THegedűs PFerenc R(2024)A Comparative Study of Commit Representations for JIT Vulnerability PredictionComputers10.3390/computers1301002213:1(22)Online publication date: 11-Jan-2024
https://doi.org/10.3390/computers13010022
Chughtai MBibi IKarim SShah SLaghari AKhan A(2024)Deep learning trends and future perspectives of web security and vulnerabilitiesJournal of High Speed Networks10.3233/JHS-23003730:1(115-146)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JHS-230037
Fu ZGuo SLi HChen RLi XJiang HLarson K(2024)VF-DetectorProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/643(5817-5825)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/643
Jones AOmar M(2024)Codesentry: Revolutionizing Real-Time Software Vulnerability Detection With Optimized GPT FrameworkLand Forces Academy Review10.2478/raft-2024-001029:1(98-107)Online publication date: 28-Feb-2024
https://doi.org/10.2478/raft-2024-0010
Shiri Harzevili NBoaye Belle AWang JWang SJiang ZNagappan N(2024)A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine LearningACM Computing Surveys10.1145/369971157:3(1-36)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3699711
St. Amour LTilevich EErtl MKirsch C(2024)Toward Declarative Auditing of Java Software for Graceful Exception HandlingProceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3679007.3685057(90-97)Online publication date: 13-Sep-2024
https://dl.acm.org/doi/10.1145/3679007.3685057
Li KZhang JChen SLiu HLiu YChen YChristakis MPradel M(2024)PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source SoftwareProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680305(590-602)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680305
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten