skip to main content
10.1145/2810103.2813604acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits

Published: 12 October 2015 Publication History

Abstract

Despite the security community's best effort, the number of serious vulnerabilities discovered in software is increasing rapidly. In theory, security audits should find and remove the vulnerabilities before the code ever gets deployed. However, due to the enormous amount of code being produced, as well as a the lack of manpower and expertise, not all code is sufficiently audited. Thus, many vulnerabilities slip into production systems. A best-practice approach is to use a code metric analysis tool, such as Flawfinder, to flag potentially dangerous code so that it can receive special attention. However, because these tools have a very high false-positive rate, the manual effort needed to find vulnerabilities remains overwhelming. In this paper, we present a new method of finding potentially dangerous code in code repositories with a significantly lower false-positive rate than comparable systems. We combine code-metric analysis with metadata gathered from code repositories to help code review teams prioritize their work. The paper makes three contributions. First, we conducted the first large-scale mapping of CVEs to GitHub commits in order to create a vulnerable commit database. Second, based on this database, we trained a SVM classifier to flag suspicious commits. Compared to Flawfinder, our approach reduces the amount of false alarms by over 99 % at the same level of recall. Finally, we present a thorough quantitative and qualitative analysis of our approach and discuss lessons learned from the results. We will share the database as a benchmark for future research and will also provide our analysis tool as a web service.

References

[1]
Clang static analyzer. http://clang-analyzer.llvm.org/. Accessed: 2015-05-08.
[2]
Trinity: A linux system call fuzzer. http://codemonkey.org.uk/projects/trinity/. Accessed: 2015-05-08.
[3]
Valgrind. http://valgrind.org/. Accessed: 2015-05-08.
[4]
CodeSonar® | GrammaTech static analysis. https://www.grammatech.com/codesonar/, visited August, 2015.
[5]
Coverity Scan -- static analysis. https://scan.coverity.com/, visited August, 2015.
[6]
HP Fortify. https://www.hpfod.com/, visited August, 2015.
[7]
IBM Security AppScan Source. https://www.ibm.com/software/products/en/appscan-source/, visited August, 2015.
[8]
PREfast analysis tool. https://msdn.microsoft.com/en-us/library/ms933794.aspx, visited January, 2015.
[9]
Rough auditing tool for security (RATS). https://code.google.com/p/rough-auditing-tool-for-security/, visited January, 2015.
[10]
Splint -- annotation-assisted lightweight static checking. http://splint.org/, visited January, 2015.
[11]
S. Bandhakavi, S. T. King, P. Madhusudan, and M. Winslett. VEX: Vetting browser extensions for security vulnerabilities. In USENIX Security Symposium, volume 10, pages 339--354, 2010.
[12]
C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008.
[13]
R.-Y. Chang, A. Podgurski, and J. Yang. Discovering neglected conditions in software by mining dependence graphs. Software Engineering, IEEE Transactions on, 34 (5): 579--596, Sept 2008.
[14]
C. Y. Cho, D. Babic, P. Poosankam, K. Z. Chen, E. X. Wu, and D. Song. MACE: Model-inference-assisted concolic exploration for protocol and vulnerability discovery. In USENIX Security Symposium, pages 139--154, 2011.
[15]
J. Dahse and T. Holz. Static detection of second-order vulnerabilities in web applications. In 23rd USENIX Security Symposium (USENIX Security 14), pages 989--1003, San Diego, CA, Aug. 2014. USENIX Association.
[16]
E. W. Dijkstra. Letters to the editor: go to statement considered harmful. Communications of the ACM, 11 (3): 147--148, 1968.
[17]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JMLR), 9: 1871--1874, 2008.
[18]
J. Graylin, J. E. Hale, R. K. Smith, H. David, N. A. Kraft, W. Charles, et al. Cyclomatic complexity and lines of code: empirical evidence of a stable linear relationship. Journal of Software Engineering and Applications, 2 (03): 137, 2009.
[19]
M. H. Halstead. Elements of software science. Elsevier computer science library: operational programming systems series. North-Holland, New York, NY, 1977.
[20]
C. Holler, K. Herzig, and A. Zeller. Fuzzing with code fragments. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12), pages 445--458, Bellevue, WA, 2012. USENIX.
[21]
S. Kim, E. J. Whitehead Jr, and Y. Zhang. Classifying software changes: Clean or buggy? Software Engineering, IEEE Transactions on, 34 (2): 181--196, 2008.
[22]
T. J. McCabe. A complexity measure. Software Engineering, IEEE Transactions on, (4): 308--320, 1976.
[23]
A. Meneely and O. Williams. Interactive churn metrics: Socio-technical variants of code churn. SIGSOFT Softw. Eng. Notes, 37 (6): 1--6, Nov. 2012.
[24]
A. Meneely, H. Srinivasan, A. Musa, A. Rodriguez Tejeda, M. Mokary, and B. Spates. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In Empirical Software Engineering and Measurement, 2013 ACM / IEEE International Symposium on, pages 65--74, Oct 2013.
[25]
A. Meneely, A. C. R. Tejeda, B. Spates, S. Trudeau, D. Neuberger, K. Whitlock, C. Ketant, and K. Davis. An empirical investigation of socio-technical code review metrics and security vulnerabilities. In Proceedings of the 6th International Workshop on Social Software Engineering, SSE 2014, pages 37--44. ACM, 2014.
[26]
S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security, pages 529--540. ACM, 2007.
[27]
K. Rieck, C. Wressnegger, and A. Bikadorov. Sally: A tool for embedding strings in vector spaces. Journal of Machine Learning Research (JMLR), 13 (Nov): 3247--3251, Nov. 2012.
[28]
A. Sadeghi, N. Esfahani, and S. Malek. Mining the categorized software repositories to improve the analysis of security vulnerabilities. In Fundamental Approaches to Software Engineering, pages 155--169. Springer, 2014.
[29]
G. Salton. Mathematics and information retrieval. Journal of Documentation, 35 (1): 1--29, 1979.
[30]
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1986.
[31]
R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen. Predicting vulnerable software components via text mining. Software Engineering, IEEE Transactions on, 40 (10): 993--1006, Oct 2014.
[32]
J.'Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? ACM Sigsoft Software Engineering Notes, 30 (4): 1--5, 2005.
[33]
H. W. Wendt. Dealing with a common problem in social science: A simplified rank-biserial coefficient of correlation based on the u statistic. European Journal of Social Psychology, 2 (4): 463--465, 1972.
[34]
D. A. Wheeler. Flawfinder. http://www.dwheeler.com/flawfinder/, visited January, 2015.
[35]
D. Wijayasekara, M. Manic, J. L. Wright, and M. McQueen. Mining bug databases for unidentified software vulnerabilities. In Human System Interactions (HSI), 2012 5th International Conference on, pages 89--96. IEEE, 2012.
[36]
F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 499--510. ACM, 2013.
[37]
F. Yamaguchi, N. Golde, D. Arp, and K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 2014.
[38]
T. Zimmermann, N. Nagappan, and L. Williams. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on, pages 421--428. IEEE, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
October 2015
1750 pages
ISBN:9781450338325
DOI:10.1145/2810103
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. machine learning
  2. static analysis
  3. vulnerabilities

Qualifiers

  • Research-article

Conference

CCS'15
Sponsor:

Acceptance Rates

CCS '15 Paper Acceptance Rate 128 of 660 submissions, 19%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)247
  • Downloads (Last 6 weeks)23
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)PatchView: Multi-modality detection of security patchesComputers & Security10.1016/j.cose.2025.104356151(104356)Online publication date: Apr-2025
  • (2024)Source Code Vulnerability Analysis Using GPT-2Redefining Security With Cyber AI10.4018/979-8-3693-6517-5.ch009(161-182)Online publication date: 17-Jul-2024
  • (2024)Towards a Block-Level Conformer-Based Python Vulnerability DetectionSoftware10.3390/software30300163:3(310-327)Online publication date: 31-Jul-2024
  • (2024)A Comparative Study of Commit Representations for JIT Vulnerability PredictionComputers10.3390/computers1301002213:1(22)Online publication date: 11-Jan-2024
  • (2024)Deep learning trends and future perspectives of web security and vulnerabilitiesJournal of High Speed Networks10.3233/JHS-23003730:1(115-146)Online publication date: 1-Jan-2024
  • (2024)VF-DetectorProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/643(5817-5825)Online publication date: 3-Aug-2024
  • (2024)Codesentry: Revolutionizing Real-Time Software Vulnerability Detection With Optimized GPT FrameworkLand Forces Academy Review10.2478/raft-2024-001029:1(98-107)Online publication date: 28-Feb-2024
  • (2024)A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine LearningACM Computing Surveys10.1145/369971157:3(1-36)Online publication date: 11-Nov-2024
  • (2024)Toward Declarative Auditing of Java Software for Graceful Exception HandlingProceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3679007.3685057(90-97)Online publication date: 13-Sep-2024
  • (2024)PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source SoftwareProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680305(590-602)Online publication date: 11-Sep-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media