skip to main content
10.1145/3134600.3134646acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources

Published: 04 December 2017 Publication History

Abstract

With the rapid growth of the cyber attacks, sharing of cyber threat intelligence (CTI) becomes essential to identify and respond to cyber attack in timely and cost-effective manner. However, with the lack of standard languages and automated analytics of cyber threat information, analyzing complex and unstructured text of CTI reports is extremely time- and labor-consuming. Without addressing this challenge, CTI sharing will be highly impractical, and attack uncertainty and time-to-defend will continue to increase.
Considering the high volume and speed of CTI sharing, our aim in this paper is to develop automated and context-aware analytics of cyber threat intelligence to accurately learn attack pattern (TTPs) from commonly available CTI sources in order to timely implement cyber defense actions. Our paper has three key contributions. First, it presents a novel threat-action ontology that is sufficiently rich to understand the specifications and context of malicious actions. Second, we developed a novel text mining approach that combines enhanced techniques of Natural Language Processing (NLP) and Information retrieval (IR) to extract threat actions based on semantic (rather than syntactic) relationship. Third, our CTI analysis can construct a complete attack pattern by mapping each threat action to the appropriate techniques, tactics and kill chain phases, and translating it any threat sharing standards, such as STIX 2.1. Our CTI analytic techniques were implemented in a tool, called TTPDrill, and evaluated using a randomly selected set of Symantec Threat Reports. Our evaluation tests show that TTPDrill achieves more than 82% of precision and recall in a variety of measures, very reasonable for this problem domain.

References

[1]
S Barnum. 2008. Common attack pattern enumeration and classification (capec) schema description. Cigital Inc, http://capec.mitre.org/documents/documentation/CAPEC_Schema_Descriptiori_v1 3 (2008).
[2]
Sean Barnum. 2012. Standardizing cyber threat intelligence information with the Structured Threat Information eXpression (STIX. MITRE Corporation 11 (2012).
[3]
CleanMX. 2006. Public Access Query for URL. (2006). http://support.clean-mx.com/clean-mx/viruses.php
[4]
Symantec Corp. 1995. Symantec Security Center. (1995). https://www.symantec.com/security_response/
[5]
Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1992. A practical part-of-speech tagger. In Proceedings of the third conference on Applied natural language processing. Association for Computational Linguistics, 133--140.
[6]
Marie-Catherine De Marneffe and Christopher D Manning. 2008. The Stanford typed dependencies representation. In Coling 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation. Association for Computational Linguistics, 1--8.
[7]
Dibnet. 2017. Defense Industrial Base Cybersecurity Information Sharing Program. (2017). http://dibnet.dod.mil/
[8]
Dictionary.com. 2016. Thesaurus. http://www.thesaurus.com/. (2016).
[9]
Malware don't need Coffee. 2012. (2012). http://malware.dontneedcoffee.com/
[10]
Facebook. 2017. ThreatExchange. (2017). https://developers.facebook.com/products/threat-exchange
[11]
Google. 2017. Natural Language API. (2017). https://cloud.google.com/natural-language/
[12]
Eric M Hutchins, Michael J Cloppert, and Rohan M Amin. 2011. Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. Leading Issues in Information Warfare & Security Research 1 (2011), 80.
[13]
V. Igure and R. Williams. 2008. Taxonomies of Attacks and Vulnerabilities in Computer Systems. Commun. Surveys Tuts. 10, 1 (Jan. 2008), 6--19.
[14]
Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. 2016. Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). ACM, New York, NY, USA, 755--766.
[15]
MANDIANT. 2011. The OpenIOC Framework. (2011). http://www.openioc.org
[16]
Carol Meyers, Sarah Powers, and Daniel Faissol. 2009. Taxonomies of cyber adversaries and attacks: a survey of incidents and approaches. Lawrence Livermore National Laboratory (April 2009) 7 (2009), 1--22.
[17]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.
[18]
MITRE. 2014. Adversarial Tactics, Techniques &Common Knowledge (ATT&CK). (2014). https://attack.mitre.org
[19]
MITRE. 2017. Standardizing cyber threat intelligence information with the Structured Threat Information eXpression (STIX) Version 2.1. (2017). https://oasis-open.github.io/cti-documentation/
[20]
Natalya F Noy, Deborah L McGuinness, et al. 2001. Ontology development 101: A guide to creating your first ontology. (2001).
[21]
Leo Obrst, Penny Chase, and Richard Markeloff. 2012. Developing an Ontology of the Cyber Security Domain. In STIDS. 49--56.
[22]
OpenDNS. 2017. PhishTank. (2017). https://www.phishtank.com/
[23]
Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie. 2013. WHYPER: Towards Automating Risk Assessment of Mobile Applications. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security 13). USENIX, Washington, D.C., 527--542. https://www.usenix.org/conference/usenixsecurity13/technical-sessions/presentation/pandita
[24]
Zhengyang Qu, Vaibhav Rastogi, Xinyi Zhang, Yan Chen, Tiantian Zhu, and Zhong Chen. 2014. AutoCog: Measuring the Description-to-permission Fidelity in Android Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1354--1365.
[25]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 232--241.
[26]
Carl Sabottke, Octavian Suciu, and Tudor Dumitras. 2015. Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits. In 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C., 1041--1056. https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/sabottke
[27]
Mark Steedman. 2017. Combinatory Categorial Grammar Parser. (2017). http://groups.inf.ed.ac.uk/ccg/
[28]
Mervyn Stone. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society. Series B (Methodological) (1974), 111--147.
[29]
VirusTotal. 2014. Yara. (2014). http://plusvic.github.io/yara/
[30]
Watson. 2017. Watson Synonym Service. (2017). http://watson.kmi.open.ac.uk/API/explain-syn.html
[31]
Ziyun Zhu and Tudor Dumitras. 2016. FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). ACM, New York, NY, USA, 767--778.
[32]
Sebastian Zimmeck and Steven M. Bellovin. 2014. Privee: An Architecture for Automatically Analyzing Web Privacy Policies. In 23rd USENIX Security Symposium (USENIX Security 14). USENIX Association, San Diego, CA, 1--16. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/zimmeck

Cited By

View all
  • (2025)AT4CTIRE: Adversarial Training for Cyber Threat Intelligence Relation ExtractionElectronics10.3390/electronics1402032414:2(324)Online publication date: 15-Jan-2025
  • (2025)DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and OntologyElectronics10.3390/electronics1402025714:2(257)Online publication date: 9-Jan-2025
  • (2025)Labeling Network Intrusion Detection System (NIDS) Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language ModelsBig Data and Cognitive Computing10.3390/bdcc90200239:2(23)Online publication date: 26-Jan-2025
  • Show More Cited By
  1. TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACSAC '17: Proceedings of the 33rd Annual Computer Security Applications Conference
    December 2017
    618 pages
    ISBN:9781450353458
    DOI:10.1145/3134600
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 December 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACSAC 2017

    Acceptance Rates

    Overall Acceptance Rate 104 of 497 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)422
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)AT4CTIRE: Adversarial Training for Cyber Threat Intelligence Relation ExtractionElectronics10.3390/electronics1402032414:2(324)Online publication date: 15-Jan-2025
    • (2025)DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and OntologyElectronics10.3390/electronics1402025714:2(257)Online publication date: 9-Jan-2025
    • (2025)Labeling Network Intrusion Detection System (NIDS) Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language ModelsBig Data and Cognitive Computing10.3390/bdcc90200239:2(23)Online publication date: 26-Jan-2025
    • (2025)A multi-source log semantic analysis-based attack investigation approachComputers & Security10.1016/j.cose.2024.104303150(104303)Online publication date: Mar-2025
    • (2025)AECR: Automatic attack technique intelligence extraction based on fine-tuned large language modelComputers & Security10.1016/j.cose.2024.104213150(104213)Online publication date: Mar-2025
    • (2025)Hyper attack graphComputers and Security10.1016/j.cose.2024.104194149:COnline publication date: 1-Feb-2025
    • (2025)TIMFuser: A multi-granular fusion framework for cyber threat intelligenceComputers & Security10.1016/j.cose.2024.104141148(104141)Online publication date: Jan-2025
    • (2025)RAF-AGComputers and Security10.1016/j.cose.2024.104125148:COnline publication date: 1-Jan-2025
    • (2024)SMET: Semantic mapping of CTI reports and CVE to ATT&CK for advanced threat intelligenceJournal of Computer Security10.3233/JCS-230218(1-20)Online publication date: 28-Jun-2024
    • (2024)Unstructured Big Data Threat Intelligence Parallel Mining AlgorithmBig Data Mining and Analytics10.26599/BDMA.2023.90200327:2(531-546)Online publication date: Jun-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media