research-article

DeepDGA: Adversarially-Tuned Domain Generation and Detection

Authors:

Hyrum S. Anderson,

Jonathan Woodbridge,

Bobby FilarAuthors Info & Claims

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

Pages 13 - 21

https://doi.org/10.1145/2996758.2996767

Published: 28 October 2016 Publication History

Abstract

Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants. In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network. In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence.

References

[1]

A closer look at cyrptolocker's DGA. https://blog.fortinet.com/post/a-closer-look-at-cryptolocker-s-dga. Accessed: 2016-04--22.

[2]

M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon. From throw-away traffic to bots: detecting the rise of DGA-based malware. In P21st USENIX Security Symposium (USENIX Security 12), pages 491--506, 2012.

Digital Library

[3]

A. J. Aviv and A. Haeberlen. Challenges in experimenting with botnet detection systems. In CSET, 2011.

Digital Library

[4]

Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu. Advances in optimizing recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8624--8628. IEEE, 2013.

[5]

A. Cherepanov and R. Lipovsky. Hesperbot-A new, advanced banking trojan in the wild, 2013.

[6]

F. Chollet. keras. https://github.com/fchollet/keras, 2016.

[7]

J. Geffner. End-to-end analysis of a domain generating algorithm malware family. Black Hat USA 2013, 2013.

[8]

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LS™. Neural computation, 12(10):2451--2471, 2000.

Digital Library

[9]

F. A. Gers, N. N. Schraudolph, and J. Schmidhuber. Learning precise timing with LS™ recurrent networks. J. Machine Learning Research, 3:115--143, 2003.

Digital Library

[10]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672--2680, 2014.

Digital Library

[11]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

[12]

A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012.

[13]

N. Hampton and Z. A. Baig. Ransomware: Emergence of the cyber-extortion menace. In Australian Information Security Management Conference, 2015.

[14]

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.

Digital Library

[15]

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. Character-aware neural language models. arXiv preprint arXiv:1508.06615, 2015.

[16]

T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, volume 2, page 3, 2010.

[17]

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 37th IEEE Symposium on Security and Privacy, 2015.

[18]

A. J. Robinson. An application of recurrent nets to phone probability estimation. Neural Networks, IEEE Transactions on, 5(2):298--305, 1994.

Digital Library

[19]

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016.

[20]

S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero. Phoenix: DGA-based botnet tracking and intelligence. In Detection of intrusions and malware, and vulnerability assessment, pages 192--211. Springer, 2014.

[21]

R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.

[22]

Symantec. W32.Ramnit analysis. 2015-02--24, Version 1.0.

[23]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

[24]

M. Ward. Cryptolocker victims to get files back for free. BBC News, August, 6, 2014.

[25]

S. Yadav, A. K. K. Reddy, A. Reddy, and S. Ranjan. Detecting algorithmically generated malicious domain names. In Proc. 10th ACM SIGCOMM conference on Internet measurement, pages 48--61. ACM, 2010.

Digital Library

[26]

S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ranjan. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. Networking, IEEE/ACM Transactions on, 20(5):1663--1677, 2012.

Digital Library

Cited By

Kritika (2024)Unleashing the Power of Generative Adversarial Networks for CybersecurityUtilizing Generative AI for Cyber Defense Strategies10.4018/979-8-3693-8944-7.ch004(137-168)Online publication date: 13-Sep-2024
https://doi.org/10.4018/979-8-3693-8944-7.ch004
Fan BMa HLiu YYuan XKe W(2024)KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA DetectionMathematics10.3390/math1205062612:5(626)Online publication date: 20-Feb-2024
https://doi.org/10.3390/math12050626
Selvaraj SPanjanathan R(2024)WordDGA: Hybrid Knowledge-Based Word-Level Domain Names Against DGA Classifiers and Adversarial DGAsInformatics10.3390/informatics1104009211:4(92)Online publication date: 26-Nov-2024
https://doi.org/10.3390/informatics11040092
Show More Cited By

Index Terms

DeepDGA: Adversarially-Tuned Domain Generation and Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Security and privacy
  1. Network security

Recommendations

Detection of algorithmically generated domain names used by botnets: a dual arms race
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Malware typically uses Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine learning. The ...
Leveraging n-gram neural embeddings to improve deep learning DGA detection
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain ...
Uncertainty-Aware Semi-Supervised Method Using Large Unlabeled and Limited Labeled COVID-19 Data
The new coronavirus has caused more than one million deaths and continues to spread rapidly. This virus targets the lungs, causing respiratory distress which can be mild or severe. The X-ray or computed tomography (CT) images of lungs can reveal whether ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

October 2016

144 pages

ISBN:9781450345736

DOI:10.1145/2996758

Program Chairs:
David Mandell Freeman
LinkedIn Corporation, USA
,
Aikaterini Mitrokotsa
Chalmers University of Technology, Sweden
,
Arunesh Sinha
University of Michigan, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS'16

Sponsor:

SIGSAC

CCS'16: 2016 ACM SIGSAC Conference on Computer and Communications Security

October 28, 2016

Vienna, Austria

Acceptance Rates

AISec '16 Paper Acceptance Rate 12 of 38 submissions, 32%;

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

154
Total Citations
View Citations
1,420
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)9

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kritika (2024)Unleashing the Power of Generative Adversarial Networks for CybersecurityUtilizing Generative AI for Cyber Defense Strategies10.4018/979-8-3693-8944-7.ch004(137-168)Online publication date: 13-Sep-2024
https://doi.org/10.4018/979-8-3693-8944-7.ch004
Fan BMa HLiu YYuan XKe W(2024)KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA DetectionMathematics10.3390/math1205062612:5(626)Online publication date: 20-Feb-2024
https://doi.org/10.3390/math12050626
Selvaraj SPanjanathan R(2024)WordDGA: Hybrid Knowledge-Based Word-Level Domain Names Against DGA Classifiers and Adversarial DGAsInformatics10.3390/informatics1104009211:4(92)Online publication date: 26-Nov-2024
https://doi.org/10.3390/informatics11040092
Drichel AMeyer MMeyer UQuek TGao DZhou JCardenas A(2024)Towards Robust Domain Generation Algorithm ClassificationProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3656287(2-18)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3656287
Aravena LCasas PBustos-Jiménez JCapdehourat GFindrik M(2024)DeepD2V - Deep Learning and Domain Word Embeddings for DGA based Malware Detection2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)10.1109/ICMLCN59089.2024.10624693(164-170)Online publication date: 5-May-2024
https://doi.org/10.1109/ICMLCN59089.2024.10624693
Fakhouri HAlhadidi BOmar KMakhadmeh SHamad FHalalsheh N(2024)AI-Driven Solutions for Social Engineering Attacks: Detection, Prevention, and Response2024 2nd International Conference on Cyber Resilience (ICCR)10.1109/ICCR61006.2024.10533010(1-8)Online publication date: 26-Feb-2024
https://doi.org/10.1109/ICCR61006.2024.10533010
Al-Kahla WTaqieddin EShatnawi AAl-Ouran R(2024)Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine LearningIEEE Access10.1109/ACCESS.2024.350127712(174255-174273)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3501277
Khan FDurad MKhan AKhan FRizwan MAli A(2024)Design and Performance Analysis of an Anti-Malware System Based on Generative Adversarial Network FrameworkIEEE Access10.1109/ACCESS.2024.335845412(27683-27708)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3358454
Kazimierczak MHabib NChan JThanapattheerakul T(2024)Impact of AI on the Cyber Kill Chain: A Systematic ReviewHeliyon10.1016/j.heliyon.2024.e4069910:24(e40699)Online publication date: Dec-2024
https://doi.org/10.1016/j.heliyon.2024.e40699
Alqahtani HKumar G(2024)Advances in artificial intelligence for detecting algorithmically generated domains: Current trends and future prospectsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109410138(109410)Online publication date: Dec-2024
https://doi.org/10.1016/j.engappai.2024.109410
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten