ABSTRACT
In this position paper, we argue that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments. We propose that designers of such systems broaden the classification goal into an explanatory goal, which would deepen the interaction with system's operators.
To provide guidance, we advocate for an approach based on maintaining one classifier for each class of unwanted activity to be filtered. We also emphasize the necessity for the system to be responsive to the operators constant curation of the training set. We show how this paradigm provides a property we call isolation and how it relates to classical causative attacks.
In order to demonstrate the effects of drift on a binary classification task, we also report on two experiments using a previously unpublished malware data set where each instance is timestamped according to when it was seen.
- U. Bayer, P. M. Comparetti, C. H. C. Kruegel, and E. Kirda. Scalable, behavior-based malware clustering. In NDSS, 2009.Google Scholar
- B. Biggio, I. Corona, and G. Fumera. Bagging classifiers for fighting poisoning attacks in adversarial classification tasks. In Multiple Classifier Systems, pages 350--359. Springer Berlin Heidelberg, 2011. Google ScholarDigital Library
- B. Biggio, G. Fumera, and F. Roli. Evade hard multiple classifier systems. In Applications of Supervised and Unsupervised Ensemble Methods, pages 15--38. Springer Berlin Heidelberg, 2009.Google ScholarCross Ref
- L. Bottou and O. Bousquet. The Tradeoffs of Large-Scale Learning. Advances in Neural Information Processing Systems, 20:161--168, 2008.Google Scholar
- M. Brückner, C. Kanzow, and T. Scheffer. Static prediction games for adversarial learning problems. Journal of Machine Learning Research, 13:2617--2654, 2012. Google ScholarDigital Library
- M. Brückner and T. Scheffer. Stackelberg games for adversarial prediction problems. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 547--555, 2011. Google ScholarDigital Library
- V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16, 1995. Google ScholarDigital Library
- G. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo, and A. D. Keromytis. Casting out demons: Sanitizing training data for anomaly sensors. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 81--95. IEEE, 2008. Google ScholarDigital Library
- C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. ZOZZLE: Fast and precise in-browser JavaScript malware detection. In Proceedings of the 20th USENIX conference on Security, SEC'11, pages 3--3, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
- N. Dalvi, P. Domingos, S. Sanghai, and D. Verma. Adversarial classification. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining KDD 04 (2004), page 99, New York, New York, USA, 2004. ACM Press. Google ScholarDigital Library
- K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton. Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP '12, pages 332--346, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
- R. Fan, K. Chang, C. Hsieh, X. Wang, and Lin. LIBLINEAR : A Library for Large Linear Classification. The Journal of Machine Learning Research, 9(2008):1871--1874, 2008. Google ScholarDigital Library
- J. Gennari and D. French. Defining malware families based on analyst insights. In Technologies for Homeland Security (HST), 2011 IEEE International Conference on, pages 396--401, 2011.Google ScholarCross Ref
- P. Graham. A plan for spam. http://www.paulgraham.com/spam.html, Aug. 2002.Google Scholar
- A. Gupta, P. Kuppili, A. Akella, and P. Barford. An empirical study of malware evolution. In First International Communication Systems and Networks and Workshops (COMSNETS 2009), pages 1--10, 2009. Google ScholarDigital Library
- C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2):415--425, 2002. Google ScholarDigital Library
- L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence, AISec '11, pages 43--58, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, pages 64--67, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- A. Kantchelian, J. Ma, L. Huang, S. Afroz, A. D. Joseph, and J. D. Tygar. Robust detection of comment spam using entropy rate. In Proceedings of the 5th ACM Workshop on Artificial Intelligence and Security, AISEC 2012. ACM, 2012. Google ScholarDigital Library
- A. Kołcz and C. H. Teo. Feature weighting for improved classifier robustness. In CEAS'09: Sixth conference on email and Anti-Spam, number 1, 2009.Google Scholar
- L. I. Kuncheva. Classifier ensembles for detecting concept change in streaming data: Overview and perspectives. In O. Okun and G. Valentini, editors, Workshop on Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA), 2008.Google Scholar
- A. Lavoie, M. Otey, N. Ratliff, and D. Sculley. History Dependent Domain Adaptation. In Domain Adaptation Workshop at NIPS '11, 2011.Google Scholar
- H. Lee and A. Ng. Spam deobfuscation using a hidden markov model. In Proceedings of the Second Conference on Email and Anti-Spam, 2005.Google Scholar
- Z. Li, K. Zhang, Y. Xie, F. Yu, and X. Wang. Knowing your enemy: Understanding and detecting malicious web advertising. In CCS, 2012. Google ScholarDigital Library
- W. Liu and S. Chawla. Mining adversarial patterns via regularized loss minimization. Machine Learning, 81(1):69--83, July 2010. Google ScholarDigital Library
- D. Lowd and C. Meek. Good word attacks on statistical spam filters. In Second Conference on Email and Anti-Spam (CEAS), Palo Alto, CA, 2005.Google Scholar
- L. Lu, R. Perdisci, and W. Lee. Surf: Detecting and measuring search poisoning. In CCS, 2011. Google ScholarDigital Library
- T. A. Meyer and B. Whateley. SpamBayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the Conference on Email and Anti-Spam (CEAS), July 2004.Google Scholar
- T. M. Mitchell. Machine Learning. McGraw-Hill, 1997. Google ScholarDigital Library
- B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. Exploiting machine learning to subvert your spam filter. In Proceedings of thenth1st USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), pages 1--9, Berkeley, CA, USA, 2008. USENIX Association. Google ScholarDigital Library
- J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating signatures for polymorphic worms. In Security and Privacy, 2005 IEEE Symposium on, pages 226--241. IEEE, 2005. Google ScholarDigital Library
- J. Newsome, B. Karp, and D. Song. Paragraph: Thwarting signature learning by training maliciously. In Recent Advances in Intrusion Detection, pages 81--105. Springer, 2006. Google ScholarDigital Library
- A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In NIPS, pages 841--848, 2001.Google Scholar
- A. Ramachandran, N. Feamster, and S. Vempala. Filtering spam with behavioral blacklisting. In Proceedings of thenth14th ACM conference on Computer and communications security (CCS), pages 342--351, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- K. Rieck, T. Holz, C. Willems, P. Dussel, and P. Laskov. Learning and classification of malware behavior. In DIMVA, 2008. Google ScholarDigital Library
- K. Rieck, P. Trinius, C. Willems, and T. Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4), 2011. Google ScholarDigital Library
- J. J. Rodríguez and L. I. Kuncheva. Combining online classification approaches for changing environments. In Proc. of the Joint IAPR International Workshops on Structural and Syntactic Pattern Recognition and Statistical Techniques in Pattern Recognition, pages 520--529, 2008. Google ScholarDigital Library
- L. Rokach. Ensemble-based classifiers. Artif. Intell. Rev., 33(1--2):1--39, Feb. 2010. Google ScholarDigital Library
- B. I. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S.-h. Lau, S. Rao, N. Taft, and J. Tygar. Antidote: understanding and defending against poisoning of anomaly detectors. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, pages 1--14. ACM, 2009. Google ScholarDigital Library
- G. Schwenk, A. Bikadorov, T. Krueger, and K. Rieck. Autonomous learning for detection of javascript attacks: Vision or reality? In AISEC, 2012. Google ScholarDigital Library
- D. Sculley, M. E. Otey, M. Pohl, B. Spitznagel, J. Hainsworth, and Y. Zhou. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 274--282. ACM, 2011. Google ScholarDigital Library
- D. Sculley, G. M. Wachman, and C. E. Brodley. Spam Filtering using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers. In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2006.Google Scholar
- R. Segal, J. Crawford, J. Kephart, and B. Leiba. SpamGuru: An enterprise anti-spam filtering system. In Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
- A. Singh, A. Walenstein, and A. Lakhotia. Tracking concept drift in malware families. In Proceedings of the 5th ACM workshop on Security and artificial intelligence, pages 81--92. ACM, 2012. Google ScholarDigital Library
- R. Sommer and V. Paxson. Outside the closed world: On using machine learning for network intrusion detection. In Security and Privacy (SP), 2010 IEEE Symposium on, pages 305--316. IEEE, 2010. Google ScholarDigital Library
- N. Srndic and P. Laskov. Detection of malicious pdf files based on hierarchical document structure. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2013, San Diego, California, USA. The Internet Society, 2013.Google Scholar
- T. Stein, E. Chen, and K. Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems, SNS '11, pages 8:1--8:8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and evaluation of a real-time URL spam filtering service. In 2011 IEEE Symposium on Security and Privacy (SP), pages 447--462. IEEE, 2011. Google ScholarDigital Library
- C. Whittaker, B. Ryner, and M. Nazif. Large-scale automatic classification of phishing pages. In Proc. of 17th NDSS, 2010.Google Scholar
- M. M. Williamson. Throttling viruses: Restricting propagation to defeat malicious mobile code. In Proceedings of thenth18th Annual Computer Security Applications Conference (ACSAC), pages 61--68, Washington DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
- G. Wittel and S. Wu. On attacking statistical spam filters. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
- C. V. Wright, S. E. Coull, and F. Monrose. Traffic morphing: An efficient defense against statistical traffic analysis. In NDSS. The Internet Society, 2009.Google Scholar
Index Terms
- Approaches to adversarial drift
Recommendations
Deceiving Portable Executable Malware Classifiers into Targeted Misclassification with Practical Adversarial Examples
CODASPY '20: Proceedings of the Tenth ACM Conference on Data and Application Security and PrivacyDue to voluminous malware attacks in the cyberspace, machine learning has become popular for automating malware detection and classification. In this work we play devil's advocate by investigating a new type of threats aimed at deceiving multi-class ...
Vulnerability assessment of machine learning based malware classification models
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference CompanionThe primary focus of the machine learning model is to train a system to achieve self-reliance. However, due to the absence of the inbuilt security functions the learning phase itself is not secured which allows attacker to exploit the security ...
On the reliable detection of concept drift from streaming unlabeled data
New classifier-independent, dynamic, unsupervised approach for detecting concept drift.Reduced number of false alarms and increased relevance of drift detection.Results comparable to supervised approaches, which require fully labeled streams.Our ...
Comments