ABSTRACT
In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced as a solution to this, where the classification problem is viewed as a game mechanism between an adversary and an intelligent and adaptive classifier. Over the last years, phishing fraud through malicious email messages has been a serious threat that affects global security and economy, where traditional spam filtering techniques have shown to be ineffective. In this domain, using dynamic games of incomplete information, a game theoretic data mining framework is proposed in order to build an adversary aware classifier for phishing fraud detection. To build the classifier, an online version of the Weighted Margin Support Vector Machines with a game theoretic prior knowledge function is proposed. In this paper, a new content-based feature extraction technique for phishing filtering is described. Experiments show that the proposed classifier is highly competitive compared with previously proposed online classification algorithms in this adversarial environment, and promising results where obtained using traditional machine learning techniques over extracted features.
- S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A comparison of machine learning techniques for phishing detection. In eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 60--69, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ASIACCS '06: Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16--25, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- R. Basne, S. Mukkamala, and A. H. Sung. Detection of Phishing Attacks: A Machine Learning Approach, chapter Studies in Fuzziness and Soft Computing, pages 373--383. Springer Berlin / Heidelberg, 2008.Google Scholar
- A. Bergholz, J. D. Beer, S. Glahn, M.-F. Moens, G. Paass, and S. Strobel. New filtering approaches for phishing email. Journal of Computer Security, 2009. Accepted for publication. Google ScholarDigital Library
- A. Bergholz, J.-H. Chang, G. Paass, F. Reichartz, and S. Strobel. Improved phishing detection using model-based features. In Fifth Conference on Email and Anti-Spam, CEAS 2008, 2008.Google Scholar
- B. Biggio, G. Fumera, and F. Roli. Adversarial pattern classification using multiple classifiers and randomisation. In SSPR/SPR, pages 500--509, 2008. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- N. Dalvi, P. Domingos, M. Sumit, and S. DeepakVerma. Adversarial classification. In in Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining, volume 1, pages 99--108, Seattle, WA, USA, 2004. ACM Press. Google ScholarDigital Library
- J. S. Downs, M. Holbrook, and L. F. Cranor. Behavioral response to phishing risk. In eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 37--44, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 649--656, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. Fudenbert and J. Tirole. Game Theory. MIT Press, October 1991.Google Scholar
- C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213--242, December 2001. Google ScholarDigital Library
- R. Gibbons. Game Theory for Applied Economists. Princeton University Press, 1992.Google Scholar
- J. Goodman, G. V. Cormack, and D. Heckerman. Spam and the ongoing battle for the inbox. Commun. ACM, 50(2):24--33, 2007. Google ScholarDigital Library
- J. C. Harsanyi. Games with incomplete information played by bayesian players. the basic probability distribution of the game. Management Science, 14(7):486--502, 1968. Google ScholarDigital Library
- M. Kantarcioglu, B. Xi, and C. Clifton. A game theoretic framework for adversarial learning. In CERIAS 9th Annual Information Security Symposium, 2008.Google ScholarDigital Library
- D. M. Kreps and R. Wilson. Sequential equilibria. Econometrica, 50(4):863--94, July 1982.Google ScholarCross Ref
- D. Lowd and C. Meek. Adversarial learning. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 641--647, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- R. D. McKelvey, A. M. McLennan, and T. L. Turocy. Gambit: Software tools for game theory, version 0.2007.01.30, 2007.Google Scholar
- R. D. McKelvey and T. R. Palfrey. Quantal response equilibria for normal form games. In Normal Form Games, Games and Economic Behavior, pages 6--38, 1996.Google Scholar
- J. Nazario. Phishing corpus, 2004--2007.Google Scholar
- B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. Exploiting machine learning to subvert your spam filter. In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1--9, Berkeley, CA, USA, 2008. USENIX Association. Google ScholarDigital Library
- J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.Google Scholar
- D. Sculley and G. M. Wachman. Relaxed online svms for spam filtering. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415--422, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- O. Sönmez. Learning game theoretic model parameters applied to adversarial classification. Master's thesis, Saarland University, 2008.Google Scholar
- F. Sebastiani. Text categorization. In A. Zanasi, editor, Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pages 109--129. WIT Press, Southampton, UK, 2005.Google Scholar
- D. S. Skins and R. Dhamija. The battle against phishing:. In In SOUPS Š05: Proceedings of the 2005 symposium on Usable privacy and security, pages 77--88. ACM Press, 2005. Google ScholarDigital Library
- T. L. Turocy. A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence. Games and Economic Behavior, 51(2):243--263, May 2005.Google ScholarCross Ref
- T. L. Turocy. Using quantal reponse to compute nash and sequential equilibria. Economic Theory, Vol. 42, Issue 1, 2010.Google Scholar
- V. N. Vapnik. The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, 1999. Google ScholarDigital Library
- J. Velasquez, H. Yasuda, T. Aoki, and R. Weber. A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems, Special Issues in Information Processing Technology for web utilization, vE87-D i2.:389--396, 2004.Google Scholar
- J. D. Velasquez, S. A. Rios, A. Bassi, H. Yasuda, and T. Aoki. Towards the identification of keywords in the web site text content: A methodological approach. IJWIS, 1(1):53--57, 2005.Google ScholarCross Ref
- H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 226--235, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition edition, 2005. Google ScholarDigital Library
- X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326--333, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- P. Zhang, X. Zhu, and Y. Shi. Categorizing and mining concept drifting data streams. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 812--820, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Index Terms
Online phishing classification using adversarial data mining and signaling games
Recommendations
Online phishing classification using adversarial data mining and signaling games
In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced, where the classification problem is viewed as a game mechanism between an ...
Adversarial classification using signaling games with an application to phishing detection
In adversarial classification, the interaction between classifiers and adversaries can be modeled as a game between two players. It is natural to model this interaction as a dynamic game of incomplete information, since the classifier does not know the ...
Optimal randomized classification in adversarial settings
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systemsThe problem of learning to distinguish good inputs from malicious has come to be known as adversarial classification emphasizing the fact that, unlike traditional classification, the adversary can manipulate input instances to avoid being so classified. ...
Comments