skip to main content
10.1145/1599272.1599279acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Online phishing classification using adversarial data mining and signaling games

Published:28 June 2009Publication History

ABSTRACT

In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced as a solution to this, where the classification problem is viewed as a game mechanism between an adversary and an intelligent and adaptive classifier. Over the last years, phishing fraud through malicious email messages has been a serious threat that affects global security and economy, where traditional spam filtering techniques have shown to be ineffective. In this domain, using dynamic games of incomplete information, a game theoretic data mining framework is proposed in order to build an adversary aware classifier for phishing fraud detection. To build the classifier, an online version of the Weighted Margin Support Vector Machines with a game theoretic prior knowledge function is proposed. In this paper, a new content-based feature extraction technique for phishing filtering is described. Experiments show that the proposed classifier is highly competitive compared with previously proposed online classification algorithms in this adversarial environment, and promising results where obtained using traditional machine learning techniques over extracted features.

References

  1. S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A comparison of machine learning techniques for phishing detection. In eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 60--69, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ASIACCS '06: Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16--25, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Basne, S. Mukkamala, and A. H. Sung. Detection of Phishing Attacks: A Machine Learning Approach, chapter Studies in Fuzziness and Soft Computing, pages 373--383. Springer Berlin / Heidelberg, 2008.Google ScholarGoogle Scholar
  4. A. Bergholz, J. D. Beer, S. Glahn, M.-F. Moens, G. Paass, and S. Strobel. New filtering approaches for phishing email. Journal of Computer Security, 2009. Accepted for publication. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bergholz, J.-H. Chang, G. Paass, F. Reichartz, and S. Strobel. Improved phishing detection using model-based features. In Fifth Conference on Email and Anti-Spam, CEAS 2008, 2008.Google ScholarGoogle Scholar
  6. B. Biggio, G. Fumera, and F. Roli. Adversarial pattern classification using multiple classifiers and randomisation. In SSPR/SPR, pages 500--509, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  8. N. Dalvi, P. Domingos, M. Sumit, and S. DeepakVerma. Adversarial classification. In in Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining, volume 1, pages 99--108, Seattle, WA, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. S. Downs, M. Holbrook, and L. F. Cranor. Behavioral response to phishing risk. In eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 37--44, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 649--656, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Fudenbert and J. Tirole. Game Theory. MIT Press, October 1991.Google ScholarGoogle Scholar
  12. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213--242, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Gibbons. Game Theory for Applied Economists. Princeton University Press, 1992.Google ScholarGoogle Scholar
  14. J. Goodman, G. V. Cormack, and D. Heckerman. Spam and the ongoing battle for the inbox. Commun. ACM, 50(2):24--33, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. C. Harsanyi. Games with incomplete information played by bayesian players. the basic probability distribution of the game. Management Science, 14(7):486--502, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kantarcioglu, B. Xi, and C. Clifton. A game theoretic framework for adversarial learning. In CERIAS 9th Annual Information Security Symposium, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. M. Kreps and R. Wilson. Sequential equilibria. Econometrica, 50(4):863--94, July 1982.Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Lowd and C. Meek. Adversarial learning. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 641--647, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. D. McKelvey, A. M. McLennan, and T. L. Turocy. Gambit: Software tools for game theory, version 0.2007.01.30, 2007.Google ScholarGoogle Scholar
  20. R. D. McKelvey and T. R. Palfrey. Quantal response equilibria for normal form games. In Normal Form Games, Games and Economic Behavior, pages 6--38, 1996.Google ScholarGoogle Scholar
  21. J. Nazario. Phishing corpus, 2004--2007.Google ScholarGoogle Scholar
  22. B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. Exploiting machine learning to subvert your spam filter. In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1--9, Berkeley, CA, USA, 2008. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.Google ScholarGoogle Scholar
  24. D. Sculley and G. M. Wachman. Relaxed online svms for spam filtering. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415--422, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Sönmez. Learning game theoretic model parameters applied to adversarial classification. Master's thesis, Saarland University, 2008.Google ScholarGoogle Scholar
  26. F. Sebastiani. Text categorization. In A. Zanasi, editor, Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pages 109--129. WIT Press, Southampton, UK, 2005.Google ScholarGoogle Scholar
  27. D. S. Skins and R. Dhamija. The battle against phishing:. In In SOUPS Š05: Proceedings of the 2005 symposium on Usable privacy and security, pages 77--88. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. L. Turocy. A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence. Games and Economic Behavior, 51(2):243--263, May 2005.Google ScholarGoogle ScholarCross RefCross Ref
  29. T. L. Turocy. Using quantal reponse to compute nash and sequential equilibria. Economic Theory, Vol. 42, Issue 1, 2010.Google ScholarGoogle Scholar
  30. V. N. Vapnik. The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Velasquez, H. Yasuda, T. Aoki, and R. Weber. A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems, Special Issues in Information Processing Technology for web utilization, vE87-D i2.:389--396, 2004.Google ScholarGoogle Scholar
  32. J. D. Velasquez, S. A. Rios, A. Bassi, H. Yasuda, and T. Aoki. Towards the identification of keywords in the web site text content: A methodological approach. IJWIS, 1(1):53--57, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 226--235, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326--333, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Zhang, X. Zhu, and Y. Shi. Categorizing and mining concept drifting data streams. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 812--820, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online phishing classification using adversarial data mining and signaling games

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader