skip to main content
10.1145/3175684.3175690acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdiotConference Proceedingsconference-collections
research-article

Advanced Phishing Filter Using Autoencoder and Denoising Autoencoder

Published: 20 December 2017 Publication History

Abstract

Phishing is referred as an attempt to obtain sensitive information, such as usernames, passwords, and credit card details (and, indirectly, money), for malicious reasons, by disguising as a trustworthy entity in an electronic communication [1]. Hackers and malicious users, often use Emails as phishing tools to obtain the personal data of legitimate users, by sending Emails with authentic identities, legitimate content, but also with malicious URL, which help them to steal consumer's data. The high dimensional data in phishing context contains large number of redundant features that significantly elevate the classification error. Additionally, the time required to perform classification increases with the number of features. So extracting complex Features from phishing Emails requires us to determine which Features are relevant and fundamental in phishing detection. The dominant approaches in phishing are based on machine learning techniques; these rely on manual feature engineering, which is time consuming. On the other hand, deep learning is a promising alternative to traditional methods. The main idea of deep learning techniques is to learn complex features extracted from data with minimum external contribution [2]. In this paper, we propose new phishing detection and prevention approach, based first on our previous spam filter [3] to classify textual content of Email. Secondly it's based on Autoencoder and on Denoising Autoencoder (DAE), to extract relevant and robust features set of URL (to which the website is actually directed), therefore the features space could be reduced considerably, and thus decreasing the phishing detection time.

References

[1]
Phishing attacks and countermeasures. Ramzan, Zulfikar (2010). 2010, Handbook of Information and Communication Security. Springer. ISBN 9783642041174.
[2]
Learning Deep Architectures for AI. Bengio, Yoshua. s.l.: Foundations and Trends® in Machine Learning: Vol. 2: No. 1, pp 1--127., 2009.
[3]
Towards A new Spam Filter Based on PV-DM (Paragraph Vector-Distributed Memory Approach), Samira Douzi, Meryem Amar, Bouabid El ouahidi, Hicham Laanaya. Science Direct,Procedia Computer Science Volume 110, 2017, Pages 486--491.
[4]
E. El-Alfy, R. Abdel-Aal,. Using GMDH-based networks for improved Spam detection and e-mail feature analysis. Applied Soft Computing 11 (1) (2011) 477--488.
[5]
Ian Fette, Norman Sadeh,Anthony Tomasic. Learning to Detect Phishing Emails. International World Wide Web Conference, 2007, pp. 649--656.
[6]
http://www.apwg.org/resources/apwg-reports/. Phishing Activity Trends Report 4 th Quarter 2016. s.l.: APWG.
[7]
Phishing Attacks: Analyzing Trends in 2006. Ramzan, Z., & Wüest, C. In Fourth conference on Email and Anti- Spam Mountain view: Citeseer, 2007.
[8]
Aaron, G. The state of phishing. Computer Fraud & Security. 2010 (6) (2010) 5--8.
[9]
S. Shivaji, E.J. Whitehead, R. Akella, K. Sunghun. Reducing features to improve bug prediction. IEEE/ACM International Conference on Automated Software Engineering (2009) 600--604. 2009.
[10]
El-Khatib, K. Impact of feature reduction on the ef?ciency of wireless intrusion detection systems. IEEE Transactions on Parallel and Distributed Systems 21 (8) (2010) 1143--1149.
[11]
LEARNING TO DETECT PHISHING URLs. al, Ram B. Basnet et. s.l.: International Journal of Research in Engineering and Technology, Jun-2014, Vol. Volume: 03 Issue: 06.
[12]
Sheng, S.,Wardman, B.,Warner, G., Cranor, L., Hong, J. and Zhang, C. An empirical analysis of phishing blacklists,. In Proceedings of the CEAS'09, 2009.
[13]
Detection of phishing attacks: a machine learning approach, Soft Computing Applications in Industry (2008) 373--383. R. Basnet, S. Mukkamala, A. Sung,.
[14]
Obtaining the threat model for e-mail phishing. Appl. Soft Comput. J. (2011),. C.K. Olivo, et al.
[15]
Online detection and prevention of phishing attacks. J. Chen, C. Guo,. s.l.: Communications and Networking in China (2006) 19--21.
[16]
Pro?ling phishing e-mails based on hyperlink information, International Conference on Advances in Social Networks Analysis and Mining (2010) 120--127. J. Yearwood, M. Mammadov, A. Banerjee,.
[17]
Analysis of Phishing Attacks and Countermeasures. Biju Issac, Raymond Chiong and Seibu Mary Jacob. s.l.: at www.arxiv.org., 2006.
[18]
Detecting Malicious URLs in E-mail- An Implementation,2013 AASRI Conference on Intelligent systems and control, Procedia 4 (2013) 125--131. Dhanalakshmi Ranganayakulu, Chellappan C.
[19]
Efficient prediction of phishing websites using supervised learning algorithms. Santhana Lakshmi V, Vijaya MS. s.l.: International Conference on Communication Technology and System Design 2011,Procedia Engineering 30 (2012) 798 -- 805.
[20]
Maher Aburrous, Hossain, M.A., KeshavDahal and FadiThabtah. "Experimental Case Studies for Investigating E-Banking Phishing Techniques and Attack Strategies. Cognitive Computing,Vol. 2, pp. 242--253. 2010.
[21]
D. Cook, V. Gurbani, M. Daniluk, Phishwish: a stateless phishing ?lter using. Phishwish: a stateless phishing ?lter using minimal rules, Lecture Notes in Computer Science (2008) 182--186.
[22]
CANTINA: a content-based approach to detecting phishing web sites. Y. Zhang, J. Hong, L. Cranor. s.l.: In Proc. 16th Int. Conf. World Wide Web, WWW‟07 Banff, Alberta, Canada, 2007, pp. 639--648.
[23]
Representation Learning via Semi-supervised Autoencoder for Multi-task Learning. al., Fuzhen Zhuang at. s.l.: EEE International Conference on Data Mining, 2015.
[24]
Unsupervised Feature Extraction with Autoencoder Trees. Ozan úIrsoy, Ethem Alpaydõn. s.l.: Neurocomputing (2017).
[25]
A novel deep autoencoder feature learning method for rotating machinery fault diagnosis. Shao Haidong, Jiang Hongkai,Zhao Huiwei, Wang Fuan. s.l.: Mechanical Systems and Signal Processing 95 (2017) 187--204.
[26]
Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition. Jun Deng, Student Member, IEEE, Zixing Zhang, Florian Eyben, Member, IEEE, and Björn Schuller, Member, IEEE. s.l.: IEEE SIGNAL PROCESSING LETTERS, VOL. 21, NO. 9, SEPTEMBER 2014.
[27]
Extracting and Composing Robust Features with Denoising Autoencoders. al, Pascal Vincent et. s.l.: Proceedings of the 25 International Conference ence on Machine Learning, Helsinki, Finland, 2008.

Cited By

View all
  • (2024)Toward a Hybrid Approach Combining Deep Learning and Case-Based Reasoning for Phishing Email DetectionInternational Journal on Artificial Intelligence Tools10.1142/S0218213024500155Online publication date: 28-Jun-2024
  • (2024)STFL: Utilizing a Semi-Supervised, Transfer-Learning, Federated-Learning Approach to Detect Phishing URL Attacks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650184(1-10)Online publication date: 30-Jun-2024
  • (2023)A Bibliometric Analysis of Phishing in the Big Data EraProcedia Computer Science10.1016/j.procs.2023.01.268219:C(91-98)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
BDIOT '17: Proceedings of the International Conference on Big Data and Internet of Thing
December 2017
251 pages
ISBN:9781450354301
DOI:10.1145/3175684
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Autoencoder
  2. Denoising Autoencoder
  3. Spam-filter
  4. phishing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BDIOT2017

Acceptance Rates

Overall Acceptance Rate 75 of 136 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward a Hybrid Approach Combining Deep Learning and Case-Based Reasoning for Phishing Email DetectionInternational Journal on Artificial Intelligence Tools10.1142/S0218213024500155Online publication date: 28-Jun-2024
  • (2024)STFL: Utilizing a Semi-Supervised, Transfer-Learning, Federated-Learning Approach to Detect Phishing URL Attacks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650184(1-10)Online publication date: 30-Jun-2024
  • (2023)A Bibliometric Analysis of Phishing in the Big Data EraProcedia Computer Science10.1016/j.procs.2023.01.268219:C(91-98)Online publication date: 1-Jan-2023
  • (2023)Autoencoder-Based Architecture for Identification and Mitigating Phishing URL Attack in IoT Using DNNJournal of The Institution of Engineers (India): Series B10.1007/s40031-023-00934-8104:6(1227-1240)Online publication date: 31-Oct-2023
  • (2022)Phishing Email Detection Using Bi-GRU-CNN ModelProceedings of the International Conference on Applied CyberSecurity (ACS) 202110.1007/978-3-030-95918-0_8(71-77)Online publication date: 2-Feb-2022
  • (2021)Updated Analysis of Detection Methods for Phishing AttacksFuturistic Trends in Network and Communication Technologies10.1007/978-981-16-1480-4_5(56-67)Online publication date: 31-Mar-2021
  • (2020)Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep LearningIEEE Access10.1109/ACCESS.2020.30431888(221214-221224)Online publication date: 2020
  • (2020)Efficient Clustering of Emails Into Spam and Ham: The Foundational Study of a Comprehensive Unsupervised FrameworkIEEE Access10.1109/ACCESS.2020.30170828(154759-154788)Online publication date: 2020
  • (2020)A Weighted LSTM Deep Learning for Intrusion DetectionAdvanced Communication Systems and Information Security10.1007/978-3-030-61143-9_14(170-179)Online publication date: 6-Nov-2020
  • (2019)Using Genetic Algorithm to Improve Classification of Imbalanced Datasets for Credit Card Fraud DetectionSmart Data and Computational Intelligence10.1007/978-3-030-11914-0_24(220-229)Online publication date: 1-Mar-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media