skip to main content
10.1145/1526709.1526786acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

A hybrid phish detection approach by identity discovery and keywords retrieval

Published: 20 April 2009 Publication History

Abstract

Phishing is a significant security threat to the Internet, which causes tremendous economic loss every year. In this paper, we proposed a novel hybrid phish detection method based on information extraction (IE) and information retrieval (IR) techniques. The identity-based component of our method detects phishing webpages by directly discovering the inconsistency between their identity and the identity they are imitating. The keywords-retrieval component utilizes IR algorithms exploiting the power of search engines to identify phish. Our method requires no training data, no prior knowledge of phishing signatures and specific implementations, and thus is able to adapt quickly to constantly appearing new phishing patterns. Comprehensive experiments over a diverse spectrum of data sources with 11449 pages show that both components have a low false positive rate and the stacked approach achieves a true positive rate of 90.06% with a false positive rate of 1.95%.

References

[1]
http://toolbar.netcraft.com/.
[2]
http://sb.google.com/safebrowsing/update?version=goog--white--domain:1:1.
[3]
http://www.millersmiles.co.uk/scams.php.
[4]
http://www.uribl.com.
[5]
http://data.phishtank.com/data/online--valid/.
[6]
http://www.phishtank.com/stats.php.
[7]
http://dir.yahoo.com/Business_and_Economy/ Shopping_and_Services/Financial_Services/Banking/Banks/.
[8]
http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/Financial%_Services/Banking/Banks/By_Region/U_S__States/.
[9]
http://dir.yahoo.com/Business_and_Economy/Shopping_and_Servic%es/Financial_Services/Banking/Credit_Unions/.
[10]
http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/Financial%_Services/Online_Escrow_Services/.
[11]
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Business_O%pportunities/Travel_Agencies/.
[12]
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Business_O%pportunities/Investment_Opportunities/Real_Estate/.
[13]
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Business_O%pportunities/Financial_Services/.
[14]
Stanford named entity recognizer (version 1.1). http://nlp.stanford.edu/software/CRF--NER.shtml.
[15]
The 2007 internet crime report. 2007. The Internet Crime Complaint Center (IC3). http://www.ic3.gov/media/annualreport/2007_IC3Report.pdf.
[16]
3sharp report. Gone phishing: Evaluating anti--phishing tools for windows. Technical report, September 2006.
[17]
N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense against web-based identity theft. In Proceedings of the 11th Annual Network and Distributed System Security Symposium (NDSS'04), 2004.
[18]
R. Dhamija and J. Tygar. The battle against phishing: Dynamic security skins. In Proceedings of the 2005 symposium on Usable privacy and security (SOUPS’05), pages 77--88. ACM Press, 2005.
[19]
S. Garera, N. Provos, M. Chew, and A. D. Rubin. A framework for detection and measurement of phishing attacks. In Proceedings of the 2007 ACM Workshop on Recurring Malcode, pages 1--8, 2007.
[20]
C. Ludl, S. McAllister, E. Kirda, and C. Kruegel. On the effectiveness of techniques to detect phishing sites. Lecture Notes in Computer Science (LNCS), 4579:20--39, 2007.
[21]
Y. Pan and X. Ding. Anomaly based web phishing page detection. In Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC'06), pages 381--392, 2006.
[22]
E. F. T. K. Sang and F. D. Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Computational Natural Language Learning (CoNLL-2003), pages 142--147, 2003.
[23]
Y. Zhang, S. Egelman, L. Cranor, and J. Hong. Phinding phish: An evaluation of anti-phishing toolbars. In Proceedings of the 14th Annual Network & Distributed System Security Symposium (NDSS 2007), 2007.
[24]
Y. Zhang, J. Hong, and L. Cranor. Cantina: a content-based approach to detecting phishing web sites. In Proceedings of the 16th International Conference on World Wide Web (WWW'07), pages 639--648, 2007.

Cited By

View all
  • (2024)Phishing Website Detection: A Dataset-Centric Approach for Enhanced SecurityData and Metadata10.56294/dm2024.2233Online publication date: 31-Dec-2024
  • (2024)PhiSN: Phishing URL Detection Using Segmentation and NLP FeaturesJournal of Information Processing10.2197/ipsjjip.32.97332(973-989)Online publication date: 2024
  • (2024)Phishing Website Detection Using Hybrid Machine Learning Model2024 International Conference on Computer, Electronics, Electrical Engineering & their Applications (IC2E3)10.1109/IC2E362166.2024.10826907(1-6)Online publication date: 6-Jun-2024
  • Show More Cited By

Index Terms

  1. A hybrid phish detection approach by identity discovery and keywords retrieval

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '09: Proceedings of the 18th international conference on World wide web
      April 2009
      1280 pages
      ISBN:9781605584874
      DOI:10.1145/1526709

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 April 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. anti-phishing
      2. information retrieval
      3. named entity recognition

      Qualifiers

      • Research-article

      Conference

      WWW '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 22 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Phishing Website Detection: A Dataset-Centric Approach for Enhanced SecurityData and Metadata10.56294/dm2024.2233Online publication date: 31-Dec-2024
      • (2024)PhiSN: Phishing URL Detection Using Segmentation and NLP FeaturesJournal of Information Processing10.2197/ipsjjip.32.97332(973-989)Online publication date: 2024
      • (2024)Phishing Website Detection Using Hybrid Machine Learning Model2024 International Conference on Computer, Electronics, Electrical Engineering & their Applications (IC2E3)10.1109/IC2E362166.2024.10826907(1-6)Online publication date: 6-Jun-2024
      • (2024)Learning Hash Subspace from Large-Scale Multi-modal Pre-Training: A CLIP-Based Cross-modal Hashing FrameworkProceedings of 2023 11th China Conference on Command and Control10.1007/978-981-99-9021-4_48(514-526)Online publication date: 4-Feb-2024
      • (2023)Detect Malicious Web Pages Using Naive Bayesian Algorithm to Detect Cyber ThreatsWireless Personal Communications10.1007/s11277-023-10713-9Online publication date: 28-Aug-2023
      • (2022)An Extensive Study of Residential Proxies in ChinaProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3559377(3049-3062)Online publication date: 7-Nov-2022
      • (2022)PhishSim: Aiding Phishing Website Detection With a Feature-Free ToolIEEE Transactions on Information Forensics and Security10.1109/TIFS.2022.316421217(1497-1512)Online publication date: 2022
      • (2022)Phishing websites detection using a novel multipurpose dataset and web technologies featuresExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118010207:COnline publication date: 30-Nov-2022
      • (2021)Generative Adverserial Analysis of Phishing Attacks on Static and Dynamic Content of Webpages2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00222(1657-1662)Online publication date: Sep-2021
      • (2021)A survey of phishing attack techniques, defence mechanisms and open research challengesEnterprise Information Systems10.1080/17517575.2021.189678616:4(527-565)Online publication date: 15-Mar-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media