skip to main content
10.1145/1526709.1526920acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Link based small sample learning for web spam detection

Published: 20 April 2009 Publication History

Abstract

Robust statistical learning based web spam detection system often requires large amounts of labeled training data. However, labeled samples are more difficult, expensive and time consuming to obtain than unlabeled ones. This paper proposed link based semi-supervised learning algorithms to boost the performance of a classifier, which integrates the traditional Self-training with the topological dependency based link learning. The experiments with a few labeled samples on standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.

References

[1]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know Your Neighbors: Web Spam Detection Using the Web Topology. SIGIR'07, May, 2007.
[2]
Yahoo! Research: Web Collection UK-2006. Crawled by University of Milan, 2007.
[3]
G.G. Geng, C.H. Wang, Q.D. Li. Improving web spam detection with re-extracted features, WWW'08, 2008.
[4]
M. Mohri, B. Roark. Effective selftraining for parsing, Proceedings of HLT-NAACL'06, America, 2006
[5]
G.G. Geng, X.B. Jin, C.H. Wang. CASIA at WSC2008, Web Spam Challenge'08 http://webspam.lip6.fr,2008.

Cited By

View all
  • (2024)A Survey on the Applications of Semi-supervised Learning to Cyber-securityACM Computing Surveys10.1145/365764756:10(1-41)Online publication date: 22-Jun-2024
  • (2024)GraphSAGE-Based Spammer Detection Using Social Attribute RelationshipTechnologies and Applications of Artificial Intelligence10.1007/978-981-97-1711-8_23(300-313)Online publication date: 28-Mar-2024
  • (2020)Classification of Spamming Attacks to Blogging Websites and Their Security TechniquesEncyclopedia of Criminal Activities and the Deep Web10.4018/978-1-5225-9715-5.ch058(864-880)Online publication date: 2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '09: Proceedings of the 18th international conference on World wide web
April 2009
1280 pages
ISBN:9781605584874
DOI:10.1145/1526709

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. content spam
  2. link spam
  3. machine learning
  4. web spam

Qualifiers

  • Poster

Conference

WWW '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on the Applications of Semi-supervised Learning to Cyber-securityACM Computing Surveys10.1145/365764756:10(1-41)Online publication date: 22-Jun-2024
  • (2024)GraphSAGE-Based Spammer Detection Using Social Attribute RelationshipTechnologies and Applications of Artificial Intelligence10.1007/978-981-97-1711-8_23(300-313)Online publication date: 28-Mar-2024
  • (2020)Classification of Spamming Attacks to Blogging Websites and Their Security TechniquesEncyclopedia of Criminal Activities and the Deep Web10.4018/978-1-5225-9715-5.ch058(864-880)Online publication date: 2020
  • (2017)Measuring and Visualizing the Scrappiness Level of a Website2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)10.1109/SYNASC.2017.00057(304-311)Online publication date: Sep-2017
  • (2017)A Cross-Domain Hidden Spam Detection Method Based on Domain Name ResolutionQuality, Reliability, Security and Robustness in Heterogeneous Networks10.1007/978-3-319-60717-7_1(3-11)Online publication date: 9-Aug-2017
  • (2015)Collective Spammer Detection in Evolving Multi-Relational Social NetworksProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788606(1769-1778)Online publication date: 10-Aug-2015
  • (2015)Combination of multiple bipartite ranking for multipartite web content quality evaluationNeurocomputing10.1016/j.neucom.2014.08.067149(1305-1314)Online publication date: Feb-2015
  • (2013)Co-training based semi-supervised Web spam detection2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)10.1109/FSKD.2013.6816301(789-793)Online publication date: Jul-2013
  • (2013)Web Spam Detection Using MapReduce Approach to Collective ClassificationInternational Joint Conference CISIS’12-ICEUTE´12-SOCO´12 Special Sessions10.1007/978-3-642-33018-6_20(197-206)Online publication date: 2013
  • (2012)Survey on web spam detectionACM SIGKDD Explorations Newsletter10.1145/2207243.220725213:2(50-64)Online publication date: 1-May-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media