research-article

Robust Spammer Detection in Microblogs: Leveraging User Carefulness

Authors:
Hao Fu

University of Science and Technology of China, Hefei, Anhui, PR China

University of Science and Technology of China, Hefei, Anhui, PR China

0000-0003-1068-9465
View Profile

,
Xing Xie

Microsoft Research, Beijing, PR China

Microsoft Research, Beijing, PR China
View Profile

,
Yong Rui

Microsoft Research, Beijing, PR China

Microsoft Research, Beijing, PR China
View Profile

,
Neil Zhenqiang Gong

Iowa State University, Ames, USA

Iowa State University, Ames, USA
View Profile

,
Guangzhong Sun

University of Science and Technology of China, Hefei, Anhui, PR China

University of Science and Technology of China, Hefei, Anhui, PR China
View Profile

,
Enhong Chen

University of Science and Technology of China, Hefei, Anhui, PR China

University of Science and Technology of China, Hefei, Anhui, PR China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 8 Issue 6Article No.: 83pp 1–31https://doi.org/10.1145/3086637

Published:18 August 2017Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Microblogging Web sites, such as Twitter and Sina Weibo, have become popular platforms for socializing and sharing information in recent years. Spammers have also discovered this new opportunity to unfairly overpower normal users with unsolicited content, namely social spams. Although it is intuitive for everyone to follow legitimate users, recent studies show that both legitimate users and spammers follow spammers for different reasons. Evidence of users seeking spammers on purpose is also observed. We regard this behavior as useful information for spammer detection. In this article, we approach the problem of spammer detection by leveraging the “carefulness” of users, which indicates how careful a user is when she is about to follow a potential spammer. We propose a framework to measure the carefulness and develop a supervised learning algorithm to estimate it based on known spammers and legitimate users. We illustrate how the robustness of the detection algorithms can be improved with aid of the proposed measure. Evaluation on two real datasets from Sina Weibo and Twitter with millions of users are performed, as well as an online test on Sina Weibo. The results show that our approach indeed captures the carefulness, and it is effective for detecting spammers. In addition, we find that our measure is also beneficial for other applications, such as link prediction.

References

Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the Web. Social Networks 25, 3, 211--230. Google ScholarCross Ref
Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 635--644. Google ScholarDigital Library
Fabrıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgılio Almeida. 2010. Detecting spammers on Twitter. In Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference. 12.Google Scholar
Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Lería, Jose Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Íntegro: Leveraging victim prediction for robust fake account detection in OSNs. In Proceedings of the 2015 Network and Distributed System Security Symposium. Google ScholarCross Ref
P. O. Boykin and V. P. Roychowdhury. 2005. Leveraging social networks to fight spam. Computer 38, 4, 61--68. Google ScholarDigital Library
Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15.Google ScholarDigital Library
Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. 2005. MailRank: Using ranking for spam detection. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 373--380. Google ScholarDigital Library
George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting sybil nodes using social networks. In Proceedings of the ISOC Network and Distributed System Security Symposium.Google Scholar
Peng Gao, Neil Zhenqiang Gong, Sanjeev Kulkarni, Kurt Thomas, and Prateek Mittal. 2015. SybilFrame: A defense-in-depth framework for structure-based sybil detection. arXiv:1503.02985.Google Scholar
Sheng Gao, Ludovic Denoyer, and Patrick Gallinari. 2011. Temporal link prediction by integrating content and structure information. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1169--1174. Google ScholarDigital Library
Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gautam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi. 2012. Understanding and combating link farming in the Twitter social network. In Proceedings of the 21st International Conference on World Wide Web. 61--70. Google ScholarDigital Library
Neil Zhenqiang Gong, Michael Frank, and Payal Mittal. 2014a. SybilBelief: A semi-supervised learning approach for structure-based sybil detection. IEEE Transactions on Information Forensics and Security 9, 6, 976--987. Google ScholarDigital Library
Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine Runting Shi, and Dawn Song. 2014b. Joint link prediction and attribute inference using a social-attribute network. ACM Transactions on Intelligent Systems and Technology 5, 2, Article No. 27. Google ScholarDigital Library
Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010. @Spam: The underground on 140 characters or less. In Proceedings of the 17th ACM Conference on Computer and Communications Security. 27--37. Google ScholarDigital Library
Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. 2004. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases. 576--587.Google ScholarDigital Library
Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2007. Fighting spam on social Web sites: A survey of approaches and future challenges. IEEE Internet Computing 11, 6, 36--45. Google ScholarDigital Library
John Hopcroft, Tiancheng Lou, and Jie Tang. 2011. Who will follow you back? Reciprocal relationship prediction. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1137--1146.Google Scholar
Xia Hu, Jiliang Tang, and Huan Liu. 2014. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 547--556. Google ScholarDigital Library
Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2633--2639.Google Scholar
Junxian Huang, Yinglian Xie, Fang Yu, Qifa Ke, Martin Abadi, Eliot Gillum, and Z. Morley Mao. 2013. SocialWatch: Detection of online service abuse via large-scale social graphs. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer, and Communications Security. 143--148. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. 591--600. Google ScholarDigital Library
Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 435--442. Google ScholarDigital Library
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining. 631--636. Google ScholarDigital Library
David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the 12th International Conference on Information and Knowledge Management. 556--559. Google ScholarDigital Library
Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. 1--9. Google ScholarDigital Library
Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. 243--258. Google ScholarDigital Library
Binghui Wang, Le Zhang, and Neil Zhenqiang Gong. 2017. SybilSCAR: Sybil detection in online social networks via local rule based propagation. In Proceedings of the IEEE International Conference on Computer Communications.Google ScholarCross Ref
Dashun Wang, Dino Pedreschi, Chaoming Song, Fosca Giannotti, and Albert-Laszlo Barabasi. 2011. Human mobility, social ties, and link prediction. In Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining. 1100--1108. Google ScholarDigital Library
Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. TwitterRank: Finding topic-sensitive influential Twitterers. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 261--270. Google ScholarDigital Library
Q. Xu, E. W. Xiang, Q. Yang, J. Du, and J. Zhong. 2012. SMS spam detection using noncontent features. IEEE Intelligent Systems 27, 6, 44--51. Google ScholarDigital Library
Jilong Xue, Zhi Yang, Xiaoyong Yang, Xiao Wang, Lijiang Chen, and Yafei Dai. 2013. VoteTrust: Leveraging friend invitation graph to defend against social network sybils. In Proceedings of the 32nd IEEE International Conference on Computer Communications. 2400--2408. Google ScholarCross Ref
Lian Yan, Robert H. Dodier, Michael Mozer, and Richard H. Wolniewicz. 2003. Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning. 848--855.Google Scholar
Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web. 71--80. Google ScholarDigital Library
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2011. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. ACM, New York, NY, 259--268. Google ScholarDigital Library
Sarita Yardi, Daniel Romero, and Grant Schoenebeck. 2009. Detecting spam in a Twitter network. First Monday 15, 1. Google ScholarCross Ref
Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008. SybilLimit: A near-optimal social network defense against sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy. 3--17. Google ScholarDigital Library
Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. Computer Communication Review 36, 4, 267--278. Google ScholarDigital Library
L. L. Yu, S. Asur, and B. A. Huberman. 2012. Artificial inflation: The real story of trends and trend-setters in Sina Weibo. In Proceedings of the International Conference on Privacy, Security, Risk, and Trust, and the International Conference on Social Computing. 514--519. Google ScholarDigital Library
Yin Zhu, Xiao Wang, Erheng Zhong, Nathan Nan Liu, He Li, and Qiang Yang. 2012. Discovering spammers in social networks. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.Google Scholar

Index Terms

Robust Spammer Detection in Microblogs: Leveraging User Carefulness
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Leveraging Careful Microblog Users for Spammer Detection
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Microblogging websites, e.g. Twitter and Sina Weibo, have become a popular platform for socializing and sharing information in recent years. Spammers have also discovered this new opportunity to unfairly overpower normal users with unsolicited content, ...
Read More
Discovering spammer communities in twitter

Online social networks have become immensely popular in recent years and have become the major sources for tracking the reverberation of events and news throughout the world. However, the diversity and popularity of online social networks attract ...
Read More
Leveraging knowledge across media for spammer detection in microblogging
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 8, Issue 6
Survey Paper, Regular Papers and Special Issue: Social Media Processing
November 2017
265 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3127339
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 August 2017
- Accepted: 1 March 2017
- Revised: 1 May 2016
- Received: 1 December 2015
Published in tist Volume 8, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Spammer detection
microblog
social network
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 370
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust Spammer Detection in Microblogs: Leveraging User Carefulness

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Leveraging Careful Microblog Users for Spammer Detection

Discovering spammer communities in twitter

Leveraging knowledge across media for spammer detection in microblogging