skip to main content
research-article

Robust Spammer Detection in Microblogs: Leveraging User Carefulness

Authors Info & Claims
Published:18 August 2017Publication History
Skip Abstract Section

Abstract

Microblogging Web sites, such as Twitter and Sina Weibo, have become popular platforms for socializing and sharing information in recent years. Spammers have also discovered this new opportunity to unfairly overpower normal users with unsolicited content, namely social spams. Although it is intuitive for everyone to follow legitimate users, recent studies show that both legitimate users and spammers follow spammers for different reasons. Evidence of users seeking spammers on purpose is also observed. We regard this behavior as useful information for spammer detection. In this article, we approach the problem of spammer detection by leveraging the “carefulness” of users, which indicates how careful a user is when she is about to follow a potential spammer. We propose a framework to measure the carefulness and develop a supervised learning algorithm to estimate it based on known spammers and legitimate users. We illustrate how the robustness of the detection algorithms can be improved with aid of the proposed measure. Evaluation on two real datasets from Sina Weibo and Twitter with millions of users are performed, as well as an online test on Sina Weibo. The results show that our approach indeed captures the carefulness, and it is effective for detecting spammers. In addition, we find that our measure is also beneficial for other applications, such as link prediction.

References

  1. Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the Web. Social Networks 25, 3, 211--230. Google ScholarGoogle ScholarCross RefCross Ref
  2. Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Fabrıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgılio Almeida. 2010. Detecting spammers on Twitter. In Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference. 12.Google ScholarGoogle Scholar
  4. Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Lería, Jose Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Íntegro: Leveraging victim prediction for robust fake account detection in OSNs. In Proceedings of the 2015 Network and Distributed System Security Symposium. Google ScholarGoogle ScholarCross RefCross Ref
  5. P. O. Boykin and V. P. Roychowdhury. 2005. Leveraging social networks to fight spam. Computer 38, 4, 61--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. 2005. MailRank: Using ranking for spam detection. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 373--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting sybil nodes using social networks. In Proceedings of the ISOC Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  9. Peng Gao, Neil Zhenqiang Gong, Sanjeev Kulkarni, Kurt Thomas, and Prateek Mittal. 2015. SybilFrame: A defense-in-depth framework for structure-based sybil detection. arXiv:1503.02985.Google ScholarGoogle Scholar
  10. Sheng Gao, Ludovic Denoyer, and Patrick Gallinari. 2011. Temporal link prediction by integrating content and structure information. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1169--1174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gautam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi. 2012. Understanding and combating link farming in the Twitter social network. In Proceedings of the 21st International Conference on World Wide Web. 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Neil Zhenqiang Gong, Michael Frank, and Payal Mittal. 2014a. SybilBelief: A semi-supervised learning approach for structure-based sybil detection. IEEE Transactions on Information Forensics and Security 9, 6, 976--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine Runting Shi, and Dawn Song. 2014b. Joint link prediction and attribute inference using a social-attribute network. ACM Transactions on Intelligent Systems and Technology 5, 2, Article No. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010. @Spam: The underground on 140 characters or less. In Proceedings of the 17th ACM Conference on Computer and Communications Security. 27--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. 2004. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases. 576--587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2007. Fighting spam on social Web sites: A survey of approaches and future challenges. IEEE Internet Computing 11, 6, 36--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. John Hopcroft, Tiancheng Lou, and Jie Tang. 2011. Who will follow you back? Reciprocal relationship prediction. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1137--1146.Google ScholarGoogle Scholar
  18. Xia Hu, Jiliang Tang, and Huan Liu. 2014. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2633--2639.Google ScholarGoogle Scholar
  20. Junxian Huang, Yinglian Xie, Fang Yu, Qifa Ke, Martin Abadi, Eliot Gillum, and Z. Morley Mao. 2013. SocialWatch: Detection of online service abuse via large-scale social graphs. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer, and Communications Security. 143--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining. 631--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the 12th International Conference on Information and Knowledge Management. 556--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. 243--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Binghui Wang, Le Zhang, and Neil Zhenqiang Gong. 2017. SybilSCAR: Sybil detection in online social networks via local rule based propagation. In Proceedings of the IEEE International Conference on Computer Communications.Google ScholarGoogle ScholarCross RefCross Ref
  28. Dashun Wang, Dino Pedreschi, Chaoming Song, Fosca Giannotti, and Albert-Laszlo Barabasi. 2011. Human mobility, social ties, and link prediction. In Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining. 1100--1108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. TwitterRank: Finding topic-sensitive influential Twitterers. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 261--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Q. Xu, E. W. Xiang, Q. Yang, J. Du, and J. Zhong. 2012. SMS spam detection using noncontent features. IEEE Intelligent Systems 27, 6, 44--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jilong Xue, Zhi Yang, Xiaoyong Yang, Xiao Wang, Lijiang Chen, and Yafei Dai. 2013. VoteTrust: Leveraging friend invitation graph to defend against social network sybils. In Proceedings of the 32nd IEEE International Conference on Computer Communications. 2400--2408. Google ScholarGoogle ScholarCross RefCross Ref
  32. Lian Yan, Robert H. Dodier, Michael Mozer, and Richard H. Wolniewicz. 2003. Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning. 848--855.Google ScholarGoogle Scholar
  33. Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web. 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2011. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. ACM, New York, NY, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sarita Yardi, Daniel Romero, and Grant Schoenebeck. 2009. Detecting spam in a Twitter network. First Monday 15, 1. Google ScholarGoogle ScholarCross RefCross Ref
  36. Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008. SybilLimit: A near-optimal social network defense against sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy. 3--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. Computer Communication Review 36, 4, 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. L. Yu, S. Asur, and B. A. Huberman. 2012. Artificial inflation: The real story of trends and trend-setters in Sina Weibo. In Proceedings of the International Conference on Privacy, Security, Risk, and Trust, and the International Conference on Social Computing. 514--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yin Zhu, Xiao Wang, Erheng Zhong, Nathan Nan Liu, He Li, and Qiang Yang. 2012. Discovering spammers in social networks. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar

Index Terms

  1. Robust Spammer Detection in Microblogs: Leveraging User Carefulness

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 6
      Survey Paper, Regular Papers and Special Issue: Social Media Processing
      November 2017
      265 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3127339
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 August 2017
      • Accepted: 1 March 2017
      • Revised: 1 May 2016
      • Received: 1 December 2015
      Published in tist Volume 8, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader