skip to main content
research-article
Open Access

Bot Detection in Wikidata Using Behavioral and Other Informal Cues

Published:01 November 2018Publication History
Skip Abstract Section

Abstract

Bots have been important to peer production's success. Wikipedia, OpenStreetMap, and Wikidata all have taken advantage of automation to perform work at a rate and scale exceeding that of human contributors. Understanding the ways in which humans and bots behave in these communities is an important topic, and one that relies on accurate bot recognition. Yet, in many cases, bot activities are not explicitly flagged and could be mistaken for human contributions. We develop a machine classifier to detect previously unidentified bots using implicit behavioral and other informal editing characteristics. We show that this method yields a high level of fitness under both formal evaluation (PR-AUC: 0.845, ROC-AUC: 0.985) and a qualitative analysis of "anonymous" contributor edit sessions. We also show that, in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses. Our model has the potential to support future research and community patrolling activities.

References

  1. R. Stuart Geiger. 2011. The lives of bots. (2011).Google ScholarGoogle Scholar
  2. R. Stuart Geiger and Aaron Halfaker. 2013. When the levee breaks: without bots, what happens to Wikipedia's quality control processes? In OpenSym, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Stuart Geiger and Aaron Halfaker. 2013. Using edit sessions to measure participation in Wikipedia. In CSCW, 861--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Stuart Geiger and Aaron Halfaker. 2017. Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of "Even Good Bots Fight." (2017).Google ScholarGoogle Scholar
  5. Aaron Halfaker, Oliver Keyes, Daniel Kluver, Jacob Thebault-Spieker, Tien Nguyen, Kenneth Shores, Anuradha Uduwage, and Morten Warncke-Wang. 2015. User session identification based on strong regularities in inter-activity time. In WWW, 410--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aaron Halfaker, Aniket Kittur, Robert Kraut, and John Riedl. 2009. A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia. In WikiSym (WikiSym '09), 15:1--15:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aaron Halfaker and John Riedl. 2012. Bots and cyborgs: Wikipedia's immune system. Computer 45, 3 (2012), 79--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrew Hall, Sarah McRoberts, Jacob Thebault-Spieker, Yilun Lin, Shilad Sen, Brent Hecht, and Loren Terveen. 2017. Freedom versus standardization: structured data generation in a peer production community. In CHI, 6352--6362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ah Reum Kang, Jiyoung Woo, Juyong Park, and Huy Kang Kim. 2013. Online game bot detection based on party-play log analysis. Comput. Math. Appl. 65, 9 (2013), 1384--1395.Google ScholarGoogle ScholarCross RefCross Ref
  10. Hongwen Kang, Kuansan Wang, David Soukal, Fritz Behr, and Zijian Zheng. 2010. Large-scale bot detection for search engines. In GROUP, 501--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aniket Kittur, Ed Chi, Bryan A. Pendleton, Bongwon Suh, and Todd Mytkowicz. 2007. Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. World Wide Web 1, 2 (2007), 19.Google ScholarGoogle Scholar
  12. Olena Medelyan, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from Wikipedia. Int. J. Hum.-Comput. Stud. 67, 9 (September 2009), 716--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, and Markus Luczak-Rösch. 2015. Peer-production system or collaborative ontology engineering effort: What is Wikidata? In OpenSym, 20. Retrieved June 27, 2016 from http://dl.acm.org/citation.cfm?id=2789836 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Katherine Panciera, Aaron Halfaker, and Loren Terveen. 2009. Wikipedians Are Born, Not Made: A Study of Power Editors on Wikipedia. In GROUP (GROUP '09), 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alessandro Piscopo, Chris Phethean, and Elena Simperl. 2017. What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata. In SocInfo, 305--322.Google ScholarGoogle Scholar
  16. Martin Potthast, Benno Stein, and Teresa Holfeld. 2010. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In CLEF (Notebook Papers/LABs/Workshops).Google ScholarGoogle Scholar
  17. Amir Sarabadani, Aaron Halfaker, and Dario Taraborelli. 2017. Building automated vandalism detection tools for Wikidata. In WWW, 1647--1654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thomas Steiner. 2014. Bots vs. wikipedians, anons vs. logged-ins (redux): A global study of edit activity on wikipedia and wikidata. In OpenSym, 25. Retrieved June 24, 2016 from http://dl.acm.org/citation.cfm?id=2641613 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pang-Ning Tan and Vipin Kumar. 2004. Discovery of web robot sessions based on their navigational patterns. In Intelligent Technologies for Information Analysis. Springer, 193--222.Google ScholarGoogle Scholar
  20. Ruck Thawonmas, Yoshitaka Kashifuji, and Kuan-Ta Chen. 2008. Detection of MMORPG bots based on behavior analysis. In Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology, 91--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri. 2016. Even Good Bots Fight. ArXiv Prepr. ArXiv160904285 (2016).Google ScholarGoogle Scholar
  22. Morten Warncke-Wang, Vivek Ranjan, Loren Terveen, and Brent Hecht. 2015. Misalignment Between Supply and Demand of Quality Content in Peer Production Communities. In ICWSM. Retrieved September 16, 2016 from http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10591Google ScholarGoogle Scholar
  23. Diyi Yang, Aaron Halfaker, Robert Kraut, and Eduard Hovy. 2017. Identifying Semantic Edit Intentions from Revisions in Wikipedia. In EMNLP 2017, 2000--2010. Retrieved July 8, 2018 from https://www.aclweb.org/anthology/D17--1213Google ScholarGoogle ScholarCross RefCross Ref
  24. Dennis Zielstra, Hartwig H. Hochmair, and Pascal Neis. 2013. Assessing the effect of data imports on the completeness of OpenStreetMap--a United States case study. Trans. GIS 17, 3 (2013), 315--334.Google ScholarGoogle ScholarCross RefCross Ref
  25. 2017. Wikipedia:Bot Approvals Group. Wikipedia. Retrieved January 20, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:Bot_Approvals_Group&oldid=807843217Google ScholarGoogle Scholar
  26. 2017. Wikipedia:History of Wikipedia bots. Wikipedia. Retrieved January 20, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:History_of_Wikipedia_bots&oldid=812914046Google ScholarGoogle Scholar
  27. 2018. Wikipedia:Bot policy. Wikipedia. Retrieved January 20, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:Bot_policy&oldid=820435660Google ScholarGoogle Scholar
  28. 2018. Coding (social sciences). Wikipedia. Retrieved July 8, 2018 from https://en.wikipedia.org/w/index.php?title=Coding_(social_sciences)&oldid=834193623Google ScholarGoogle Scholar
  29. 2018. Wikipedia:AutoWikiBrowser. Wikipedia. Retrieved July 8, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:AutoWikiBrowser&oldid=840931199Google ScholarGoogle Scholar
  30. Wikidata:Bots - Wikidata. Retrieved July 4, 2018 from https://www.wikidata.org/wiki/Wikidata:BotsGoogle ScholarGoogle Scholar
  31. TIGER - OpenStreetMap Wiki. Retrieved January 20, 2018 from https://wiki.openstreetmap.org/wiki/TIGERGoogle ScholarGoogle Scholar
  32. TIGER fixup - OpenStreetMap Wiki. Retrieved January 20, 2018 from https://wiki.openstreetmap.org/wiki/TIGER_fixupGoogle ScholarGoogle Scholar
  33. Import/Guidelines - OpenStreetMap Wiki. Retrieved January 20, 2018 from https://wiki.openstreetmap.org/wiki/Import/GuidelinesGoogle ScholarGoogle Scholar
  34. Who Writes Wikipedia? (Aaron Swartz's Raw Thought). Retrieved January 19, 2018 from http://www.aaronsw.com/weblog/whowriteswikipediaGoogle ScholarGoogle Scholar
  35. Research:Measuring edit productivity - Meta. Retrieved April 17, 2018 from https://meta.wikimedia.org/wiki/Research:Measuring_edit_productivityGoogle ScholarGoogle Scholar
  36. Wikidata:Glossary - Wikidata. Retrieved July 1, 2018 from https://www.wikidata.org/wiki/Wikidata:GlossaryGoogle ScholarGoogle Scholar
  37. Manual:Tags - MediaWiki. Retrieved July 9, 2018 from https://www.mediawiki.org/wiki/Manual:TagsGoogle ScholarGoogle Scholar
  38. Proposed features/changeset tags - OpenStreetMap Wiki. Retrieved September 3, 2018 from https://wiki.openstreetmap.org/wiki/Proposed_features/changeset_tagsGoogle ScholarGoogle Scholar

Index Terms

  1. Bot Detection in Wikidata Using Behavioral and Other Informal Cues

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader