ABSTRACT
Wikipedia is based on the idea that anyone can make edits to the website in order to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedia's intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where only certain users can make revisions to the page. This allows administrators to protect pages from vandalism, libel, and edit wars. However, with over five million pages on Wikipedia, it is impossible for administrators to monitor all pages and manually enforce page protection. In this paper we consider for the first time the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (i) users page revision behavior and (ii) page categories. We tested our system, called DePP, on a new dataset we built consisting of 13.6K pages (half protected and half unprotected) and 1.9M edits. Experimental results show that DePP reaches 93.24% classification accuracy and significantly improves over baselines.
- http://en.wikipedia.org/wiki/User:ClueBot_NG.Google Scholar
- http://en.wikipedia.org/wiki/Wikipedia:STiki.Google Scholar
- B. T. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In CICLing, pages 277--288, 2011. Google ScholarDigital Library
- B. M. Hill and A. D. Shaw. Page protection: another missing dimension of wikipedia research. In OpenSym, pages 15:1--15:4, 2015. Google ScholarDigital Library
- S. Kumar, F. Spezzano, and V. S. Subrahmanian. VEWS: A wikipedia vandal early warning system. In SIGKDD, pages 607--616, 2015. Google ScholarDigital Library
- S. Kumar, R. West, and J. Leskovec. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In WWW, pages 591--602, 2016. Google ScholarDigital Library
- D. W. McDonald, S. Javanmardi, and M. Zachry. Finding patterns in behavioral observations by automatically labeling forms of wikiwork in barnstars. In OpenSym, pages 15--24, 2011. Google ScholarDigital Library
- R. Priedhorsky, J. Chen, S. T. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in wikipedia. In GROUP, pages 259--268, 2007. Google ScholarDigital Library
- H. Roitman, S. Hummel, E. Rabinovich, B. Sznajder, N. Slonim, and E. Aharoni. On the retrieval of wikipedia articles containing claims on controversial topics. In WWW Companion, pages 991--996, 2016. Google ScholarDigital Library
- F. B. Viégas, M. Wattenberg, and M. M. McKeon. The hidden order of wikipedia. In OCSC, pages 445--454, 2007. Google ScholarDigital Library
Index Terms
- DePP: A System for Detecting Pages to Protect in Wikipedia
Recommendations
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
DAWT: Densely Annotated Wikipedia Texts Across Multiple Languages
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The ...
Comments