skip to main content
10.1145/2983323.2983914acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

DePP: A System for Detecting Pages to Protect in Wikipedia

Published:24 October 2016Publication History

ABSTRACT

Wikipedia is based on the idea that anyone can make edits to the website in order to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedia's intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where only certain users can make revisions to the page. This allows administrators to protect pages from vandalism, libel, and edit wars. However, with over five million pages on Wikipedia, it is impossible for administrators to monitor all pages and manually enforce page protection. In this paper we consider for the first time the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (i) users page revision behavior and (ii) page categories. We tested our system, called DePP, on a new dataset we built consisting of 13.6K pages (half protected and half unprotected) and 1.9M edits. Experimental results show that DePP reaches 93.24% classification accuracy and significantly improves over baselines.

References

  1. http://en.wikipedia.org/wiki/User:ClueBot_NG.Google ScholarGoogle Scholar
  2. http://en.wikipedia.org/wiki/Wikipedia:STiki.Google ScholarGoogle Scholar
  3. B. T. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In CICLing, pages 277--288, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. M. Hill and A. D. Shaw. Page protection: another missing dimension of wikipedia research. In OpenSym, pages 15:1--15:4, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Kumar, F. Spezzano, and V. S. Subrahmanian. VEWS: A wikipedia vandal early warning system. In SIGKDD, pages 607--616, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Kumar, R. West, and J. Leskovec. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In WWW, pages 591--602, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. W. McDonald, S. Javanmardi, and M. Zachry. Finding patterns in behavioral observations by automatically labeling forms of wikiwork in barnstars. In OpenSym, pages 15--24, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Priedhorsky, J. Chen, S. T. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in wikipedia. In GROUP, pages 259--268, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Roitman, S. Hummel, E. Rabinovich, B. Sznajder, N. Slonim, and E. Aharoni. On the retrieval of wikipedia articles containing claims on controversial topics. In WWW Companion, pages 991--996, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. B. Viégas, M. Wattenberg, and M. M. McKeon. The hidden order of wikipedia. In OCSC, pages 445--454, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DePP: A System for Detecting Pages to Protect in Wikipedia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
      October 2016
      2566 pages
      ISBN:9781450340731
      DOI:10.1145/2983323

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader