skip to main content
10.1145/2835776.2835798acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open Access

Evolution of Privacy Loss in Wikipedia

Authors Info & Claims
Published:08 February 2016Publication History

ABSTRACT

The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.

References

  1. Supplementary material: Evolution of privacy loss in Wikipedia, 2015. http://goo.gl/JT6WK7.Google ScholarGoogle Scholar
  2. A. Acquisti, L. K. John, and G. Loewenstein. What is privacy worth? The Journal of Legal Studies, 42(2):249--274, June 2013.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Almeida, B. Mozafari, and J. Cho. On the evolution of Wikipedia. In ICWSM '07, 2007.Google ScholarGoogle Scholar
  4. D. Barth-Jones, K. E. Emam, J. Bambauer, a. Cavoukian, and B. Malin. Assessing data intrusion threats. Science, 348(6231):194--195, Apr. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. d. boyd and A. E. Marwick. Social Privacy in Networked Publics: Teens' Attitudes, Practices, and Strategies. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, sep 2011.Google ScholarGoogle Scholar
  6. C. Danescu-Niculescu-Mizil, L. Lee, B. Pang, and J. Kleinberg. Echoes of power: Language effects and power differences in social interaction. In WWW, page 699, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3, 2013.Google ScholarGoogle Scholar
  8. A. Gibbons, D. Vetrano, and S. Biancani. Wikipedia: Nowhere to grow. Tech. report, Standford, 2012.Google ScholarGoogle Scholar
  9. A. Halfaker, R. S. Geiger, J. T. Morgan, and J. Riedl. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist, 57(5):664--688, Dec. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  10. C. J. Hoofnagle, J. King, S. Li, and J. Turow. How Different are Young Adults from Older Adults When it Comes to Information Privacy Attitudes and Policies? Ssrn scholarly paper, Apr. 2010.Google ScholarGoogle Scholar
  11. G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning, volume 103 of Springer Texts in Statistics. Springer New York, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. PNAS, 110(15):5802--5805, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. J. MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A.-M. Meyer and D. Gotz. A new privacy debate. Science, 348(6231):194--194, Apr. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. E. Meyer. R package 'infotheo', 2014.Google ScholarGoogle Scholar
  16. Y.-a. D. Montjoye and a. S. Pentland. Assessing data intrusion threats--Response. Science, 348(6231):195--195, Apr. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  17. Y.-a. D. Montjoye, L. Radaelli, V. K. Singh, and a. S. Pentland. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, 347(6221):536--539, Jan. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Narayanan, E. Shi, and B. I. P. Rubinstein. Link prediction by de-anonymization: How we won the kaggle social network challenge. In IJCNN, pages 1825--1834, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Narayanan and V. Shmatikov. Myths and fallacies of personally identifiable information. Comm. of the ACM, 53(6):24--26, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Ramachandran and A. Chaintreau. The Network Effect of Privacy Choices. In Workshop EcoNet, pages 1--4, 2015.Google ScholarGoogle Scholar
  21. J. Saramaki, E. A. Leicht, E. Lopez, S. G. B. Roberts, F. Reed-Tsochas, and R. I. M. Dunbar. The persistence of social signatures in human communication. PNAS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. B. Suh, G. Convertino, E. H. Chi, and P. Pirolli. The Singularity is Not Near: Slowing Growth of Wikipedia. In WikiSym '09, pages 8:1--8:10. ACM, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Ver Steeg and A. Galstyan. Information transfer in social media. In WWW '12, page 509, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Wasserman and S. Zhou. A statistical framework for differential privacy. Jour. of American Stat. Assoc., 105(489):375--389, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. H. T. Welser, D. Cosley, G. Kossinets, A. Lin, F. Dokshin, G. Gay, and M. Smith. Finding social roles in wikipedia. In iConference, pages 122--129. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Youyou, M. Kosinski, and D. Stillwell. Computer-based personality judgments are more accurate than those made by humans. PNAS, 2014.Google ScholarGoogle Scholar

Index Terms

  1. Evolution of Privacy Loss in Wikipedia

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
            February 2016
            746 pages
            ISBN:9781450337168
            DOI:10.1145/2835776

            Copyright © 2016 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 February 2016

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            WSDM '16 Paper Acceptance Rate67of368submissions,18%Overall Acceptance Rate498of2,863submissions,17%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader