ABSTRACT
The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.
- Supplementary material: Evolution of privacy loss in Wikipedia, 2015. http://goo.gl/JT6WK7.Google Scholar
- A. Acquisti, L. K. John, and G. Loewenstein. What is privacy worth? The Journal of Legal Studies, 42(2):249--274, June 2013.Google ScholarCross Ref
- R. Almeida, B. Mozafari, and J. Cho. On the evolution of Wikipedia. In ICWSM '07, 2007.Google Scholar
- D. Barth-Jones, K. E. Emam, J. Bambauer, a. Cavoukian, and B. Malin. Assessing data intrusion threats. Science, 348(6231):194--195, Apr. 2015.Google ScholarCross Ref
- d. boyd and A. E. Marwick. Social Privacy in Networked Publics: Teens' Attitudes, Practices, and Strategies. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, sep 2011.Google Scholar
- C. Danescu-Niculescu-Mizil, L. Lee, B. Pang, and J. Kleinberg. Echoes of power: Language effects and power differences in social interaction. In WWW, page 699, 2012. Google ScholarDigital Library
- Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3, 2013.Google Scholar
- A. Gibbons, D. Vetrano, and S. Biancani. Wikipedia: Nowhere to grow. Tech. report, Standford, 2012.Google Scholar
- A. Halfaker, R. S. Geiger, J. T. Morgan, and J. Riedl. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist, 57(5):664--688, Dec. 2012.Google ScholarCross Ref
- C. J. Hoofnagle, J. King, S. Li, and J. Turow. How Different are Young Adults from Older Adults When it Comes to Information Privacy Attitudes and Policies? Ssrn scholarly paper, Apr. 2010.Google Scholar
- G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning, volume 103 of Springer Texts in Statistics. Springer New York, 2013. Google ScholarDigital Library
- M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. PNAS, 110(15):5802--5805, 2013.Google ScholarCross Ref
- D. J. MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003. Google ScholarDigital Library
- A.-M. Meyer and D. Gotz. A new privacy debate. Science, 348(6231):194--194, Apr. 2015.Google ScholarCross Ref
- P. E. Meyer. R package 'infotheo', 2014.Google Scholar
- Y.-a. D. Montjoye and a. S. Pentland. Assessing data intrusion threats--Response. Science, 348(6231):195--195, Apr. 2015.Google ScholarCross Ref
- Y.-a. D. Montjoye, L. Radaelli, V. K. Singh, and a. S. Pentland. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, 347(6221):536--539, Jan. 2015.Google ScholarCross Ref
- A. Narayanan, E. Shi, and B. I. P. Rubinstein. Link prediction by de-anonymization: How we won the kaggle social network challenge. In IJCNN, pages 1825--1834, 2011.Google ScholarCross Ref
- A. Narayanan and V. Shmatikov. Myths and fallacies of personally identifiable information. Comm. of the ACM, 53(6):24--26, June 2010. Google ScholarDigital Library
- A. Ramachandran and A. Chaintreau. The Network Effect of Privacy Choices. In Workshop EcoNet, pages 1--4, 2015.Google Scholar
- J. Saramaki, E. A. Leicht, E. Lopez, S. G. B. Roberts, F. Reed-Tsochas, and R. I. M. Dunbar. The persistence of social signatures in human communication. PNAS, 2014.Google ScholarCross Ref
- B. Suh, G. Convertino, E. H. Chi, and P. Pirolli. The Singularity is Not Near: Slowing Growth of Wikipedia. In WikiSym '09, pages 8:1--8:10. ACM, Oct. 2009. Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557--570, 2002. Google ScholarDigital Library
- G. Ver Steeg and A. Galstyan. Information transfer in social media. In WWW '12, page 509, 2012. Google ScholarDigital Library
- L. Wasserman and S. Zhou. A statistical framework for differential privacy. Jour. of American Stat. Assoc., 105(489):375--389, 2010.Google ScholarCross Ref
- H. T. Welser, D. Cosley, G. Kossinets, A. Lin, F. Dokshin, G. Gay, and M. Smith. Finding social roles in wikipedia. In iConference, pages 122--129. ACM, 2011. Google ScholarDigital Library
- W. Youyou, M. Kosinski, and D. Stillwell. Computer-based personality judgments are more accurate than those made by humans. PNAS, 2014.Google Scholar
Index Terms
- Evolution of Privacy Loss in Wikipedia
Recommendations
An analytical framework for online privacy research
An analytical framework is suggested for interdisciplinary online privacy research.Websites managers views and knowledge is a neglected topic in privacy research.Websites managers indicate that their own websites do not violate users privacy.The younger ...
A Typology of Online Privacy Personalities: Exploring and Segmenting Users’ Diverse Privacy Attitudes and Behaviors
AbstractWith our lives being increasingly digital, most users are concerned about their online privacy. Still, many users provide manifold data online and show no protection behaviors. Research has found different explanations for this privacy paradox: ...
Adolescents' privacy concerns and information disclosure online
This study investigated the role of parents and the Internet in adolescents' online privacy concerns and information disclosing behaviors. Specifically, instructive and restrictive parental mediation, adolescents' self-disclosure to parents about their ...
Comments