skip to main content
10.1145/2665943.2665958acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

Published:03 November 2014Publication History

ABSTRACT

Detecting the presence and amount of private information being shared in online media is the first step towards analyzing information revealing habits of users in social networks and a useful method for researchers to study aggregate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learning based approach `privacy detective' that combines topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that focus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45% accuracy in a two-class task classifying Twitter users who do not reveal much private information and Twitter users who share sensitive information. We score timelines according to three privacy levels after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63% accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores between various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user's privacy level is correlated with her friends' privacy scores and also with the privacy scores of people mentioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions.

References

  1. https://opennlp.apache.org.Google ScholarGoogle Scholar
  2. http://alias-i.com/lingpipe. October 2008.Google ScholarGoogle Scholar
  3. S. Aksoy and R. M. Haralick. Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognition Letters, 22(5):563--582, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011.Google ScholarGoogle Scholar
  7. P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational linguistics, 18(4):467--479, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Chow, I. Oberst, and J. Staddon. Sanitization's slippery slope: the design and study of a text revision assistant. In Proceedings of the 5th Symposium on Usable Privacy and Security, page 13. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. A. Christakis and J. H. Fowler. The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4):370--379, 2007.Google ScholarGoogle Scholar
  10. N. A. Christakis and J. H. Fowler. The collective dynamics of smoking in a large social network. New England journal of medicine, 358(21):2249--2258, 2008.Google ScholarGoogle Scholar
  11. J. Cohen. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1):37, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Cohen. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. D. Cristofaro, C. Soriente, G. Tsudik, and A. Williams. Hummingbird: Privacy at the time of twitter. In IEEE Symposium on Security and Privacy, pages 285--299. IEEE Computer Society, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Freund, R. E. Schapire, et al. Experiments with a new boosting algorithm. In ICML, volume 96, pages 148--156, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. J. Gill, A. Vasalou, C. Papoutsi, and A. N. Joinson. Privacy dictionary: a linguistic taxonomy of privacy for content analysis. In Proceedings of the 2011 annual conference on Human factors in computing systems, pages 3227--3236. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hart, P. Manadhata, and R. Johnson. Text classification for data loss prevention. In Privacy Enhancing Technologies, pages 18--37. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. R. Landis, G. G. Koch, et al. The measurement of observer agreement for categorical data. biometrics, 33(1):159--174, 1977.Google ScholarGoogle Scholar
  18. J. H. Lau, N. Collier, and T. Baldwin. On-line trend analysis with topic models:#twitter trends detection topic model online. In COLING, pages 1519--1534, 2012.Google ScholarGoogle Scholar
  19. K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(1):6, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Mao, X. Shuai, and A. Kapadia. Loose tweets: an analysis of privacy leaks on twitter. In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society, pages 1--12. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. K. McCallum. Mallet: A machine learning for language toolkit. 2002.Google ScholarGoogle Scholar
  22. O. Owoputi, B. O'Connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL-HLT, pages 380--390, 2013.Google ScholarGoogle Scholar
  23. J. W. Pennebaker, M. E. Francis, and R. J. Booth. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 2001.Google ScholarGoogle Scholar
  24. J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods Support Vector Learning, 208(MSR-TR-98--14):1--21, 1998.Google ScholarGoogle Scholar
  25. A. Ritter, S. Clark, O. Etzioni, et al. Named entity recognition in tweets: an experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1524--1534. Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Sleeper, J. Cranshaw, P. G. Kelley, B. Ur, A. Acquisti, L. F. Cranor, and N. Sadeh. i read my twitter the next morning and was astonished: a conversational perspective on twitter regrets. In Proceedings of the 2013 ACM annual conference on Human factors in computing systems, pages 3277--3286. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Thomas, C. Grier, and D. M. Nicol. unfriendly: Multi-party privacy risks in social networks. In M. J. Atallah and N. J. Hopper, editors, Privacy Enhancing Technologies, volume 6205 of Lecture Notes in Computer Science, pages 236--252. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Vasalou, A. J. Gill, F. Mazanderani, C. Papoutsi, and A. Joinson. Privacy dictionary: A new resource for the automated content analysis of privacy. Journal of the American Society for Information Science and Technology, 62(11):2095--2105, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Wang, G. Norcie, S. Komanduri, A. Acquisti, P. G. Leon, and L. F. Cranor. "i regretted the minute i pressed share": A qualitative study of regrets on facebook. In Proceedings of the Seventh Symposium on Usable Privacy and Security, SOUPS '11, pages 10:1--10:16, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Xue, D. Yin, B. D. Davison, and B. Davison. Normalizing microtext. In Analyzing Microtext, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic Society
          November 2014
          218 pages
          ISBN:9781450331487
          DOI:10.1145/2665943

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WPES '14 Paper Acceptance Rate26of67submissions,39%Overall Acceptance Rate106of355submissions,30%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader