research-article

Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

Authors:
Aylin Caliskan Islam

Drexel University, Philadelphia, PA, USA

Drexel University, Philadelphia, PA, USA
View Profile

,
Jonathan Walsh

Drexel University, Philadelphia, PA, USA

Drexel University, Philadelphia, PA, USA
View Profile

,
Rachel Greenstadt

Drexel University, Philadelphia, PA, USA

Drexel University, Philadelphia, PA, USA
View Profile

WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic SocietyNovember 2014Pages 35–46https://doi.org/10.1145/2665943.2665958

Published:03 November 2014Publication History

WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic Society

Pages 35–46

ABSTRACT

Detecting the presence and amount of private information being shared in online media is the first step towards analyzing information revealing habits of users in social networks and a useful method for researchers to study aggregate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learning based approach `privacy detective' that combines topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that focus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45% accuracy in a two-class task classifying Twitter users who do not reveal much private information and Twitter users who share sensitive information. We score timelines according to three privacy levels after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63% accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores between various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user's privacy level is correlated with her friends' privacy scores and also with the privacy scores of people mentioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions.

References

https://opennlp.apache.org.Google Scholar
http://alias-i.com/lingpipe. October 2008.Google Scholar
S. Aksoy and R. M. Haralick. Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognition Letters, 22(5):563--582, 2001. Google ScholarDigital Library
D. Blei. Probabilistic topic models. Communications of the ACM, 55(4), 2012. Google ScholarDigital Library
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011.Google Scholar
P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational linguistics, 18(4):467--479, 1992. Google ScholarDigital Library
R. Chow, I. Oberst, and J. Staddon. Sanitization's slippery slope: the design and study of a text revision assistant. In Proceedings of the 5th Symposium on Usable Privacy and Security, page 13. ACM, 2009. Google ScholarDigital Library
N. A. Christakis and J. H. Fowler. The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4):370--379, 2007.Google Scholar
N. A. Christakis and J. H. Fowler. The collective dynamics of smoking in a large social network. New England journal of medicine, 358(21):2249--2258, 2008.Google Scholar
J. Cohen. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1):37, 1960.Google ScholarCross Ref
J. Cohen. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213, 1968.Google ScholarCross Ref
E. D. Cristofaro, C. Soriente, G. Tsudik, and A. Williams. Hummingbird: Privacy at the time of twitter. In IEEE Symposium on Security and Privacy, pages 285--299. IEEE Computer Society, 2012. Google ScholarDigital Library
Y. Freund, R. E. Schapire, et al. Experiments with a new boosting algorithm. In ICML, volume 96, pages 148--156, 1996.Google ScholarDigital Library
A. J. Gill, A. Vasalou, C. Papoutsi, and A. N. Joinson. Privacy dictionary: a linguistic taxonomy of privacy for content analysis. In Proceedings of the 2011 annual conference on Human factors in computing systems, pages 3227--3236. ACM, 2011. Google ScholarDigital Library
M. Hart, P. Manadhata, and R. Johnson. Text classification for data loss prevention. In Privacy Enhancing Technologies, pages 18--37. Springer, 2011. Google ScholarDigital Library
J. R. Landis, G. G. Koch, et al. The measurement of observer agreement for categorical data. biometrics, 33(1):159--174, 1977.Google Scholar
J. H. Lau, N. Collier, and T. Baldwin. On-line trend analysis with topic models:#twitter trends detection topic model online. In COLING, pages 1519--1534, 2012.Google Scholar
K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(1):6, 2010. Google ScholarDigital Library
H. Mao, X. Shuai, and A. Kapadia. Loose tweets: an analysis of privacy leaks on twitter. In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society, pages 1--12. ACM, 2011. Google ScholarDigital Library
A. K. McCallum. Mallet: A machine learning for language toolkit. 2002.Google Scholar
O. Owoputi, B. O'Connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL-HLT, pages 380--390, 2013.Google Scholar
J. W. Pennebaker, M. E. Francis, and R. J. Booth. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 2001.Google Scholar
J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods Support Vector Learning, 208(MSR-TR-98--14):1--21, 1998.Google Scholar
A. Ritter, S. Clark, O. Etzioni, et al. Named entity recognition in tweets: an experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1524--1534. Association for Computational Linguistics, 2011. Google ScholarDigital Library
M. Sleeper, J. Cranshaw, P. G. Kelley, B. Ur, A. Acquisti, L. F. Cranor, and N. Sadeh. i read my twitter the next morning and was astonished: a conversational perspective on twitter regrets. In Proceedings of the 2013 ACM annual conference on Human factors in computing systems, pages 3277--3286. ACM, 2013. Google ScholarDigital Library
K. Thomas, C. Grier, and D. M. Nicol. unfriendly: Multi-party privacy risks in social networks. In M. J. Atallah and N. J. Hopper, editors, Privacy Enhancing Technologies, volume 6205 of Lecture Notes in Computer Science, pages 236--252. Springer, 2010. Google ScholarDigital Library
A. Vasalou, A. J. Gill, F. Mazanderani, C. Papoutsi, and A. Joinson. Privacy dictionary: A new resource for the automated content analysis of privacy. Journal of the American Society for Information Science and Technology, 62(11):2095--2105, 2011. Google ScholarDigital Library
Y. Wang, G. Norcie, S. Komanduri, A. Acquisti, P. G. Leon, and L. F. Cranor. "i regretted the minute i pressed share": A qualitative study of regrets on facebook. In Proceedings of the Seventh Symposium on Usable Privacy and Security, SOUPS '11, pages 10:1--10:16, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Z. Xue, D. Yin, B. D. Davison, and B. Davison. Normalizing microtext. In Analyzing Microtext, 2011.Google Scholar

Index Terms

Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

Recommendations

An analytical framework for online privacy research

An analytical framework is suggested for interdisciplinary online privacy research.Websites managers views and knowledge is a neglected topic in privacy research.Websites managers indicate that their own websites do not violate users privacy.The younger ...
Read More
Internet Privacy Concerns versus Behavior: A Protection Motivation Approach

This study examines the possible disconnect between student concerns about privacy when using the Internet and their behavior. The literature indicates that Internet users are concerned about privacy but their web-browsing habits consistently put their ...
Read More
Privacy Sensitivity: Application in Arabic
IALP '09: Proceedings of the 2009 International Conference on Asian Language Processing

Personal Identifiable Information (PII) describes a relationship between information and a uniquely identifiable person. Sensitive PII refers to a category of PII that contains significant information about individuals. In general, sources of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic Society
November 2014
218 pages
ISBN:9781450331487
DOI:10.1145/2665943
General Chair:
Gail-Joon Ahn
Arizona State University, USA
,
Program Chair:
Anupam Datta
Carnegie Mellon University, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
detecting private information
privacy
privacy behavior
sensitive information
social network
text classification
Qualifiers
- research-article
Conference

Acceptance Rates
WPES '14 Paper Acceptance Rate26of67submissions,39%Overall Acceptance Rate106of355submissions,30%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 793
  Total Downloads
- Downloads (Last 12 months)91
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic Society

ABSTRACT

References

Cited By

Index Terms

Recommendations

An analytical framework for online privacy research

Internet Privacy Concerns versus Behavior: A Protection Motivation Approach

Privacy Sensitivity: Application in Arabic

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network

WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic Society

ABSTRACT

References

Cited By

Index Terms

Recommendations

An analytical framework for online privacy research

Internet Privacy Concerns versus Behavior: A Protection Motivation Approach

Privacy Sensitivity: Application in Arabic

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media