skip to main content
10.1145/2660460.2660486acmconferencesArticle/Chapter ViewAbstractPublication PagescosnConference Proceedingsconference-collections
research-article

Fighting authorship linkability with crowdsourcing

Published: 01 October 2014 Publication History

Abstract

Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy.
In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on results.

Supplementary Material

ZIP File (cosn109f.zip)
This folder includes all the source files and pictures referenced in source file. Please run> pdflatex almishari-cosn109f.tex to compile the paper as a PDF file.

References

[1]
Amazon Mechanical Turk. https://www.mturk.com/mturk/.
[2]
Bing Translator. http://www.bing.com/translator.
[3]
Bing Translator Language Codes. http://msdn.microsoft.com/en-us/library/hh456380.aspx.
[4]
Google Translate. http://translate.google.com/.
[5]
Google Translator API. https://developers.google.com/translate/.
[6]
Original, Rewritten, Translated and Translated-Fixed Reviews. http://sprout.ics.uci.edu/projects/aaa/dataset_userhiding.tar.gz.
[7]
Reference book and online dictionaries. http://www.merriam-webster.com/.
[8]
TripAdvisor Review Moderation. http://www.tripadvisor.com/vpages/review_mod_fraud_detect.html.
[9]
Yelp By The Numbers. http://officialblog.yelp.com/2010/12/2010-yelp-by-the-numbers.html.
[10]
A. Abbasi and H. Chen. Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. In ACM Transactions on Information Systems, 2008.
[11]
S. Afroz, M. Brennan, and R. Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In IEEE Symposium on Security and Privacy, 2012.
[12]
D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: A platform for integrating human-based and digital computation. In OOPSLA, 2012.
[13]
M. Brennan, S. Afroz, and R. Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security (TISSEC), 2012.
[14]
M. R. Brennan and R. Greenstadt. Practical attacks against authorship recognition techniques. In IAAI, 2009.
[15]
A. Caliskan and R. Greenstadt. Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text. In ICSC, 2012.
[16]
E. Hayashi, J. Hong, and N. Christin. Security through a different kind of obscurity: evaluating distortion in graphical authentication schemes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI, 2011.
[17]
G. Kacmarcik and M. Gamon. Obfuscating document stylometry to preserve author anonymity. In ACL, 2006.
[18]
P. G. Kelley. Conducting usable privacy & security studies with amazon's mechanical turk. In Symposium on Usable Privacy and Security (SOUPS)(Redmond, WA, 2010.
[19]
A. W. E. McDonald, S. Afroz, A. Caliskan, A. Stolerman, and R. Greenstadt. Use fewer instances of the letter "i": Toward writing style anonymization. In Privacy Enhancing Technologies, 2012.
[20]
M. A. Mishari and G. Tsudik. Exploring linkability of user reviews. In ESORICS, 2012.
[21]
M. Nanavati, N. Taylor, W. Aiello, and A. Warfield. Herbert West -- Deanonymizer. In 6th USENIX Workshop on Hot Topics in Security, 2011.
[22]
A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, and D. Song. On the Feasibility of Internet-Scale Author Identification. In IEEE Symposium on Security and Privacy, 2012.
[23]
P. Pudil, J. Novovi\vcová, and J. Kittler. Floating search methods in feature selection. Pattern recognition letters, 15(11):1119--1125, 1994.
[24]
J. R. Rao and P. Rohatgi. Can pseudonymity really guarantee privacy. In Proceedings of the Ninth USENIX Security Symposium, 2000.
[25]
R. Schumacker and S. Tomek. Chi-square test. In Understanding Statistics Using R, pages 169--175. Springer, 2013.
[26]
E. Stamatatos. A Survey of Modern Authorship Attribution Methods. In Journal of the American Society for Information Science and Technology, 2009.
[27]
K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 173--180. Association for Computational Linguistics, 2003.
[28]
B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. L. Mazurek, T. Passaro, R. Shay, T. Vidas, L. Bauer, N. Christin, L. F. Cranor, S. Egelman, and J. López. Helping users create better passwords. USENIX, 2012.

Cited By

View all
  • (2020)Effective writing style transfer via combinatorial paraphrasingProceedings on Privacy Enhancing Technologies10.2478/popets-2020-00682020:4(175-195)Online publication date: 17-Aug-2020
  • (2019)A Girl Has No Name: Automated Authorship Obfuscation using Mutant-XProceedings on Privacy Enhancing Technologies10.2478/popets-2019-00582019:4(54-71)Online publication date: 30-Jul-2019
  • (2019)Text Analysis in Adversarial SettingsACM Computing Surveys10.1145/331033152:3(1-36)Online publication date: 18-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
COSN '14: Proceedings of the second ACM conference on Online social networks
October 2014
288 pages
ISBN:9781450331982
DOI:10.1145/2660460
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. author anonymization
  2. author identification
  3. author linkability
  4. authorship attribution
  5. crowdsourcing
  6. stylometry

Qualifiers

  • Research-article

Funding Sources

Conference

COSN'14
Sponsor:
COSN'14: Conference on Online Social Networks
October 1 - 2, 2014
Dublin, Ireland

Acceptance Rates

COSN '14 Paper Acceptance Rate 25 of 87 submissions, 29%;
Overall Acceptance Rate 69 of 307 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Effective writing style transfer via combinatorial paraphrasingProceedings on Privacy Enhancing Technologies10.2478/popets-2020-00682020:4(175-195)Online publication date: 17-Aug-2020
  • (2019)A Girl Has No Name: Automated Authorship Obfuscation using Mutant-XProceedings on Privacy Enhancing Technologies10.2478/popets-2019-00582019:4(54-71)Online publication date: 30-Jul-2019
  • (2019)Text Analysis in Adversarial SettingsACM Computing Surveys10.1145/331033152:3(1-36)Online publication date: 18-Jun-2019
  • (2019)Dynamic Ensemble Selection for Author VerificationAdvances in Information Retrieval10.1007/978-3-030-15712-8_7(102-115)Online publication date: 7-Apr-2019
  • (2017)Toward sensitive document release with privacy guaranteesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.12.01359:C(23-34)Online publication date: 1-Mar-2017
  • (2017)Writer Profiling Without the Writer’s TextSocial Informatics10.1007/978-3-319-67256-4_43(537-558)Online publication date: 2-Sep-2017
  • (2017)The Case for Being Average: A Mediocrity Approach to Style Masking and Author ObfuscationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-65813-1_18(173-185)Online publication date: 17-Aug-2017
  • (2017)Stylometric Authorship Attribution of Collaborative DocumentsCyber Security Cryptography and Machine Learning10.1007/978-3-319-60080-2_9(115-135)Online publication date: 2-Jun-2017
  • (2015)Timeprints for identifying social media users with multiple aliasesSecurity Informatics10.1186/s13388-015-0022-z4:1Online publication date: 24-Sep-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media