research-article

Fighting authorship linkability with crowdsourcing

Authors:

Mishari Almishari,

Gene TsudikAuthors Info & Claims

COSN '14: Proceedings of the second ACM conference on Online social networks

Pages 69 - 82

https://doi.org/10.1145/2660460.2660486

Published: 01 October 2014 Publication History

Abstract

Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy.

In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on results.

Supplementary Material

ZIP File (cosn109f.zip)

This folder includes all the source files and pictures referenced in source file. Please run> pdflatex almishari-cosn109f.tex to compile the paper as a PDF file.

Download
17.77 MB

References

[1]

Amazon Mechanical Turk. https://www.mturk.com/mturk/.

[2]

Bing Translator. http://www.bing.com/translator.

[3]

Bing Translator Language Codes. http://msdn.microsoft.com/en-us/library/hh456380.aspx.

[4]

Google Translate. http://translate.google.com/.

[5]

Google Translator API. https://developers.google.com/translate/.

[6]

Original, Rewritten, Translated and Translated-Fixed Reviews. http://sprout.ics.uci.edu/projects/aaa/dataset_userhiding.tar.gz.

[7]

Reference book and online dictionaries. http://www.merriam-webster.com/.

[8]

TripAdvisor Review Moderation. http://www.tripadvisor.com/vpages/review_mod_fraud_detect.html.

[9]

Yelp By The Numbers. http://officialblog.yelp.com/2010/12/2010-yelp-by-the-numbers.html.

[10]

A. Abbasi and H. Chen. Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. In ACM Transactions on Information Systems, 2008.

Digital Library

[11]

S. Afroz, M. Brennan, and R. Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In IEEE Symposium on Security and Privacy, 2012.

Digital Library

[12]

D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: A platform for integrating human-based and digital computation. In OOPSLA, 2012.

Digital Library

[13]

M. Brennan, S. Afroz, and R. Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security (TISSEC), 2012.

Digital Library

[14]

M. R. Brennan and R. Greenstadt. Practical attacks against authorship recognition techniques. In IAAI, 2009.

[15]

A. Caliskan and R. Greenstadt. Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text. In ICSC, 2012.

Digital Library

[16]

E. Hayashi, J. Hong, and N. Christin. Security through a different kind of obscurity: evaluating distortion in graphical authentication schemes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI, 2011.

Digital Library

[17]

G. Kacmarcik and M. Gamon. Obfuscating document stylometry to preserve author anonymity. In ACL, 2006.

Digital Library

[18]

P. G. Kelley. Conducting usable privacy & security studies with amazon's mechanical turk. In Symposium on Usable Privacy and Security (SOUPS)(Redmond, WA, 2010.

[19]

A. W. E. McDonald, S. Afroz, A. Caliskan, A. Stolerman, and R. Greenstadt. Use fewer instances of the letter "i": Toward writing style anonymization. In Privacy Enhancing Technologies, 2012.

Digital Library

[20]

M. A. Mishari and G. Tsudik. Exploring linkability of user reviews. In ESORICS, 2012.

[21]

M. Nanavati, N. Taylor, W. Aiello, and A. Warfield. Herbert West -- Deanonymizer. In 6th USENIX Workshop on Hot Topics in Security, 2011.

Digital Library

[22]

A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, and D. Song. On the Feasibility of Internet-Scale Author Identification. In IEEE Symposium on Security and Privacy, 2012.

Digital Library

[23]

P. Pudil, J. Novovi\vcová, and J. Kittler. Floating search methods in feature selection. Pattern recognition letters, 15(11):1119--1125, 1994.

Digital Library

[24]

J. R. Rao and P. Rohatgi. Can pseudonymity really guarantee privacy. In Proceedings of the Ninth USENIX Security Symposium, 2000.

Digital Library

[25]

R. Schumacker and S. Tomek. Chi-square test. In Understanding Statistics Using R, pages 169--175. Springer, 2013.

[26]

E. Stamatatos. A Survey of Modern Authorship Attribution Methods. In Journal of the American Society for Information Science and Technology, 2009.

Digital Library

[27]

K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 173--180. Association for Computational Linguistics, 2003.

Digital Library

[28]

B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. L. Mazurek, T. Passaro, R. Shay, T. Vidas, L. Bauer, N. Christin, L. F. Cranor, S. Egelman, and J. López. Helping users create better passwords. USENIX, 2012.

Cited By

Gröndahl TAsokan N(2020)Effective writing style transfer via combinatorial paraphrasingProceedings on Privacy Enhancing Technologies10.2478/popets-2020-00682020:4(175-195)Online publication date: 17-Aug-2020
https://doi.org/10.2478/popets-2020-0068
Mahmood AAhmad FShafiq ZSrinivasan PZaffar F(2019)A Girl Has No Name: Automated Authorship Obfuscation using Mutant-XProceedings on Privacy Enhancing Technologies10.2478/popets-2019-00582019:4(54-71)Online publication date: 30-Jul-2019
https://doi.org/10.2478/popets-2019-0058
Gröndahl TAsokan N(2019)Text Analysis in Adversarial SettingsACM Computing Surveys10.1145/331033152:3(1-36)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3310331
Show More Cited By

Index Terms

Fighting authorship linkability with crowdsourcing
1. Security and privacy
  1. Human and societal aspects of security and privacy
2. Social and professional topics
  1. Computing / technology policy
    1. Privacy policies

Recommendations

Empirical evaluation of authorship obfuscation using JGAAP
AISec '10: Proceedings of the 3rd ACM workshop on Artificial intelligence and security

Authorship attribution is an important emerging security tool. However, just as criminals may wear gloves to hide their fingerprints, so authors may choose to mask their style to escape detection. Most authorship studies have focused on cooperative and/...
Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

Authorship attribution is an important field in online security. Recently there have been numerous successful works in authorship attribution in various European languages. Character n-grams are reported to be the best choice in authorship attribution, ...
Stylometric Analysis for Authorship Attribution on Twitter
BDA 2013: Proceedings of the Second International Conference on Big Data Analytics - Volume 8302

Authorship Attribution (AA), the science of inferring an author for a given piece of text based on its characteristics is a problem with a long history. In this paper, we study the problem of authorship attribution for forensic purposes and present ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

COSN '14: Proceedings of the second ACM conference on Online social networks

October 2014

288 pages

ISBN:9781450331982

DOI:10.1145/2660460

General Chair:
Alessandra Sala
Bell Laboratories, Ireland
,
Program Chairs:
Ashish Goel
Stanford / Twitter, USA
,
Krishna Gummadi
MPI-SWS, Germany

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

COSN'14

Sponsor:

ACM

COSN'14: Conference on Online Social Networks

October 1 - 2, 2014

Dublin, Ireland

Acceptance Rates

COSN '14 Paper Acceptance Rate 25 of 87 submissions, 29%;

Overall Acceptance Rate 69 of 307 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
179
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gröndahl TAsokan N(2020)Effective writing style transfer via combinatorial paraphrasingProceedings on Privacy Enhancing Technologies10.2478/popets-2020-00682020:4(175-195)Online publication date: 17-Aug-2020
https://doi.org/10.2478/popets-2020-0068
Mahmood AAhmad FShafiq ZSrinivasan PZaffar F(2019)A Girl Has No Name: Automated Authorship Obfuscation using Mutant-XProceedings on Privacy Enhancing Technologies10.2478/popets-2019-00582019:4(54-71)Online publication date: 30-Jul-2019
https://doi.org/10.2478/popets-2019-0058
Gröndahl TAsokan N(2019)Text Analysis in Adversarial SettingsACM Computing Surveys10.1145/331033152:3(1-36)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3310331
Potha NStamatatos E(2019)Dynamic Ensemble Selection for Author VerificationAdvances in Information Retrieval10.1007/978-3-030-15712-8_7(102-115)Online publication date: 7-Apr-2019
https://doi.org/10.1007/978-3-030-15712-8_7
Snchez DBatet M(2017)Toward sensitive document release with privacy guaranteesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.12.01359:C(23-34)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.engappai.2016.12.013
Jurgens DTsvetkov YJurafsky D(2017)Writer Profiling Without the Writer’s TextSocial Informatics10.1007/978-3-319-67256-4_43(537-558)Online publication date: 2-Sep-2017
https://doi.org/10.1007/978-3-319-67256-4_43
Karadzhov GMihaylova TKiprov YGeorgiev GKoychev INakov P(2017)The Case for Being Average: A Mediocrity Approach to Style Masking and Author ObfuscationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-65813-1_18(173-185)Online publication date: 17-Aug-2017
https://doi.org/10.1007/978-3-319-65813-1_18
Dauber EOverdorf RGreenstadt R(2017)Stylometric Authorship Attribution of Collaborative DocumentsCyber Security Cryptography and Machine Learning10.1007/978-3-319-60080-2_9(115-135)Online publication date: 2-Jun-2017
https://doi.org/10.1007/978-3-319-60080-2_9
Johansson FKaati LShrestha A(2015)Timeprints for identifying social media users with multiple aliasesSecurity Informatics10.1186/s13388-015-0022-z4:1Online publication date: 24-Sep-2015
https://doi.org/10.1186/s13388-015-0022-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten