ABSTRACT
Matching the profiles of a user across multiple online social networks brings opportunities for new services and applications as well as new insights on user online behavior, yet it raises serious privacy concerns. Prior literature has showed that it is possible to accurately match profiles, but their evaluation focused only on sampled datasets. In this paper, we study the extent to which we can reliably match profiles in practice, across real-world social networks, by exploiting public attributes, i.e., information users publicly provide about themselves. Today's social networks have hundreds of millions of users, which brings completely new challenges as a reliable matching scheme must identify the correct matching profile out of the millions of possible profiles. We first define a set of properties for profile attributes--Availability, Consistency, non-Impersonability, and Discriminability (ACID)--that are both necessary and sufficient to determine the reliability of a matching scheme. Using these properties, we propose a method to evaluate the accuracy of matching schemes in real practical cases. Our results show that the accuracy in practice is significantly lower than the one reported in prior literature. When considering entire social networks, there is a non-negligible number of profiles that belong to different users but have similar attributes, which leads to many false matches. Our paper sheds light on the limits of matching profiles in the real world and illustrates the correct methodology to evaluate matching schemes in realistic scenarios.
Supplemental Material
- Phash. http://www.phash.org.Google Scholar
- Spokeo. http://www.spokeo.com/.Google Scholar
- Spokeo lawsuit. http://www.ftc.gov/sites/default/files/documents/ cases/2012/06/120612spokeocmpt.pdf.Google Scholar
- A. Acquisti, R. Gross, and F. Stutzman. Faces of facebook: Privacy in the age of augmented reality. In BlackHat, 2011.Google Scholar
- R. Akbani, S. Kwek, and N. Japkowicz. Applying support vector machines to imbalanced datasets. In ECML, 2004.Google ScholarDigital Library
- Get better results with less effort with Mechanical Turk Masters -- The Mechanical Turk blog. http://bit.ly/112GmQI.Google Scholar
- M. Balduzzi, C. Platzer, T. Holz, E. Kirda, D. Balzarotti, and C. Kruegel. Abusing social networks for automated user profiling. In RAID, 2010. Google ScholarDigital Library
- P. Christen. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-centric systems and applications. Springer, 2012. Google ScholarDigital Library
- W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWeb, 2003.Google ScholarDigital Library
- O. Goga. Matching User Accounts Across Online Social Networks: Methods and Applications. PhD thesis, UPMC, 2014.Google Scholar
- O. Goga, H. Lei, S. Parthasarathi, G. Friedland, R. Sommer, and R. Teixeira. Exploiting innocuous activity for correlating users across sites. In WWW, 2013. Google ScholarDigital Library
- O. Goga, P. Loiseau, R. Sommer, R. Teixeira, and K. P. Gummadi. On the reliability of profile matching across large online social networks, 2015. Technical report, available as arXiv:1506.02289. Google ScholarDigital Library
- H. He and E. A. Garcia. Learning from imbalanced data. IEEE TKDE, 2009. Google ScholarDigital Library
- T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff. Identifying users across social tagging systems. In ICWSM, 2011.Google Scholar
- N. Korula and S. Lattanzi. An efficient reconciliation algorithm for social networks. PVLDB, 2014. Google ScholarDigital Library
- C. Kreibich, C. Kanich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage. Spamcraft: An inside look at spam campaign orchestration. In LEET, 2009. Google ScholarDigital Library
- S. Labitzke, I. Taranu, and H. Hartenstein. What your friends tell others about you: Low cost linkability of social network profiles. In SNA-KDD, 2011.Google Scholar
- J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: An unsupervised approach to link users across communities. In WSDM, 2013. Google ScholarDigital Library
- S. Liu, S. Wang, F. Zhu, J. Zhang, and R. Krishnan. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In SIGMOD, 2014. Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004. Google ScholarDigital Library
- C.-T. Lu, H.-H. Shuai, and P. S. Yu. Identifying your customers in social networks. In CIKM, 2014. Google ScholarDigital Library
- A. Malhotra, L. Totti, W. Meira, P. Kumaraguru, and V. Almeida. Studying user footprints in different online social networks. In CSOSN, 2012. Google ScholarDigital Library
- M. A. Mishari and G. Tsudik. Exploring linkability of user reviews. In ESORICS, 2012.Google Scholar
- M. Motoyama and G. Varghese. I seek you: searching and matching individuals in social networks. In WIDM, 2009. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. De-anonymizing social networks. In IEEE S&P, 2009. Google ScholarDigital Library
- C. T. Northern and M. L. Nelson. An unsupervised approach to discovering and disambiguating social media profiles. In MDSW, 2011.Google Scholar
- P. K. Paridhi Jain and A. Joshi. @i seek 'fb.me': Identifying users across multiple online social networks. In WoLE, 2013. Google ScholarDigital Library
- Peekyou. http://www.peekyou.com/.Google Scholar
- O. Peled, M. Fire, L. Rokach, and Y. Elovici. Entity matching in online social networks. In SocialCom, 2013. Google ScholarDigital Library
- D. Perito, C. Castelluccia, M. Ali Kâafar, and P. Manils. How unique and traceable are usernames? In PETS, 2011. Google ScholarDigital Library
- E. Raad, R. Chbeir, and A. Dipanda. User profile matching in social networks. In NBiS, 2010. Google ScholarDigital Library
- R. Schmid. Salesforce service cloud -- featuring activision, 2012. http://www.youtube.com/watch?v=eT6iHEdnKQ4&feature=relmfu.Google Scholar
- Y. Shen and H. Jin. Controllable information sharing for user accounts linkage across multiple online social networks. In CIKM, 2014. Google ScholarDigital Library
- Social Intelligence Corp. http://www.socialintel.com/.Google Scholar
- L. Sweeney. Weaving technology and policy together to maintain confidentiality. Journal of Law, Medicine, and Ethics, 1997.Google Scholar
- J. Vosecky, D. Hong, and V. Shen. User identification across multiple social networks. In NDT, 2009.Google ScholarCross Ref
- G.-w. You, S.-w. Hwang, Z. Nie, and J.-R. Wen. Socialsearch: enhancing entity search with social network matching. In EDBT/ICDT, 2011. Google ScholarDigital Library
- R. Zafarani and H. Liu. Connecting corresponding identities across communities. In ICWSM, 2009.Google ScholarCross Ref
- R. Zafarani and H. Liu. Connecting users across social media sites: A behavioral-modeling approach. In KDD, 2013. Google ScholarDigital Library
Index Terms
- On the Reliability of Profile Matching Across Large Online Social Networks
Recommendations
User Identity Linkage across Online Social Networks: A Review
The increasing popularity and diversity of social media sites has encouraged more and more people to participate on multiple online social networks to enjoy their services. Each user may create a user identity, which can includes profile, content, or ...
HYDRA: large-scale social identity linkage via heterogeneous behavior modeling
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataWe study the problem of large-scale social identity linkage across different social media platforms, which is of critical importance to business intelligence by gaining from social data a deeper understanding and more accurate profiling of users. This ...
Exploiting innocuous activity for correlating users across sites
WWW '13: Proceedings of the 22nd international conference on World Wide WebWe study how potential attackers can identify accounts on different social network sites that all belong to the same user, exploiting only innocuous activity that inherently comes with posted content. We examine three specific features on Yelp, Flickr, ...
Comments