ABSTRACT
A large amount of valuable information resides in decentralized social graphs, where no entity has access to the complete graph structure. Instead, each user maintains locally a limited view of the graph. For example, in a phone network, each user keeps a contact list locally in her phone, and does not have access to other users' contacts. The contact lists of all users form an implicit social graph that could be very useful to study the interaction patterns among different populations. However, due to privacy concerns, one could not simply collect the unfettered local views from users and reconstruct a decentralized social network.
In this paper, we investigate techniques to ensure local differential privacy of individuals while collecting structural information and generating representative synthetic social graphs. We show that existing local differential privacy and synthetic graph generation techniques are insufficient for preserving important graph properties, due to excessive noise injection, inability to retain important graph structure, or both. Motivated by this, we propose LDPGen, a novel multi-phase technique that incrementally clusters users based on their connections to different partitions of the whole population. Every time a user reports information, LDPGen carefully injects noise to ensure local differential privacy.We derive optimal parameters in this process to cluster structurally-similar users together. Once a good clustering of users is obtained, LDPGen adapts existing social graph generation models to construct a synthetic social graph.
We conduct comprehensive experiments over four real datasets to evaluate the quality of the obtained synthetic graphs, using a variety of metrics, including (i) important graph structural measures; (ii) quality of community discovery; and (iii) applicability in social recommendation. Our experiments show that the proposed technique produces high-quality synthetic graphs that well represent the original decentralized social graphs, and significantly outperform those from baseline approaches.
Supplemental Material
- William Aiello, Fan Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In Proceedings of the thirty-second annual ACM symposium on Theory of computing. Acm, 171--180. Google ScholarDigital Library
- Raef Bassily and Adam D. Smith. 2015. Local, Private, Efficient Protocols for Succinct Histograms. In STOC. 127--135.Google ScholarDigital Library
- Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011).Google Scholar
- Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. 2012. The johnsonlindenstrauss transform itself preserves differential privacy. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE, 410--419.Google ScholarDigital Library
- Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.Google ScholarCross Ref
- Christian Braune, Christian Borgelt, and Rudolf Kruse. 2013. Behavioral clustering for point processes. In International Symposium on Intelligent Data Analysis. Springer, 127--137. Google ScholarDigital Library
- L da F Costa, Francisco A Rodrigues, Gonzalo Travieso, and Paulino Ribeiro Villas Boas. 2007. Characterization of complex networks: A survey of measurements. Advances in physics 56, 1 (2007), 167--242. Google ScholarCross Ref
- Wei-Yen Day, Ninghui Li, and Min Lyu. 2016. Publishing graph degree distribution with node differential privacy. In Proceedings of the 2016 International Conference on Management of Data. ACM, 123--138. Google ScholarDigital Library
- John C Duchi, Michael I Jordan, and Martin J Wainwright. 2013. Local privacy and statistical minimax rates. In FOCS. 429--438.Google Scholar
- Cynthia Dwork. 2006. Differential privacy. In Automata, languages and programming. 1--12. Google ScholarDigital Library
- Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 371--380. Google ScholarDigital Library
- Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N Rothblum, and Sergey Yekhanin. 2010. Pan-Private Streaming Algorithms.. In ICS. 66--80.Google Scholar
- Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.Google Scholar
- Paul Erdös and Alfréd Rényi. 1959. On random graphs, I. Publicationes Mathematicae (Debrecen) 6 (1959), 290--297.Google ScholarCross Ref
- Úlfar Erlingsson et al. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054--1067.Google ScholarDigital Library
- Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. ACM, 1054--1067. Google ScholarDigital Library
- Giulia Fanti et al. 2016. Building a RAPPOR with the Unknown: PrivacyPreserving Learning of Associations and Data Dictionaries. PoPETS issue 3, 2016 (2016).Google Scholar
- Neil Zhenqiang Gong, Wenchang Xu, Ling Huang, Prateek Mittal, Emil Stefanov, Vyas Sekar, and Dawn Song. 2012. Evolution of social-attribute networks: measurements, modeling, and implications using google+. In Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, 131--144. Google ScholarDigital Library
- Roger Guimera, Marta Sales-Pardo, and Luís A Nunes Amaral. 2004. Modularity from fluctuations in random graphs and complex networks. Physical Review E 70, 2 (2004), 025101.Google ScholarCross Ref
- Justin Hsu, Sanjeev Khanna, and Aaron Roth. 2012. Distributed private heavy hitters. In Automata, Languages, and Programming. 461--472. Google ScholarDigital Library
- Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 135--142. Google ScholarDigital Library
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446. Google ScholarDigital Library
- Zach Jorgensen, Ting Yu, and Graham Cormode. 2016. Publishing Attributed Social Graphs with Formal Privacy Guarantees. In SIGMOD. 107--122. Google ScholarDigital Library
- Vishesh Karwa and Aleksandra B Slavković. 2012. Differentially private graphical degree sequences and synthetic graphs. In Privacy in Statistical Databases. Springer, 273--285.Google Scholar
- Shiva Prasad Kasiviswanathan et al. 2011. What can we learn privately? SICOMP 40, 3 (2011), 793--826. Google ScholarDigital Library
- Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2013. Analyzing graphs with node differential privacy. In Theory of Cryptography. Springer, 457--476. Google ScholarDigital Library
- Dor Konforty, Yuval Adam, Daniel Estrada, and Lucius Gregory Meredith. 2015. Synereo: The Decentralized and Distributed Social Network. (2015).Google Scholar
- Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, Feb (2010), 985--1042.Google ScholarDigital Library
- Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123. Google ScholarCross Ref
- Jure Leskovec and Julian J Mcauley. 2012. Learning to discover social circles in ego networks. In Advances in neural information processing systems. 539--547.Google Scholar
- Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390, 6 (2011), 1150--1170. Google ScholarCross Ref
- Wentian Lu and Gerome Miklau. 2014. Exponential Random Graph Estimation under Differential Privacy. In KDD. 921--930. Google ScholarDigital Library
- R Duncan Luce and Albert D Perry. 1949. A method of matrix analysis of group structure. Psychometrika 14, 2 (1949), 95--116. Google ScholarCross Ref
- Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge.Google Scholar
- Marina Meilă. 2003. Comparing clusterings by the variation of information. In Learning theory and kernel machines. Springer, 173--187. Google ScholarCross Ref
- Darakhshan Mir and Rebecca N Wright. 2012. A differentially private estimator for the stochastic kronecker graph model. In Proceedings of the 2012 Joint EDBT/ICDT Workshops. ACM, 167--176. Google ScholarDigital Library
- Thông T. Nguyên, Xiaokui Xiao, Yin Yang, Siu Cheung Hui, Hyejin Shin, and Junbum Shin. 2016. Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy. CoRR abs/1606.05053 (2016). http://arxiv.org/abs/ 1606.05053Google Scholar
- Derek de Solla Price. 1976. A general theory of bibliometric and other cumulative advantage processes. Journal of the American society for Information science 27, 5 (1976), 292--306. Google ScholarCross Ref
- Zhan Qin, Yin Yang, Ting Yu, Issa Khalil, Xiaokui Xiao, and Kui Ren. 2016. Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). ACM, New York, NY, USA, 192--203. https://doi.org/10.1145/ 2976749.2978409Google ScholarDigital Library
- William M Rand. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66, 336 (1971), 846--850. Google ScholarCross Ref
- Garry Robins, Pip Pattison, Yuval Kalish, and Dean Lusher. 2007. An introduction to exponential random graph (p*) models for social networks. Social networks 29, 2 (2007), 173--191. Google ScholarCross Ref
- Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y Zhao. 2011. Sharing graphs using differentially private graph models. In IMC. ACM, 81--98.Google Scholar
- Comandur Seshadhri, Tamara G Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erd's-Rényi graphs. Physical Review E 85, 5 (2012), 056109.Google ScholarCross Ref
- Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, Oct (2010), 2837--2854.Google ScholarDigital Library
- Yue Wang and Xintao Wu. 2013. Preserving Differential Privacy in DegreeCorrelation Based Graph Generation. Trans. Data Privacy 6, 2 (Aug. 2013).Google Scholar
- Yue Wang, Xintao Wu, and Leting Wu. 2013. Differential Privacy Preserving Spectral Graph Analysis. In PAKDD. 329--340. Google ScholarCross Ref
- Stanley L Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the ASA 60, 309 (1965), 63--69.Google Scholar
- Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of "smallworld" networks. nature 393, 6684 (1998), 440--442.Google Scholar
- Sijie Xiong, Anand D. Sarwate, and Narayan B. Mandayam. 2016. Randomized requantization with local differential privacy. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20--25, 2016. 2189--2193. https://doi.org/10.1109/ICASSP.2016.7472065Google Scholar
- Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2015. Private Release of Graph Statistics using Ladder Functions. In SIGMOD. 731--745. Google ScholarDigital Library
Index Terms
- Generating Synthetic Decentralized Social Graphs with Local Differential Privacy
Recommendations
A Two-Phase Algorithm for Generating Synthetic Graph Under Local Differential Privacy
ICCNS '18: Proceedings of the 8th International Conference on Communication and Network SecurityWith the rapid development of big data technology, the issue of preserving personal privacy has attracted more and more attention. It has been shown that protecting the published graph with the differential privacy can ensure not only the quantitative ...
PPDU: dynamic graph publication with local differential privacy
AbstractLocal differential privacy (LDP) is an emerging privacy-preserving data collection model that requires no trusted third party. Most privacy-preserving decentralized graph publishing studies adopt LDP technique to ensure individual privacy. However,...
Community-based social recommendation under local differential privacy protection
AbstractSocial recommendation refers to recommendation technology taking social relations as additional input to improve merchandise sales and user satisfaction. It has been widely used in various social networking services. Social recommendation ...
Comments