skip to main content
10.1145/3133956.3134086acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Generating Synthetic Decentralized Social Graphs with Local Differential Privacy

Authors Info & Claims
Published:30 October 2017Publication History

ABSTRACT

A large amount of valuable information resides in decentralized social graphs, where no entity has access to the complete graph structure. Instead, each user maintains locally a limited view of the graph. For example, in a phone network, each user keeps a contact list locally in her phone, and does not have access to other users' contacts. The contact lists of all users form an implicit social graph that could be very useful to study the interaction patterns among different populations. However, due to privacy concerns, one could not simply collect the unfettered local views from users and reconstruct a decentralized social network.

In this paper, we investigate techniques to ensure local differential privacy of individuals while collecting structural information and generating representative synthetic social graphs. We show that existing local differential privacy and synthetic graph generation techniques are insufficient for preserving important graph properties, due to excessive noise injection, inability to retain important graph structure, or both. Motivated by this, we propose LDPGen, a novel multi-phase technique that incrementally clusters users based on their connections to different partitions of the whole population. Every time a user reports information, LDPGen carefully injects noise to ensure local differential privacy.We derive optimal parameters in this process to cluster structurally-similar users together. Once a good clustering of users is obtained, LDPGen adapts existing social graph generation models to construct a synthetic social graph.

We conduct comprehensive experiments over four real datasets to evaluate the quality of the obtained synthetic graphs, using a variety of metrics, including (i) important graph structural measures; (ii) quality of community discovery; and (iii) applicability in social recommendation. Our experiments show that the proposed technique produces high-quality synthetic graphs that well represent the original decentralized social graphs, and significantly outperform those from baseline approaches.

Skip Supplemental Material Section

Supplemental Material

References

  1. William Aiello, Fan Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In Proceedings of the thirty-second annual ACM symposium on Theory of computing. Acm, 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Raef Bassily and Adam D. Smith. 2015. Local, Private, Efficient Protocols for Succinct Histograms. In STOC. 127--135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011).Google ScholarGoogle Scholar
  4. Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. 2012. The johnsonlindenstrauss transform itself preserves differential privacy. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE, 410--419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.Google ScholarGoogle ScholarCross RefCross Ref
  6. Christian Braune, Christian Borgelt, and Rudolf Kruse. 2013. Behavioral clustering for point processes. In International Symposium on Intelligent Data Analysis. Springer, 127--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L da F Costa, Francisco A Rodrigues, Gonzalo Travieso, and Paulino Ribeiro Villas Boas. 2007. Characterization of complex networks: A survey of measurements. Advances in physics 56, 1 (2007), 167--242. Google ScholarGoogle ScholarCross RefCross Ref
  8. Wei-Yen Day, Ninghui Li, and Min Lyu. 2016. Publishing graph degree distribution with node differential privacy. In Proceedings of the 2016 International Conference on Management of Data. ACM, 123--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John C Duchi, Michael I Jordan, and Martin J Wainwright. 2013. Local privacy and statistical minimax rates. In FOCS. 429--438.Google ScholarGoogle Scholar
  10. Cynthia Dwork. 2006. Differential privacy. In Automata, languages and programming. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N Rothblum, and Sergey Yekhanin. 2010. Pan-Private Streaming Algorithms.. In ICS. 66--80.Google ScholarGoogle Scholar
  13. Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.Google ScholarGoogle Scholar
  14. Paul Erdös and Alfréd Rényi. 1959. On random graphs, I. Publicationes Mathematicae (Debrecen) 6 (1959), 290--297.Google ScholarGoogle ScholarCross RefCross Ref
  15. Úlfar Erlingsson et al. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054--1067.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. ACM, 1054--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Giulia Fanti et al. 2016. Building a RAPPOR with the Unknown: PrivacyPreserving Learning of Associations and Data Dictionaries. PoPETS issue 3, 2016 (2016).Google ScholarGoogle Scholar
  18. Neil Zhenqiang Gong, Wenchang Xu, Ling Huang, Prateek Mittal, Emil Stefanov, Vyas Sekar, and Dawn Song. 2012. Evolution of social-attribute networks: measurements, modeling, and implications using google+. In Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, 131--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Roger Guimera, Marta Sales-Pardo, and Luís A Nunes Amaral. 2004. Modularity from fluctuations in random graphs and complex networks. Physical Review E 70, 2 (2004), 025101.Google ScholarGoogle ScholarCross RefCross Ref
  20. Justin Hsu, Sanjeev Khanna, and Aaron Roth. 2012. Distributed private heavy hitters. In Automata, Languages, and Programming. 461--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 135--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zach Jorgensen, Ting Yu, and Graham Cormode. 2016. Publishing Attributed Social Graphs with Formal Privacy Guarantees. In SIGMOD. 107--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vishesh Karwa and Aleksandra B Slavković. 2012. Differentially private graphical degree sequences and synthetic graphs. In Privacy in Statistical Databases. Springer, 273--285.Google ScholarGoogle Scholar
  25. Shiva Prasad Kasiviswanathan et al. 2011. What can we learn privately? SICOMP 40, 3 (2011), 793--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2013. Analyzing graphs with node differential privacy. In Theory of Cryptography. Springer, 457--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dor Konforty, Yuval Adam, Daniel Estrada, and Lucius Gregory Meredith. 2015. Synereo: The Decentralized and Distributed Social Network. (2015).Google ScholarGoogle Scholar
  28. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, Feb (2010), 985--1042.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123. Google ScholarGoogle ScholarCross RefCross Ref
  30. Jure Leskovec and Julian J Mcauley. 2012. Learning to discover social circles in ego networks. In Advances in neural information processing systems. 539--547.Google ScholarGoogle Scholar
  31. Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390, 6 (2011), 1150--1170. Google ScholarGoogle ScholarCross RefCross Ref
  32. Wentian Lu and Gerome Miklau. 2014. Exponential Random Graph Estimation under Differential Privacy. In KDD. 921--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R Duncan Luce and Albert D Perry. 1949. A method of matrix analysis of group structure. Psychometrika 14, 2 (1949), 95--116. Google ScholarGoogle ScholarCross RefCross Ref
  34. Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge.Google ScholarGoogle Scholar
  35. Marina Meilă. 2003. Comparing clusterings by the variation of information. In Learning theory and kernel machines. Springer, 173--187. Google ScholarGoogle ScholarCross RefCross Ref
  36. Darakhshan Mir and Rebecca N Wright. 2012. A differentially private estimator for the stochastic kronecker graph model. In Proceedings of the 2012 Joint EDBT/ICDT Workshops. ACM, 167--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Thông T. Nguyên, Xiaokui Xiao, Yin Yang, Siu Cheung Hui, Hyejin Shin, and Junbum Shin. 2016. Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy. CoRR abs/1606.05053 (2016). http://arxiv.org/abs/ 1606.05053Google ScholarGoogle Scholar
  38. Derek de Solla Price. 1976. A general theory of bibliometric and other cumulative advantage processes. Journal of the American society for Information science 27, 5 (1976), 292--306. Google ScholarGoogle ScholarCross RefCross Ref
  39. Zhan Qin, Yin Yang, Ting Yu, Issa Khalil, Xiaokui Xiao, and Kui Ren. 2016. Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). ACM, New York, NY, USA, 192--203. https://doi.org/10.1145/ 2976749.2978409Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. William M Rand. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66, 336 (1971), 846--850. Google ScholarGoogle ScholarCross RefCross Ref
  41. Garry Robins, Pip Pattison, Yuval Kalish, and Dean Lusher. 2007. An introduction to exponential random graph (p*) models for social networks. Social networks 29, 2 (2007), 173--191. Google ScholarGoogle ScholarCross RefCross Ref
  42. Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y Zhao. 2011. Sharing graphs using differentially private graph models. In IMC. ACM, 81--98.Google ScholarGoogle Scholar
  43. Comandur Seshadhri, Tamara G Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erd's-Rényi graphs. Physical Review E 85, 5 (2012), 056109.Google ScholarGoogle ScholarCross RefCross Ref
  44. Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, Oct (2010), 2837--2854.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yue Wang and Xintao Wu. 2013. Preserving Differential Privacy in DegreeCorrelation Based Graph Generation. Trans. Data Privacy 6, 2 (Aug. 2013).Google ScholarGoogle Scholar
  46. Yue Wang, Xintao Wu, and Leting Wu. 2013. Differential Privacy Preserving Spectral Graph Analysis. In PAKDD. 329--340. Google ScholarGoogle ScholarCross RefCross Ref
  47. Stanley L Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the ASA 60, 309 (1965), 63--69.Google ScholarGoogle Scholar
  48. Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of "smallworld" networks. nature 393, 6684 (1998), 440--442.Google ScholarGoogle Scholar
  49. Sijie Xiong, Anand D. Sarwate, and Narayan B. Mandayam. 2016. Randomized requantization with local differential privacy. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20--25, 2016. 2189--2193. https://doi.org/10.1109/ICASSP.2016.7472065Google ScholarGoogle Scholar
  50. Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2015. Private Release of Graph Statistics using Ladder Functions. In SIGMOD. 731--745. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generating Synthetic Decentralized Social Graphs with Local Differential Privacy

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
      October 2017
      2682 pages
      ISBN:9781450349468
      DOI:10.1145/3133956

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%

      Upcoming Conference

      CCS '24
      ACM SIGSAC Conference on Computer and Communications Security
      October 14 - 18, 2024
      Salt Lake City , UT , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader