skip to main content
research-article
Public Access

A Framework for Privacy-Preserving Data Publishing with Enhanced Utility for Cyber-Physical Systems

Published:27 November 2018Publication History
Skip Abstract Section

Abstract

Cyber-physical systems have enabled the collection of massive amounts of data in an unprecedented level of spatial and temporal granularity. Publishing these data can prosper big data research, which, in turn, helps improve overall system efficiency and resiliency. The main challenge in data publishing is to ensure the usefulness of published data while providing necessary privacy protection. In our previous work (Jia et al. 2017a), we presented a privacy-preserving data publishing framework (referred to as PAD hereinafter), which can guarantee k-anonymity while achieving better data utility than traditional anonymization techniques. PAD learns the information of interest to data users or features from their interactions with the data publishing system and then customizes data publishing processes to the intended use of data. However, our previous work is only applicable to the case where the desired features are linear in the original data record. In this article, we extend PAD to nonlinear features. Our experiments demonstrate that for various data-driven applications, PAD can achieve enhanced utility while remaining highly resilient to privacy threats.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Bharathan Balaji. 2015. Zodiac dataset publication agreement. Retrieved June 15, 2017 from http://www.synergylabs.org/bharath/datasets.html.Google ScholarGoogle Scholar
  3. Vincent Bindschaedler, Reza Shokri, and Carl A. Gunter. 2017. Plausible deniability for privacy-preserving data synthesis. Proceedings of the VLDB Endowment 10, 5 (2017), 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andy Bloxham. 2011. Most burglars using Facebook and Twitter to target victims, survey suggests. Retrieved September 26, 2011 from http://www.telegraph.co.uk/technology/news/8789538/Most-burglars-using-Facebook-and-Twitter-to-target-victims-survey-suggests.html.Google ScholarGoogle Scholar
  5. François Chollet and others, 2015. Keras. Retrieved from https://github.com/keras-team/keras.Google ScholarGoogle Scholar
  6. Josep Domingo-Ferrer. 2006. Microaggregation for database and location privacy. In International Workshop on Next Generation Information Technologies and Systems. Springer, 106--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Josep Domingo-Ferrer and Josep Maria Mateo-Sanz. 2002a. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 1 (2002), 189--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Domingo-Ferrer and J. M. Mateo-Sanz. 2002b. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 1 (Jan. 2002), 189--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Josep Domingo-Ferrer and Jordi Soria-Comas. 2016. Anonymization in the time of big data. In Privacy in Statistical Databases, Josep Domingo-Ferrer and Mirjana Pejić-Bach (Eds.). Springer International Publishing, Cham, 57--68.Google ScholarGoogle Scholar
  10. Simona D'Oca and Tianzhen Hong. 2015. Occupancy schedules learning process through a data mining framework. Energy and Buildings 88 (2015), 395--408.Google ScholarGoogle ScholarCross RefCross Ref
  11. Flávio du Pin Calmon and Nadia Fawaz. 2012. Privacy against statistical inference. In Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 1401--1408.Google ScholarGoogle ScholarCross RefCross Ref
  12. Cynthia Dwork. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation. Springer, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Elevate Energy. 2013. Aggregated Data Access: The 15/15 Rule in Illinois and Beyond. Retrieved June 15, 2017 from http://www.elevateenergy.org/wp/wp-content/uploads/1515-Rule-Factsheet-FINAL.pdf.Google ScholarGoogle Scholar
  14. Khaled El Emam and Cecilia Álvarez. 2014. A critical appraisal of the Article 29 Working Party Opinion 05/2014 on data anonymization techniques. International Data Privacy Law 5, 1 (2014), 73--87.Google ScholarGoogle ScholarCross RefCross Ref
  15. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1054--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. European Commission. 2012. Protection of personal data. Retrieved January 13, 2017 from http://ec.europa.eu/justice/data-protection/.Google ScholarGoogle Scholar
  17. Benjamin Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR) 42, 4 (2010), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aris Gkoulalas-Divanis, Panos Kalnis, and Vassilios S. Verykios. 2010. Providing k-anonymity in location based services. ACM SIGKDD Explorations Newsletter 12, 1 (2010), 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marco Gruteser and Dirk Grunwald. 2003. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services. ACM, 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mehreen S. Gul and Sandhya Patidar. 2015. Understanding the energy consumption and occupancy of a multi-purpose academic building. Energy and Buildings 87 (2015), 155--165.Google ScholarGoogle ScholarCross RefCross Ref
  21. Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, 1735--1742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Naoise Holohan, Spiros Antonatos, Stefano Braghin, and Pól Mac Aonghusa. 2017. (k, ε)-Anonymity: k-Anonymity with ε-differential privacy. arXiv Preprint arXiv:1710.01615 (2017).Google ScholarGoogle Scholar
  23. Tsan-sheng Hsu, Churn-Jung Liau, and Da-Wei Wang. 2014. A logical framework for privacy-preserving social network publication. Journal of Applied Logic 12, 2 (2014), 151--174.Google ScholarGoogle ScholarCross RefCross Ref
  24. Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2014. Discriminative deep metric learning for face verification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875--1882. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ruoxi Jia, Fisayo Caleb Sangogboye, Tianzhen Hong, Costas Spanos, and Mikkel Baun Kjærgaard. 2017a. PAD: Protecting anonymity in publishing building related datasets. In Proceedings of the 4th ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ruoxi Jia, Roy Dong, S. Shankar Sastry, and Costas J. Spanos. 2017b. Privacy-enhanced architecture for occupancy-based HVAC control. In Proceedings of the 8th International Conference on Cyber-Physical Systems. ACM, 177--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ruoxi Jia and Costas Spanos. 2017. Occupancy modelling in shared spaces of buildings: A queueing approach. Journal of Building Performance Simulation 10, 4 (2017), 406--421.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ming Jin, Ruoxi Jia, Zhaoyi Kang, Ioannis C. Konstantakopoulos, and Costas J. Spanos. 2014. Presencesense: Zero-training algorithm for individual presence detection based on power monitoring. In Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ming Jin, Ruoxi Jia, and Costas Spanos. 2017. Virtual occupancy sensing: Using smart meters to indicate your presence. IEEE Transactions on Mobile Computing 16, 11 (2017), 3264--3277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Eoghan McKenna, Ian Richardson, and Murray Thomson. 2012. Smart meter data: Balancing consumer privacy concerns with legitimate applications. Energy Policy 41 (2012), 807--814.Google ScholarGoogle ScholarCross RefCross Ref
  31. Andrés Molina-Markham, Prashant Shenoy, Kevin Fu, Emmanuel Cecchet, and David Irwin. 2010. Private memoirs of a smart meter. In Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building. ACM, 61--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Raj Rajagopalan, Lalitha Sankar, Soheil Mohajer, and H. Vincent Poor. 2011. Smart meter privacy: A utility-privacy framework. In Proceedings of the 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm). IEEE, 190--195.Google ScholarGoogle Scholar
  33. Fisayo Caleb Sangogboye, Krzysztof Arendt, Ashok Singh, Christian T. Veje, Mikkel Baun Kjærgaard, and Bo Nørregaard Jørgensen. 2017. Performance comparison of occupancy count estimation and prediction with common versus dedicated sensors for building model predictive control. Building Simulation 10, 6 (Dec. 2017), 829--843.Google ScholarGoogle ScholarCross RefCross Ref
  34. Lalitha Sankar, S. Raj Rajagopalan, and H. Vincent Poor. 2013. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Transactions on Information Forensics and Security 8, 6 (2013), 838--852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ivor W. Tsang, James T. Kwok, C. Bay, and H. Kong. 2003. Distance metric learning with kernels. In Proceedings of the International Conference on Artificial Neural Networks. 126--129.Google ScholarGoogle Scholar
  37. Giridhari Venkatadri, Athanasios Andreou, Yabing Liu, Alan Mislove, Krishna P. Gummadi, Patrick Loiseau, and Oana Goga. 2018. Privacy Risks with Facebooks PII-based Targeting: Auditing a Data Brokers Advertising Interface.Google ScholarGoogle Scholar
  38. Kilian Q. Weinberger, John Blitzer, and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems. 1473--1480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Eric P. Xing, Michael I. Jordan, Stuart J. Russell, and Andrew Y. Ng. 2003. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems. 521--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dit-Yan Yeung and Hong Chang. 2007. A kernel approach for semisupervised metric learning. IEEE Transactions on Neural Networks 18, 1 (2007), 141--149. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Framework for Privacy-Preserving Data Publishing with Enhanced Utility for Cyber-Physical Systems

          Recommendations

          Reviews

          Fjodor J. Ruzic

          I recently did research on cyber-physical systems (CPSs) based on sensors that collect data about the humans who are wearing them. The actual problem that arises in such an environment is related to privacy, because CPS deployment may provide sensitive personal information. So, I was searching for resources focused on these issues. Plenty of titles deal with the Internet of Things (IoT), embedded systems, wearable sensors, and CPSs, but they rarely focus on a privacy-preserving architecture when data publishing is incorporated into the system. Fortunately, I found this paper to be a worthwhile study on privacy challenges related to publishing data collected within actual CPSs. It is clear that the convergence of computing and physical sensing creates a complex engineering ecosystem in which sensitive data exposure with privacy attacks could jeopardize the integrity and security of these systems. However, it should be noted that CPSs are not just two separate parts, but also the interaction of the physical and the cyber parts, thus the need for new concepts of design and frameworks that can deal with personal data and privacy issues. Security and privacy are the great concerns for CPSs in which privacy attacks target data collections that can be used to leak sensitive information. The problem arises with the need for data publishing, where a maximum form of anonymity must be provided and guaranteed. The authors start with the premise that the distributed sensing, processing, and storage of massive amounts of data that CPSs provide impact privacy breaches when personal data is in use. Privacy-preserving data publishing frameworks can be found in recent literature. The general goal is to prevent the linking of data records and sensitive information in the publishing process, with the highest data quality possible. The authors successfully confront the "current practice in publishing CPSs' datasets," that is, to provide agreements for regulating data use, sharing, and retention. Hence, they are conscious of how this approach is vulnerable, for example, "datasets are often anonymized by suppressing direct identifiers," and instead apply k -anonymity models. It is interesting how they connect k -anonymity with privacy-preserving data publishing in an elegant way through the PAD ecosystem (which is presented in their previous work [1]). An additional bonus that readers interested in PAD may find useful is an open-source project done in Python and developed through a GitHub repository: https://github.com/PAD-Protecting-Anonymity/PAD. The PAD framework is extended to data utility based on deep neural networks. Thus, the authors use distance metric learning in "the accurate estimation of arrival and departure time from a database containing daily occupancy profiles." The primary task in a CPS environment is "to publish the dataset with k -anonymity guarantee as well as high quality in support of the required data analysis." They successfully present improved data reliability "by learning how the data is intended to be used and then adjusting the data perturbation algorithm accordingly"; microaggregation as the perturbation technique (mostly used in Eurostat) provides an acceptable level of privacy protection. The overall value of this study lies in the evaluation and results: "using various datasets collected in real-world buildings," the authors look at, for example, "the utility of PAD with a generic distance metric" and "the utility of PAD with a customized distance metric." Everyone involved in big data and data mining research and utilization within CPS and IoT environments should consider the privacy issues in data publishing. This study could be a valuable resource for their work. It is also recommended as a supplementary reference for undergraduate and postgraduate courses.

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Sensor Networks
            ACM Transactions on Sensor Networks  Volume 14, Issue 3-4
            Special Issue on BuildSys'17
            November 2018
            392 pages
            ISSN:1550-4859
            EISSN:1550-4867
            DOI:10.1145/3294070
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 November 2018
            • Accepted: 1 September 2018
            • Revised: 1 April 2018
            • Received: 1 January 2018
            Published in tosn Volume 14, Issue 3-4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader