Abstract
Cyber-physical systems have enabled the collection of massive amounts of data in an unprecedented level of spatial and temporal granularity. Publishing these data can prosper big data research, which, in turn, helps improve overall system efficiency and resiliency. The main challenge in data publishing is to ensure the usefulness of published data while providing necessary privacy protection. In our previous work (Jia et al. 2017a), we presented a privacy-preserving data publishing framework (referred to as PAD hereinafter), which can guarantee k-anonymity while achieving better data utility than traditional anonymization techniques. PAD learns the information of interest to data users or features from their interactions with the data publishing system and then customizes data publishing processes to the intended use of data. However, our previous work is only applicable to the case where the desired features are linear in the original data record. In this article, we extend PAD to nonlinear features. Our experiments demonstrate that for various data-driven applications, PAD can achieve enhanced utility while remaining highly resilient to privacy threats.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/ Software available from tensorflow.org.Google Scholar
- Bharathan Balaji. 2015. Zodiac dataset publication agreement. Retrieved June 15, 2017 from http://www.synergylabs.org/bharath/datasets.html.Google Scholar
- Vincent Bindschaedler, Reza Shokri, and Carl A. Gunter. 2017. Plausible deniability for privacy-preserving data synthesis. Proceedings of the VLDB Endowment 10, 5 (2017), 481--492. Google ScholarDigital Library
- Andy Bloxham. 2011. Most burglars using Facebook and Twitter to target victims, survey suggests. Retrieved September 26, 2011 from http://www.telegraph.co.uk/technology/news/8789538/Most-burglars-using-Facebook-and-Twitter-to-target-victims-survey-suggests.html.Google Scholar
- François Chollet and others, 2015. Keras. Retrieved from https://github.com/keras-team/keras.Google Scholar
- Josep Domingo-Ferrer. 2006. Microaggregation for database and location privacy. In International Workshop on Next Generation Information Technologies and Systems. Springer, 106--116. Google ScholarDigital Library
- Josep Domingo-Ferrer and Josep Maria Mateo-Sanz. 2002a. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 1 (2002), 189--201. Google ScholarDigital Library
- J. Domingo-Ferrer and J. M. Mateo-Sanz. 2002b. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 1 (Jan. 2002), 189--201. Google ScholarDigital Library
- Josep Domingo-Ferrer and Jordi Soria-Comas. 2016. Anonymization in the time of big data. In Privacy in Statistical Databases, Josep Domingo-Ferrer and Mirjana Pejić-Bach (Eds.). Springer International Publishing, Cham, 57--68.Google Scholar
- Simona D'Oca and Tianzhen Hong. 2015. Occupancy schedules learning process through a data mining framework. Energy and Buildings 88 (2015), 395--408.Google ScholarCross Ref
- Flávio du Pin Calmon and Nadia Fawaz. 2012. Privacy against statistical inference. In Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 1401--1408.Google ScholarCross Ref
- Cynthia Dwork. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation. Springer, 1--19. Google ScholarDigital Library
- Elevate Energy. 2013. Aggregated Data Access: The 15/15 Rule in Illinois and Beyond. Retrieved June 15, 2017 from http://www.elevateenergy.org/wp/wp-content/uploads/1515-Rule-Factsheet-FINAL.pdf.Google Scholar
- Khaled El Emam and Cecilia Álvarez. 2014. A critical appraisal of the Article 29 Working Party Opinion 05/2014 on data anonymization techniques. International Data Privacy Law 5, 1 (2014), 73--87.Google ScholarCross Ref
- Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1054--1067. Google ScholarDigital Library
- European Commission. 2012. Protection of personal data. Retrieved January 13, 2017 from http://ec.europa.eu/justice/data-protection/.Google Scholar
- Benjamin Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR) 42, 4 (2010), 14. Google ScholarDigital Library
- Aris Gkoulalas-Divanis, Panos Kalnis, and Vassilios S. Verykios. 2010. Providing k-anonymity in location based services. ACM SIGKDD Explorations Newsletter 12, 1 (2010), 3--10. Google ScholarDigital Library
- Marco Gruteser and Dirk Grunwald. 2003. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services. ACM, 31--42. Google ScholarDigital Library
- Mehreen S. Gul and Sandhya Patidar. 2015. Understanding the energy consumption and occupancy of a multi-purpose academic building. Energy and Buildings 87 (2015), 155--165.Google ScholarCross Ref
- Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, 1735--1742. Google ScholarDigital Library
- Naoise Holohan, Spiros Antonatos, Stefano Braghin, and Pól Mac Aonghusa. 2017. (k, ε)-Anonymity: k-Anonymity with ε-differential privacy. arXiv Preprint arXiv:1710.01615 (2017).Google Scholar
- Tsan-sheng Hsu, Churn-Jung Liau, and Da-Wei Wang. 2014. A logical framework for privacy-preserving social network publication. Journal of Applied Logic 12, 2 (2014), 151--174.Google ScholarCross Ref
- Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2014. Discriminative deep metric learning for face verification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875--1882. Google ScholarDigital Library
- Ruoxi Jia, Fisayo Caleb Sangogboye, Tianzhen Hong, Costas Spanos, and Mikkel Baun Kjærgaard. 2017a. PAD: Protecting anonymity in publishing building related datasets. In Proceedings of the 4th ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM. Google ScholarDigital Library
- Ruoxi Jia, Roy Dong, S. Shankar Sastry, and Costas J. Spanos. 2017b. Privacy-enhanced architecture for occupancy-based HVAC control. In Proceedings of the 8th International Conference on Cyber-Physical Systems. ACM, 177--186. Google ScholarDigital Library
- Ruoxi Jia and Costas Spanos. 2017. Occupancy modelling in shared spaces of buildings: A queueing approach. Journal of Building Performance Simulation 10, 4 (2017), 406--421.Google ScholarCross Ref
- Ming Jin, Ruoxi Jia, Zhaoyi Kang, Ioannis C. Konstantakopoulos, and Costas J. Spanos. 2014. Presencesense: Zero-training algorithm for individual presence detection based on power monitoring. In Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM, 1--10. Google ScholarDigital Library
- Ming Jin, Ruoxi Jia, and Costas Spanos. 2017. Virtual occupancy sensing: Using smart meters to indicate your presence. IEEE Transactions on Mobile Computing 16, 11 (2017), 3264--3277.Google ScholarDigital Library
- Eoghan McKenna, Ian Richardson, and Murray Thomson. 2012. Smart meter data: Balancing consumer privacy concerns with legitimate applications. Energy Policy 41 (2012), 807--814.Google ScholarCross Ref
- Andrés Molina-Markham, Prashant Shenoy, Kevin Fu, Emmanuel Cecchet, and David Irwin. 2010. Private memoirs of a smart meter. In Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building. ACM, 61--66. Google ScholarDigital Library
- S. Raj Rajagopalan, Lalitha Sankar, Soheil Mohajer, and H. Vincent Poor. 2011. Smart meter privacy: A utility-privacy framework. In Proceedings of the 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm). IEEE, 190--195.Google Scholar
- Fisayo Caleb Sangogboye, Krzysztof Arendt, Ashok Singh, Christian T. Veje, Mikkel Baun Kjærgaard, and Bo Nørregaard Jørgensen. 2017. Performance comparison of occupancy count estimation and prediction with common versus dedicated sensors for building model predictive control. Building Simulation 10, 6 (Dec. 2017), 829--843.Google ScholarCross Ref
- Lalitha Sankar, S. Raj Rajagopalan, and H. Vincent Poor. 2013. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Transactions on Information Forensics and Security 8, 6 (2013), 838--852. Google ScholarDigital Library
- Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557--570. Google ScholarDigital Library
- Ivor W. Tsang, James T. Kwok, C. Bay, and H. Kong. 2003. Distance metric learning with kernels. In Proceedings of the International Conference on Artificial Neural Networks. 126--129.Google Scholar
- Giridhari Venkatadri, Athanasios Andreou, Yabing Liu, Alan Mislove, Krishna P. Gummadi, Patrick Loiseau, and Oana Goga. 2018. Privacy Risks with Facebooks PII-based Targeting: Auditing a Data Brokers Advertising Interface.Google Scholar
- Kilian Q. Weinberger, John Blitzer, and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems. 1473--1480. Google ScholarDigital Library
- Eric P. Xing, Michael I. Jordan, Stuart J. Russell, and Andrew Y. Ng. 2003. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems. 521--528. Google ScholarDigital Library
- Dit-Yan Yeung and Hong Chang. 2007. A kernel approach for semisupervised metric learning. IEEE Transactions on Neural Networks 18, 1 (2007), 141--149. Google ScholarDigital Library
Index Terms
- A Framework for Privacy-Preserving Data Publishing with Enhanced Utility for Cyber-Physical Systems
Recommendations
An effective value swapping method for privacy preserving data publishing
Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in ...
Can the Utility of Anonymized Data be Used for Privacy Breaches?
Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article ...
Background knowledge attacks in privacy-preserving data publishing models
AbstractMassive volumes of data are being generated at every moment through various sources in the cyber-physical world. While storing as well as facilitating these data for business or individual requirements, data disclosure, sensitive data ...
Comments