Abstract
Advances in distributed service-oriented computing and Internet technology have formed a strong technology push for outsourcing and information sharing. There is an increasing need for organizations to share their data across organization boundaries both within the country and with countries that may have lesser privacy and security standards. Ideally, we wish to share certain statistical data and extract the knowledge from the private databases without revealing any additional information of each individual database apart from the aggregate result that is permitted. In this article, we describe two scenarios for outsourcing data aggregation services and present a set of decentralized peer-to-peer protocols for supporting data sharing across multiple private databases while minimizing the data disclosure among individual parties. Our basic protocols include a set of novel probabilistic computation mechanisms for important primitive data aggregation operations across multiple private databases such as max, min, and top k selection. We provide an analytical study of our basic protocols in terms of precision, efficiency, and privacy characteristics. Our advanced protocols implement an efficient algorithm for performing kNN classification across multiple private databases. We provide a set of experiments to evaluate the proposed protocols in terms of their correctness, efficiency, and privacy characteristics.
- Aggarwal, G., Bawa, M., Ganesan, P., Garcia-Molina, H., Kenthapadi, K., Motwani, R., Srivastava, U., Thomas, D., and Xu, Y. 2005. Two can keep a secret: A distributed architecture for secure database services. Conference on Innovative Data Systems Research (CIDR).Google Scholar
- Aggarwal, G., Mishra, N., and Pinkas, B. 2004. Secure computation of the kth ranked element. IACR Conference on Eurocryption.Google Scholar
- Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data mining algorithms. Symposium on Principles of Database Systems. Google ScholarDigital Library
- Agrawal, R., Bird, P., Grandison, T., Kieman, J., Logan, S., and Rjaibi, W. 2005. Extending relational database systems to automatically enforce privacy policies. 21st International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Agrawal, R., Evfimievski, A., and Srikant, R. 2003. Information sharing across private databases. ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
- Agrawal, R., Kieman, J., Srikant, R., and Xu, Y. 2002. Hippocratic databases. International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
- Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. 2004. Order-preserving encryption for numeric data. ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
- Bawa, M., Bayardo, R. J., and Agrawal, R. 2003. Privacy-preserving indexing of documents on the network. 29th International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
- Bertino, E., Ooi, B., Yang, Y., and Deng, R. H. 2005. Privacy and ownership preserving of outsourced medical data. International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Blaze, M., Feigenbaum, J., and Lacy, J. 1996. Decentralized trust management. IEEE Conference on Privacy and Security. Google ScholarDigital Library
- Clifton, C. 2002. Tutorial on privacy, security, and data mining. 13th European Conference on Machine Learning and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases. Google ScholarDigital Library
- Clifton, C., Kantarcioglu, M., Lin, X., Vaidya, J., and Zhu, M. 2003. Tools for privacy preserving distributed data mining. SIGKDD Explorations. Google ScholarDigital Library
- Dijkstra, E. W. 1974. Self-stabilizing systems in spite of distributed control. Commun. ACM 17, 11. Google ScholarDigital Library
- Doan, A. and Halevy, A. 2005. Semantic integration research in the database community: A brief survey. AI Magazine (Special Issue on Semantic Integration). Google ScholarDigital Library
- Elmagarmid, A., Rusinkiewicz, M., and Sheth, A., Eds. 1999. Management of Heterogeneous and Autonomous Database Systems 1st Ed. Morgan Kaufmann. Google ScholarDigital Library
- Garcia-Molina, H., Ullman, J. D., and Widom, J. D. 2001. Information Integration, Chapter 20. Prentice Hall.Google Scholar
- Goldreich, O. 2001. Secure multi-party computation. Working Draft, version 1.3.Google Scholar
- Hacigumus, H., Iyer, B., Li, C., and Mehrotra, S. 2002. Executing SQL over encrypted data in the database service provider model. ACM SIGMOD Conference on Management of Data. Google ScholarDigital Library
- Hacigumus, H., Iyer, B., and Mehrotra, S. 2002. Providing database as a service. International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Halevy, A. Y., Ashish, N., Bitton, D., Carey, M. J., Draper, D., Pollock, J., Rosenthal, A., and Sikka, V. 2005. Enterprise information integration: successes, challenges and controversies. ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
- Hore, B., Mehrotra, S., and Tsudik, G. 1997. A privacy-preserving index for range queries. ACM Symposium on Principles of Distributed Computing.Google Scholar
- Jajodia, S. and Sandhu, R. 1991. Toward a multilevel secure relational data model. ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
- Kantarcioglu, M. and Clifton, C. 2004a. Privacy preserving data mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Engin. 16, 9. Google ScholarDigital Library
- Kantarcioglu, M. and Clifton, C. 2004b. Security issues in querying encrypted data. Tech. rep. TR-04-013, Purdue University.Google Scholar
- Kantarcioglu, M. and Clifton, C. 2005. Privacy preserving k-nn classifier. International Conference on Data Engineering (ICDE).Google Scholar
- Kantarcoglu, M. and Vaidya, J. 2003. Privacy preserving naive Bayes classifier for horizontally partitioned data. IEEE ICDM Workshop on Privacy Preserving Data Mining.Google Scholar
- Lindell, Y. and Pinkas, B. 2002. Privacy preserving data mining. J. Crypto. 15, 3.Google ScholarDigital Library
- Lynch, N. A. 1996. Distributed Algorithms. Morgan Kaufmann Publishers. Google ScholarDigital Library
- Markey, E. J. 2005. Outsourcing privacy: Countries processing U.S. social security numbers, health information, tax records lack fundamental privacy safeguards. A staff report prepared at the request of Edward J. Markey, U.S. House of Representatives.Google Scholar
- Reiter, M. K. and Rubin, A. D. 1998. Crowds: Anonymity for Web transactions. ACM Trans. Inform. Syst. Secur. (TISSEC) 1, 1. Google ScholarDigital Library
- Syverson, S., Coldsehlag, D. M., and Reed, M. C. 1997. Anonymous connections and onion routing. IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2003a. Privacy-preserving k-means clustering over vertically partitioned data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2003b. Privacy preserving naive Bayes classifier for vertically partitioned data. The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2005. Privacy-preserving top-k queries. International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Wang, K., Fung, B. C. M., and Dong, G. 2005. Integrating private databases for data analysis. IEEE Intelligence and Security Informatics Conference (ISI). Google ScholarDigital Library
- Wright, M., Adler, M., Levine, B. N., and Shields, C. 2003. Defending anonymous communications against passive logging attacks. IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- Xiao, L., Xu, Z., and Zhang, X. 2003. Mutual anonymity protocols for hybrid peer-to-peer systems. International Conference on Distributed Computing Systems (ICDCS). Google ScholarDigital Library
- Xiong, L., Chitti, S., and Liu, L. 2005. Topk queries across multiple private databases. 25th International Conference on Distributed Computing Systems (ICDCS). Google ScholarDigital Library
- Xiong, L. and Liu, L. 2004. PeerTrust: supporting reputation-based trust in peer-to-peer communities. IEEE Trans. Knowl. Data Engin. 16, 7. Google ScholarDigital Library
- Yang, Z., Zhong, S., and Wright, R. N. 2005. Privacy-preserving classification of customer data without loss of accuracy. SIAM Conference on Data Mining (SDM). Google ScholarDigital Library
Index Terms
- Preserving data privacy in outsourcing data aggregation services
Recommendations
Privacy-Preserving Sharing of Sensitive Information
Privacy-preserving sharing of sensitive information (PPSSI) is motivated by the increasing need for entities (organizations or individuals) that don't fully trust each other to share sensitive information. Many types of entities need to collect, analyze,...
Privacy-preserving data linkage protocols
WPES '04: Proceedings of the 2004 ACM workshop on Privacy in the electronic societyWe address the problem of data linkage and data extraction across database tables of sensitive information about individuals, in an environment of constraints on organisations' ability to share data and a need to protect individuals' privacy and ...
Privacy-preserving data sharing in cloud computing
Storing and sharing databases in the cloud of computers raise serious concern of individual privacy. We consider two kinds of privacy risk: presence leakage, by which the attackers can explicitly identify individuals in (or not in) the database, and ...
Comments