skip to main content
10.1145/1141277.1141415acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data

Published: 23 April 2006 Publication History

Abstract

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What we need is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the non-disclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from the data distributed at multiple parties, without disclosing the data of each party to others. We assume that data is horizontally partitioned -- each party collects the same features of information for different data objects. We quantify the security and efficiency of the proposed method, and highlight future challenges.

References

[1]
SPECT dataset. ftp://ftp/ics.uci.edu/pub/machine-learning-databases/spect/.]]
[2]
D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 247--255, 2001.]]
[3]
R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pages 439--450, 2000.]]
[4]
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998.]]
[5]
N. Christianini and J. Shawe-Taylor. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.]]
[6]
W. Du and M. J. Atallah. Privacy-preserving statistical analysis. In Proceeding of the 17th Annual Computer Security Applications Conference, 2001.]]
[7]
W. Du and Z. Zhan. Building decision tree classifier on private data. In IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, pages 1--8, 2002.]]
[8]
Directive 95/46/EC of the european parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official Journal of the European Communities, No I.(281):31--50, Oct. 24 1995.]]
[9]
A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 217--228, 2002.]]
[10]
G. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'01), pages 77--86, 2001.]]
[11]
X. Ge. C++ code: SMO training of SVM. http://www.datalab.uci.edu/people/xge/svm/, 2001.]]
[12]
B. Goethals, S. Laur, H. Lipmaa, and T. Mielikäinen. On Secure Scalar Product Computation for Privacy-Preserving Data Mining. In C. Park and S. Chee, editors, The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), volume 3506, pages 104--120, December 2--3, 2004.]]
[13]
Standard for privacy of individually identifiable health information. Federal Register, 66(40), Feb. 28 2001.]]
[14]
I. Ioannidis, A. Grama, and M. Atallah. A secure protocol for computing dot-products in clustered and distributed environments. In The 2002 International Conference on Parallel Processing, 2002.]]
[15]
X. Jiang and H. Yu. SVM-JAVA: A Java implementation of the SMO (sequential minimal optimization) for training SVM. Computer Science Department, University of Iowa, http://hwanjoyu.org/svm-java, 2005.]]
[16]
M. Kantarcioǧlu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9):1026--1037, Sept. 2004.]]
[17]
H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03), 2003.]]
[18]
L. A. Kurgan, K. J. Cios, R. Tadeusiewicz, M. Ogiela, and L. S. Goodenday. Knowledge discovery approach to automated cardiac spect diagnosis. Artificial Intelligence in Medicine, 23:2:149--169, 2001.]]
[19]
X. Lin, C. Clifton, and M. Zhu. Privacy preserving clustering with distributed EM mixture modeling. Knowledge and Information Systems, to appear 2004.]]
[20]
Y. Lindell and B. Pinkas. Privacy preserving data mining. In Advances in Cryptology--CRYPTO 2000, pages 36--54. Springer-Verlag, Aug. 20--24 2000.]]
[21]
Y. Lindell and B. Pinkas. Privacy preserving data mining. Journal of Cryptology, 15(3):177--206, 2002.]]
[22]
A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography. CRC Press, Oct. 1996.]]
[23]
J. Platt. Fast training of support vector machines using sequential minimal optimization. In A. S. B. Scholkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.]]
[24]
P. Ravikumar, W. W. Cohen, and S. E. Fienberg. A secure protocol for computing string distance metrics. In Proc. the Workshop on Privacy and Security Aspects of Data Mining at the Int. Conf. on Data Mining, 2004.]]
[25]
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of 28th International Conference on Very Large Data Bases, pages 682--693, Hong Kong, Aug. 20--23 2002. VLDB.]]
[26]
J. Vaidya. Privacy Preserving Data Mining over Vertically Partitioned Data. PhD thesis, Purdue University, West Lafayette, Indiana, 2004.]]
[27]
J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639--644, 2002.]]
[28]
J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206--215, 2003.]]
[29]
J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In 2004 SIAM International Conference on Data Mining, pages 522--526, 2004.]]
[30]
J. Vaidya and C. Clifton. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 2005.]]
[31]
J. Vaidya and C. Clifton. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4), Nov. 2005.]]
[32]
V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.]]
[33]
R. Wright and Z. Yang. Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, Aug.22--25 2004.]]
[34]
X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD'03), 2003.]]
[35]
H. Yu, K. C. Chang, and J. Han. Heterogeneous learner for Web page classification. In Int. Conf. Data Mining (ICDM'2), 2002.]]
[36]
H. Yu and J. Vaidya. Privacy-preserving linear SVM classification. Submitted for publication, 2005.]]
[37]
H. Yu, J. Vaidya, and X. Jiang. Privacy preserving svm classification on vertically partitioned data. Submitted for publication, 2005.]]

Cited By

View all
  • (2024)A systematic review on federated learning system: a new paradigm to machine learningKnowledge and Information Systems10.1007/s10115-024-02257-667:2(1811-1914)Online publication date: 9-Nov-2024
  • (2023)Lightweight Multi-Class Support Vector Machine-Based Medical Diagnosis System with Privacy PreservationSensors10.3390/s2322903323:22(9033)Online publication date: 8-Nov-2023
  • (2023)APPLICATION OF COMPUTER SIMULATION TO THE ANONYMIZATION OF PERSONAL DATA: STATE-OF-THE-ART AND KEY POINTSПрограммирование10.31857/S0132347423040040(58-74)Online publication date: 1-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. privacy-preserving data mining
  2. support vector machine

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A systematic review on federated learning system: a new paradigm to machine learningKnowledge and Information Systems10.1007/s10115-024-02257-667:2(1811-1914)Online publication date: 9-Nov-2024
  • (2023)Lightweight Multi-Class Support Vector Machine-Based Medical Diagnosis System with Privacy PreservationSensors10.3390/s2322903323:22(9033)Online publication date: 8-Nov-2023
  • (2023)APPLICATION OF COMPUTER SIMULATION TO THE ANONYMIZATION OF PERSONAL DATA: STATE-OF-THE-ART AND KEY POINTSПрограммирование10.31857/S0132347423040040(58-74)Online publication date: 1-Jul-2023
  • (2023)Application of Computer Simulation to the Anonymization of Personal Data: State-of-the-Art and Key PointsProgramming and Computer Software10.1134/S036176882304004749:4(232-246)Online publication date: 28-Jul-2023
  • (2023)Efficient Multiparty Fully Homomorphic Encryption With Computation Fairness and Error Detection in Privacy Preserving Multisource Data MiningIEEE Transactions on Reliability10.1109/TR.2023.324656372:4(1308-1323)Online publication date: Dec-2023
  • (2023)A Privacy-Preserving Framework for Collaborative Machine Learning with Kernel Methods2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)10.1109/TPS-ISA58951.2023.00020(82-90)Online publication date: 1-Nov-2023
  • (2023)A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and ProtectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312459935:4(3347-3366)Online publication date: 1-Apr-2023
  • (2023)Achieving Privacy-Preserving Outsourced SVM Training with Non-Linear KernelGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437212(7115-7120)Online publication date: 4-Dec-2023
  • (2023)An unsupervised embedding harmonization system for privacy-preserving data mining in healthcareIISE Transactions on Healthcare Systems Engineering10.1080/24725579.2023.223101114:1(1-17)Online publication date: 25-Jul-2023
  • (2022)Budget Distributed Support Vector Machine for Non-ID Federated Learning ScenariosACM Transactions on Intelligent Systems and Technology10.1145/353973413:6(1-25)Online publication date: 22-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media