ACM Home Page
Please provide us with feedback. Feedback
Privacy preserving regression modelling via distributed computation
Full text PdfPdf (264 KB)
Source Conference on Knowledge Discovery in Data archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters table of contents
Pages: 677 - 682  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Ashish P. Sanil  National Institute of Statistical Sciences, Research Triangle Park, NC
Alan F. Karr  National Institute of Statistical Sciences, Research Triangle Park, NC
Xiaodong Lin  National Institute of Statistical Sciences, Research Triangle Park, NC
Jerome P. Reiter  Duke University, Durham, NC
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 56,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014139
What is a DOI?

ABSTRACT

Reluctance of data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting a mutually beneficial data mining analysis. We address the case of vertically partitioned data -- multiple data owners/agencies each possess a few attributes of every data record. We focus on the case of the agencies wanting to conduct a linear regression analysis with complete records without disclosing values of their own attributes. This paper describes an algorithm that enables such agencies to compute the exact regression coefficients of the global regression equation and also perform some basic goodness-of-fit diagnostics while protecting the confidentiality of their data. In more general settings beyond the privacy scenario, this algorithm can also be viewed as method for the distributed computation for regression analyses.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ACM. SIGKDD Explorations, volume 4, December 2002.
 
2
 
3
R. Brent. Algorithms for minimization without derivatives. Prentice-Hall, Englewood Cliffs, NJ, 1973.
4
5
 
6
W. Du, Y. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the 4th SIAM International Conference on Data Mining, April 2004.
 
7
 
8
H. Kargupta and K. Liu. Distributed data mining bibliography.
 
9
A. F. Karr, X. Lin, J. P. Reiter, and A. P. Sanil. Secure regression on distributed databases. J. Computational and Graphical Statistics, 2004. Submitted for publication. Available on-line at www.niss.org/dgii/technicalreports.html.
 
10
M. Powell. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Computer Journal, 7:152--162, 1964.
 
11
12
13
 
14
S. Weisberg. Applied Linear Regression. Wiley, 1985.
 
15
Y. Xing, M. G. Madden, J. Duggan, and G. J. Lyons. Distributed Regression for Heterogeneous Data Sets. In M. R. Berthold, H.-J. Lenz, E. Bradley, R. Kruse, and C. Borgelt, editors, Proceedings of 5th International Symposium on Intelligent Data Analysis (IDA2003), LNCS 2810, pages 544--553, Berlin, German, August 2003.


Collaborative Colleagues:
Ashish P. Sanil: colleagues
Alan F. Karr: colleagues
Xiaodong Lin: colleagues
Jerome P. Reiter: colleagues