|
ABSTRACT
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes. In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Aggarwal, C. C. and Yu, P. S. 2004. A condensation approach to privacy preserving data mining. In Proceedings of the International Conference on Extending Database Technology (EDBT). 183--199.
|
| |
3
|
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2004. k-anonymity: Algorithms and hardness. Tech. rep., Stanford University.
|
 |
4
|
|
| |
5
|
Agrawal, R., Bayardo, R. J., Faloutsos, C., Kiernan, J., Rantzau, R., and Srikant, R. 2004. Auditing compliance with a hippocratic database. In Proceedings of the International Conference on Very Large Databases (VLDB). 516--527.
|
 |
6
|
|
| |
7
|
Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. 2002. Hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 143--154.
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Michael Ben-Or , Shafi Goldwasser , Avi Wigderson, Completeness theorems for non-cryptographic fault-tolerant distributed computation, Proceedings of the twentieth annual ACM symposium on Theory of computing, p.1-10, May 02-04, 1988, Chicago, Illinois, United States
[doi> 10.1145/62212.62213]
|
| |
17
|
Ben-Tal, A., Charnes, A., and Teboulle, M. 1989. Entropic means. J. Mathemat. Anal. Appl. 139, 2, 537--551.
|
 |
18
|
|
 |
19
|
David Chaum , Claude Crépeau , Ivan Damgard, Multiparty unconditionally secure protocols, Proceedings of the twentieth annual ACM symposium on Theory of computing, p.11-19, May 02-04, 1988, Chicago, Illinois, United States
[doi> 10.1145/62212.62214]
|
| |
20
|
Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the Tactical Communications Conference (TCC).
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Cox, L. 1995. Network models for complementary cell suppression. J. Amer. Statis. Asso. 90, 1453--1462.
|
| |
25
|
Cox, L. H. 1980. Suppression, methodology and statistical disclosure control. J. Amer. Statis. Asso. 75.
|
| |
26
|
Cox, L. H. 1982. Solving confidentiality protection problems in tabulations using network optimization: A network model for cell suppression in the u.s. economic censuses. In Proceedings of the International Seminar on Statistical Confidentiality. Dublin International Statistical Institute, Dublin, Ireland. 229--245.
|
| |
27
|
Cox, L. H. 1987. New results in dislosure avoidance for tabulations. In Proceedings of the International Statistical Institute 46th Session. Tokyo, Japan. 83--84.
|
| |
28
|
Dalenius, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift.
|
| |
29
|
Dalenius, T. and Reiss, S. 1982. Data swapping: A technique for disclosure control. J. Statis. Plan. Infer. 6.
|
 |
30
|
|
 |
31
|
|
| |
32
|
Diaconis, P. and Sturmfels, B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 1, 363--397.
|
 |
33
|
|
 |
34
|
|
| |
35
|
Dobra, A. 2002. Statistical tools for disclosure limitation in multiway contingency tables. Ph.D. thesis, Carnegie Mellon University.
|
| |
36
|
Dobra, A. and Feinberg, S. E. 2000. Assessing the risk of disclosure of confidential categorical data. In Bayesian Statistics 7. Oxford University Press, Oxford, UK.
|
| |
37
|
Dobra, A. and Feinberg, S. E. 2003. Bounding entries in multi-way contingency tables given a set of marginal totals. In Proceedings of the Shoresh Conference 2000: Foundations of Statistical Inference. Springer Verlag.
|
| |
38
|
|
 |
39
|
|
| |
40
|
Duncan, G. T. and Feinberg, S. E. 1997. Obtaining information while preserving privacy: A markov perturbation method for tabular data. Joint Statistical Meetings. Anaheim, CA.
|
 |
41
|
|
 |
42
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
| |
43
|
Fellegi, I. P. 1972. On the question of statistical confidentiality. J. Amer. Statis. Asso. 67:337, 7--18.
|
 |
44
|
|
 |
45
|
|
| |
46
|
Kantarcioglu, M. and Clifton, C. 2002. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the Conference on Data Mining and Knowledge Discovery (DMKD).
|
| |
47
|
|
 |
48
|
|
 |
49
|
Jon Kleinberg , Christos Papadimitriou , Prabhakar Raghavan, Auditing Boolean attributes, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.86-91, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335210]
|
| |
50
|
LeFevre, K., Agrawal, R., Ercegovac, V., Ramakrishnan, R., Xu, Y., and DeWitt, D. J. 2004. Limiting disclosure in hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 108--119.
|
 |
51
|
|
| |
52
|
M. Langheinrich, E. 2001. A P3P preference exchange language 1.0 (appel1.0). W3C Working Draft.
|
| |
53
|
M. Marchiori, E. 2002. The platform for privacy preferences 1.0 (p3p1.0) specification. W3C Proposed Recommendation.
|
| |
54
|
|
| |
55
|
Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. 2006. Worst-case background knowledge in privacy. Tech. rep., Cornell University.
|
| |
56
|
Matloff, N. S. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy. 173--180.
|
 |
57
|
|
| |
58
|
Miklau, G. and Suciu, D. 2003. Controlling access to published data using cryptography. In Proceedings of the International Conference on Very Large Databases (VLDB). 898--909.
|
 |
59
|
|
| |
60
|
Ohrn, A. and Ohno-Machado, L. 1999. Using boolean reasoning to anonymize databases. A. I. Medicine 15, 3, 235--254.
|
| |
61
|
|
| |
62
|
Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Tech. rep. SRI-CSL-98-04, SRI Computer Science Laboratory, Palo Alto, CA.
|
| |
63
|
Schlorer, J. 1975. Identification and retrieval of personal records from a statistical bank. Methods Inform. Medicine.
|
| |
64
|
Slavkovic, A. and Feinberg, S. E. 2004. Bounds for cell entries in two-way tables given conditional relative frequencies. In Lecture Notes in Computer Science, Vol. 3050. J. Domingo-Ferrer and V. Torra Eds. Springer-Verlag, 30--43.
|
| |
65
|
Snodgrass, R. T., Yao, S., and Collberg, C. S. 2004. Tamper detection in audit logs. In Proceedings of the International Conference on Very Large Databases (VLDB). 504--515.
|
| |
66
|
Sweeney, L. 2000. Uniqueness of simple demographics in the u.s. population. Tech. rep., Carnegie Mellon University.
|
| |
67
|
|
 |
68
|
|
| |
69
|
University of California Irvine Machine Learning Repository. http://www.ics.uci.edu/mlearn/mlrepository.html.
|
 |
70
|
|
| |
71
|
Warner, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statis. Ass.
|
| |
72
|
Warner, S. L. 1971. The linear randomized response model. J. Amer. Statis. Ass. 884--888.
|
| |
73
|
Yang, X. and Li, C. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the International Conference on Very Large Databases (VLDB). 96--107.
|
 |
74
|
|
|