|
ABSTRACT
A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes of microdata that clearly identify individuals, such as name and medical care card number, are generally removed, these databases can sometimes be joined with other public databases on attributes such as Zip code, Gender and Age to re-identify individuals who were supposed to remain anonymous. "Linking" attacks are made easier by the availability of other complementary databases over the Internet. k-anonymity is a technique that prevents "linking" attacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. In this paper, we investigate a practical model of k-anonymity, called full-domain generalization. We examine the issue of computing minimal k-anonymous table based on the definition of minimality described by Samarati. We introduce the hash-based technique previously used in mining associate rules and present an efficient hash-based algorithm to find the minimal k-anonymous table, which improves the previous binary search algorithm first proposed by Samarati.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Adam, N. R. and Wortman, J. C. Security-Control Methods for Statistical Databases: A Comparative Study, ACM Computing Surveys, vol. 21, no. 4, pp. 515--556, 1989.
|
| |
2
|
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D. and Zhu, A. Anonymizing tables. In Proc. of the 10th International Conference on Database Theory (ICDT05), pp. 246--258, Edinburgh, Scotland.
|
| |
3
|
Aggarwal, G., Feder, T., Kenthapadi, K., Panigrahy, R., Thomas, D. and Zhu, A. Achieving anonymity via clustering in a metric space. In Proceedings of the 25th ACM SIGACTSIGMOD- SIGART Symposium on Principles of Database Systems (PODS), 2006.
|
| |
4
|
Agrawal, R and Srikant, R. Fast algorithms for mining association rules. In Proc. of the 20th Int'l Conference on Very Large Databases, August 1994.
|
| |
5
|
Bayardo, R. and Agrawal, R. Data privacy through optimal k-anonymity. In Proceedings of the 21st International Conference on Data Engineering (ICDE), 2005.
|
| |
6
|
Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. Introduction to Algorithms, second edition, MIT Press and McGraw-Hill. ISBN 0-262-53196-8.
|
| |
7
|
Domingo-Ferrer, J and Mateo-Sanz, J. M. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 4(1), 2002.
|
| |
8
|
Federal Committee on Statistical Methodology, Statistical Policy Working Paper 22, Report on Statistical Disclosure Limitation Methodology, May 1994.
|
| |
9
|
Fung B, Wang K and Yu P. Top-down specialization for information and privacy preservation. In Proc. of the 21st International Conference on Data Engineering (ICDE'05), Tokyo, Japan.
|
| |
10
|
Hundepool, A. and Willenborg, L. μ and τ-ARGUS: Software for statistical disclosure control. In Proc. of the Third International Seminar on Statistical Confidentiality, 1996.
|
| |
11
|
Iyengar V. Transforming data to satisfy privacy constraints. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279--288, Edmonton, Alberta, Canada, 2002.
|
| |
12
|
LeFevre K, DeWitt DJ, Ramakrishnan R. Incognito: Efficient fulldomain k-anonymity. In Proc. of the 24th ACM SIGMOD International Conference on Management of Data, pp. 49--60, Baltimore, Maryland, USA, 2005.
|
| |
13
|
LeFevre, K., DeWitt, D. and Ramakrishnan, R. Multidimensional k-anonymity. Technical Report 1521, University of Wisconsin, 2005.
|
| |
14
|
Meyerson A and Williams R. On the complexity of optimal k-anonymity. In Proc. of the 23rd ACM-SIGMOD-SIGACT-SIGART Symposium on the Principles of Database Systems, pp. 223--228, Paris, France, 2004.
|
| |
15
|
Park, J. S., Chen, M. S. and Yu, P. S. An Effective Hash-Based Algorithm for Mining Association Rules. Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. PP. 175--186, 1995.
|
| |
16
|
Samarati, P and Sweeney, L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory, 1998.
|
| |
17
|
Samarati P. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010--1027. 2001
|
| |
18
|
Samarati P, Sweeney L. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proc. of ACM Symposium on Principles of Database Systems, pp. 188, 1998.
|
| |
19
|
Srikant, R and Agrawal, R. Mining generalized association rules. In Proc. of the 21st Int'l Conference on Very Large Databases, August 1995.
|
| |
20
|
Sweeney L. Achieving k-anonynity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002, 10(5):571--588.
|
| |
21
|
Wang, K., Yu, P. S and Chakraborty, S. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), 2004.
|
| |
22
|
Willenborg, L and DeWaal, T. Statistical Disclosure Control in Practice. Springer-Verlag, 1996.
|
| |
23
|
Willenborg, L and DeWaal, T. Elements of Statistical Disclosure Control. Springer Verlag Lecture Notes in Statistics, 2000.
|
| |
24
|
Winkler, W. Using simulated annealing for k-anonymity. Research Report 2002--07, US Census Bureau Statistical Research Division, 2002.
|
|