Abstract
Privacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier attributes to generalized or changed values. However, global recoding may not always achieve effective anonymization in terms of discernability and query answering accuracy using the anonymized data. Moreover, anonymized data is often used for analysis. As well accepted in many analytical applications, different attributes in a data set may have different utility in the analysis. The utility of attributes has not been considered in the previous methods.
In this paper, we study the problem of utility-based anonymization. First, we propose a simple framework to specify utility of attributes. The framework covers both numeric and categorical data. Second, we develop two simple yet efficient heuristic local recoding methods for utility-based anonymization. Our extensive performance study using both real data sets and synthetic data sets shows that our methods outperform the state-of-the-art multidimensional global recoding methods in both discernability and query answering accuracy. Furthermore, our utility-based method can boost the quality of analysis using the anonymized data.
- C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 901--909. VLDB Endowment, 2005. Google ScholarDigital Library
- G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In ICDT, pages 246--258, 2005. Google ScholarDigital Library
- G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation algorithms for k-anonymity. Journal of Privacy Technology, (2005112001), 2005.Google Scholar
- R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 217--228, Tokyo, Japan, April 2005. Google ScholarDigital Library
- J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509--517, 1975. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 205--216, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- V. S. Iyengar. Transforming data to satisfy privacy constraints. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'02), pages 279--288, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD Conference, pages 49--60, 2005. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), Atlanta, GA, USA, April 2006. IEEE. Google ScholarDigital Library
- A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS'04), pages 223--228, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6): 1010--1027, 2001. Google ScholarDigital Library
- P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information. In Proceedings of the 17th ACM Symposium on the Principle of Database Systems, Seattle, WA, June 1998. Google ScholarDigital Library
- P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In Technical Report SRI-CSL-98-04, 1998.Google Scholar
- L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, 10(5):571--588, 2002. Google ScholarDigital Library
- L. Sweeney. K-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, 10(5):571--588, 2002. Google ScholarDigital Library
- K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), pages 249--256, 2004. Google ScholarDigital Library
- L. Willenborg and T. deWaal. Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer Verlag, 2000.Google Scholar
- W. E. Winkler. Using simulated annealing for k-anonymity. In Technical Report Statistics 2002-7, U.S. Census Bureau, Statistical Research Division, 2002.Google Scholar
Index Terms
- Utility-based anonymization for privacy preservation with less information loss
Recommendations
Utility-based anonymization using local recoding
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier ...
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting SocietyPrivacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation
Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate k-anonymous data sets, where the identity of ...
Comments