article

Utility-based anonymization for privacy preservation with less information loss

Authors:
Jian Xu

Fudan University, China

Fudan University, China
View Profile

,
Wei Wang

Fudan University, China

Fudan University, China
View Profile

,
Jian Pei

Simon Fraser University, Canada

Simon Fraser University, Canada
View Profile

,
Xiaoyuan Wang

Fudan University, China

Fudan University, China
View Profile

,
Baile Shi

Fudan University, China

Fudan University, China
View Profile

,
Ada Wai-Chee Fu

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 8 Issue 2December 2006pp 21–30https://doi.org/10.1145/1233321.1233324

Published:01 December 2006Publication History

ACM SIGKDD Explorations Newsletter

Abstract

Privacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier attributes to generalized or changed values. However, global recoding may not always achieve effective anonymization in terms of discernability and query answering accuracy using the anonymized data. Moreover, anonymized data is often used for analysis. As well accepted in many analytical applications, different attributes in a data set may have different utility in the analysis. The utility of attributes has not been considered in the previous methods.

In this paper, we study the problem of utility-based anonymization. First, we propose a simple framework to specify utility of attributes. The framework covers both numeric and categorical data. Second, we develop two simple yet efficient heuristic local recoding methods for utility-based anonymization. Our extensive performance study using both real data sets and synthetic data sets shows that our methods outperform the state-of-the-art multidimensional global recoding methods in both discernability and query answering accuracy. Furthermore, our utility-based method can boost the quality of analysis using the anonymized data.

References

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 901--909. VLDB Endowment, 2005. Google ScholarDigital Library
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In ICDT, pages 246--258, 2005. Google ScholarDigital Library
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation algorithms for k-anonymity. Journal of Privacy Technology, (2005112001), 2005.Google Scholar
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 217--228, Tokyo, Japan, April 2005. Google ScholarDigital Library
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509--517, 1975. Google ScholarDigital Library
B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 205--216, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
V. S. Iyengar. Transforming data to satisfy privacy constraints. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'02), pages 279--288, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD Conference, pages 49--60, 2005. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), Atlanta, GA, USA, April 2006. IEEE. Google ScholarDigital Library
A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS'04), pages 223--228, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
P. Samarati. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6): 1010--1027, 2001. Google ScholarDigital Library
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information. In Proceedings of the 17th ACM Symposium on the Principle of Database Systems, Seattle, WA, June 1998. Google ScholarDigital Library
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In Technical Report SRI-CSL-98-04, 1998.Google Scholar
L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, 10(5):571--588, 2002. Google ScholarDigital Library
L. Sweeney. K-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, 10(5):571--588, 2002. Google ScholarDigital Library
K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), pages 249--256, 2004. Google ScholarDigital Library
L. Willenborg and T. deWaal. Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer Verlag, 2000.Google Scholar
W. E. Winkler. Using simulated annealing for k-anonymity. In Technical Report Statistics 2002-7, U.S. Census Bureau, Statistical Research Division, 2002.Google Scholar

Index Terms

Utility-based anonymization for privacy preservation with less information loss
1. Information systems
  1. Information systems applications

Recommendations

Utility-based anonymization using local recoding
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Privacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier ...
Read More
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society

Privacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
Read More
t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation
Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate k-anonymous data sets, where the identity of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 8, Issue 2
December 2006
106 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1233321
Issue’s Table of Contents

Copyright © 2006 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2006
Check for updates
Author Tags
data mining
k-anonymity
local recoding
privacy preservation
utility
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 60
  Total Citations
  View Citations
- 484
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Utility-based anonymization for privacy preservation with less information loss

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Utility-based anonymization using local recoding

IMR based Anonymization for Privacy Preservation in Data Mining

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Utility-based anonymization for privacy preservation with less information loss

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Utility-based anonymization using local recoding

IMR based Anonymization for Privacy Preservation in Data Mining

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media