skip to main content
10.1145/1807167.1807248acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Non-homogeneous generalization in privacy preserving data publishing

Published: 06 June 2010 Publication History

Abstract

Most previous research on privacy-preserving data publishing, based on the k-anonymity model, has followed the simplistic approach of homogeneously giving the same generalized value in all quasi-identifiers within a partition. We observe that the anonymization error can be reduced if we follow a non-homogeneous generalization approach for groups of size larger than k. Such an approach would allow tuples within a partition to take different generalized quasi-identifier values. Anonymization following this model is not trivial, as its direct application can easily violate k-anonymity. In addition, non-homogeneous generalization allows for additional types of attack, which should be considered in the process. We provide a methodology for verifying whether a non-homogeneous generalization violates k-anonymity. Then, we propose a technique that generates a non-homogeneous generalization for a partition and show that its result satisfies k-anonymity, however by straightforwardly applying it, privacy can be compromised if the attacker knows the anonymization algorithm. Based on this, we propose a randomization method that prevents this type of attack and show that k-anonymity is not compromised by it. Nonhomogeneous generalization can be used on top of any existing partitioning approach to improve its utility. In addition, we show that a new partitioning technique tailored for non-homogeneous generalization can further improve quality. A thorough experimental evaluation demonstrates that our methodology greatly improves the utility of anonymized data in practice.

References

[1]
R. Agrawal and R. Srikant. Privacy-preserving data mining. In SIGMOD, 2000.
[2]
A. Asuncion and D. Newman. UCI Machine Learning Repository, 2007.
[3]
M. Barbaro and T. Zeller. A Face Is Exposed for AOL Searcher No. 4417749. The New York Times, 2006. http://www.nytimes.com/2006/08/09/technology/09aol.html.
[4]
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005.
[5]
A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In KDD, 2002.
[6]
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. Fast data anonymization with low information loss. In VLDB, 2007.
[7]
A. Gionis, A. Mazza, and T. Tassa. k-anonymization revisited. In ICDE, 2008.
[8]
P. Golle. Revisiting the uniqueness of simple demographics in the US population. In WPES, 2006.
[9]
Z. Huang, W. Du, and B. Chen. Deriving private information from randomized data. In SIGMOD, 2005.
[10]
T. Iwuchukwu and J. F. Naughton. k-anonymization as spatial indexing: toward scalable and incremental anonymization. In VLDB, 2007.
[11]
P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventing location-based identity inference in anonymous spatial queries. IEEE Trans. Knowl. Data Eng., 19(12), 2007.
[12]
D. Kifer. Attacks on privacy and definetti's theorem. In SIGMOD, 2009.
[13]
K. Lefevre, D. J. Dewitt, and R. Ramakrishnan. Incognito: efficient full-domain k-anonymity. In SIGMOD, 2005.
[14]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006.
[15]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.
[16]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.
[17]
D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge in privacy. In ICDE, 2007.
[18]
A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004.
[19]
M. F. Mokbel, C. Y. Chow, and W. G. Aref. The new casper: Query processing for location services without compromising privacy. In VLDB, 2006.
[20]
P. Samarati. Protecting respondents' identities in microdata release. IEEE Trans. Knowl. Data Eng., 13(6):1010--1027, 2001.
[21]
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 2002.
[22]
Y. Tao, X. Xiao, J. Li, and D. Zhang. On anti-corruption privacy preserving publication. In ICDE, 2008.
[23]
R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146--160, 1972.
[24]
R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007.
[25]
R. C.-W. Wong, J. Li, A. W.-C. Fu, and K. Wang. α, k-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In KDD, 2006.
[26]
X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, 2006.
[27]
X. Xiao and Y. Tao. m-invariance: Towards privacy preserving re-publication of dynamic datasets. In SIGMOD, 2007.
[28]
J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu. Utility-based anonymization using local recoding. In SIGKDD, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. anonymization
  2. non-homogeneous generalization
  3. privacy

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 10, 2010
Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)PMDG: Privacy for Multi-perspective Process Mining Through Data GeneralizationAdvanced Information Systems Engineering10.1007/978-3-031-34560-9_30(506-521)Online publication date: 8-Jun-2023
  • (2022)Anonymization of Quasi-Sensitive Attribute Sets in Aggregated DatasetSecurity and Communication Networks10.1155/2022/97218172022Online publication date: 1-Jan-2022
  • (2022)KABJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.04.01434:7(4075-4088)Online publication date: 1-Jul-2022
  • (2022)Improved l-diversityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2019.08.00634:4(1423-1430)Online publication date: 1-Apr-2022
  • (2021)Preserving the Privacy of COVID-19 Infected Patients Data Using a Divergent-Scale Supervised Learning for Publishing the Informative DataContactless Healthcare Facilitation and Commodity Delivery Management During COVID 19 Pandemic10.1007/978-981-16-5411-4_5(35-47)Online publication date: 3-Nov-2021
  • (2021)A clustering‐based anonymization approach for privacy‐preserving in the healthcare cloudConcurrency and Computation: Practice and Experience10.1002/cpe.648734:1Online publication date: 13-Jul-2021
  • (2020)G-Model: A Novel Approach to Privacy-Preserving 1:M Microdata Publication2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom49738.2020.00024(88-99)Online publication date: Aug-2020
  • (2020)A Survey on Privacy Properties for Data Publishing of Relational DataIEEE Access10.1109/ACCESS.2020.29802358(51071-51099)Online publication date: 2020
  • (2019)Towards privacy preserving unstructured big data publishingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-18123136:4(3471-3482)Online publication date: 1-Jan-2019
  • (2019)Attribute Compartmentation and Greedy UCC Discovery for High-Dimensional Data AnonymizationProceedings of the Ninth ACM Conference on Data and Application Security and Privacy10.1145/3292006.3300019(109-119)Online publication date: 13-Mar-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media