research-article

Non-homogeneous generalization in privacy preserving data publishing

Authors:

Nikos Mamoulis,

David Wai Lok CheungAuthors Info & Claims

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 747 - 758

https://doi.org/10.1145/1807167.1807248

Published: 06 June 2010 Publication History

Abstract

Most previous research on privacy-preserving data publishing, based on the k-anonymity model, has followed the simplistic approach of homogeneously giving the same generalized value in all quasi-identifiers within a partition. We observe that the anonymization error can be reduced if we follow a non-homogeneous generalization approach for groups of size larger than k. Such an approach would allow tuples within a partition to take different generalized quasi-identifier values. Anonymization following this model is not trivial, as its direct application can easily violate k-anonymity. In addition, non-homogeneous generalization allows for additional types of attack, which should be considered in the process. We provide a methodology for verifying whether a non-homogeneous generalization violates k-anonymity. Then, we propose a technique that generates a non-homogeneous generalization for a partition and show that its result satisfies k-anonymity, however by straightforwardly applying it, privacy can be compromised if the attacker knows the anonymization algorithm. Based on this, we propose a randomization method that prevents this type of attack and show that k-anonymity is not compromised by it. Nonhomogeneous generalization can be used on top of any existing partitioning approach to improve its utility. In addition, we show that a new partitioning technique tailored for non-homogeneous generalization can further improve quality. A thorough experimental evaluation demonstrates that our methodology greatly improves the utility of anonymized data in practice.

References

[1]

R. Agrawal and R. Srikant. Privacy-preserving data mining. In SIGMOD, 2000.

Digital Library

[2]

A. Asuncion and D. Newman. UCI Machine Learning Repository, 2007.

[3]

M. Barbaro and T. Zeller. A Face Is Exposed for AOL Searcher No. 4417749. The New York Times, 2006. http://www.nytimes.com/2006/08/09/technology/09aol.html.

[4]

R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005.

Digital Library

[5]

A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In KDD, 2002.

Digital Library

[6]

G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. Fast data anonymization with low information loss. In VLDB, 2007.

Digital Library

[7]

A. Gionis, A. Mazza, and T. Tassa. k-anonymization revisited. In ICDE, 2008.

Digital Library

[8]

P. Golle. Revisiting the uniqueness of simple demographics in the US population. In WPES, 2006.

Digital Library

[9]

Z. Huang, W. Du, and B. Chen. Deriving private information from randomized data. In SIGMOD, 2005.

Digital Library

[10]

T. Iwuchukwu and J. F. Naughton. k-anonymization as spatial indexing: toward scalable and incremental anonymization. In VLDB, 2007.

Digital Library

[11]

P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventing location-based identity inference in anonymous spatial queries. IEEE Trans. Knowl. Data Eng., 19(12), 2007.

Digital Library

[12]

D. Kifer. Attacks on privacy and definetti's theorem. In SIGMOD, 2009.

Digital Library

[13]

K. Lefevre, D. J. Dewitt, and R. Ramakrishnan. Incognito: efficient full-domain k-anonymity. In SIGMOD, 2005.

Digital Library

[14]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006.

Digital Library

[15]

N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.

[16]

A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.

Digital Library

[17]

D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge in privacy. In ICDE, 2007.

[18]

A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004.

Digital Library

[19]

M. F. Mokbel, C. Y. Chow, and W. G. Aref. The new casper: Query processing for location services without compromising privacy. In VLDB, 2006.

Digital Library

[20]

P. Samarati. Protecting respondents' identities in microdata release. IEEE Trans. Knowl. Data Eng., 13(6):1010--1027, 2001.

Digital Library

[21]

L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 2002.

Digital Library

[22]

Y. Tao, X. Xiao, J. Li, and D. Zhang. On anti-corruption privacy preserving publication. In ICDE, 2008.

Digital Library

[23]

R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146--160, 1972.

Digital Library

[24]

R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007.

Digital Library

[25]

R. C.-W. Wong, J. Li, A. W.-C. Fu, and K. Wang. α, k-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In KDD, 2006.

Digital Library

[26]

X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, 2006.

Digital Library

[27]

X. Xiao and Y. Tao. m-invariance: Towards privacy preserving re-publication of dynamic datasets. In SIGMOD, 2007.

Digital Library

[28]

J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu. Utility-based anonymization using local recoding. In SIGKDD, 2006.

Digital Library

Cited By

Hildebrant RFahrenkrog-Petersen SWeidlich MRen S(2023)PMDG: Privacy for Multi-perspective Process Mining Through Data GeneralizationAdvanced Information Systems Engineering10.1007/978-3-031-34560-9_30(506-521)Online publication date: 8-Jun-2023
https://doi.org/10.1007/978-3-031-34560-9_30
Li YYuan SYuan YChen CYu J(2022)Anonymization of Quasi-Sensitive Attribute Sets in Aggregated DatasetSecurity and Communication Networks10.1155/2022/97218172022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9721817
Kacha LZitouni ADjoudi M(2022)KABJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.04.01434:7(4075-4088)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2021.04.014
Show More Cited By

Index Terms

Non-homogeneous generalization in privacy preserving data publishing
1. Security and privacy
  1. Database and storage security
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Theory of database privacy and security

Recommendations

A Survey on Privacy Preserving Dynamic Data Publishing

Many organizations, especially small and medium business SMB enterprises require the collection and sharing of data containing personal information. The privacy of this data must be preserved before outsourcing to the commercial public. Privacy ...
Preservation of proximity privacy in publishing numerical sensitive data
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

We identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in ...
Ensuring location diversity in privacy-preserving spatio-temporal data publishing

The rise of mobile technologies in the last decade has led to vast amounts of location information generated by individuals. From the knowledge discovery point of view, these data are quite valuable, but the inherent personal information in the data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

June 2010

1286 pages

ISBN:9781450300322

DOI:10.1145/1807167

General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Results Reproduced / v1.1

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '10

Sponsor:

SIGMOD

SIGMOD/PODS '10: International Conference on Management of Data

June 6 - 10, 2010

Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
816
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hildebrant RFahrenkrog-Petersen SWeidlich MRen S(2023)PMDG: Privacy for Multi-perspective Process Mining Through Data GeneralizationAdvanced Information Systems Engineering10.1007/978-3-031-34560-9_30(506-521)Online publication date: 8-Jun-2023
https://doi.org/10.1007/978-3-031-34560-9_30
Li YYuan SYuan YChen CYu J(2022)Anonymization of Quasi-Sensitive Attribute Sets in Aggregated DatasetSecurity and Communication Networks10.1155/2022/97218172022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9721817
Kacha LZitouni ADjoudi M(2022)KABJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.04.01434:7(4075-4088)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2021.04.014
Mehta BRao U(2022)Improved l-diversityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2019.08.00634:4(1423-1430)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2019.08.006
Riyazuddin MHajera Begum SJaffar Sadiq M(2021)Preserving the Privacy of COVID-19 Infected Patients Data Using a Divergent-Scale Supervised Learning for Publishing the Informative DataContactless Healthcare Facilitation and Commodity Delivery Management During COVID 19 Pandemic10.1007/978-981-16-5411-4_5(35-47)Online publication date: 3-Nov-2021
https://doi.org/10.1007/978-981-16-5411-4_5
Abbasi AMohammadi B(2021)A clustering‐based anonymization approach for privacy‐preserving in the healthcare cloudConcurrency and Computation: Practice and Experience10.1002/cpe.648734:1Online publication date: 13-Jul-2021
https://doi.org/10.1002/cpe.6487
Albulayhi KTosic PSheldon F(2020)G-Model: A Novel Approach to Privacy-Preserving 1:M Microdata Publication2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom49738.2020.00024(88-99)Online publication date: Aug-2020
https://doi.org/10.1109/CSCloud-EdgeCom49738.2020.00024
Zigomitros ACasino FSolanas APatsakis C(2020)A Survey on Privacy Properties for Data Publishing of Relational DataIEEE Access10.1109/ACCESS.2020.29802358(51071-51099)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2980235
Mehta BRao UGupta RConti M(2019)Towards privacy preserving unstructured big data publishingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-18123136:4(3471-3482)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.3233/JIFS-181231
Podlesny NKayem AMeinel CAhn GThuraisingham BKantarcioglu MKrishnan R(2019)Attribute Compartmentation and Greedy UCC Discovery for High-Dimensional Data AnonymizationProceedings of the Ninth ACM Conference on Data and Application Security and Privacy10.1145/3292006.3300019(109-119)Online publication date: 13-Mar-2019
https://dl.acm.org/doi/10.1145/3292006.3300019
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten