skip to main content
research-article

Transparent anonymization: Thwarting adversaries who know the algorithm

Published: 03 May 2010 Publication History

Abstract

Numerous generalization techniques have been proposed for privacy-preserving data publishing. Most existing techniques, however, implicitly assume that the adversary knows little about the anonymization algorithm adopted by the data publisher. Consequently, they cannot guard against privacy attacks that exploit various characteristics of the anonymization mechanism. This article provides a practical solution tothis problem. First, we propose an analytical model for evaluating disclosure risks, when an adversary knows everything in the anonymization process, except the sensitive values. Based on this model, we develop a privacy principle, transparent l-diversity, which ensures privacy protection against such powerful adversaries. We identify three algorithms that achieve transparent l-diversity, and verify their effectiveness and efficiency through extensive experiments with real data.

Supplementary Material

Xiao Appendix (a8-xiao-apndx.pdf)
Online appendix to transparent anonymization thwarting adversaries who know the algorithm on article 8.

References

[1]
Aggarwal, C., Pei, J., and Zhang, B. 2006. On privacy preservation against adversarial data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 510--516.
[2]
Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings of the International Conference on Very Large Databases (VLDB'05). 901--909.
[3]
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2005. Anonymizing tables. In Proceedings of the International Conference on Database Theory (ICDT'05). 246--258.
[4]
Agrawal, R., Srikant, R., and Thomas, D. 2005. Privacy preserving OLAP. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 251--262.
[5]
Bacchus, F., Grove, A. J., Halpern, J. Y., and Koller, D. 1996. From statistical knowledge bases to degrees of belief. Artif. Intell. 87, 1-2, 75--143.
[6]
Bayardo, R. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering (ICDE'05). 217--228.
[7]
Byun, J.-W., Sohn, Y., Bertino, E., and Li, N. 2006. Secure anonymization for incremental datasets. In Secure Data Management. 48--63.
[8]
Chen, B.-C., Ramakrishnan, R., and LeFevre, K. 2007. Privacy skyline: Privacy with multidimensional adversarial knowledge. In Proceedings of the International Conference on Very Large Databases (VLDB'07). 770--781.
[9]
Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd IACR Theory of Cryptography Conference. 265--284.
[10]
Evfimievski, A. V., Gehrke, J., and Srikant, R. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'03). 211--222.
[11]
Friedman, J. H., Bentley, J. L., and Finkel, R. A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3, 209--226.
[12]
Fung, B. C. M., Wang, K., and Yu, P. S. 2005. Top-down specialization for information and privacy preservation. In Proceedings of the International Conference on Data Engineering (ICDE'05). 205--216.
[13]
Garofalakis, M. N. and Kumar, A. 2005. Wavelet synopses for general error metrics. ACM Trans. Datab. Syst. 30, 4, 888--928.
[14]
Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. 2007. Fast data anonymization with low information loss. In Proceedings of the International Conference on Very Large Databases (VLDB'07). 758--769.
[15]
Iyengar, V. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 279--288.
[16]
Kendall, M. and Stuart, A. 1979. The Advanced Theory of Statistics 4th Ed. MacMillan, New York.
[17]
Kifer, D. 2009. Attacks on privacy and definetti's theorem. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 127--138.
[18]
Kifer, D. and Gehrke, J. 2006. Injecting utility into anonymized datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 217--228.
[19]
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 49--60.
[20]
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006a. Mondrian multidimensional k-anonymity. In Proceedings of the International Conference on Data Engineering (ICDE'06).
[21]
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006b. Workload-Aware anonymization. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06). 277--286.
[22]
Li, N., Li, T., and Venkatasubramanian, S. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the International Conference on Data Engineering (ICDE'07). 106--115.
[23]
Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2007. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1.
[24]
Machanavajjhala, A., Kifer, D., Abowd, J. M., Gehrke, J., and Vilhuber, L. 2008. Privacy: Theory meets practice on the map. In Proceedings of the International Conference on Data Engineering (ICDE'08). 277--286.
[25]
Martin, D. J., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. Y. 2007. Worst-Case background knowledge for privacy-preserving data publishing. In Proceedings of the International Conference on Data Engineering (ICDE'07). 126--135.
[26]
Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'04). 223--228.
[27]
Nergiz, M. E., Atzori, M., and Clifton, C. 2007. Hiding the presence of individuals from shared databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 665--676.
[28]
Park, H. and Shim, K. 2007. Approximate algorithms for k-anonymity. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 67--78.
[29]
Pei, J., Xu, J., Wang, Z., Wang, W., and Wang, K. 2007. Maintaining k-anonymity against incremental updates. In Proceedings of the International Conference on Statistical and Scientific Database Management (SSDBM'07).
[30]
Ruggles, S., Sobek, M., Alexander, T., Fitch, C. A., Goeken, R., Hall, P. K., King, M., and Ronnander, C. 2004. Integrated public use microdata series: Version 3.0 {machine-readable database}. http://ipums.org.
[31]
Samarati, P. 2001. Protecting respondents' identities in microdata release. ACM Trans. Knowl. Data Engin. 13, 6, 1010--1027.
[32]
Tao, Y., Xiao, X., Li, J., and Zhang, D. 2008. On anti-corruption privacy preserving publication. In Proceedings of the International Conference on Data Engineering (ICDE'08). 725--734.
[33]
Vitter, J. S. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 193--204.
[34]
Wang, K. and Fung, B. C. M. 2006. Anonymizing sequential releases. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 414--423.
[35]
Wang, K., Yu, P. S., and Chakraborty, S. 2004. Bottom-up generalization: a data mining solution to privacy protection. In Proceedings of the IEEE International Conference on Data Mining (ICDM'04). 249--256.
[36]
Wong, R. C.-W., Fu, A. W.-C., Wang, K., and Pei, J. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the International Conference on Very Large Databases (VLDB'07). 543--554.
[37]
Wong, R. C.-W., Li, J., Fu, A. W.-C., and Wang, K. 2006. (α, k)-Anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 754--759.
[38]
Xiao, X. and Tao, Y. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the International Conference on Very Large Databases (VLDB'06). 139--150.
[39]
Xiao, X. and Tao, Y. 2006b. Personalized privacy preservation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 229--240.
[40]
Xiao, X. and Tao, Y. 2007. m-Invariance: Towards privacy preserving re-publication of dynamic datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 689--700.
[41]
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. W.-C. 2006. Utility-Based anonymization using local recoding. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785--790.
[42]
Zhang, L., Jajodia, S., and Brodsky, A. 2007a. Information disclosure under realistic assumptions: privacy versus optimality. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'07). 573--583.
[43]
Zhang, Q., Koudas, N., Srivastava, D., and Yu, T. 2007b. Aggregate query answering on anonymized tables. In Proceedings of the International Conference on Data Engineering (ICDE'07). 116--125.

Cited By

View all
  • (2024)Preventing Inferences Through Data Dependencies on Sensitive DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333663036:10(5308-5327)Online publication date: 1-Oct-2024
  • (2022)DRAPE: optimizing private data release under adjustable privacy-utility equilibriumInformation Technology and Management10.1007/s10799-022-00378-425:2(199-217)Online publication date: 2-Oct-2022
  • (2021)Reverse-Safe Text IndexingACM Journal of Experimental Algorithmics10.1145/346169826(1-26)Online publication date: 9-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 35, Issue 2
April 2010
336 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1735886
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2010
Accepted: 01 January 2010
Revised: 01 November 2009
Received: 01 November 2008
Published in TODS Volume 35, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. l-diversity
  2. Privacy-preserving data publishing
  3. generalization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Preventing Inferences Through Data Dependencies on Sensitive DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333663036:10(5308-5327)Online publication date: 1-Oct-2024
  • (2022)DRAPE: optimizing private data release under adjustable privacy-utility equilibriumInformation Technology and Management10.1007/s10799-022-00378-425:2(199-217)Online publication date: 2-Oct-2022
  • (2021)Reverse-Safe Text IndexingACM Journal of Experimental Algorithmics10.1145/346169826(1-26)Online publication date: 9-Jul-2021
  • (2019)Big Data and Analytics in the Age of the GDPR2019 IEEE International Congress on Big Data (BigDataCongress)10.1109/BigDataCongress.2019.00015(7-16)Online publication date: Jul-2019
  • (2019)Privacy preserving serial publication of transactional dataInformation Systems10.1016/j.is.2019.01.00182(53-70)Online publication date: May-2019
  • (2019)Automatic Selection of Sensitive Attributes in PPDPInnovations in Computer Science and Engineering10.1007/978-981-13-7082-3_18(143-150)Online publication date: 19-Jun-2019
  • (2018)A relative privacy model for effective privacy preservation in transactional dataConcurrency and Computation: Practice and Experience10.1002/cpe.492331:23Online publication date: 19-Sep-2018
  • (2017)Exploiting Contextual Information in Attacking Set-Generalized TransactionsACM Transactions on Internet Technology10.1145/310616517:4(1-20)Online publication date: 18-Sep-2017
  • (2017)A Relative Privacy Model for Effective Privacy Preservation in Transactional Data2017 IEEE Trustcom/BigDataSE/ICESS10.1109/Trustcom/BigDataSE/ICESS.2017.263(394-401)Online publication date: Aug-2017
  • (2017)IntroductionDifferential Privacy and Applications10.1007/978-3-319-62004-6_1(1-6)Online publication date: 23-Aug-2017
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media