research-article

Privacy-MaxEnt: integrating background knowledge in privacy quantification

Authors:

Zutao ZhuAuthors Info & Claims

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 459 - 472

https://doi.org/10.1145/1376616.1376665

Published: 09 June 2008 Publication History

Abstract

Privacy-Preserving Data Publishing (PPDP) deals with the publication of microdata while preserving people' private information in the data. To measure how much private information can be preserved, privacy metrics is needed. An essential element for privacy metrics is the measure of how much adversaries can know about an individual's sensitive attributes (SA) if they know the individual's quasi-identifiers (QI), i.e., we need to measure P(SA|QI). Such a measure is hard to derive when adversaries' background knowledge has to be considered.

We propose a systematic approach, Privacy-MaxEnt, to integrate background knowledge in privacy quantification. Our approach is based on the maximum entropy principle. We treat all the conditional probabilities P(SA|QI) as unknown variables; we treat the background knowledge as the constraints of these variables; in addition, we also formulate constraints from the published data. Our goal becomes finding a solution to those variables (the probabilities) that satisfy all these constraints. Although many solutions may exist, the most unbiased estimate of P(SA|QI) is the one that achieves the maximum entropy.

References

[1]

D. Agrawal and C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proccedings of the 20th ACM Symposium on Principles of Database Systems, Santa Barbara, California, USA, May 21-23 2001.

Digital Library

[2]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data, pages 207--216, Washington D.C., May 1993.

Digital Library

[3]

R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD on Management of Data, pages 439--450, Dallas, TX USA, May 15 - 18 2000.

Digital Library

[4]

R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), Tokyo, Japan, April 2005.

Digital Library

[5]

A. L. Berger, S. D. Pietra, and V. J. D. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 1996.

Digital Library

[6]

D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.

[7]

B.-C. Chen, K. LeFevre, and R. Ramakrishnan. Privacy skyline: Privacy with multidimensional adversarial knowledge. In Proceedings of VLDB, Vienna, Austria, September 23-28 2007.

Digital Library

[8]

J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematicsl Statistics, (32):1470--1480, 1872.

[9]

A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 2002.

Digital Library

[10]

B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), Tokyo, Japan, April 2005.

Digital Library

[11]

J. Kazama and J. Tsujii. Maximum entropy models with inequality constraints: A case study on text categorization. Machine Learning (Special Issue on Learning in Speech and Language Technologies), 60(1-3):159--194, September 2005.

Digital Library

[12]

S. Kullback and R. A. Leibler. On information and sufficiency. pages 79--86, 1951.

[13]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito:efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD, June 12 - 16 2005.

Digital Library

[14]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), Atlanta, Georgia, USA, April 2006.

Digital Library

[15]

N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the International Conference on Data Engineering (ICDE), Istanbul, Turkey, April 17-20 2007.

[16]

D. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming B, 45:503--528, 1989.

Digital Library

[17]

A. Machanavajjhala, J. E. Gehrke, D. Kifer, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), Atlanta, Georgia, USA, April 2006.

Digital Library

[18]

R. Malouf. A comparison of algorithms for maximum entropy parameter estimation, 2002.

[19]

D. J. Martin, D. Kifer, A. Machanavajjhala, J. E. Gehrke, and J. Halpern. Worst case background knowledge. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE), Istanbul, Turkey, April 15-20 2007.

[20]

S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing features of random fields. Transactions Pattern Analysis and Machine Intelligence, 19(4), April 1997.

Digital Library

[21]

S. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.

Digital Library

[22]

P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, 1998.

[23]

A. Savasere, E. Omiecinski, and S. B. Navathe. Mining for strong negative associations in a large database of customer transactions. In In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 494--502, 1998.

Digital Library

[24]

R. C.-W. Wong, J. Li, A. W.-C. Fu, and K. Wang. (?, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In Proceedings of ACM KDD, Philadelphia, Pennsylvania, USA, August 20-23 2006.

Digital Library

[25]

X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd Very Large Data Bases conference (VLDB), pages 139--150, Seoul, Korea, September 12-15 2006.

Digital Library

Cited By

LeFevre KChen B(2025)Adversarial/External Knowledge (Privacy in the Presence of)Encyclopedia of Cryptography, Security and Privacy10.1007/978-3-030-71522-9_902(42-46)Online publication date: 8-Jan-2025
https://doi.org/10.1007/978-3-030-71522-9_902
Olatunji IRauch JKatzensteiner MKhosla M(2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
https://doi.org/10.1089/big.2021.0169
Ding ADe Alwis CLiyanage M(2024)ReferencesMobile Edge Computing and Communications10.1002/9781119611646.refs(209-243)Online publication date: 27-Dec-2024
https://doi.org/10.1002/9781119611646.refs
Show More Cited By

Index Terms

Privacy-MaxEnt: integrating background knowledge in privacy quantification
1. Security and privacy
  1. Database and storage security
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Theory of database privacy and security

Recommendations

An effective value swapping method for privacy preserving data publishing

Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in data ...
Can the Utility of Anonymized Data be Used for Privacy Breaches?

Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article ...
Generalized bucketization scheme for flexible privacy settings

Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

June 2008

1396 pages

ISBN:9781605581026

DOI:10.1145/1376616

General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '08

Sponsor:

SIGMOD/PODS '08: SIGMOD/PODS '08 - International Conference on Management of Data

June 9 - 12, 2008

Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 699 of 3,470 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
686
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

LeFevre KChen B(2025)Adversarial/External Knowledge (Privacy in the Presence of)Encyclopedia of Cryptography, Security and Privacy10.1007/978-3-030-71522-9_902(42-46)Online publication date: 8-Jan-2025
https://doi.org/10.1007/978-3-030-71522-9_902
Olatunji IRauch JKatzensteiner MKhosla M(2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
https://doi.org/10.1089/big.2021.0169
Ding ADe Alwis CLiyanage M(2024)ReferencesMobile Edge Computing and Communications10.1002/9781119611646.refs(209-243)Online publication date: 27-Dec-2024
https://doi.org/10.1002/9781119611646.refs
Jayapradha JPrakash M(2023)Privacy-Preserving Data Publishing Models, Challenges, Applications, and IssuesExpert Clouds and Applications10.1007/978-981-99-1745-7_61(845-862)Online publication date: 2-Jul-2023
https://doi.org/10.1007/978-981-99-1745-7_61
Ganguly RChakraborty S(2022)Semi-structured Patient Data in Electronic Health RecordData-Driven Approach for Bio-medical and Healthcare10.1007/978-981-19-5184-8_12(219-233)Online publication date: 28-Oct-2022
https://doi.org/10.1007/978-981-19-5184-8_12
Bhattacharjee KChen MDasgupta A(2020)Privacy‐Preserving Data Visualization: Reflections on the State of the Art and Research OpportunitiesComputer Graphics Forum10.1111/cgf.1403239:3(675-692)Online publication date: 18-Jul-2020
https://doi.org/10.1111/cgf.14032
Tesfay WNastouli DStamatiou YSerna J(2020)pQUANT: A User-Centered Privacy Risk Analysis FrameworkRisks and Security of Internet and Systems10.1007/978-3-030-41568-6_1(3-16)Online publication date: 28-Feb-2020
https://doi.org/10.1007/978-3-030-41568-6_1
Dasgupta AKosara RChen M(2019)Guess Me If You Can: A Visual Uncertainty Model for Transparent Evaluation of Disclosure Risks in Privacy-Preserving Data Visualization2019 IEEE Symposium on Visualization for Cyber Security (VizSec)10.1109/VizSec48167.2019.9161608(1-10)Online publication date: Oct-2019
https://doi.org/10.1109/VizSec48167.2019.9161608
Wang DGuo BShen Y(2018)Method for measuring the privacy level of pre‐published datasetIET Information Security10.1049/iet-ifs.2017.034112:5(425-430)Online publication date: Sep-2018
https://doi.org/10.1049/iet-ifs.2017.0341
Majeed AUllah FLee S(2017)Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing DataSensors10.3390/s1705105917:5(1059)Online publication date: 8-May-2017
https://doi.org/10.3390/s17051059
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten