skip to main content
10.1145/1376616.1376665acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Privacy-MaxEnt: integrating background knowledge in privacy quantification

Published: 09 June 2008 Publication History

Abstract

Privacy-Preserving Data Publishing (PPDP) deals with the publication of microdata while preserving people' private information in the data. To measure how much private information can be preserved, privacy metrics is needed. An essential element for privacy metrics is the measure of how much adversaries can know about an individual's sensitive attributes (SA) if they know the individual's quasi-identifiers (QI), i.e., we need to measure P(SA|QI). Such a measure is hard to derive when adversaries' background knowledge has to be considered.
We propose a systematic approach, Privacy-MaxEnt, to integrate background knowledge in privacy quantification. Our approach is based on the maximum entropy principle. We treat all the conditional probabilities P(SA|QI) as unknown variables; we treat the background knowledge as the constraints of these variables; in addition, we also formulate constraints from the published data. Our goal becomes finding a solution to those variables (the probabilities) that satisfy all these constraints. Although many solutions may exist, the most unbiased estimate of P(SA|QI) is the one that achieves the maximum entropy.

References

[1]
D. Agrawal and C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proccedings of the 20th ACM Symposium on Principles of Database Systems, Santa Barbara, California, USA, May 21-23 2001.
[2]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data, pages 207--216, Washington D.C., May 1993.
[3]
R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD on Management of Data, pages 439--450, Dallas, TX USA, May 15 - 18 2000.
[4]
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), Tokyo, Japan, April 2005.
[5]
A. L. Berger, S. D. Pietra, and V. J. D. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 1996.
[6]
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.
[7]
B.-C. Chen, K. LeFevre, and R. Ramakrishnan. Privacy skyline: Privacy with multidimensional adversarial knowledge. In Proceedings of VLDB, Vienna, Austria, September 23-28 2007.
[8]
J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematicsl Statistics, (32):1470--1480, 1872.
[9]
A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 2002.
[10]
B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), Tokyo, Japan, April 2005.
[11]
J. Kazama and J. Tsujii. Maximum entropy models with inequality constraints: A case study on text categorization. Machine Learning (Special Issue on Learning in Speech and Language Technologies), 60(1-3):159--194, September 2005.
[12]
S. Kullback and R. A. Leibler. On information and sufficiency. pages 79--86, 1951.
[13]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito:efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD, June 12 - 16 2005.
[14]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), Atlanta, Georgia, USA, April 2006.
[15]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the International Conference on Data Engineering (ICDE), Istanbul, Turkey, April 17-20 2007.
[16]
D. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming B, 45:503--528, 1989.
[17]
A. Machanavajjhala, J. E. Gehrke, D. Kifer, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), Atlanta, Georgia, USA, April 2006.
[18]
R. Malouf. A comparison of algorithms for maximum entropy parameter estimation, 2002.
[19]
D. J. Martin, D. Kifer, A. Machanavajjhala, J. E. Gehrke, and J. Halpern. Worst case background knowledge. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE), Istanbul, Turkey, April 15-20 2007.
[20]
S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing features of random fields. Transactions Pattern Analysis and Machine Intelligence, 19(4), April 1997.
[21]
S. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
[22]
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, 1998.
[23]
A. Savasere, E. Omiecinski, and S. B. Navathe. Mining for strong negative associations in a large database of customer transactions. In In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 494--502, 1998.
[24]
R. C.-W. Wong, J. Li, A. W.-C. Fu, and K. Wang. (?, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In Proceedings of ACM KDD, Philadelphia, Pennsylvania, USA, August 20-23 2006.
[25]
X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd Very Large Data Bases conference (VLDB), pages 139--150, Seoul, Korea, September 12-15 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data publishing
  2. privacy quantification

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 699 of 3,470 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Adversarial/External Knowledge (Privacy in the Presence of)Encyclopedia of Cryptography, Security and Privacy10.1007/978-3-030-71522-9_902(42-46)Online publication date: 8-Jan-2025
  • (2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
  • (2024)ReferencesMobile Edge Computing and Communications10.1002/9781119611646.refs(209-243)Online publication date: 27-Dec-2024
  • (2023)Privacy-Preserving Data Publishing Models, Challenges, Applications, and IssuesExpert Clouds and Applications10.1007/978-981-99-1745-7_61(845-862)Online publication date: 2-Jul-2023
  • (2022)Semi-structured Patient Data in Electronic Health RecordData-Driven Approach for Bio-medical and Healthcare10.1007/978-981-19-5184-8_12(219-233)Online publication date: 28-Oct-2022
  • (2020)Privacy‐Preserving Data Visualization: Reflections on the State of the Art and Research OpportunitiesComputer Graphics Forum10.1111/cgf.1403239:3(675-692)Online publication date: 18-Jul-2020
  • (2020)pQUANT: A User-Centered Privacy Risk Analysis FrameworkRisks and Security of Internet and Systems10.1007/978-3-030-41568-6_1(3-16)Online publication date: 28-Feb-2020
  • (2019)Guess Me If You Can: A Visual Uncertainty Model for Transparent Evaluation of Disclosure Risks in Privacy-Preserving Data Visualization2019 IEEE Symposium on Visualization for Cyber Security (VizSec)10.1109/VizSec48167.2019.9161608(1-10)Online publication date: Oct-2019
  • (2018)Method for measuring the privacy level of pre‐published datasetIET Information Security10.1049/iet-ifs.2017.034112:5(425-430)Online publication date: Sep-2018
  • (2017)Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing DataSensors10.3390/s1705105917:5(1059)Online publication date: 8-May-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media