skip to main content
10.1145/3269206.3271703acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Randomized Bit Vector: Privacy-Preserving Encoding Mechanism

Published: 17 October 2018 Publication History

Abstract

Recently, many methods have been proposed to prevent privacy leakage in record linkage by encoding record pair data into another anonymous space. Nevertheless, they cannot perform well in some circumstances due to high computational complexities, low privacy guarantees or loss of data utility.
In this paper, we propose distance-aware encoding mechanisms to compare numerical values in the anonymous space. We first embed numerical values into Hamming space by a low-computational encoding algorithm with randomized bit vector. To provide rigorous privacy guarantees, we use the random response based on differential privacy to keep global indistinguishability of original data and use Laplace noises via pufferfish mechanism to provide local indistinguishability. Besides, we provide an approach for embedding and privacy-related parameters selection to improve data utility. Experiments on datasets from different data distributions and application contexts validate that our approaches can be used efficiently in privacy-preserving record linkage tasks compared with previous works and have excellent performance even under very small privacy budgets.

References

[1]
Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 308--318.
[2]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, Vol. 29, 8--13 (1997), 1157--1166.
[3]
Rui Chen, Qian Xiao, Yu Zhang, and Jianliang Xu. 2015. Differentially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 129--138.
[4]
Peter Christen. 2012. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer Science & Business Media.
[5]
Tim Churches and Peter Christen. 2004. Some methods for blindfolded record linkage. BMC medical informatics and decision making, Vol. 4, 1 (2004), 9.
[6]
Paul Cuff and Lanqing Yu. 2016. Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 43--54.
[7]
Cynthia Dwork. 2006. Differential Privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II (ICALP'06). Springer-Verlag, Berlin, Heidelberg, 1--12.
[8]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3-4 (2014), 211--407.
[9]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. ACM, 1054--1067.
[10]
Michael J. Freedman, Kobbi Nissim, and Benny Pinkas. 2004. Efficient private matching and set intersection. In International conference on the theory and applications of cryptographic techniques. Springer, 1--19.
[11]
Alexandros Karakasidis, Georgia Koloniari, and Vassilios S. Verykios. 2015. Scalable blocking for privacy preserving record linkage. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 527--536.
[12]
Dimitrios Karapiperis, Aris Gkoulalas-Divanis, and Vassilios S. Verykios. 2017. Distance-Aware Encoding of Numerical Values for Privacy-Preserving Record Linkage. In Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 135--138.
[13]
Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 193--204.
[14]
Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems (TODS), Vol. 39, 1 (2014), 3.
[15]
Frank D. McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 19--30.
[16]
Noman Mohammed, Rui Chen, Benjamin Fung, and Philip S. Yu. 2011. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--501.
[17]
Thilina Ranbaduge, Peter Christen, Dinusha Vatsalan, et al. 2014. Tree based scalable indexing for multi-party privacy-preserving record linkage. AusDM, CRPIT, Vol. 158 (2014).
[18]
Rainer Schnell, Tobias Bachteler, and Jörg Reiher. 2009. Privacy-preserving record linkage using Bloom filters. BMC medical informatics and decision making, Vol. 9, 1 (2009), 41.
[19]
Rainer Schnell and Christian Borgs. 2016. Randomized response and balanced Bloom filters for privacy preserving record linkage. In Data Mining Workshops (ICDMW), 2016 IEEE 16th International Conference on. IEEE, 218--224.
[20]
Dinusha Vatsalan and Peter Christen. 2014. Scalable privacy-preserving record linkage for multiple databases. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1795--1798.
[21]
Dinusha Vatsalan and Peter Christen. 2016. Privacy-preserving matching of similar patients. Journal of biomedical informatics, Vol. 59 (2016), 285--298.
[22]
Dinusha Vatsalan, Peter Christen, and Vassilios S. Verykios. 2013. A taxonomy of privacy-preserving record linkage techniques. Information Systems, Vol. 38, 6 (2013), 946--969.
[23]
Dinusha Vatsalan, Ziad Sehili, Peter Christen, and Erhard Rahm. 2017. Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges. In Handbook of Big Data Technologies. Springer, 851--895.
[24]
Leye Wang, Daqing Zhang, Dingqi Yang, Brian Y. Lim, and Xiaojuan Ma. 2016. Differential location privacy for sparse mobile crowdsensing. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 1257--1262.
[25]
Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally differentially private protocols for frequency estimation. In Proc. of the 26th USENIX Security Symposium. 729--745.
[26]
Tianqing Zhu, Yongli Ren, Wanlei Zhou, Jia Rong, and Ping Xiong. 2014. An effective privacy preserving algorithm for neighborhood-based collaborative filtering. Future Generation Computer Systems, Vol. 36 (2014), 142--155.

Cited By

View all
  • (2023)Weakly supervised deep metric learning on discrete metric spaces for privacy-preserved clusteringInformation Processing & Management10.1016/j.ipm.2022.10310960:1(103109)Online publication date: Jan-2023
  • (2022)Trust management and data protection for online social networksIET Communications10.1049/cmu2.1240116:12(1355-1368)Online publication date: 3-May-2022
  • (2021)RPM: Local Differential Privacy based Inventory Forecasting2021 7th International Conference on Big Data Computing and Communications (BigCom)10.1109/BigCom53800.2021.00015(106-113)Online publication date: Aug-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data anonymization
  2. differential privacy
  3. privacy-preserving record linkage
  4. pufferfish mechanism

Qualifiers

  • Research-article

Conference

CIKM '18
Sponsor:

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Weakly supervised deep metric learning on discrete metric spaces for privacy-preserved clusteringInformation Processing & Management10.1016/j.ipm.2022.10310960:1(103109)Online publication date: Jan-2023
  • (2022)Trust management and data protection for online social networksIET Communications10.1049/cmu2.1240116:12(1355-1368)Online publication date: 3-May-2022
  • (2021)RPM: Local Differential Privacy based Inventory Forecasting2021 7th International Conference on Big Data Computing and Communications (BigCom)10.1109/BigCom53800.2021.00015(106-113)Online publication date: Aug-2021
  • (2020)Privacy-Preserving User Profile Matching in Social NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291274832:8(1572-1585)Online publication date: 1-Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media