skip to main content
10.1145/3173574.3173900acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Honorable Mention

Balancing Privacy and Information Disclosure in Interactive Record Linkage with Visual Masking

Published:21 April 2018Publication History

ABSTRACT

Effective use of data involving personal or sensitive information often requires different people to have access to personal information, which significantly reduces the personal privacy of those whose data is stored and increases risk of identity theft, data leaks, or social engineering attacks. Our research studies the tradeoffs between privacy and utility of personal information for human decision making. Using a record-linkage scenario, this paper presents a controlled study of how varying degrees of information availability influences the ability to effectively use personal information. We compared the quality of human decision-making using a visual interface that controls the amount of personal information available using visual markup to highlight data discrepancies. With this interface, study participants who viewed only 30% of data content had decision quality similar to those who had full 100% access. The results demonstrate that it is possible to greatly limit the amount of personal information available to human decision makers without negatively affecting utility or human effectiveness. However, the findings also show there is a limit to how much data can be hidden before negatively influencing the quality of judgment in decisions involving person-level data. Despite the reduced accuracy with extreme data hiding, the study demonstrates that with proper interface designs, many correct decisions can be made with even legally de-identified data that is fully masked (74.5% accuracy with fully-masked data compared to 84.1% with full access). Thus, when legal requirements only allow for de-identified data access, use of well-designed interface can significantly improve data utility.

Skip Supplemental Material Section

Supplemental Material

pn3007-file3.mp4

mp4

3.2 MB

References

  1. Martha Bailey, Connor Cole, Morgan Henderson, and Catherine Massey. 2017. How Well Do Automated Linking Methods Perform in Historical Samples? Evidence from New Ground Truth. Technical Report.Google ScholarGoogle Scholar
  2. Francis P Boscoe, Deborah Schrag, Kun Chen, Patrick J Roohan, and Maria J Schymura. 2011. Building capacity to assess cancer care in the Medicaid population in New York State. Health services research 46, 3 (2011), 805--820.Google ScholarGoogle Scholar
  3. Nadia Boukhelifa, Marc-Emmanuel Perrin, Samuel Huron, and James Eagan. 2017. How Data Workers Cope with Uncertainty: A Task Characterisation Study. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3645--3656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cathy J Bradley, Charles W Given, Zhehui Luo, Caralee Roberts, Glenn Copeland, and Beth A Virnig. 2007. Medicaid, Medicare, and the Michigan Tumor Registry: a linkage strategy. Medical Decision Making 27, 4 (2007), 352--363.Google ScholarGoogle ScholarCross RefCross Ref
  5. Janet M Bronstein, Charles T Lomatsch, David Fletcher, Terri Wooten, Tsai Mei Lin, Richard Nugent, and Curtis L Lowery. 2009. Issues and biases in matching medicaid pregnancy episodes to vital records data: the Arkansas experience. Maternal and child health journal 13, 2 (2009), 250--259.Google ScholarGoogle Scholar
  6. Kelly Caine and Rima Hanania. 2012. Patients want granular privacy control over health information in electronic medical records. Journal of the American Medical Informatics Association 20, 1 (2012), 7--15.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kelly E Caine, Marita O'Brien, Sung Park, Wendy A Rogers, Arthur D Fisk, Koert Van Ittersum, Muge Capar, and Leonard J Parsons. 2006. Understanding acceptance of high technology products: 50 years of research. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 50. SAGE Publications Sage CA: Los Angeles, CA, 2148--2152.Google ScholarGoogle ScholarCross RefCross Ref
  8. Daphne Chang, Erin L Krupka, Eytan Adar, and Alessandro Acquisti. 2016. Engineering Information Disclosure: Norm Shaping Designs. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 587--597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jia-Kai Chou, Yang Wang, and Kwan-Liu Ma. 2016. Privacy preserving event sequence data visualization using a Sankey diagram-like representation. In SIGGRAPH ASIA Symposium on Visualization. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Serdar C ¸ iftc ¸i, Pavel Korshunov, Ahmet Oguz Akyuz, and Touradj Ebrahimi. 2015. Using false colors to protect visual privacy of sensitive content. In Human Vision And Electronic Imaging Xx, Vol. 9394. Spie-Int Soc Optical Engineering, 93941L.Google ScholarGoogle Scholar
  11. Federal Trade Commission and others. 2008. Innovations in health care delivery. (2008).Google ScholarGoogle Scholar
  12. Gordon Darroch. 2002. Semi-Automated Record Linkage with Surname Samples: a Regional Study of Case LawLinkage, Ontario 1861--1871. History and Computing 14, 1--2 (2002), 153--183.Google ScholarGoogle ScholarCross RefCross Ref
  13. Aritra Dasgupta, Min Chen, and Robert Kosara. 2013. Measuring Privacy and Utility in Privacy-Preserving Visualization. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 35--47.Google ScholarGoogle Scholar
  14. Aritra Dasgupta and Robert Kosara. 2011. Adaptive privacy-preserving visualization using parallel coordinates. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2241--2248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aritra Dasgupta, Eamonn Maguire, Alfie Abdul-Rahman, and Min Chen. 2014. Opportunities and challenges for privacy-preserving visualization of electronic health record data. In Proc. of IEEE VIS 2014 Workshop on Visualization of Electronic Health Records.Google ScholarGoogle Scholar
  16. Fan Du, Catherine Plaisant, Neil Spring, and Ben Shneiderman. 2017. Finding similar people to guide life choices: Challenge, design, and evaluation. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 5498--5544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Stephen E Fienberg. 2005. Confidentiality and disclosure limitation. Encyclopedia of Social Measurement 1 (2005), 463--69.Google ScholarGoogle ScholarCross RefCross Ref
  18. Daniel J Gilman and James C Cooper. 2009. There is a Time to Keep Silent and a Time to Speak, The Hard Part is Knowing Which is Which: Striking the Balance Between Privacy Protection and the Flow of Health Care Information. (2009).Google ScholarGoogle Scholar
  19. Rob Hall and Stephen E Fienberg. 2010. Privacy-Preserving Record Linkage.. In Privacy in statistical databases, Vol. 6344. Springer, 269--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, and Paolo Buono. 2011. Research directions in data wrangling: Visualizations and transformations for usable and credible data. Information Visualization 10, 4 (2011), 271--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 547--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hyunmo Kang, Lise Getoor, Ben Shneiderman, Mustafa Bilgic, and Louis Licamele. 2008. Interactive entity resolution in relational data: A visual analytic tool and its evaluation. IEEE transactions on visualization and computer graphics 14, 5 (2008), 999--1014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hanna K¨ opcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment 3, 1--2 (2010), 484--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hye-Chung Kum, Stanley Ahalt, and Darshana Pathak. 2013. Privacy-preserving data integration using decoupled data. In Security and Privacy in Social Networks. Springer, 225--253.Google ScholarGoogle Scholar
  25. Hye-Chung Kum, Ashok Krishnamurthy, Ashwin Machanavajjhala, and Stanley C Ahalt. 2014a. Social genome: Putting big data to work for population informatics. Computer 47, 1 (2014), 56--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hye-Chung Kum, Ashok Krishnamurthy, Ashwin Machanavajjhala, Michael K Reiter, and Stanley Ahalt. 2014b. Privacy preserving interactive record linkage (PPIRL). Journal of the American Medical Informatics Association 21, 2 (2014), 212--220.Google ScholarGoogle ScholarCross RefCross Ref
  27. Pin Luarn and Hsin-Hui Lin. 2005. Toward an understanding of the behavioral intention to use mobile banking. Computers in human behavior 21, 6 (2005), 873--891.Google ScholarGoogle Scholar
  28. National Cancer Institute NIH. 2017. SEER Research Data Use Agreement -- Surveillance, Epidemiology and End Results Program. (2017).Google ScholarGoogle Scholar
  29. E.C. O'Brien, A.M. Rodriguez, H.-C. Kum, L. Schanberg, S.M. O'Brien, and S. Setoguchi. 2017. Patient perspectives on the linkage of health data for clinical research: insights from a survey in the United States. Presentation abstract at the 2017 World Congress of Epidemiology. (2017).Google ScholarGoogle Scholar
  30. Vaishali Patel, Penelope Hughes, Wesley Barker, and Lisa Moon. 2016. Trends in Individuals Perceptions regarding Privacy and Security of Medical Records and Exchange of Health Information: 2012--2014. Technical Report. ONC Data Brief, no.33. Office of the National Coordinator for Health Information Technology: Washington DC.Google ScholarGoogle Scholar
  31. George G Robertson, Mary P Czerwinski, and John E Churchill. 2005. Visualization of mappings between schemas. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 431--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hans-J¨ org Schulz, Thomas Nocke, Magnus Heitzler, and Heidrun Schumann. 2017. A systematic view on data descriptors for the visual analysis of tabular data. Information Visualization 16, 3 (2017), 232--256.Google ScholarGoogle ScholarCross RefCross Ref
  33. Qiaomu Shen, Tongshuang Wu, Haiyan Yang, Yanhong Wu, Huamin Qu, and Weiwei Cui. 2017. NameClarifier: a visual analytics system for author name disambiguation. IEEE transactions on visualization and computer graphics 23, 1 (2017), 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dinusha Vatsalan, Peter Christen, and Vassilios S Verykios. 2013. A taxonomy of privacy-preserving record linkage techniques. Information Systems 38, 6 (2013), 946--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Joan L Warren, Carrie N Klabunde, Deborah Schrag, Peter B Bach, and Gerald F Riley. 2002. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Medical care 40, 8 (2002), IV--3.Google ScholarGoogle Scholar
  36. Daniel J Weitzner, Harold Abelson, Tim Berners-Lee, Joan Feigenbaum, James Hendler, and Gerald Jay Sussman. 2008. Information accountability. Commun. ACM 51, 6 (2008), 82--87. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Balancing Privacy and Information Disclosure in Interactive Record Linkage with Visual Masking

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
        April 2018
        8489 pages
        ISBN:9781450356206
        DOI:10.1145/3173574

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 April 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CHI '18 Paper Acceptance Rate666of2,590submissions,26%Overall Acceptance Rate6,199of26,314submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader