skip to main content
10.1145/3287560.3287577acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing

Published: 29 January 2019 Publication History

Abstract

Data too sensitive to be "open" for analysis and re-purposing typically remains "closed" as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legal-technical approach provided by a third-party public-private data trust designed to balance these competing interests. Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage, and c) removal of biases that could reinforce discriminatory policies, all while maintaining fidelity to the original data. We find that using synthetic data in conjunction with strong legal protections over raw data strikes a balance between transparency, proprietorship, privacy, and research objectives. This legal-technical framework can form the basis for data trusts in a variety of contexts.

References

[1]
Promoting public bike-sharing: A lesson from the unsuccessful Pronto system - ScienceDirect.
[2]
Progressive animal welfare society, respondent, v. the university of washington, 125 wn.2d 243, paws v. uw. http://courts.mrsc.org/supreme/125wn2d/125wn2d0243.htm, 1994. (Accessed on 08/23/2018).
[3]
Amazon doesn't consider the race of its customers. should it? Bloomberg, 2016.
[4]
Lyft, inc. v. city of Seattle. http://www.courts.wa.gov/opinions/pdf/940266.pdf, 2018. (Accessed on 08/23/2018).
[5]
N. Anderson, A. Abend, A. Mandel, E. Geraghty, D. Gabriel, R. Wynden, M. Kamerick, K. Anderson, J. Rainwater, and P. Tarczy-Hornoch. Implementation of a de-identified federated data network for population-based cohort discovery. J. Am. Med. Inform. Assoc., 26, 2011.
[6]
J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias: Risk assessments in criminal sentencing. ProPublica, May 23, 2016.
[7]
J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias. ProPublica, May, 23, 2016.
[8]
S. Barocas and H. Nissenbaum. Big data's end run around procedural privacy protections. Communications of the ACM, 57(11):31--33, 2014.
[9]
S. Barocas and A. Selbst. Big data's disparate impact. California Law Review, 2016.
[10]
V. Bindschaedler, R. Shokri, and C. A. Gunter. Plausible deniability for privacy-preserving data synthesis. Proceedings of the VLDB Endowment, 10(5):481--492, 2017.
[11]
A. Blok, C. Marquet, A. Courmont, K. Minor, M. Young, R. Hoyng, and C. Nold. Data Platforms and Cities. Tecnoscienza. Italian Journal of Science & Technology Studies, 2017.
[12]
R. Brauneis and E. P. Goodman. Algorithmic transparency for the smart city. Yale Journal of Law & Technology, forthcoming.
[13]
A. M. Brock, J. E. Froehlich, J. Guerreiro, B. Tannert, A. Caspi, J. Schöning, and S. Landau. Sig: Making maps accessible and putting accessibility in maps. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, page SIG03. ACM, 2018.
[14]
L. Caggiani, R. Camporeale, M. Ottomanelli, and W. Y. Szeto. A modeling framework for the dynamic management of free-floating bike-sharing systems. Transportation Research Part C: Emerging Technologies, 87:159--182, Feb. 2018.
[15]
J. M. Corbin and A. Strauss. Unending work and care: Managing chronic illness at home. Jossey-Bass, 1988.
[16]
A. Datta, S. Sen, and Y. Zick. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE SP, pages 598--617, 2016.
[17]
A. Datta, M. C. Tschantz, and A. Datta. Automated experiments on ad privacy settings. PoPETs, 2015(1):92--112, 2015.
[18]
Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific reports, 3:1376, 2013.
[19]
J. Denis and S. Goëta. Exploration, extraction and 'rawification'. the shaping of transparency in the back rooms of open data. 2014.
[20]
J. Dressel and H. Farid. The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1):eaao5580, 2018.
[21]
C. Dwork. Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II, ICALP'06, pages 1--12, Berlin, Heidelberg, 2006. Springer-Verlag.
[22]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876, pages 265--284. Springer, 2006.
[23]
D. Ensign, S. A. Friedler, S. Neville, C. Scheidegger, and S. Venkatasubramanian. Runaway feedback loops in predictive policing. arXiv preprint arXiv:1706.09847, 2017.
[24]
B. Ferris, K. Watkins, and A. Borning. Onebusaway: results from providing real-time arrival information for public transit. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1807--1816. ACM, 2010.
[25]
E. Fishman. Bikeshare: A Review of Recent Literature. Transport Reviews, 36(1):92--113, Jan. 2016.
[26]
S. Galhotra, Y. Brun, and A. Meliou. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, pages 498--510, 2017.
[27]
Y. Ge, C. R. Knittel, D. MacKenzie, and S. Zoepf. Racial and gender discrimination in transportation network companies. Technical report, National Bureau of Economic Research, 2016.
[28]
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment, 3(1-2):1021--1032, 2010.
[29]
R. Hughes and D. MacKenzie. Transportation network company wait times in greater Seattle, and relationship to socioeconomic indicators. Journal of Transport Geography, 56:36--44, 2016.
[30]
S. Jain, D. Moritz, D. Halperin, B. Howe, and E. Lazowska. Sqlshare: Results from a multi-year sql-as-a-service experiment. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 281--293. ACM, 2016.
[31]
D. Jorge and G. Correia. Carsharing systems demand estimation and defined operations: a literature review. European Journal of Transport and Infrastructure Research, 13(3):201--220, 2013.
[32]
M. Kent and A. Karner. Prioritizing low-stress and equitable bicycle networks using neighborhood-based accessibility measures. International Journal of Sustainable Transportation, 0(0):1--11, Mar. 2018.
[33]
D. Kifer and A. Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193--204. ACM, 2011.
[34]
N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656--666, 2017.
[35]
C. Kontokosta, B. Hong, and K. Korsberg. Equity in 311 Reporting: Understanding Socio-Spatial Differentials in the Propensity to Complain. arXiv:1710.02452 {cs}, Oct. 2017. arXiv: 1710.02452.
[36]
M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4069--4079, 2017.
[37]
J. Lee and C. Clifton. How much is enough? choosing ϵ for differential privacy. In International Conference on Information Security, pages 325--340. Springer, 2011.
[38]
H. Li, L. Xiong, L. Zhang, and X. Jiang. Dpsynthesizer: differentially private data synthesizer for privacy preserving data sharing. Proceedings of the VLDB Endowment, 7(13):1677--1680, 2014.
[39]
Y. Li, W. Y. Szeto, J. Long, and C. S. Shui. A multiple type bike repositioning problem. Transportation Research Part B: Methodological, 90:263--278, Aug. 2016.
[40]
X. Liang, S. Shetty, D. Tosh, C. Kamhoua, K. Kwiat, and L. Njilla. Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 468--477. IEEE Press, 2017.
[41]
K. Lum and W. Isaac. To predict and serve? Significance, 13(5):14--19, 2016.
[42]
S. Ma, Y. Zheng, and O. Wolfson. Real-time city-scale taxi ridesharing. 27:1782--1795,07 2015.
[43]
X. Meng, H. Li, and J. Cui. Different strategies for differentially private histogram publication. Journal of Communications and Information Networks, 2(3):68--77, 2017.
[44]
J. Metcalf and K. Crawford. Where are human subjects in big data research? the emerging ethics divide. Big Data & Society, 3(1):2053951716650211, 2016.
[45]
MetroLab Network. First, do no harm: Ethical guidelines for applying predictive tools within human services. http://www.alleghenycountyanalytics.us/, 2018. {forthcoming}.
[46]
G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita. Self-exciting point process modeling of crime. Journal of the American Statistical Association, 106(493):100--108, 2011.
[47]
R. Nabi and I. Shpitser. Fair inference on outcomes. In Proceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, volume 2018, page 1931. NTH Public Access, 2018.
[48]
A. Noriega-Campero, A. Rutherford, O. Lederman, Y. A. de Montjoye, and A. Pentland. Mapping the privacy-utility tradeoff in mobile phone data for development. arXiv preprint arXiv:1808.00160, 2018.
[49]
A. Pal and Y. Zhang. Free-floating bike sharing: Solving real-life large-scale static rebalancing problems. Transportation Research Part C: Emerging Technologies, 80:92--116, July 2017.
[50]
H. d. S. Pinto, F. Bernardini, and J. Viterbo. How cities categorize datasets in their open data portals: an exploratory analysis. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, page 25. ACM, 2018.
[51]
L. Rayle, D. Dai, N. Chan, R. Cervero, and S. Shaheen. Just a better taxi? a survey-based comparison of taxis, transit, and ridesourcing services in san francisco. Transport Policy, 45:168--178, 2016.
[52]
H. W. Rittel and M. M. Webber. Wicked problems. Man-made Futures, 26(1):272--280, 1974.
[53]
L. Rodriguez, B. Salimi, H. Ping, J. Stoyanovich, and B. Howe. MobilityMirror: Bias-Adjusted Transportation Datasets. arXiv:1808.07151 {cs}, Aug. 2018. arXiv: 1808.07151.
[54]
E. Sedenberg and A. L. Hoffmann. Recovering the history of informed consent for data science and internet industry research ethics. arXiv preprint arXiv:1609.03266, 2016.
[55]
K. A. Stephens, N. Anderson, C.-P. Lin, and H. Estiri. Implementing partnership-driven clinical federated electronic health record data sharing networks. International journal of medical informatics, 93:26--33, 2016.
[56]
K. A. Stephens, N. Anderson, C.-P. Lin, and H. Estiri. Implementing partnership-driven clinical federated electronic health record data sharing networks. International Journal of Medical Informatics, 93:26--33, 2016.
[57]
L. Sweeney. Discrimination in online ad delivery. Commun. ACM, 56(5):44--54, 2013.
[58]
N. Tkacz. From open source to open government: A critique of open politics. Ephemera: Theory & politics in organization, 12(4), 2012.
[59]
A. Vetrò, L. Canova, M. Torchiano, C. O. Minotas, R. lemma, and F. Morando. Open data quality measurement framework: Definition and application to open government data. Government Information Quarterly, 33(2):325--337, 2016.
[60]
J. Whittington, R. Calo, M. Simon, J. Woo, M. Young, and P. Schmiedeskamp. Push, pull, and spill: A transdisciplinary case study in municipal open government. Berkeley Tech. LJ, 30:1899, 2015.
[61]
Y. Xiao, L. Xiong, L. Fan, and S. Goryczka. Dpcube: differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358, 2012.
[62]
J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, and M. Winslett. Differentially private histogram publication. The VLDB Journal, 22(6):797--822, 2013.
[63]
A. Yan and N. Weber. Mining open government data used in scientific research. In International Conference on Information, pages 303--313. Springer, 2018.
[64]
M. Young and A. Yan. Civic hackers' user experiences and expectations of Seattle's open municipal data program. In Proceedings of the 50th Hawaii International Conference on System Sciences, 2017.
[65]
H. Yu and D. G. Robinson. The new ambiguity of open government. UCLA L. Rev. Discourse, 59:178, 2011.
[66]
J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):25, 2017.
[67]
Y. Zhang, T. Thomas, M. Brussel, and M. van Maarseveen. Expanding bicycle-sharing systems: lessons learnt from an analysis of usage. PLoS one, 11(12):e0168604, 2016.

Cited By

View all
  • (2024)A Survey on the Use of Synthetic Data for Enhancing Key Aspects of Trustworthy AI in the Energy Domain: Challenges and OpportunitiesEnergies10.3390/en1709199217:9(1992)Online publication date: 23-Apr-2024
  • (2024)Reidentification of Participants in Shared Clinical Data Sets: Experimental StudyJMIR AI10.2196/520543(e52054)Online publication date: 15-Mar-2024
  • (2024)Integrating Equity in Public Sector Data-Driven Decision Making: Exploring the Desired Futures of Underserved StakeholdersProceedings of the ACM on Human-Computer Interaction10.1145/36869058:CSCW2(1-39)Online publication date: 8-Nov-2024
  • Show More Cited By

Index Terms

  1. Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
      January 2019
      388 pages
      ISBN:9781450361255
      DOI:10.1145/3287560
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 January 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. algorithmic bias
      2. data ethics
      3. data governance
      4. data sharing
      5. privacy

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      FAT* '19
      Sponsor:

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)127
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Survey on the Use of Synthetic Data for Enhancing Key Aspects of Trustworthy AI in the Energy Domain: Challenges and OpportunitiesEnergies10.3390/en1709199217:9(1992)Online publication date: 23-Apr-2024
      • (2024)Reidentification of Participants in Shared Clinical Data Sets: Experimental StudyJMIR AI10.2196/520543(e52054)Online publication date: 15-Mar-2024
      • (2024)Integrating Equity in Public Sector Data-Driven Decision Making: Exploring the Desired Futures of Underserved StakeholdersProceedings of the ACM on Human-Computer Interaction10.1145/36869058:CSCW2(1-39)Online publication date: 8-Nov-2024
      • (2024)The tensions of data sharing for human rights: A modern slavery case studyProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658949(974-987)Online publication date: 3-Jun-2024
      • (2024)In the Walled Garden: Challenges and Opportunities for Research on the Practices of the AI Tech IndustryProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658918(456-466)Online publication date: 3-Jun-2024
      • (2024)Exploring Open-Source Software Ecosystems for Hardware DevelopmentGlobal collaboration, local production10.1007/978-3-658-44114-2_14(187-199)Online publication date: 1-May-2024
      • (2023)Data Anonymization With Diversity ConstraintsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313152835:4(3603-3618)Online publication date: 1-Apr-2023
      • (2023)Implementing Responsible AI: Tensions and Trade-Offs Between Ethics Aspects2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191274(1-7)Online publication date: 18-Jun-2023
      • (2023)Cloud auditing and authentication scheme for establishing privacy preservationMultimedia Tools and Applications10.1007/s11042-023-17170-383:15(43849-43870)Online publication date: 16-Oct-2023
      • (2022)Ethical Tensions in Applications of AI for Addressing Human Trafficking: A Human Rights PerspectiveProceedings of the ACM on Human-Computer Interaction10.1145/35551866:CSCW2(1-29)Online publication date: 11-Nov-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media