Abstract
The trustworthiness of data science systems in applied and real-world settings emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility. Highlighting the collaborative and heterogeneous nature of real-world data science, we show how the management of trust in applied corporate data science settings depends not only on pre-processing and quantification, but also on negotiation and translation. We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.
- Saleema Amershi, Max Chickering, Steven M Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. ModelTracker: Redesigning Performance Analysis Tools for Machine Learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 337--346. Google ScholarDigital Library
- Mike Ananny and Kate Crawford. 2016. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc.20, 3, 973--989.Google ScholarCross Ref
- Chris Anderson. 2008. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired. Retrieved from http://archive.wired.com/science/discoveries/magazine/16-07/pb_theoryGoogle Scholar
- Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a Research Agenda for Human-Centered Data Science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (CSCW '16 Companion), 529--535. Google ScholarDigital Library
- Ellen Balka and Ina Wagner. 2006. Making Things Work: Dimensions of Configurability As Appropriation Work. In Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, 229--238. Google ScholarDigital Library
- Eric P S Baumer. 2017. Toward human-centered algorithm design. Big Data Soc.4, 2, 1--12.Google ScholarCross Ref
- Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. "It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18), 1--14. Google ScholarDigital Library
- Luc Boltanski and Laurent Thévenot. 2006. On Justification: Economies of Worth. Princeton University Press, Princeton.Google ScholarCross Ref
- Geoffrey C. Bowker. 2013. Data Flakes: An Afterword to "Raw Data" Is an Oxymoron. In"Raw Data" Is an Oxymoron, Lisa Gitelman (ed.). MIT Press, Cambridge, Massachusetts, 167--171.Google Scholar
- Geoffrey C. Bowker. 2014. The Theory/Data Thing. Int. J. Commun.8, 1795--1799.Google Scholar
- George E. P. Box. 1979. Robustness in the Strategy of Scientific Model Building. In Robustness in Statistics, Robert L. Launer and Graham N. Wilkinson (eds.). Academic Press, New York, 201--36.Google Scholar
- Engin Bozdag. 2013. Bias in algorithmic filtering and personalization. Ethics Inf. Technol.15, 3 (2013), 209--227. Google ScholarDigital Library
- Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D Sculley. 2016. What's your ML Test Score? A rubric for ML production systems. In Reliable Machine Learning in the Wild - NIPS 2016 Workshop.Google Scholar
- Jenna Burrell. 2016. How the machine 'thinks': Understanding opacity in machine learning algorithms. Big Data Soc.3, 1, 1--12.Google ScholarCross Ref
- Kevin Daniel André Carillo. 2017. Let's stop trying to be "sexy" -- preparing managers for the (big) data-driven business era. Bus. Process Manag. J.23, 3, 598--622.Google Scholar
- Kathy Charmaz. 2014. Constructing Grounded Theory (Introducing Qualitative Methods series) 2nd Edition. Sage, London.Google Scholar
- Glenn Cohen, Ruben Amarasingham, Anand Shah, Bin Xie, and Bernard Lo. 2014. The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Aff.33, 7, 1139--1147.Google ScholarCross Ref
- Harry M. Collins and Robert Evans. 2007. Rethinking Expertise. University of Chicago Press, Chicago.Google Scholar
- Kate Crawford and Ryan Calo. 2016. There is a blind spot in AI research. Nature 538, 311--313.Google ScholarCross Ref
- Morgan Currie, Britt S Paris, Irene Pasquetto, and Jennifer Pierre. 2016. The conundrum of police officer-involved homicides: Counter-data in Los Angeles County. Big Data Soc.3, 2, 1--14.Google ScholarCross Ref
- Aritra Dasgupta, Susannah Burrows, Kyungsik Han, and Philip J Rasch. 2017. Empirical Analysis of the Subjective Impressions and Objective Measures of Domain Scientists' Visual Analytic Judgments. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17), 1193--1204. Google ScholarDigital Library
- Lorraine Daston and Peter Galison. 1992. The Image of Objectivity. Representations 40, 81--128.Google ScholarCross Ref
- Lorraine Daston and Peter Galison. 2007. Objectivity. MIT Press, Cambridge, Massachusetts.Google Scholar
- Alain Desrosieres. 1998. The Politics of Large Numbers: A History of Statistical Reasoning. Harvard University Press, Cambridge, MA.Google Scholar
- John Dewey. 1939. Theory of Valuation. University of Chicago Press, Chicago.Google Scholar
- Jana Diesner. 2015. Small decisions with big impact on data analytics. Big Data Soc.2, 2, 1--6.Google ScholarCross Ref
- Paul Dourish. 2016. Algorithms and their others: Algorithmic culture in context. Big Data Soc.3, 2, 1--11.Google ScholarCross Ref
- Wendy Nelson Espeland and Mitchell L Stevens. 2008. A Sociology of Quantification. Eur. J. Sociol. / Arch. Eur. Sociol. / Eur. Arch. für Soziologie 49, 3, 401--436.Google ScholarCross Ref
- Batya Friedman and Helen Nissenbaum. 1996. Bias in computer systems. ACM Trans. Inf. Syst.14, 3, 330--347. Google ScholarDigital Library
- Jennifer Gabrys, Helen Pritchard, and Benjamin Barratt. 2016. Just good enough data: Figuring data citizenships through air pollution sensing and data stories. Big Data Soc.3, 2, 1--14.Google ScholarCross Ref
- Harold Garfinkel. 1964. Studies of the Routine Grounds of Everyday Activities. Soc. Probl.11, 3 (1964), 225--250.Google ScholarCross Ref
- Harold Garfinkel. 1967. Studies in Ethnomethodology. Prentice Hall, New Jersey.Google Scholar
- Tarleton Gillespie. 2014. The Relevance of Algorithms. In Media Technologies: Essays on Communication, Materiality, and Society, Tarleton Gillespie, Pablo J. Boczkowski and Kirsten A. Foot (eds.). MIT Press, Cambridge, 167--194.Google Scholar
- Lisa Gitelman. 2006. Raw Data is an Oxymoron. MIT Press, MA.Google Scholar
- Barney Glaser and Anselm Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transactions, Chicago.Google Scholar
- Charles Goodwin. 1994. Professional Vision. Am. Anthropol.96, 3, 606--633.Google ScholarCross Ref
- Miriam Greis, Emre Avci, Albrecht Schmidt, and Tonja Machulla. 2017. Increasing Users' Confidence in Uncertain Data by Aggregating Data from Multiple Sources. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17), 828--840. Google ScholarDigital Library
- Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A Survey Of Methods For Explaining Black Box Models. arXiv Prepr.1802.01933.Google Scholar
- Jürgen Habermas. 1992. Autonomy and Solidarity: Interviews. Verso, London.Google Scholar
- Ian Hacking. 1990. The Taming of Chance. Cambridge University Press, Cambridge.Google Scholar
- Donna Haraway. 2007. Modest_Witness@Second_Millenium. Routledge, London.Google Scholar
- Richard Harper. 2000. The social organization of the IMF's mission work: An examination of international auditing. In Audit cultures: Anthropological studies in accountability, ethics, and the academy, Marilyn Strathern (ed.). Routledge, London, 21--53.Google Scholar
- Mireille Hildebrandt. 2011. Who needs stories if you can get the data? ISPs in the era of big number crunching. Philos. Technol.24, 4, 371--390.Google ScholarCross Ref
- Edmund Husserl. 1970. The Crisis of European Sciences and Transcendental Philosophy. Northwestern University Press, Evanston.Google Scholar
- John P. A. Ioannidis. 2005. Why most published research findings are false. PLoS Med.2, 8 (2005), e124.Google ScholarCross Ref
- Steven J. Jackson. 2006. Water Models and Water Politics: Design, Deliberation, and Virtual Accountability. In Proceedings of the 2006 International Conference on Digital Government Research, 95--104. Google ScholarDigital Library
- Bernward Joerges and Terry Shinn. 2001. A Fresh Look at Instrumentation an Introduction. In Instrumentation Between Science, State and Industry, Bernward Joerges and Terry Shinn (eds.). Springer Netherlands, Dordrecht, 1--13.Google Scholar
- Matthew Kay, Shwetak N Patel, and Julie A Kientz. 2015. How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 347--356. Google ScholarDigital Library
- Evelyn Fox Keller. 2000. Models of and Models for: Theory and Practice in Contemporary Biology. Philos. Sci.67, 72--86.Google ScholarCross Ref
- Helen Kennedy and Giles Moss. 2015. Known or knowing publics? Social media data mining and the question of public agency. Big Data Soc.2, 2, 1--11.Google ScholarCross Ref
- Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17), 1265--1276. Google ScholarDigital Library
- Rob Kitchin. 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage Publications, USA.Google Scholar
- Rob Kitchin. 2014. Big Data, new epistemologies and paradigm shifts. Big Data Soc.1, 1 (April 2014), 1--12.Google ScholarCross Ref
- Rob Kitchin and Gavin McArdle. 2016. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc.3, 1, 1--10.Google ScholarCross Ref
- Jon Kleinberg, Jens Ludwig, and Sendhil Mullainathan. 2016. A Guide to Solving Social Problems with Machine Learning. Harvard Business Review. Retrieved from https://hbr.org/2016/12/a-guide-to-solving-social-problems-with-machine-learningGoogle Scholar
- Cory P. Knobel. 2010. Ontic Occlusion and Exposure in Sociotechnical Systems. University of Michigan.Google Scholar
- Bran Knowles, Mike Harding, Lynne Blair, Nigel Davies, James Hannon, Mark Rouncefield, and John Walden. 2014. Trustworthy by Design. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '14), 1060--1071. Google ScholarDigital Library
- Bran Knowles, Mark Rouncefield, Mike Harding, Nigel Davies, Lynne Blair, James Hannon, John Walden, and Ding Wang. 2015. Models and Patterns of Trust. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15), 328--338. Google ScholarDigital Library
- Laura M Koesten, Emilia Kacprzak, Jenifer F A Tennison, and Elena Simperl. 2017. The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17), 1277--1289. Google ScholarDigital Library
- Bruno Latour. 1987. Science in Action. Harvard University Press, Cambridge, MA.Google Scholar
- David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343, 6176, 1203--1205.Google Scholar
- Sabina Leonelli. 2014. What difference does quantity make? On the epistemology of Big Data in biology. Big Data Soc.1, 1, 1--11.Google ScholarCross Ref
- Sabina Leonelli. 2015. What counts as scientific data? A relational framework. Philos. Sci.82, 5 (2015), 810--821.Google ScholarCross Ref
- Zachary Chase Lipton. 2016. The Mythos of Model Interpretability. In2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016).Google Scholar
- Michael Luca, Jon Kleinberg, and Sendhil Mullainathan. 2016. Algorithms Need Managers, Too. Harvard Business Review. Retrieved from https://hbr.org/2016/01/algorithms-need-managers-tooGoogle Scholar
- Adrian Mackenzie. 2017. Machine Learners: Archaeology of a Data Practice. MIT Press, Cambridge, MA. Google ScholarCross Ref
- Jacob Metcalf and Kate Crawford. 2016. Where are human subjects in Big Data research? The emerging ethics divide. Big Data Soc.3, 1, 1--14.Google ScholarCross Ref
- Brent D. Mittelstadt, Patrick Allo, Mariarosaria Taddeo, Sandra Wachter, and Luciano Floridi. 2016. The ethics of algorithms: Mapping the debate. Big Data Soc.3, 2, 1--21.Google ScholarCross Ref
- Brent D. Mittelstadt and Luciano Floridi. 2016. The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts. Sci. Eng. Ethics 22, 2, 303--341.Google ScholarCross Ref
- Gauri Naik and Sanika S. Bhide. 2014. Will the future of knowledge work automation transform personalized medicine? Appl. Transl. Genomics 3, 3, 50--53.Google ScholarCross Ref
- Gina Neff, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science. Big Data 5, 2, 85--97.Google ScholarCross Ref
- Joseph O'Connell. 1993. Metrology: The Creation of Universality by the Circulation of Particulars. Soc. Stud. Sci.23, 1, 129--173.Google ScholarCross Ref
- Wanda J. Orlikowski. 2007. Sociomaterial Practices: Exploring Technology at Work. Organ. Stud.28, 9, 1435--1448.Google ScholarCross Ref
- Frank Pasquale. 2015. The Black Box Society: The Secret Algorithms that Control Money and Information. Harvard University Press, Cambridge, MA. Google ScholarDigital Library
- Samir Passi and Steven J. Jackson. 2017. Data Vision: Learning to See Through Algorithmic Abstraction. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17), 2436--2447. Google ScholarDigital Library
- Samir Passi and Phoebe Sengers. 2016. "From what I see, this makes sense:" Seeing meaning in algorithmic results. In Computer-Supported Cooperative Work (CSCW) 2016 workshop "Algorithms at Work: Empirical Diversity, Analytic Vocabularies, Design Implications,"1--4.Google Scholar
- Theodore Porter. 1995. Trust in Numbers: The Pursuit of Objectivity in Scienceand Public Life. Princeton University Press, Princeton.Google Scholar
- Michael Power. 1997. The Audit Society: Rituals of Verification. Oxford University Press, Oxford.Google Scholar
- David Powers. 2011. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol 2, 1, 37--63.Google ScholarCross Ref
- Rita Raley. 2013. Dataveillance and Counterveillance. In"Raw Data" Is an Oxymoron, Lisa Gitelman (ed.). MIT Press, Cambridge, Massachusetts, 121--145.Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-Agnostic Interpretability of Machine Learning. In2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), 91--95.Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135--1144. Google ScholarDigital Library
- Gernot Rieder and Judith Simon. 2016. Datatrust: Or, the political quest for numerical evidence and the epistemologies of Big Data. Big Data Soc.3, 1, 1--6.Google ScholarCross Ref
- Alan Rubel and Kyle M. L. Jones. 2014. Student privacy in learning analytics: An information ethics perspective. Inf. Soc.32, 2, 143--159. Google ScholarDigital Library
- Nadine Schuurman and Ellen Balka. 2009. alt.metadata.health: Ontological Context for Data Use and Integration. Comput. Support. Coop. Work 18, 1, 83--108. Google ScholarDigital Library
- Andrew D. Selbst and Solon Barocas. 2018. The Intuitive Appeal of Explainable Machines. Fordham Law Rev.Forthcoming, (2018).Google Scholar
- Steven Shapin. 1989. The Invisible Technician. Am. Sci.77, 6, 554--563.Google Scholar
- Steven Shapin. 1994. A Social History of Truth: Civility and Science in Seventeenth Century England. University of Chicago Press, Chicago.Google Scholar
- Steven Shapin. 1995. Cordelia's Love: Credibility and the Social Studies of Science. Perspect. Sci.3, 3, 255--275.Google Scholar
- Steven Shapin. 1995. Trust, Honesty, and the Authority of Science. In Society's Choices: Social and Ethical Decision Making in Biomedicine, Ruth Ellen Bulger, Elizabeth Meyer Bobby and Harvey Fineberg (eds.). National Academies Press, Washington D.C., 388--408.Google Scholar
- Steven Shapin and Simon Schaffer. 1985. Leviathan and the Air-Pump: Hobbes, Boyle and the Experimental Life. Princeton University Press, Princeton.Google Scholar
- Sameer Singh, Marco Tulio Ribeiro, and Carlos Guestrin. 2016. Programs as Black-Box Explanations. ArXiv e-prints 1611.07579.Google Scholar
- Susan Leigh Star and Karen Ruhleder. 1994. Steps Towards an Ecology of Infrastructure: Complex Problems in Design and Access for Large-scale Collaborative Systems. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (CSCW '94), 253--264. Google ScholarDigital Library
- David Stark. 2009. The Sense of Dissonance: Accounts of Worth in Economic Life. Princeton University Press, Princeton.Google Scholar
- Anselm Strauss and Juliet M. Corbin. 1990. Basics of Qualitative Research: Grounded Theory Techniques and Procedures. Sage, New York.Google Scholar
- John Symons and Ramón Alvarado. 2016. Can we trust Big Data? Applying philosophy of science to software. Big Data Soc.3, 2, 1--17.Google ScholarCross Ref
- Alex Tabarrok. 2015. The Rise of Opaque Intelligence. Marginal Revolution. Retrieved February 15, 2018 from http://marginalrevolution.com/marginalrevolution/2015/02/opaque-intelligence.htmlGoogle Scholar
- Alex S. Taylor, Siân Lindley, Tim Regan, and David Sweeney. 2014. Data and life on the street. Big Data Soc.1, 2, 1--7.Google ScholarCross Ref
- François Thoreau. 2016. 'A mechanistic interpretation, if possible': How does predictive modelling causality affect the regulation of chemicals? Big Data Soc.3, 2, 1--11.Google ScholarCross Ref
- Michael Veale. 2017. Logics and Practices of Transparency and Opacity in Real-world Applications of Public Sector Machine Learning. In 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017), Halifax, Canada.Google Scholar
- Martin Wattenberg, Fernanda Viégas, and Moritz Hardt. 2016. Attacking discrimination with smarter machine learning. Google Research.Google Scholar
- Michele Willson. 2017. Algorithms (and the) everyday. Information, Commun. Soc.20, 1 (2017), 137--150.Google ScholarCross Ref
- Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating How Experienced UX Designers Effectively Work with Machine Learning. In Proceedings of the 2018 Designing Interactive Systems Conference (DIS '18), 585--596. Google ScholarDigital Library
- Tal Zarsky. 2016. The trouble with algorithmic decisions an analytic road map to examine efficiency and fairness in automated and opaque decision making. Sci. Technol. Hum. Values 41, 1, 118--132.Google ScholarCross Ref
- Matthew Zook, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, and Frank Pasquale. 2017. Ten simple rules for responsible big data research. PLoS Comput. Biol.13, 3.Google ScholarCross Ref
Index Terms
- Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects
Recommendations
Beyond Expertise Seeking: A Field Study of the Informal Knowledge Practices of Healthcare IT Teams
CSCW has long been concerned with formal and informal knowledge practices in organizations, examining both the social and technical aspects of how knowledge is sought, shared, and used. In this study, we are interested in examining the set of activities ...
Trust and epistemic communities in biodiversity data sharing
JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital librariesTrust is a key element of knowledge work: what we know depends largely on others. This paper discusses the concepts of communities of practice and epistemic cultures, and their implication for design of digital libraries that support data sharing, with ...
Mobile-banking adoption by Iranian bank clients
This study provides insights into factors affecting the adoption of mobile banking in Iran. Encouraging clients to use the cell-phone for banking affairs, and negative trends in the adoption of this technology makes it imperative to study the factors ...
Comments