research-article

Understanding the Effect of Accuracy on Trust in Machine Learning Models

Authors:
Ming Yin

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Jennifer Wortman Vaughan

Microsoft Research, New York, NY, USA

Microsoft Research, New York, NY, USA
View Profile

,
Hanna Wallach

Microsoft Research, New York City, NY, USA

Microsoft Research, New York City, NY, USA
View Profile

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsMay 2019Paper No.: 279Pages 1–12https://doi.org/10.1145/3290605.3300509

Published:02 May 2019Publication History

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Pages 1–12

ABSTRACT

We address a relatively under-explored aspect of human-computer interaction: people's abilities to understand the relationship between a machine learning model's stated performance on held-out data and its expected performance post deployment. We conduct large-scale, randomized human-subject experiments to examine whether laypeople's trust in a model, measured in terms of both the frequency with which they revise their predictions to match those of the model and their self-reported levels of trust in the model, varies depending on the model's stated accuracy on held-out data and on its observed accuracy in practice. We find that people's trust in a model is affected by both its stated accuracy and its observed accuracy, and that the effect of stated accuracy can change depending on the observed accuracy. Our work relates to recent research on interpretable machine learning, but moves beyond the typical focus on model internals, exploring a different component of the machine learning pipeline.

References

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica, May 23 (2016).Google Scholar
Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183--186.Google Scholar
Alexandra Chouldechova, Diana Benavides Prado, Oleksandr Fialko, Emily Putnam-Hornstein, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Proceedings of the First Conference on Fairness, Accountability, and Transparency.Google Scholar
Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144, 1 (2015), 114.Google ScholarCross Ref
Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2016. Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them. Management Science 64, 3 (2016). Google ScholarDigital Library
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. (2017). CoRR arXiv:1702.08608.Google Scholar
Mary T. Dzindolet, Linda G. Pierce, Hall P. Beck, and Lloyd A. Dawe. 2002. The perceived utility of human and automated aids in a visual detection task. Human Factors 44, 1 (2002), 79--94.Google ScholarCross Ref
Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologistlevel classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115.Google Scholar
Raymond Fisman, Sheena S Iyengar, Emir Kamenica, and Itamar Simonson. 2006. Gender differences in mate selection: Evidence from a speed dating experiment. The Quarterly Journal of Economics 121, 2 (2006), 673--697.Google ScholarCross Ref
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. (2018). CoRR arXiv:1803.09010.Google Scholar
Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. 2015. Incentivizing High Quality Crowdwork. In Proceedings of the Twenty-Fourth International World Wide Web Conference. Google ScholarDigital Library
Kartik Hosanagar and Apoorv Saxena. 2017. The Democratization of Machine Learning: What It Means for Tech Innovation. Knowledge@Wharton, retrieved from http://knowledge.wharton.upenn.edu/article/democratization-ai-means-tech-innovation/.Google Scholar
Matthew Kay, Shwetak N Patel, and Julie A Kientz. 2015. How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 347--356. Google ScholarDigital Library
Ryan Kennedy, Philip D. Waggoner, and Matthew Ward. 2018. Trust in Public Policy Algorithms. Working paper.Google Scholar
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Google ScholarDigital Library
Zachary C. Lipton. 2016. The mythos of model interpretability. (2016). CoRR arXiv:1606.03490.Google Scholar
Jennifer M. Logg, Julia Minson, and Don A. Moore. 2018. Algorithm Appreciation: People prefer algorithmic to human judgment. (2018). Harvard Business School NOM Unit Working Paper No. 17-086.Google Scholar
Polina Marinova. 2017. How Dating Site eHarmony Uses Machine Learning to Help You Find Love. http://fortune.com/2017/02/14/eharmony-dating-machine-learning/.Google Scholar
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Second Conference on Fairness, Accountability, and Transparency. Google ScholarDigital Library
Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human Interpretability of Explanation. (2018). CoRR arXiv:1802.00682.Google Scholar
David W Nickerson and Todd Rogers. 2014. Political campaigns and big data. Journal of Economic Perspectives 28, 2 (2014), 51--74.Google ScholarCross Ref
Dilek Önkal, Paul Goodwin, Mary Thomson, and Sinan Gönül. 2009. The relative influence of advice from human experts and statistical methods on forecast adjustments. Journal of Behavioral Decision Making 22 (2009), 390--409.Google ScholarCross Ref
Umberto Panniello, Michele Gorgoglione, and Alexander Tuzhilin. 2016. Research note--In CARSs we trust: How context-aware recommendations affect customers? Trust and other business performance measures of recommender systems. Information Systems Research 27, 1 (2016), 182--196.Google ScholarCross Ref
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and Measuring Model Interpretability. (2018). CoRR arXiv:1802.07810.Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Maha Salem, Gabriella Lakatos, Farshid Amirabdollahian, and Kerstin Dautenhahn. 2015. Would you trust a (faulty) robot?: Effects of error, task type and personality on human-robot cooperation and trust. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction. 141--148. Google ScholarDigital Library
Jennifer Wortman Vaughan and Hanna Wallach. 2017. The Inescapability of Uncertainty. In CHI Workshop on Designing for Uncertainty in HCI: When Does Uncertainty Help?Google Scholar
Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in HighStakes Public Sector Decision-Making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Google ScholarDigital Library
Peng Xia, Hua Jiang, Xiaodong Wang, Cindy X Chen, and Benyuan Liu. 2014. Predicting User Replying Behavior on a Large Online Dating Site. In Proceedings of the International Conference on Web and Social Media.Google Scholar
Michael Yeomans, Anuj K. Shah, Sendhil Mullainathan, and Jon Kleinberg. 2018. Making sense of recommendations. Working paper.Google Scholar
Kun Yu, Shlomo Berkovsky, Dan Conway, Ronnie Taib, Jianlong Zhou, and Fang Chen. 2016. Trust and reliance based on system accuracy. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization. 223--227. Google ScholarDigital Library
Kun Yu, Shlomo Berkovsky, Ronnie Taib, Dan Conway, Jianlong Zhou, and Fang Chen. 2017. User trust dynamics: An investigation driven by differences in system performance. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 307--317. Google ScholarDigital Library

Index Terms

Understanding the Effect of Accuracy on Trust in Machine Learning Models
1. Computing methodologies
  1. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI

Recommendations

When Confidence Meets Accuracy: Exploring the Effects of Multiple Performance Indicators on Trust in Machine Learning Models
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

Previous research shows that laypeople’s trust in a machine learning model can be affected by both performance measurements of the model on the aggregate level and performance estimates on individual predictions. However, it is unclear how people would ...
Read More
Distrust and trust in B2C e-commerce: do they differ?
ICEC '06: Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet

Researchers have not studied e-commerce <u>distrust</u> as much as e-commerce <u>trust</u>. This study examines whether trust and distrust are distinct concepts. If trust and distrust are the same, lack of distrust research matters little. But if they ...
Read More
Understanding Egyptian Consumers' Intentions in Online Shopping

The purpose of this article is to investigate the factors that impact on Egyptian consumers' attitudes and intentions to use online shopping by integrating the technology acceptance models of Davis, and Fishbein and Ajzen's theory of reasoned action. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
May 2019
9077 pages
ISBN:9781450359702
DOI:10.1145/3290605
General Chairs:
Stephen Brewster
University of Glasgow, Scotland, UK
,
Geraldine Fitzpatrick
TU Wien, Austria
,
Program Chairs:
Anna Cox
University College London, UK
,
Vassilis Kostakos
University of Melbourne, Australia
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Honorable Mention
Author Tags
human-subject experiments
machine learning
trust
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 236
  Total Citations
  View Citations
- 5,376
  Total Downloads
- Downloads (Last 12 months)1,283
- Downloads (Last 6 weeks)167
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Understanding the Effect of Accuracy on Trust in Machine Learning Models

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

When Confidence Meets Accuracy: Exploring the Effects of Multiple Performance Indicators on Trust in Machine Learning Models

Distrust and trust in B2C e-commerce: do they differ?

Understanding Egyptian Consumers' Intentions in Online Shopping