skip to main content
10.1145/3290605.3300509acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Honorable Mention

Understanding the Effect of Accuracy on Trust in Machine Learning Models

Published:02 May 2019Publication History

ABSTRACT

We address a relatively under-explored aspect of human-computer interaction: people's abilities to understand the relationship between a machine learning model's stated performance on held-out data and its expected performance post deployment. We conduct large-scale, randomized human-subject experiments to examine whether laypeople's trust in a model, measured in terms of both the frequency with which they revise their predictions to match those of the model and their self-reported levels of trust in the model, varies depending on the model's stated accuracy on held-out data and on its observed accuracy in practice. We find that people's trust in a model is affected by both its stated accuracy and its observed accuracy, and that the effect of stated accuracy can change depending on the observed accuracy. Our work relates to recent research on interpretable machine learning, but moves beyond the typical focus on model internals, exploring a different component of the machine learning pipeline.

References

  1. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica, May 23 (2016).Google ScholarGoogle Scholar
  2. Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183--186.Google ScholarGoogle Scholar
  3. Alexandra Chouldechova, Diana Benavides Prado, Oleksandr Fialko, Emily Putnam-Hornstein, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Proceedings of the First Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  4. Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144, 1 (2015), 114.Google ScholarGoogle ScholarCross RefCross Ref
  5. Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2016. Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them. Management Science 64, 3 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. (2017). CoRR arXiv:1702.08608.Google ScholarGoogle Scholar
  7. Mary T. Dzindolet, Linda G. Pierce, Hall P. Beck, and Lloyd A. Dawe. 2002. The perceived utility of human and automated aids in a visual detection task. Human Factors 44, 1 (2002), 79--94.Google ScholarGoogle ScholarCross RefCross Ref
  8. Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologistlevel classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115.Google ScholarGoogle Scholar
  9. Raymond Fisman, Sheena S Iyengar, Emir Kamenica, and Itamar Simonson. 2006. Gender differences in mate selection: Evidence from a speed dating experiment. The Quarterly Journal of Economics 121, 2 (2006), 673--697.Google ScholarGoogle ScholarCross RefCross Ref
  10. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. (2018). CoRR arXiv:1803.09010.Google ScholarGoogle Scholar
  11. Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. 2015. Incentivizing High Quality Crowdwork. In Proceedings of the Twenty-Fourth International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kartik Hosanagar and Apoorv Saxena. 2017. The Democratization of Machine Learning: What It Means for Tech Innovation. Knowledge@Wharton, retrieved from http://knowledge.wharton.upenn.edu/article/democratization-ai-means-tech-innovation/.Google ScholarGoogle Scholar
  13. Matthew Kay, Shwetak N Patel, and Julie A Kientz. 2015. How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 347--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ryan Kennedy, Philip D. Waggoner, and Matthew Ward. 2018. Trust in Public Policy Algorithms. Working paper.Google ScholarGoogle Scholar
  15. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zachary C. Lipton. 2016. The mythos of model interpretability. (2016). CoRR arXiv:1606.03490.Google ScholarGoogle Scholar
  17. Jennifer M. Logg, Julia Minson, and Don A. Moore. 2018. Algorithm Appreciation: People prefer algorithmic to human judgment. (2018). Harvard Business School NOM Unit Working Paper No. 17-086.Google ScholarGoogle Scholar
  18. Polina Marinova. 2017. How Dating Site eHarmony Uses Machine Learning to Help You Find Love. http://fortune.com/2017/02/14/eharmony-dating-machine-learning/.Google ScholarGoogle Scholar
  19. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Second Conference on Fairness, Accountability, and Transparency. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human Interpretability of Explanation. (2018). CoRR arXiv:1802.00682.Google ScholarGoogle Scholar
  21. David W Nickerson and Todd Rogers. 2014. Political campaigns and big data. Journal of Economic Perspectives 28, 2 (2014), 51--74.Google ScholarGoogle ScholarCross RefCross Ref
  22. Dilek Önkal, Paul Goodwin, Mary Thomson, and Sinan Gönül. 2009. The relative influence of advice from human experts and statistical methods on forecast adjustments. Journal of Behavioral Decision Making 22 (2009), 390--409.Google ScholarGoogle ScholarCross RefCross Ref
  23. Umberto Panniello, Michele Gorgoglione, and Alexander Tuzhilin. 2016. Research note--In CARSs we trust: How context-aware recommendations affect customers? Trust and other business performance measures of recommender systems. Information Systems Research 27, 1 (2016), 182--196.Google ScholarGoogle ScholarCross RefCross Ref
  24. Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and Measuring Model Interpretability. (2018). CoRR arXiv:1802.07810.Google ScholarGoogle Scholar
  25. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Maha Salem, Gabriella Lakatos, Farshid Amirabdollahian, and Kerstin Dautenhahn. 2015. Would you trust a (faulty) robot?: Effects of error, task type and personality on human-robot cooperation and trust. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction. 141--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jennifer Wortman Vaughan and Hanna Wallach. 2017. The Inescapability of Uncertainty. In CHI Workshop on Designing for Uncertainty in HCI: When Does Uncertainty Help?Google ScholarGoogle Scholar
  28. Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in HighStakes Public Sector Decision-Making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Peng Xia, Hua Jiang, Xiaodong Wang, Cindy X Chen, and Benyuan Liu. 2014. Predicting User Replying Behavior on a Large Online Dating Site. In Proceedings of the International Conference on Web and Social Media.Google ScholarGoogle Scholar
  30. Michael Yeomans, Anuj K. Shah, Sendhil Mullainathan, and Jon Kleinberg. 2018. Making sense of recommendations. Working paper.Google ScholarGoogle Scholar
  31. Kun Yu, Shlomo Berkovsky, Dan Conway, Ronnie Taib, Jianlong Zhou, and Fang Chen. 2016. Trust and reliance based on system accuracy. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization. 223--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kun Yu, Shlomo Berkovsky, Ronnie Taib, Dan Conway, Jianlong Zhou, and Fang Chen. 2017. User trust dynamics: An investigation driven by differences in system performance. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 307--317. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Understanding the Effect of Accuracy on Trust in Machine Learning Models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
        May 2019
        9077 pages
        ISBN:9781450359702
        DOI:10.1145/3290605

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 May 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format