ABSTRACT
As we increasingly delegate important decisions to intelligent systems, it is essential that users understand how algorithmic decisions are made. Prior work has often taken a technocentric approach to transparency. In contrast, we explore empirical user-centric methods to better understand user reactions to transparent systems. We assess user reactions to transparency in two studies. In Study 1, users anticipated that a more transparent system would perform better, but retracted this evaluation after experience with the system. Qualitative data suggest this arose because transparency is distracting and undermines simple heuristics users form about system operation. Study 2 explored these effects in depth, suggesting that users may benefit from initially simplified feedback that hides potential system errors and assists users in building working heuristics about system operation. We use these findings to motivate new progressive disclosure principles for transparency in intelligent systems.
Supplemental Material
- Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18, 1--18. Google ScholarDigital Library
- Icek Ajzen. 1991. The theory of planned behavior. Organizational Behavior and Human Decision Processes 50, 2: 179--211.Google ScholarCross Ref
- Julia Angwin and Jeff Larson. 2016. Machine Bias. ProPublica. Retrieved October 27, 2017 from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
- Stavros Antifakos, Nicky Kern, Bernt Schiele, and Adrian Schwaninger. 2005. Towards improving trust in context-aware systems by displaying system confidence. In Proceedings of the 7th international conference on Human computer interaction with mobile devices & services - MobileHCI '05, 9. Google ScholarDigital Library
- Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, and Wojciech Samek. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PLOS ONE 12, 8: e0181142.Google ScholarCross Ref
- Victoria Bellotti and Keith Edwards. 2001. Intelligibility and Accountability: Human Considerations in Context-Aware Systems. Human-Computer Interaction 16, 2: 193--212. Google ScholarDigital Library
- Frank Bentley, Konrad Tollmar, Peter Stephenson, Laura Levy, Brian Jones, Scott Robertson, Ed Price, Richard Catrambone, and Jeff Wilson. 2013. Health Mashups: Presenting Statistical Patterns between Wellbeing Data and Context in Natural Language to Promote Behavior Change. ACM Transactions on Computer-Human Interaction 20, 5: 1--27. Google ScholarDigital Library
- Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. "It's Reducing a Human Being to a Percentage"; Perceptions of Justice in Algorithmic Decisions. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18: 1--14. Google ScholarDigital Library
- Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems, 4349--4357. Google ScholarDigital Library
- Engin Bozdag. 2013. Bias in algorithmic filtering and personalization. Ethics and Information Technology 15, 3: 209--227. Google ScholarDigital Library
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2: 77--101.Google ScholarCross Ref
- Andrea Bunt, Matthew Lount, and Catherine Lauzon. 2012. Are explanations always important?: a study of deployed, low-cost intelligent interactive systems. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, 169--178. Retrieved April 25, 2017 from http://dl.acm.org/citation.cfm?id=2166996 Google ScholarDigital Library
- John T. Cacioppo, Richard E. Petty, and Chuan Feng Kao. 1984. The Efficient Assessment of Need for Cognition. Journal of Personality Assessment 48, 3: 306--307.Google ScholarCross Ref
- John M. Carroll and Caroline Carrithers. 1984. Training Wheels in a User Interface. Commun. ACM 27, 8: 800--806. Google ScholarDigital Library
- Eun Kyoung Choe, Bongshin Lee, Haining Zhu, Nathalie Henry Riche, and Dominikus Baur. 2017. Understanding Self-reflection: How People Reflect on Personal Data Through Visual Data Exploration. In Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth '17), 173--182. Google ScholarDigital Library
- Eun Kyoung Choe, Nicole B. Lee, Bongshin Lee, Wanda Pratt, and Julie A. Kientz. 2014. Understanding quantified-selfers' practices in collecting and exploring personal data. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems - CHI '14, 1143--1152. Google ScholarDigital Library
- Mary L. Cummings. 2004. Automation bias in intelligent time critical decision support systems. In AIAA 1st Intelligent Systems Technical Conference, 557--562.Google ScholarCross Ref
- Michael A. DeVito, Jeremy Birnholtz, Jeffery T. Hancock, Megan French, and Sunny Liu. 2018. How People Form Folk Theories of Social Media Feeds and What it Means for How We Study Self-Presentation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18, 1--12. Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 {cs}. Retrieved December 27, 2018 from http://arxiv.org/abs/1810.04805Google Scholar
- Mary T. Dzindolet, Scott A. Peterson, Regina A. Pomranky, Linda G. Pierce, and Hall P. Beck. 2003. The Role of Trust in Automation Reliance. Int. J. Hum.-Comput. Stud. 58, 6: 697--718. Google ScholarDigital Library
- Malin Eiband, Hanna Schneider, Mark Bilandzic, Julian Fazekas-Con, Mareike Haug, and Heinrich Hussmann. 2018. Bringing Transparency Design into Practice. In 23rd International Conference on Intelligent User Interfaces (IUI '18), 211--223. Google ScholarDigital Library
- Motahhare Eslami, Karrie Karahalios, Christian Sandvig, Kristen Vaccaro, Aimee Rickman, Kevin Hamilton, and Alex Kirlik. 2016. First I like it, then I hide it: Folk Theories of Social Feeds. In Proceedings of the 2016 cHI conference on human factors in computing systems, 2371--2382. Retrieved April 25, 2017 from http://dl.acm.org/citation.cfm?id=2858494 Google ScholarDigital Library
- Motahhare Eslami, Sneha R. Krishna Kumaran, Christian Sandvig, and Karrie Karahalios. 2018. Communicating Algorithmic Process in Online Behavioral Advertising. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18), 432:1--432:13. Google ScholarDigital Library
- Motahhare Eslami, Aimee Rickman, Kristen Vaccaro, Amirhossein Aleyasen, Andy Vuong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. "I Always Assumed That I Wasn'T Really That Close to {Her}": Reasoning About Invisible Algorithms in News Feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 153--162. Google ScholarDigital Library
- Mads Frost, Afsaneh Doryab, Maria Faurholt-Jepsen, Lars Vedel Kessing, and Jakob E. Bardram. 2013. Supporting disease insight through data analysis: refinements of the monarca self-assessment system. 133. Google ScholarDigital Library
- Pedro García Garcia, Enrico Costanza, Jhim Verame, Diana Nowacka, and Sarvapali D. Ramchurn. 2018. Seeing (Movement) is Believing: The Effect of Motion on Perception of Automatic Systems Performance. Human-Computer Interaction 0, 0: 1--51.Google ScholarCross Ref
- Harold Garfinkel. 1991. Studies in Ethnomethodology. Wiley.Google Scholar
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. arXiv:1412.6572 {cs, stat}. Retrieved January 7, 2019 from http://arxiv.org/abs/1412.6572Google Scholar
- Bryce Goodman and Seth Flaxman. 2017. European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation." AI Magazine 38, 3: 50--57.Google ScholarCross Ref
- H. P Grice. 1975. Logic and conversation.Google Scholar
- Enzo Grossi, Nicola Groth, Paola Mosconi, Renata Cerutti, Fabio Pace, Angelo Compare, and Giovanni Apolone. 2006. Development and validation of the short version of the Psychological General Well-Being Index (PGWB-S). Health and Quality of Life Outcomes: 8.Google Scholar
- Chloe Gui and Victoria Chan. 2017. Machine learning in medicine. University of Western Ontario Medical Journal 86, 2: 76--78.Google ScholarCross Ref
- Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. arXiv:1610.02413 {cs}. Retrieved from http://arxiv.org/abs/1610.02413Google Scholar
- Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Advances in Psychology, Peter A. Hancock and Najmedin Meshkati (eds.). North-Holland, 139--183.Google Scholar
- Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Grounding Visual Explanations. 16.Google Scholar
- Denis J. Hilton. 1990. Conversational processes and causal explanation. Psychological Bulletin 107, 1: 65--81.Google ScholarCross Ref
- Victoria Hollis, Artie Konrad, Aaron Springer, Chris Antoun, Matthew Antoun, Rob Martin, and Steve Whittaker. 2017. What Does All This Data Mean for My Future Mood? Actionable Analytics and Targeted Reflection for Emotional Well-Being. Human-Computer Interaction. Google ScholarDigital Library
- Victoria Hollis, Alon Pekurovsky, Eunika Wu, and Steve Whittaker. 2018. On Being Told How We Feel: How Algorithmic Sensor Feedback Influences Emotion Perception. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3: 114:1--114:31. Google ScholarDigital Library
- Ellen Isaacs, Artie Konrad, Alan Walendowski, Thomas Lennig, Victoria Hollis, and Steve Whittaker. 2013. Echoes from the past: how technology mediated reflection improves well-being. 1071. Google ScholarDigital Library
- Daniel Kahneman. 2011. Thinking, Fast and Slow. Macmillan.Google Scholar
- SeungJun Kim, Jaemin Chun, and Anind K. Dey. 2015. Sensors Know When to Interrupt You in the Car: Detecting Driver Interruptibility Through Monitoring of Peripheral Interactions. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI '15, 487--496. Google ScholarDigital Library
- René F. Kizilcec. 2016. How Much Information?: Effects of Transparency on Trust in an Algorithmic Interface. 2390--2395. Google ScholarDigital Library
- Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces - IUI '15, 126--137. Google ScholarDigital Library
- Todd Kulesza, Weng-Keen Wong, Simone Stumpf, Stephen Perona, Rachel White, Margaret M. Burnett, Ian Oberst, and Andrew J. Ko. 2008. Fixing the program my computer learned: barriers for end users, challenges for the machine. In Proceedingsc of the 13th international conference on Intelligent user interfaces - IUI '09, 187. Google ScholarDigital Library
- B. C. Kwon, M. Choi, J. T. Kim, E. Choi, Y. B. Kim, S. Kwon, J. Sun, and J. Choo. 2018. RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records. IEEE Transactions on Visualization and Computer Graphics: 1--1.Google Scholar
- Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable & Explorable Approximations of Black Box Models. arXiv:1707.01154 {cs}. Retrieved September 18, 2018 from http://arxiv.org/abs/1707.01154Google Scholar
- Shoushan Li, Lei Huang, Rong Wang, and Guodong Zhou. 2015. Sentence-level Emotion Classification with Label and Context Dependence. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1045--1053. Retrieved September 21, 2018 from http://www.aclweb.org/anthology/P15-1101Google ScholarCross Ref
- Brian Y. Lim and Anind K. Dey. 2009. Assessing demand for intelligibility in context-aware applications. In Proceedings of the 11th international conference on Ubiquitous computing, 195--204. Retrieved May 8, 2017 from http://dl.acm.org/citation.cfm?id=1620576 Google ScholarDigital Library
- Brian Y. Lim and Anind K. Dey. 2011. Investigating intelligibility for uncertain context-aware applications. In Proceedings of the 13th international conference on Ubiquitous computing, 415--424. Retrieved April 25, 2017 from http://dl.acm.org/citation.cfm?id=2030168 Google ScholarDigital Library
- Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the 27th international conference on Human factors in computing systems - CHI 09, 2119. Google ScholarDigital Library
- Zachary C. Lipton. 2016. The Mythos of Model Interpretability. arXiv:1606.03490 {cs, stat}. Retrieved September 21, 2018 from http://arxiv.org/abs/1606.03490Google Scholar
- Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible Models for Classification and Regression.Google Scholar
- Scott M Lundberg and Su-In Lee. A Unified Approach to Interpreting Model Predictions. 10.Google Scholar
- Daniel McDuff, Amy Karlson, Ashish Kapoor, Asta Roseway, and Mary Czerwinski. 2012. AffectAura: an intelligent system for emotional memory. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 849--858. Retrieved March 16, 2016 from http://dl.acm.org/citation.cfm?id=2208525 Google ScholarDigital Library
- B. Micenková, R. T. Ng, X. Dang, and I. Assent. 2013. Explaining Outliers by Subspace Separability. In 2013 IEEE 13th International Conference on Data Mining, 518--527.Google Scholar
- Tim Miller. 2017. Explanation in Artificial Intelligence: Insights from the Social Sciences. arXiv:1706.07269 {cs}. Retrieved September 17, 2018 from http://arxiv.org/abs/1706.07269Google Scholar
- Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2018. Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73: 1--15.Google ScholarCross Ref
- Kathleen L. Mosier, Linda J. Skitka, Susan Heers, and Mark Burdick. 1998. Automation Bias: Decision Making and Performance in High-Tech Cockpits. The International Journal of Aviation Psychology 8, 1: 47--63.Google ScholarCross Ref
- Bonnie M. Muir and Neville Moray. 1996. Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics 39, 3: 429--460.Google ScholarCross Ref
- Saurabh Nagrecha, John Z. Dillon, and Nitesh V. Chawla. 2017. MOOC Dropout Prediction: Lessons Learned from Making Pipelines Interpretable. In Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion, 351--359. Google ScholarDigital Library
- Lloyd H. Nakatani and John A. Rohrlich. 1983. Soft machines: A philosophy of user-computer interface design. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems - CHI '83, 19--23. Google ScholarDigital Library
- Clifford Nass and Youngme Moon. 2000. Machines and Mindlessness: Social Responses to Computers. Journal of Social Issues 56, 1: 81--103.Google ScholarCross Ref
- Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown. Google ScholarDigital Library
- Eli Pariser. 2011. The Filter Bubble: What The Internet Is Hiding From You. Penguin Books Limited. Google ScholarDigital Library
- James W. Pennebaker. 2011. The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Press.Google Scholar
- Richard E. Petty and John T. Cacioppo. 1986. The Elaboration Likelihood Model of Persuasion. Advances in Experimental Social Psychology 19: 123--205.Google ScholarCross Ref
- M. F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3: 130--137.Google ScholarCross Ref
- Mashfiqui Rabbi, Min Hane Aung, Mi Zhang, and Tanzeem Choudhury. 2015. MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 707--718. Retrieved May 4, 2017 from http://dl.acm.org/citation.cfm?id=2805840 Google ScholarDigital Library
- Lena Reed, Jiaqi Wu, Shereen Oraby, Pranav Anand, and Marilyn Walker. 2017. Learning Lexico-Functional Patterns for First-Person Affect. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 141--147. Retrieved September 18, 2018 from http://aclweb.org/anthology/P17-2022Google ScholarCross Ref
- Byron Reeves and Clifford Nass. 1996. The Media Equation: How People Treat Computers, Television, and New Media like Real People and Places. Cambridge University Press. Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135--1144. Google ScholarDigital Library
- Stephanie Rude, Eva-Maria Gortner, and James Pennebaker. 2004. Language use of depressed and depression-vulnerable college students. Cognition and Emotion 18, 8: 1121--1133.Google ScholarCross Ref
- Emanuel A. Schegloff. 1992. Repair After Next Turn: The Last Structurally Provided Defense of Intersubjectivity in Conversation. American Journal of Sociology 97, 5: 1295--1345.Google ScholarCross Ref
- David Canfield Smith. Designing the Star User Interface. 21.Google Scholar
- Aaron Springer and Henriette Cramer. 2018. "Play PRBLMS": Identifying and Correcting Less Accessible Content in Voice Interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18), 296:1--296:13. Google ScholarDigital Library
- Aaron Springer, Victoria Hollis, and Steve Whittaker. 2017. Dice in the Black Box: User Experiences with an Inscrutable Algorithm. Retrieved April 24, 2017 from https://aaai.org/ocs/index.php/SSS/SSS17/paper/view/1537 2Google Scholar
- Aaron Springer, Victoria Hollis, and Steve Whittaker. 2018. Mood modeling: accuracy depends on active logging and reflection. Personal and Ubiquitous Computing: 1--15. Google ScholarDigital Library
- Aaron Springer and Steve Whittaker. 2018. What are You Hiding? Algorithmic Transparency and User Perceptions. In 2018 AAAI Spring Symposium Series.Google Scholar
- Rachael Tatman. 2017. Gender and Dialect Bias in YouTube's Automatic Captions. EACL 2017: 53.Google ScholarCross Ref
- Kristen Vaccaro, Dylan Huang, Motahhare Eslami, Christian Sandvig, Kevin Hamilton, and Karrie Karahalios. 2018. The Illusion of Control: Placebo Effects of Control Settings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18), 16:1--16:13. Google ScholarDigital Library
- Jeffrey Warshaw, Tara Matthews, Steve Whittaker, Chris Kau, Mateo Bengualid, and Barton A. Smith. 2015. Can an Algorithm Know the "Real You"?: Understanding People's Reactions to Hyper-personal Analytics Systems. 797--806. Google ScholarDigital Library
- Daniel S. Weld and Gagan Bansal. 2018. The Challenge of Crafting Intelligible Intelligence. arXiv:1803.04263 {cs}. Retrieved September 20, 2018 from http://arxiv.org/abs/1803.04263Google Scholar
- Jenna Wiens and Erica S. Shenoy. 2018. Machine Learning for Healthcare: On the Verge of a Major Shift in Healthcare Epidemiology. Clinical Infectious Diseases 66, 1:149--153.Google ScholarCross Ref
- Rayoung Yang and Mark W. Newman. 2013. Learning from a learning thermostat: lessons for intelligent systems for the home. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing - UbiComp '13, 93. Google ScholarDigital Library
- Miriam Zisook, Sara Taylor, Akane Sano, and Rosalind Picard. 2016. SNAPSHOT Expose: Stage Based and Social Theory Based Applications to Reduce Stress and Improve Wellbeing. Retrieved March 10, 2017 from https://pdfs.semanticscholar.org/b283/53899d0c5059a31c9bc69c364e62bc6c7ff5.pdfGoogle Scholar
- Home |. Pip. Retrieved September 21, 2018 from https://thepip.com/en-us/Google Scholar
- Real-time physiological signals | E4 EDA/GSR sensor. Empatica. Retrieved September 21, 2018 from https://www.empatica.com/research/e4Google Scholar
- Woebot - Your charming robot friend who is here for you, 24/7. Retrieved September 21, 2018 from https://woebot.ioGoogle Scholar
Index Terms
- Progressive disclosure: empirically motivated approaches to designing effective transparency
Recommendations
Progressive Disclosure: When, Why, and How Do Users Want Algorithmic Transparency Information?
Special Issue on IUI 2019 HighlightsIt is essential that users understand how algorithmic decisions are made, as we increasingly delegate important decisions to intelligent systems. Prior work has often taken a techno-centric approach, focusing on new computational techniques to support ...
Enabling Effective Transparency: Towards User-Centric Intelligent Systems
AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and SocietyMuch of the current research in transparency and explainability is highly technical and focuses on how to derive explanations from models and algorithms. Less thought is being given to how users actually want to receive transparency and explanations ...
How to Support Users in Understanding Intelligent Systems? An Analysis and Conceptual Framework of User Questions Considering User Mindsets, Involvement, and Knowledge Outcomes
The opaque nature of many intelligent systems violates established usability principles and thus presents a challenge for human-computer interaction. Research in the field therefore highlights the need for transparency, scrutability, intelligibility, ...
Comments