research-article

“I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression

Authors:
Tessa Eagle

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

,
Conrad Blau

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

,
Sophie Bales

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

,
Noopur Desai

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

,
Victor Li

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

,
Steve Whittaker

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA

0000-0003-3893-2551
View Profile

ACM Transactions on Interactive Intelligent Systems Volume 12 Issue 2Article No.: 12pp 1–23https://doi.org/10.1145/3488057

Published:20 July 2022Publication History

ACM Transactions on Interactive Intelligent Systems

Abstract

Conversational agents (CAs) are increasingly ubiquitous and are now commonly used to access medical information. However, we lack systematic data about the quality of advice such agents provide. This paper evaluates CA advice for mental health (MH) questions, a pressing issue given that we are undergoing a mental health crisis. Building on prior work, we define a new method to systematically evaluate mental health responses from CAs. We develop multi-utterance conversational probes derived from two widely used mental health diagnostic surveys, the PHQ-9 (Depression) and the GAD-7 (Anxiety). We evaluate the responses of two text-based chatbots and four voice assistants to determine whether CAs provide relevant responses and treatments. Evaluations were conducted both by clinicians and immersively by trained raters, yielding consistent results across all raters. Although advice and recommendations were generally low quality, they were better for Crisis probes and for probes concerning symptoms of Anxiety rather than Depression. Responses were slightly improved for text versus speech-based agents, and when CAs had access to extended dialogue context. Design implications include suggestions for improved responses through clarification sub-dialogues. Responses may also be improved by the incorporation of empathy although this needs to be combined with effective treatments or advice.

REFERENCES

[1] Abdullah Abu S., Gaehde Stephan, and Bickmore Tim. 2018. A tablet based embodied conversational agent to promote smoking cessation among veterans: A feasibility study. J. Epidemiol. Glob. Health 8, 3–4 (2018), 225–230. DOI:Google ScholarCross Ref
[2] Amith Muhammad (Tuan), Cui Licong, Roberts Kirk, and Tao Cui. 2020. Towards an ontology-based medication conversational agent for PrEP and PEP. Proc. Conf. Assoc. Comput. Linguist. Meet. 31–40. DOI:Google ScholarCross Ref
[3] Anders George. 2017. Alexa, understand me. MIT Technology Review. Retrieved January 28, 2021 from https://www.technologyreview.com/2017/08/09/149815/alexa-understand-me/.Google Scholar
[4] Benyon David, Gamback Bjorn, Hansen Preben, Mival Oli, and Webb Nick. 2013. How was your day? Evaluating a conversational companion. IEEE Trans. Affect. Comput. 4, 3 (2013), 299–311. DOI:Google ScholarDigital Library
[5] Bibault Jean-Emmanuel, Chaix Benjamin, Guillemassé Arthur, Cousin Sophie, Escande Alexandre, Perrin Morgane, Pienkowski Arthur, Delamon Guillaume, Nectoux Pierre, and Brouard Benoît. 2019. A chatbot versus physicians to provide information for patients with breast cancer: Blind, randomized controlled noninferiority trial. J. Med. Internet Res. 21, 11 (2019), e15787. DOI:Google ScholarCross Ref
[6] Bickmore Timothy W., Mitchell Suzanne E., Jack Brian W., Paasche-Orlow Michael K., Pfeifer Laura M., and O'Donnell Julie. 2010. Response to a relational agent by hospital patients with depressive symptoms. Interact. Comput. 22, 4 (2010), 289–298. DOI:Google ScholarDigital Library
[7] Bickmore Timothy W., Trinh Ha, Olafsson Stefan, Leary Teresa K. O, Asadi Reza, Rickles Nathaniel M., and Cruz Ricardo. 2018. Patient and consumer safety risks when using conversational assistants for medical information: An observational study of Siri, Alexa, and Google Assistant. J. Med. Internet Res. 20, 9 (2018). DOI:Google ScholarCross Ref
[8] Boehner Kirsten, DePaula Rogério, Dourish Paul, and Sengers Phoebe. 2007. How emotion is made and measured. Int. J. Hum.-Comput. Stud. 65, 4 (2007), 275–291. DOI:Google ScholarDigital Library
[9] Boyd Matt and Wilson Nick. 2018. Just ask Siri? A pilot study comparing smartphone digital assistants and laptop Google searches for smoking cessation advice. PLoS ONE 13, 3 (2018), e0194811. DOI:Google ScholarCross Ref
[10] Car Lorainne Tudor, Dhinagaran Dhakshenya Ardhithy, Kyaw Bhone Myint, Kowatsch Tobias, Joty Shafiq, Theng Yin-Leng, and Atun Rifat. 2020. Conversational agents in health care: Scoping review and conceptual analysis. J. Med. Internet Res. 22, 8 (2020), e17158. DOI:Google ScholarCross Ref
[11] Chaix Benjamin, Bibault Jean-Emmanuel, Pienkowski Arthur, Delamon Guillaume, Guillemassé Arthur, Nectoux Pierre, and Brouard Benoît. 2019. When chatbots meet patients: One-year prospective study of conversations between patients with breast cancer and a chatbot. JMIR Cancer 5, 1 (2019), e12856. DOI:Google ScholarCross Ref
[12] Cheng A., Raghavaraju V., Kanugo J., Handrianto Y. P., and Shang Y.. 2018. Development and evaluation of a healthy coping voice interface application using the Google home for elderly patients with type 2 diabetes. In 2018 15th IEEE Annual Consumer Communications Networking Conference (CCNC). 1–5. DOI:Google ScholarDigital Library
[13] Clark Leigh, Doyle Phillip, Garaialde Diego, Gilmartin Emer, Schlögl Stephan, Edlund Jens, Aylett Matthew, Cabral João, Munteanu Cosmin, and Cowan Benjamin. 2018. The state of speech in HCI: Trends, themes and challenges. (2018). DOI:Google ScholarCross Ref
[14] Czeisler Mark É.. 2020. Mental health, substance use, and suicidal ideation during the COVID-19 pandemic — United States, June 24–30, 2020. MMWR Morb. Mortal. Wkly. Rep 69, (2020). DOI:Google ScholarCross Ref
[15] Dworkin Mark S., Lee Sangyoon, Chakraborty Apurba, Monahan Colleen, Hightow-Weidman Lisa, Garofalo Robert, Qato Dima M., Liu Li, and Jimenez Antonio. 2019. Acceptability, feasibility, and preliminary efficacy of a theory-based relational embodied conversational agent mobile phone intervention to promote HIV medication adherence in young HIV-positive African American MSM. AIDS Educ. Prev. 31, 1 (2019), 17–37. DOI:Google ScholarCross Ref
[16] Ferrand John, Hockensmith Ryli, Houghton Rebecca Fagen, and Walsh-Buhi Eric R.. 2020. Evaluating smart assistant responses for accuracy and misinformation regarding human papillomavirus vaccination: Content analysis study. J. Med. Internet Res. 22, 8 (2020), e19018. DOI:Google ScholarCross Ref
[17] Fitzpatrick Kathleen Kara, Darcy Alison, and Vierhile Molly. 2017. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Ment. Health 4, 2 (2017), e19. DOI:Google ScholarCross Ref
[18] Fulmer Russell, Joerin Angela, Gentile Breanna, Lakerink Lysanne, and Rauws Michiel. 2018. Using psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: Randomized controlled trial. JMIR Ment. Health 5, 4 (2018), e64. DOI:Google ScholarCross Ref
[19] Inkster Becky, Sarda Shubhankar, and Subramanian Vinod. 2018. An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study. JMIR MHealth UHealth 6, 11 (2018), e12106. DOI:Google ScholarCross Ref
[20] Iribarren Maddie. 2019. Microsoft releases voice assistant usage report, finds Apple Siri and Google Assistant tied at 36%, and 41% of respondents have privacy concerns. Voicebot.ai. Retrieved January 28, 2021 from https://voicebot.ai/2019/04/28/microsoft-releases-voice-assistant-usage-report-finds-apple-siri-and-google-assistant-tied-at-36-and-41-of-respondents-have-privacy-concerns/.Google Scholar
[21] Joerin Angela, Rauws Michiel, and Ackerman Mary Lou. 2019. Psychological artificial intelligence service, Tess: Delivering on-demand support to patients and their caregivers: Technical report. Cureus 11, 1 (2019), e3972. DOI:Google ScholarCross Ref
[22] Kadariya D., Venkataramanan R., Yip H. Y., Kalra M., Thirunarayanan K., and Sheth A.. 2019. kBot: Knowledge-enabled personalized chatbot for asthma self-management. In 2019 IEEE International Conference on Smart Computing (SMARTCOMP). 138–143. DOI:Google ScholarCross Ref
[23] Kinsella Bret. 2019. More than half of consumers want to use voice assistants for healthcare - new report from Voicebot and Orbita - voicebot.ai. Voicebot.ai. Retrieved January 22, 2021 from https://voicebot.ai/2019/10/29/more-than-half-of-consumers-want-to-use-voice-assistants-for-healthcare-new-report-from-voicebot-and-orbita/.Google Scholar
[24] Kinsella Bret. 2020. Nearly 90 million U.S. adults have smart speakers, adoption now exceeds one-third of consumers - voicebot.ai. Voicebot.ai. Retrieved January 22, 2021 from https://voicebot.ai/2020/04/28/nearly-90-million-u-s-adults-have-smart-speakers-adoption-now-exceeds-one-third-of-consumers/.Google Scholar
[25] Baki Kocaballi Ahmet, Quiroz Juan C., Rezazadegan Dana, Berkovsky Shlomo, Magrabi Farah, Coiera Enrico, and Laranjo Liliana. 2020. Responses of conversational agents to health and lifestyle prompts: Investigation of appropriateness and presentation structures. J. Med. Internet Res. 22, 2 (2020), e15823. DOI:Google ScholarCross Ref
[26] Kocielnik Rafal, Agapie Elena, Argyle Alexander, Hsieh Dennis T., Yadav Kabir, Taira Breena, and Hsieh Gary. 2020. HarborBot: A chatbot for social needs screening. AMIA. Annu. Symp. Proc. 2019, (2020), 552–561.Google Scholar
[27] Kroenke Kurt, Spitzer Robert L., Williams Janet B. W., and Löwe Bernd. 2010. The patient health questionnaire somatic, anxiety, and depressive symptom scales: A systematic review. Gen. Hosp. Psychiatry 32, 4 (2010), 345–359. DOI:Google ScholarCross Ref
[28] Laranjo Liliana, Dunn Adam G., Tong Huong Ly, Kocaballi Ahmet Baki, Chen Jessica, Bashir Rabia, Surian Didi, Gallego Blanca, Magrabi Farah, Lau Annie Y. S., and Coiera Enrico. 2018. Conversational agents in healthcare: A systematic review. J. Am. Med. Inform. Assoc. JAMIA 25, 9 (2018), 1248–1258. DOI:Google ScholarCross Ref
[29] Liu Bingjie and Shyam Sundar S.. 2018. Should machines express sympathy and empathy? Experiments with a health advice chatbot. Cyberpsychology Behav. Soc. Netw. 21, 10 (2018), 625–636. DOI:Google ScholarCross Ref
[30] Hoa Ly Kien, Ly Ann-Marie, and Andersson Gerhard. 2017. A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods. Internet Interv. 10, (2017), 39–46. DOI:Google ScholarCross Ref
[31] Miner Adam S., Milstein Arnold, Schueller Stephen, Hegde Roshini, Mangurian Christina, and Linos Eleni. 2016. Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health. JAMA Intern. Med. 176, 5 (2016), 619–625. DOI:Google ScholarCross Ref
[32] Moudatsou Maria, Stavropoulou Areti, Anastas Philalithis, and Sofia Koukouli. 2020. The role of empathy in health and social care professionals. Healthcare 8, 1 (2020). DOI:Google ScholarCross Ref
[33] Nass Clifford, Steuer Jonathan, and Tauber Ellen R.. 1994. Computers are social actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’94), Association for Computing Machinery, New York, NY, USA, 72–78. DOI:Google ScholarDigital Library
[34] Pew Research Center. 2017. Nearly half of Americans use digital voice assistants, mostly on their smartphones. Pew Research Center. Retrieved January 22, 2021 from https://www.pewresearch.org/fact-tank/2017/12/12/nearly-half-of-americans-use-digital-voice-assistants-mostly-on-their-smartphones/.Google Scholar
[35] Philip Pierre, Micoulaud-Franchi Jean-Arthur, Sagaspe Patricia, De Sevin Etienne, Olive Jérôme, Bioulac Stéphanie, and Sauteraud Alain. 2017. Virtual human as a new diagnostic tool, a proof of concept study in the field of major depressive disorders. Sci. Rep. 7, (2017), 42656. DOI:Google ScholarCross Ref
[36] Reeves Byron and Nass Clifford Ivar. 1996. The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press, New York, NY, USA.Google ScholarDigital Library
[37] Rhee Hyekyun, Allen James, Mammen Jennifer, and Swift Mary. 2014. Mobile phone-based asthma self-management aid for adolescents (mASMAA): A feasibility study. Patient Prefer. Adherence 8, (2014), 63–72. DOI:Google ScholarCross Ref
[38] Sezgin Emre, Huang Yungui, Ramtekkar Ujjwal, and Lin Simon. 2020. Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. npj Digit. Med. 3, 1 (2020), 1–4. DOI:Google ScholarCross Ref
[39] Spitzer Robert L., Kroenke Kurt, Williams Janet B. W., and Löwe Bernd. 2006. A brief measure for assessing generalized anxiety disorder: The GAD-7. Arch. Intern. Med. 166, 10 (2006), 1092. DOI:Google ScholarCross Ref
[40] Twenge Jean M., Bell Cooper A., Joiner Thomas E., Duffy Mary E., and Binau Sarah G.. 2019. Age, period, and cohort trends in mood disorder indicators and suicide-related outcomes in a nationally representative dataset, 2005-2017. J. Abnorm. Psychol. 128, 3 (2019), 185–199. DOI:Google ScholarCross Ref
[41] Vaidyam Aditya Nrusimha, Wisniewski Hannah, Halamka John David, Kashavan Matcheri S., and Torous John Blake. 2019. Chatbots and conversational agents in mental health: A review of the psychiatric landscape. Can. J. Psychiatry (2019), 070674371982897. DOI:Google ScholarCross Ref
[42] Walker Marilyn A., Litman Diane J., Kamm Candace A., and Abella Alicia. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Madrid, Spain, 271–280. DOI:Google ScholarDigital Library
[43] Wang Haolin, Zhang Qingpeng, Ip Mary, and Lau Joseph Tak Fai. 2018. Social media–based conversational agents for health management and interventions. Computer 51, 8 (2018), 26–33. DOI:Google ScholarDigital Library
[44] Weizenbaum Joseph. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (1966), 36–45. DOI:Google ScholarDigital Library
[45] WHO. Depression Fact Sheet. Retrieved January 21, 2021 from https://www.who.int/news-room/fact-sheets/detail/Depression.Google Scholar
[46] Xiong Jiaqi, Lipsitz Orly, Nasri Flora, Lui Leanna M. W., Gill Hartej, Phan Lee, Chen-Li David, Iacobucci Michelle, Ho Roger, Majeed Amna, and McIntyre Roger S.. 2020. Impact of COVID-19 pandemic on mental health in the general population: A systematic review. J. Affect. Disord. 277, (2020), 55–64. DOI:Google ScholarCross Ref

Index Terms

“I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
  2. Interaction design

Recommendations

Exploring how politeness impacts the user experience of chatbots for mental health support
Abstract
Politeness is important in human–human interaction when asking people to engage in sensitive conversations. If politeness manifests similarly in human–chatbot interaction, it may play an important role in the design of sensitive chatbot ...
Highlights
- Politeness can both positively and negatively impact the chatbot user experience.
- The Personal politeness chatbot was experienced as caring and encouraging.
- The Passive politeness chatbot was experienced as too apologetic and ...
Read More
The Effect of Emojis when interacting with Conversational Interface Assisted Health Coaching System
PervasiveHealth '18: Proceedings of the 12th EAI International Conference on Pervasive Computing Technologies for Healthcare

The recent rise of conversational interfaces have made it possible to integrate this technology into various domains, among which is health. Dialogue systems and conversational agents can bring a lot into healthcare to reduce cost, increase efficiency ...
Read More
Can Chatbots Help Support a Person’s Mental Health? Perceptions and Views from Mental Healthcare Professionals and Experts
Survey Paper

The objective of this study was to understand the attitudes of professionals who work in mental health regarding the use of conversational user interfaces, or chatbots, to support people’s mental health and wellbeing. This study involves an online ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Interactive Intelligent Systems Volume 12, Issue 2
June 2022
216 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/3543990
Editor:
Michelle X. Zhou
Juji, Inc., USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2022
- Online AM: 24 May 2022
- Revised: 1 September 2021
- Accepted: 1 September 2021
- Received: 1 January 2021
Published in tiis Volume 12, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Conversational agents
healthcare
conversation evaluation
voice agents
chatbots
mental health
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 1,179
  Total Downloads
- Downloads (Last 12 months)332
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

“I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression

ACM Transactions on Interactive Intelligent Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Exploring how politeness impacts the user experience of chatbots for mental health support

The Effect of Emojis when interacting with Conversational Interface Assisted Health Coaching System

Can Chatbots Help Support a Person’s Mental Health? Perceptions and Views from Mental Healthcare Professionals and Experts

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

“I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression

ACM Transactions on Interactive Intelligent Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Exploring how politeness impacts the user experience of chatbots for mental health support

The Effect of Emojis when interacting with Conversational Interface Assisted Health Coaching System

Can Chatbots Help Support a Person’s Mental Health? Perceptions and Views from Mental Healthcare Professionals and Experts

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media