skip to main content
research-article

“I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression

Published:20 July 2022Publication History
Skip Abstract Section

Abstract

Conversational agents (CAs) are increasingly ubiquitous and are now commonly used to access medical information. However, we lack systematic data about the quality of advice such agents provide. This paper evaluates CA advice for mental health (MH) questions, a pressing issue given that we are undergoing a mental health crisis. Building on prior work, we define a new method to systematically evaluate mental health responses from CAs. We develop multi-utterance conversational probes derived from two widely used mental health diagnostic surveys, the PHQ-9 (Depression) and the GAD-7 (Anxiety). We evaluate the responses of two text-based chatbots and four voice assistants to determine whether CAs provide relevant responses and treatments. Evaluations were conducted both by clinicians and immersively by trained raters, yielding consistent results across all raters. Although advice and recommendations were generally low quality, they were better for Crisis probes and for probes concerning symptoms of Anxiety rather than Depression. Responses were slightly improved for text versus speech-based agents, and when CAs had access to extended dialogue context. Design implications include suggestions for improved responses through clarification sub-dialogues. Responses may also be improved by the incorporation of empathy although this needs to be combined with effective treatments or advice.

REFERENCES

  1. [1] Abdullah Abu S., Gaehde Stephan, and Bickmore Tim. 2018. A tablet based embodied conversational agent to promote smoking cessation among veterans: A feasibility study. J. Epidemiol. Glob. Health 8, 3–4 (2018), 225230. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Amith Muhammad (Tuan), Cui Licong, Roberts Kirk, and Tao Cui. 2020. Towards an ontology-based medication conversational agent for PrEP and PEP. Proc. Conf. Assoc. Comput. Linguist. Meet. 3140. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Anders George. 2017. Alexa, understand me. MIT Technology Review. Retrieved January 28, 2021 from https://www.technologyreview.com/2017/08/09/149815/alexa-understand-me/.Google ScholarGoogle Scholar
  4. [4] Benyon David, Gamback Bjorn, Hansen Preben, Mival Oli, and Webb Nick. 2013. How was your day? Evaluating a conversational companion. IEEE Trans. Affect. Comput. 4, 3 (2013), 299311. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bibault Jean-Emmanuel, Chaix Benjamin, Guillemassé Arthur, Cousin Sophie, Escande Alexandre, Perrin Morgane, Pienkowski Arthur, Delamon Guillaume, Nectoux Pierre, and Brouard Benoît. 2019. A chatbot versus physicians to provide information for patients with breast cancer: Blind, randomized controlled noninferiority trial. J. Med. Internet Res. 21, 11 (2019), e15787. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Bickmore Timothy W., Mitchell Suzanne E., Jack Brian W., Paasche-Orlow Michael K., Pfeifer Laura M., and O'Donnell Julie. 2010. Response to a relational agent by hospital patients with depressive symptoms. Interact. Comput. 22, 4 (2010), 289298. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Bickmore Timothy W., Trinh Ha, Olafsson Stefan, Leary Teresa K. O, Asadi Reza, Rickles Nathaniel M., and Cruz Ricardo. 2018. Patient and consumer safety risks when using conversational assistants for medical information: An observational study of Siri, Alexa, and Google Assistant. J. Med. Internet Res. 20, 9 (2018). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Boehner Kirsten, DePaula Rogério, Dourish Paul, and Sengers Phoebe. 2007. How emotion is made and measured. Int. J. Hum.-Comput. Stud. 65, 4 (2007), 275291. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Boyd Matt and Wilson Nick. 2018. Just ask Siri? A pilot study comparing smartphone digital assistants and laptop Google searches for smoking cessation advice. PLoS ONE 13, 3 (2018), e0194811. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Car Lorainne Tudor, Dhinagaran Dhakshenya Ardhithy, Kyaw Bhone Myint, Kowatsch Tobias, Joty Shafiq, Theng Yin-Leng, and Atun Rifat. 2020. Conversational agents in health care: Scoping review and conceptual analysis. J. Med. Internet Res. 22, 8 (2020), e17158. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chaix Benjamin, Bibault Jean-Emmanuel, Pienkowski Arthur, Delamon Guillaume, Guillemassé Arthur, Nectoux Pierre, and Brouard Benoît. 2019. When chatbots meet patients: One-year prospective study of conversations between patients with breast cancer and a chatbot. JMIR Cancer 5, 1 (2019), e12856. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Cheng A., Raghavaraju V., Kanugo J., Handrianto Y. P., and Shang Y.. 2018. Development and evaluation of a healthy coping voice interface application using the Google home for elderly patients with type 2 diabetes. In 2018 15th IEEE Annual Consumer Communications Networking Conference (CCNC). 15. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Clark Leigh, Doyle Phillip, Garaialde Diego, Gilmartin Emer, Schlögl Stephan, Edlund Jens, Aylett Matthew, Cabral João, Munteanu Cosmin, and Cowan Benjamin. 2018. The state of speech in HCI: Trends, themes and challenges. (2018). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Czeisler Mark É.. 2020. Mental health, substance use, and suicidal ideation during the COVID-19 pandemic — United States, June 24–30, 2020. MMWR Morb. Mortal. Wkly. Rep 69, (2020). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Dworkin Mark S., Lee Sangyoon, Chakraborty Apurba, Monahan Colleen, Hightow-Weidman Lisa, Garofalo Robert, Qato Dima M., Liu Li, and Jimenez Antonio. 2019. Acceptability, feasibility, and preliminary efficacy of a theory-based relational embodied conversational agent mobile phone intervention to promote HIV medication adherence in young HIV-positive African American MSM. AIDS Educ. Prev. 31, 1 (2019), 1737. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Ferrand John, Hockensmith Ryli, Houghton Rebecca Fagen, and Walsh-Buhi Eric R.. 2020. Evaluating smart assistant responses for accuracy and misinformation regarding human papillomavirus vaccination: Content analysis study. J. Med. Internet Res. 22, 8 (2020), e19018. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Fitzpatrick Kathleen Kara, Darcy Alison, and Vierhile Molly. 2017. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Ment. Health 4, 2 (2017), e19. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Fulmer Russell, Joerin Angela, Gentile Breanna, Lakerink Lysanne, and Rauws Michiel. 2018. Using psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: Randomized controlled trial. JMIR Ment. Health 5, 4 (2018), e64. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Inkster Becky, Sarda Shubhankar, and Subramanian Vinod. 2018. An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study. JMIR MHealth UHealth 6, 11 (2018), e12106. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Iribarren Maddie. 2019. Microsoft releases voice assistant usage report, finds Apple Siri and Google Assistant tied at 36%, and 41% of respondents have privacy concerns. Voicebot.ai. Retrieved January 28, 2021 from https://voicebot.ai/2019/04/28/microsoft-releases-voice-assistant-usage-report-finds-apple-siri-and-google-assistant-tied-at-36-and-41-of-respondents-have-privacy-concerns/.Google ScholarGoogle Scholar
  21. [21] Joerin Angela, Rauws Michiel, and Ackerman Mary Lou. 2019. Psychological artificial intelligence service, Tess: Delivering on-demand support to patients and their caregivers: Technical report. Cureus 11, 1 (2019), e3972. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Kadariya D., Venkataramanan R., Yip H. Y., Kalra M., Thirunarayanan K., and Sheth A.. 2019. kBot: Knowledge-enabled personalized chatbot for asthma self-management. In 2019 IEEE International Conference on Smart Computing (SMARTCOMP). 138143. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Kinsella Bret. 2019. More than half of consumers want to use voice assistants for healthcare - new report from Voicebot and Orbita - voicebot.ai. Voicebot.ai. Retrieved January 22, 2021 from https://voicebot.ai/2019/10/29/more-than-half-of-consumers-want-to-use-voice-assistants-for-healthcare-new-report-from-voicebot-and-orbita/.Google ScholarGoogle Scholar
  24. [24] Kinsella Bret. 2020. Nearly 90 million U.S. adults have smart speakers, adoption now exceeds one-third of consumers - voicebot.ai. Voicebot.ai. Retrieved January 22, 2021 from https://voicebot.ai/2020/04/28/nearly-90-million-u-s-adults-have-smart-speakers-adoption-now-exceeds-one-third-of-consumers/.Google ScholarGoogle Scholar
  25. [25] Baki Kocaballi Ahmet, Quiroz Juan C., Rezazadegan Dana, Berkovsky Shlomo, Magrabi Farah, Coiera Enrico, and Laranjo Liliana. 2020. Responses of conversational agents to health and lifestyle prompts: Investigation of appropriateness and presentation structures. J. Med. Internet Res. 22, 2 (2020), e15823. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kocielnik Rafal, Agapie Elena, Argyle Alexander, Hsieh Dennis T., Yadav Kabir, Taira Breena, and Hsieh Gary. 2020. HarborBot: A chatbot for social needs screening. AMIA. Annu. Symp. Proc. 2019, (2020), 552561.Google ScholarGoogle Scholar
  27. [27] Kroenke Kurt, Spitzer Robert L., Williams Janet B. W., and Löwe Bernd. 2010. The patient health questionnaire somatic, anxiety, and depressive symptom scales: A systematic review. Gen. Hosp. Psychiatry 32, 4 (2010), 345359. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Laranjo Liliana, Dunn Adam G., Tong Huong Ly, Kocaballi Ahmet Baki, Chen Jessica, Bashir Rabia, Surian Didi, Gallego Blanca, Magrabi Farah, Lau Annie Y. S., and Coiera Enrico. 2018. Conversational agents in healthcare: A systematic review. J. Am. Med. Inform. Assoc. JAMIA 25, 9 (2018), 12481258. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Liu Bingjie and Shyam Sundar S.. 2018. Should machines express sympathy and empathy? Experiments with a health advice chatbot. Cyberpsychology Behav. Soc. Netw. 21, 10 (2018), 625636. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Hoa Ly Kien, Ly Ann-Marie, and Andersson Gerhard. 2017. A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods. Internet Interv. 10, (2017), 3946. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Miner Adam S., Milstein Arnold, Schueller Stephen, Hegde Roshini, Mangurian Christina, and Linos Eleni. 2016. Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health. JAMA Intern. Med. 176, 5 (2016), 619625. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Moudatsou Maria, Stavropoulou Areti, Anastas Philalithis, and Sofia Koukouli. 2020. The role of empathy in health and social care professionals. Healthcare 8, 1 (2020). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Nass Clifford, Steuer Jonathan, and Tauber Ellen R.. 1994. Computers are social actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’94), Association for Computing Machinery, New York, NY, USA, 7278. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Pew Research Center. 2017. Nearly half of Americans use digital voice assistants, mostly on their smartphones. Pew Research Center. Retrieved January 22, 2021 from https://www.pewresearch.org/fact-tank/2017/12/12/nearly-half-of-americans-use-digital-voice-assistants-mostly-on-their-smartphones/.Google ScholarGoogle Scholar
  35. [35] Philip Pierre, Micoulaud-Franchi Jean-Arthur, Sagaspe Patricia, De Sevin Etienne, Olive Jérôme, Bioulac Stéphanie, and Sauteraud Alain. 2017. Virtual human as a new diagnostic tool, a proof of concept study in the field of major depressive disorders. Sci. Rep. 7, (2017), 42656. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Reeves Byron and Nass Clifford Ivar. 1996. The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press, New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Rhee Hyekyun, Allen James, Mammen Jennifer, and Swift Mary. 2014. Mobile phone-based asthma self-management aid for adolescents (mASMAA): A feasibility study. Patient Prefer. Adherence 8, (2014), 6372. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Sezgin Emre, Huang Yungui, Ramtekkar Ujjwal, and Lin Simon. 2020. Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. npj Digit. Med. 3, 1 (2020), 14. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Spitzer Robert L., Kroenke Kurt, Williams Janet B. W., and Löwe Bernd. 2006. A brief measure for assessing generalized anxiety disorder: The GAD-7. Arch. Intern. Med. 166, 10 (2006), 1092. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Twenge Jean M., Bell Cooper A., Joiner Thomas E., Duffy Mary E., and Binau Sarah G.. 2019. Age, period, and cohort trends in mood disorder indicators and suicide-related outcomes in a nationally representative dataset, 2005-2017. J. Abnorm. Psychol. 128, 3 (2019), 185199. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Vaidyam Aditya Nrusimha, Wisniewski Hannah, Halamka John David, Kashavan Matcheri S., and Torous John Blake. 2019. Chatbots and conversational agents in mental health: A review of the psychiatric landscape. Can. J. Psychiatry (2019), 070674371982897. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Walker Marilyn A., Litman Diane J., Kamm Candace A., and Abella Alicia. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Madrid, Spain, 271280. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Wang Haolin, Zhang Qingpeng, Ip Mary, and Lau Joseph Tak Fai. 2018. Social media–based conversational agents for health management and interventions. Computer 51, 8 (2018), 2633. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Weizenbaum Joseph. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (1966), 3645. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] WHO. Depression Fact Sheet. Retrieved January 21, 2021 from https://www.who.int/news-room/fact-sheets/detail/Depression.Google ScholarGoogle Scholar
  46. [46] Xiong Jiaqi, Lipsitz Orly, Nasri Flora, Lui Leanna M. W., Gill Hartej, Phan Lee, Chen-Li David, Iacobucci Michelle, Ho Roger, Majeed Amna, and McIntyre Roger S.. 2020. Impact of COVID-19 pandemic on mental health in the general population: A systematic review. J. Affect. Disord. 277, (2020), 5564. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. “I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Interactive Intelligent Systems
        ACM Transactions on Interactive Intelligent Systems  Volume 12, Issue 2
        June 2022
        216 pages
        ISSN:2160-6455
        EISSN:2160-6463
        DOI:10.1145/3543990
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 July 2022
        • Online AM: 24 May 2022
        • Revised: 1 September 2021
        • Accepted: 1 September 2021
        • Received: 1 January 2021
        Published in tiis Volume 12, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format