Evaluating the Risks of AI Therapy Chatbots in Mental Health Care

A recent study conducted by researchers at Stanford University raises significant concerns regarding the use of AI therapy chatbots in mental health care. The study, led by Nick Haber, an assistant professor at the Stanford Graduate School of Education and an affiliate of the Stanford Institute for Human-Centered AI, highlights that while these chatbots offer a low-cost alternative for therapy, they may lack the effectiveness of human therapists and could perpetuate harmful stigmas regarding mental health conditions. This research will be presented at the upcoming ACM Conference on Fairness, Accountability, and Transparency.
As mental health issues become increasingly prevalent, with nearly 50 percent of individuals experiencing challenges unable to access therapeutic services, AI therapy chatbots powered by large language models (LLMs) have been touted as a potential solution to bridge this gap. However, the Stanford study suggests that these AI systems can introduce biases and failures that may have dangerous consequences.
The research team conducted a comprehensive mapping review of therapeutic guidelines to identify crucial characteristics of effective human therapists. These traits include empathy, equitable treatment of patients, and the ability to challenge harmful thoughts. Subsequently, they performed two experiments assessing the performance of five popular therapy chatbots, including 7cups’ 'Pi' and 'Noni', and the 'Therapist' from Character.ai.
In the first experiment, researchers presented the chatbots with vignettes of individuals exhibiting various mental health symptoms to evaluate the potential stigma elicited. The chatbots were asked questions based on standard measures of stigma, such as their willingness to work with individuals described in these vignettes. The findings revealed that the AI systems exhibited increased stigma towards conditions such as alcohol dependence and schizophrenia compared to depression, which could discourage patients from seeking necessary care. Jared Moore, a PhD candidate in computer science at Stanford and lead author of the paper, emphasized the consistent stigma across different AI models, stating, "Bigger models and newer models show as much stigma as older models."
In the second experiment, the research team tested how the chatbots responded to sensitive topics such as suicidal ideation. In one instance, when prompted with a suicidal question regarding bridges in New York City, the chatbot 'Noni' responded with factual information about bridge heights rather than addressing the underlying mental health issue. Moore pointed out that these chatbots, which have interacted with millions of users, failed to provide appropriate therapeutic responses, raising significant concerns about their safety in clinical contexts.
The implications of this research extend beyond immediate therapy applications. While the replacement of human therapists with AI may not be feasible in the near future, the study does suggest potential roles for AI in supporting human therapists. AI could assist with administrative tasks, such as billing, or serve as a training tool for developing therapists, helping them practice in a controlled environment before working with real patients. Haber noted, “Nuance is the issue — this isn’t simply ‘LLMs for therapy is bad,’ but it’s asking us to think critically about the role of LLMs in therapy.”
Looking forward, the landscape of AI in mental health care remains complex and requires careful consideration of both its advantages and limitations. The researchers advocate for a critical examination of the roles that LLMs can play, suggesting that while AI can enhance therapeutic practices, it cannot replace the human element essential to effective mental health care.
Advertisement
Tags
Advertisement