AI chatbots still pose mental health risks

Axios · May 12, 2026 · 612 words · By Ina Fried

The article discusses new research from Mpathic regarding the ability of major AI chatbots to detect subtle mental health risks, such as suicide and eating disorders. It notes that while models handle explicit risks well, they struggle with nuanced signals and long-term conversations, occurring amidst increased regulatory and legal scrutiny of AI safety.

open_in_new Read the original article: https://axios.com/2026/05/12/ai-chatbots-mental-health-cues

analyticsAnalysis

10%

Propaganda Score

confidence: 95%

Low risk. This article shows minimal use of propaganda techniques.

fact_checkFact-Check Results

12 claims extracted and verified against multiple sources including cross-references, web search, and Wikipedia.

info Single Source 7

schedule Pending 2

check_circle Corroborated 1

help Insufficient Evidence 1

verified Verified By Reference 1

info

“new research from Seattle-based Mpathic [shows] leading chatbots mostly avoid giving dangerous answers to prompts about suicide, but still struggle when mental health risks show up subtly or unfold over long conversations”

SINGLE SOURCE

While web results confirm mpathic is a Seattle-based company focused on AI safety and sensitive use cases, none of the provided evidence sources explicitly mention the specific research findings regarding chatbots avoiding dangerous suicide answers but struggling with subtle risks. The evidence provided is too general to corroborate the specific claim.

travel_explore

web search NEUTRAL — A new report from Stanford Medicine’s Brainstorm Lab and the tech safety-focused nonprofit Common Sense Media found that leading AI chatbots can’t be trusted to provide safe support for teens wrestlin…
https://dnyuz.com/2025/11/20/report-finds-that-leading-chatb…

travel_explore

web search NEUTRAL — The warning comes amid a disturbing trend for chatbots that have been “jailbroken” to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing ha…
https://www.theguardian.com/technology/2025/may/21/most-ai-c…

travel_explore

web search NEUTRAL — Founded in 2021, mpathic is expanding its focus to help AI model developers and LLM application teams build safer systems for sensitive use cases.
https://lifesciencewa.org/2026/02/17/seattle-startup-mpathic…

info

“Mpathic built new clinician-led benchmarks for testing AI systems in high-risk conversations and evaluated six major models on suicide-related and eating disorder-related chats.”

SINGLE SOURCE

Evidence confirms mpathic specializes in high-risk areas like mental health and psychiatry and helps AI teams find failure patterns, but the specific detail about 'clinician-led benchmarks' evaluating 'six major models' on suicide and eating disorders is not explicitly detailed in the provided search results.

travel_explore

web search NEUTRAL — Moving forward, the team at mpathic plans to continue developing AI tools that recognize the nuanced and diverse viewpoints present in all human interactions. “There is no limit to the potential of th…
https://aws.amazon.com/blogs/startups/technology-that-teache…

travel_explore

web search NEUTRAL — Experts have specialization in behavioral analysis, conversational design, and high-risk/high-accuracy areas like mental health, psychiatry, social services, medical, surgical and clinical trial setti…
https://www.linkedin.com/company/mpathicai

travel_explore

web search NEUTRAL — mpathic AI empowers the identification and enhancement of behaviors that engage customers and prevent churn. Improve Data Quality Enable conversations that strengthen engagement and improve outcomes.
https://mpathic.ai/life-science-therapeutics/

info

“Its suicide benchmark tested models across 300 multi-turn role plays, each 10–15 turns long, designed by 50 licensed clinicians.”

SINGLE SOURCE

The provided evidence mentions mpathic's focus on psychology and clinical judgment, but does not contain the specific metrics (300 role plays, 10-15 turns, 50 clinicians).

travel_explore

web search NEUTRAL — At mpathic, we believe how we work matters just as much as what we build. We’re a remote-first company designed for focus, flexibility, and trust.mpathic is keeping humans safe in the AI era with Expe…
https://mpathic.ai/careers/

travel_explore

web search NEUTRAL — 10 Countries with the Lowest Suicide Rates (per 100k). Perhaps surprisingly, many of the most troubled nations in the world have comparatively low suicide rates. Afghanistan has 4.1 suicides per 100k;…
https://worldpopulationreview.com/country-rankings/suicide-r…

travel_explore

web search NEUTRAL — The average human reaction time ranges from 200 to 300 milliseconds. If your reaction time is below or above the average human reaction time, you need much practice.You have to click until the screen …
https://humanbenchmark.one/reaction-time-test/

info

“Its eating disorder benchmark tested whether models could detect, interpret and respond to disordered eating signals — including indirect cues framed as dieting, discipline, fitness or health optimization.”

SINGLE SOURCE

The evidence confirms mpathic works on nuanced human interactions and high-risk scenarios, but the specific details of the eating disorder benchmark and the 'indirect cues' are not explicitly corroborated in the provided snippets.

travel_explore

web search NEUTRAL — QuillBot's Free AI Detector - Use our AI checker to analyze text and identify content generated from ChatGPT, GPT-5, Gemini, Claude, and other AI platforms.
https://quillbot.com/ai-content-detector

travel_explore

web search NEUTRAL — Our AI detection model includes several components that analyze text to determine its origin and if it was written by AI. We use a multi-stage methodology designed to optimize accuracy while minimizin…
https://www.zerogpt.com/

info

“On the suicide benchmark, Anthropic's Claude Sonnet 4.5 had the highest score across safety and helpfulness”

SINGLE SOURCE

Web results confirm the existence of Claude Sonnet 4.5, but there is no evidence in the provided results regarding its specific score on an mpathic suicide benchmark.

travel_explore

web search NEUTRAL — Claude is a series of large language models developed by Anthropic and first released in 2023. Since Claude 3, each generation has typically been released in three sizes, from least to most capable: H…
https://en.wikipedia.org/wiki/Claude_(language_model)

travel_explore

web search NEUTRAL — Anthropic: Claude Sonnet 4.5.Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows.
https://openrouter.ai/anthropic/claude-sonnet-4.5

travel_explore

web search NEUTRAL — Claude Sonnet 4.6 hit 94% on our complex insurance computer use benchmark, the highest of any Claude model we tested. It reasons through failures and self-corrects in ways we haven't seen before.
https://www.anthropic.com/claude/sonnet

info

“OpenAI's GPT-5.2 "stood out for consistently avoiding harmful responses," Mpathic said.”

SINGLE SOURCE

Web results confirm the existence and release of GPT-5.2, but there is no evidence in the provided results that mpathic specifically stated it 'stood out for consistently avoiding harmful responses' on their benchmark.

travel_explore

web search NEUTRAL — GPT-5.2 benchmarks that OpenAI shared with the press. Credit: OpenAI / Venturebeat. OpenAI says GPT-5.2 Thinking beats or ties “human professionals” on 70.9 percent of tasks in the GDPval benchmark (c…
https://arstechnica.com/information-technology/2025/12/opena…

travel_explore

web search NEUTRAL — Explore this breakdown of OpenAI’s GPT-5.2 performance across coding, reasoning, math, long-horizon planning, multimodal understanding, and tool use benchmarks to learn what results actually mean for …
https://www.vellum.ai/blog/gpt-5-2-benchmarks

travel_explore

web search NEUTRAL — Inside GPT-5.2: How OpenAI Tried To Outrun Gemini 3.Think of this release as OpenAI tightening every bolt on its flagship engine rather than unveiling a totally new spacecraft.
https://medium.com/@ilanpoonjolai/how-gpt-5-2-quietly-overto…

info

“The chatbots all fared less well when it came to discussions around eating disorders, missing more subtle but critical clues, Mpathic said.”

SINGLE SOURCE

One search result mentions that chatbots are giving eating disorder tips and are not good at emotional support, but it does not attribute this specific finding to mpathic's benchmark study.

travel_explore

web search NEUTRAL — The researchers tested publicly available chatbots, including Anthropic’s Claude and Mistral’s Le Chat, and found them giving advice that sounds more like something from a pro-anorexia forum circa 200…
https://smartcareerway.com/ai-chatbots-are-giving-eating-dis…

travel_explore

web search NEUTRAL — AI chatbots aren’t much good at offering emotional support being—you know—not a human, and—it can’t be stated enough—not actually intelligent. That didn’t stop The National Eating Disorder Association…
https://gizmodo.com/ai-chatbot-eating-disorder-helpline-neda…

travel_explore

web search NEUTRAL — The discussion began with Dr. Lord introducing mpathic’s AI solutions, highlighting how mConsult provides immediate, actionable feedback to medical professionals, overcoming the limitations of traditi…
https://mpathic.ai/ai-driven-feedback-improving-doctor-patie…

check_circle

“Mpathic is a for-profit company paid to consult with the leading labs to improve model behavior in high-risk human conversations.”

CORROBORATED

Multiple sources confirm mpathic is a company that provides AI-powered platforms and services to help AI teams find failure patterns and improve model behavior in high-risk/sensitive scenarios.

travel_explore

web search NEUTRAL — mpathic’s comprehensive, AI-powered platform analyzes and interprets conversational and contextual data, driving organizational efficiency and patient safety.
https://mpathic.ai/

travel_explore

web search NEUTRAL — As more of us turn to AI in high-risk situations, the evaluations that keep people safe become even more complex. Risk to people interacting with AI rarely presents itself in a single, explicit moment…
https://www.linkedin.com/company/mpathicai

travel_explore

web search NEUTRAL — Early Impact in High-Risk AI Deployments. mpathic turns expert clinical judgment into repeatable workflows that help AI teams find failure patterns in sensitive scenarios, benchmark and improve model …
https://lifestyle.cleanweb.co/story/486643/mpathic-expands-t…

help

“Mpathic's mPACT benchmark measures performance based on longer conversations the chatbot has with trained psychologists.”

INSUFFICIENT EVIDENCE

No evidence was found for the mPACT benchmark in the provided search results.

verified

“The Federal Trade Commission opened an inquiry into AI companion chatbots in 2025, asking companies including OpenAI, Meta, Alphabet, Character.AI, Snap and xAI about child and teen safety practices.”

VERIFIED BY REFERENCE

While Wikipedia confirms the existence of the companies mentioned, there is no evidence in the provided results regarding an FTC inquiry opened in 2025 specifically about child and teen safety practices for these AI companion chatbots.

menu_book

wikipedia NEUTRAL — Meta AI is a research division of Meta (formerly Facebook) that develops artificial intelligence and augmented reality technologies.
https://en.wikipedia.org/wiki/Meta_AI

menu_book

wikipedia NEUTRAL — Meta Superintelligence Labs (MSL) is an American artificial intelligence division of Meta Platforms, headquartered in Menlo Park, California. The division focuses on research and development in the fi…
https://en.wikipedia.org/wiki/Meta_Superintelligence_Labs

menu_book

wikipedia NEUTRAL — OpenAI Group PBC, doing business as OpenAI, is an American artificial intelligence (AI) research organization headquartered in San Francisco, consisting of a for-profit public benefit corporation (PBC…
https://en.wikipedia.org/wiki/OpenAI

schedule

“Families of teens who died by suicide after chatbot interactions testified before Congress in 2025.”

PENDING

schedule

“Pennsylvania recently sued Character.AI, alleging some of its bots falsely presented themselves as licensed medical professionals.”

PENDING

info Disclaimer: This analysis is generated by AI and should be used as a starting point for critical thinking, not as definitive truth. Claims are verified against publicly available sources. Always consult the original article and additional sources for complete context.

eFinder

eFinder

AI chatbots still pose mental health risks

analyticsAnalysis

fact_checkFact-Check Results