Why AI health chatbots won’t make you better at diagnosing yourself – new research

The Conversation · Mar 31, 2026 · 839 words · By Rebecca Payne

The article discusses a study evaluating the effectiveness of AI chatbots in medical decision-making, finding that human-AI interactions often lead to poor health outcomes. It argues that while AI can perform well in structured tasks, it lacks the human qualities necessary for clinical care and should be used as a supportive tool rather than a replacement for doctors.

open_in_new Read the original article: https://theconversation.com/why-ai-health-chatbots-wont-make-you-better-at-diagn…

analyticsAnalysis

Propaganda Score

confidence: 100%

Low risk. This article shows minimal use of propaganda techniques.

fact_checkFact-Check Results

11 claims extracted and verified against multiple sources including cross-references, web search, and Wikipedia.

check_circle Corroborated 4

info Single Source 3

help Insufficient Evidence 2

cancel Disputed 1

schedule Pending 1

info

“Millions of people are turning to artificial intelligence (AI) chatbots for advice on everything from cooking to tax returns.”

SINGLE SOURCE

The web search results confirm that people are using AI chatbots for health advice (e.g., 'Millions of Americans Are Talking to AI Instead of Going to the Doctor...'), but the evidence does not provide enough independent sources or breadth to confirm that 'Millions of people' are using them for *everything* from cooking to tax returns. The evidence is suggestive but not broadly corroborated across multiple independent reports covering all listed domains.

travel_explore

web search NEUTRAL — At the same time, many Americans remain highly skeptical of AI’s medical advice. Roughly a third of participants who said they consulted AI for health issues said they distrusted the tool. One in ten …
https://futurism.com/artificial-intelligence/millions-americ…

travel_explore

web search NEUTRAL — It involves a Swedish medical researcher, a fictional eye disease, two deliberately fake academic papers, and the moment four of the biggest AI companies on the planet fell for all of it. The papers t…
https://tech.yahoo.com/ai/chatgpt/articles/ai-chatbots-givin…

travel_explore

web search NEUTRAL — Best AI chatbots. We went through numerous platforms and read third-party review websites to collect the top AI chat apps. Then, we tested them and looked through the features.
https://www.tidio.com/blog/ai-chatbot/

info

“The UK’s chief medical officer recently warned that relying on AI chatbots for medical decisions may not be wise.”

SINGLE SOURCE

One web search result mentions the 'Deputy Chief Medical Officer Dr Jenny Harries' in connection with AI chatbots, suggesting a warning or discussion occurred. However, the evidence provided does not contain multiple independent sources confirming a specific warning from the UK's chief medical officer regarding medical decisions.

travel_explore

web search NEUTRAL — Deputy Chief Medical Officer Dr Jenny Harries.
https://www.thesun.co.uk/news/politics/

travel_explore

web search NEUTRAL — Passing Exams Is Not Practicing Medicine. AI chatbots can pass medical licensing exams.A randomized controlled trial published in Nature Medicine in February 2026 set out to test what happens when non…
https://vibegraveyard.ai/story/oxford-ai-chatbots-medical-ad…

travel_explore

web search NEUTRAL — Medical chatbots are increasingly integrated into digital healthcare, providing personalized health information using AI tools from companies like OpenAI and Anthropic. They analyze data from medical …
https://theaimag.net/medical-chatbots-ignite-intense-debate-…

check_circle

“A study tested how well large language model (LLM) chatbots help the public deal with common health problems.”

CORROBORATED

Multiple web search results discuss the evaluation and application of LLMs in medical chatbots. One result mentions a 'systematic review of studies on LLM-based chatbot health advice services,' and another discusses the general transformation of medical chatbots by LLMs, indicating active study and evaluation of this topic.

travel_explore

web search NEUTRAL — Large language models (LLMs) are transforming the capabilities of medical chatbots by enabling more context-aware, human-like interactions. This review presents a comprehensive analysis of their appli…
https://www.mdpi.com/2078-2489/16/7/549

travel_explore

web search NEUTRAL — A systematic review of studies on LLM-based chatbot health advice services revealed considerable variation in reporting quality, with many studies providing insufficient information to identify the sp…
https://pmc.ncbi.nlm.nih.gov/articles/PMC12189880/

travel_explore

web search NEUTRAL — In a randomized controlled study involving 1,298 participants from a general sample, performance of humans when assisted by a large language model (LLM) was sensibly inferior to that of the LLM ...
https://www.nature.com/articles/s41591-025-04074-y

check_circle

“Users of chatbots were less likely to identify the correct condition than those who didn’t use chatbots.”

CORROBORATED

Two distinct web search results directly support the claim that users who interacted with chatbots were less accurate in identifying correct medical conditions compared to control groups or non-users. One states, 'People who used chatbots were less likely to identify the correct condition than those who didn't.'

travel_explore

web search NEUTRAL — People who used chatbots were less likely to identify the correct condition than those who didn't. They were also no better at determining the right place to seek care than the control group. In other…
https://www.sciencealert.com/ai-chatbots-are-bad-at-diagnosi…

travel_explore

web search NEUTRAL — Researchers cite two main problems: users had trouble providing the chatbots with relevant and complete information and the models sometimes gave contradictory or outright incorrect advice.
https://www.computerworld.com/article/4130361/ai-chatbots-wo…

travel_explore

web search NEUTRAL — The authors consequently recommend that users be extremely critical when seeking medical advice from AI chatbots and default to consulting human specialists before implementing model recommendations.
https://theoutpost.ai/news-story/large-language-models-excel…

cancel

“Chatbots performed better when given direct medical scenarios without human interaction.”

DISPUTED

The evidence presents conflicting findings. One source suggests chatbots falter when faced with incomplete information (implying difficulty in real-world scenarios), while another source reports that ChatGPT *outshone* human candidates in a mock exam, suggesting better performance in structured testing. The evidence does not definitively prove they *always* perform better without human interaction.

travel_explore

web search NEUTRAL — Consumer AI chatbots falter when used to make medical diagnoses, particularly when faced with incomplete information, according to new research highlighting the risks of relying on them as digital doc…
https://theoutpost.ai/news-story/large-language-models-excel…

travel_explore

web search NEUTRAL — In a study to determine how the Chat Generative Pre-Trained Transformer or ChatGPT would fare in medical specialist examinations compared to human candidates without additional training, the Artificia…
https://neurosciencenews.com/chatgpt-medical-exam-23458/

travel_explore

web search NEUTRAL — Many medical systems already use simpler chatbots to perform tasks such as scheduling appointments and providing people with general health information.But the well-read LLM chatbots could take doctor…
https://www.scientificamerican.com/article/ai-chatbots-can-d…

info

“Chatbot performance issues stem from communication failures between humans and machines.”

SINGLE SOURCE

One web search result discusses the need to analyze 'communication failures between humans and machines,' suggesting this is a key area of study regarding performance issues. However, the other two results are general discussions about AI/human behavior patterns and chatbot performance issues, without directly attributing the root cause solely to 'communication failures.'

travel_explore

web search NEUTRAL — Uniting both groups of researchers is the belief that communication failures between humans and machines need to be taken seriously and that a systematic analysis of such failures may open fruitful av…
https://researchportal.hw.ac.uk/en/publications/working-with…

travel_explore

web search NEUTRAL — No significant effects were detected from experimental conditions, despite conversation analyses revealing differences in AI and human behavioral patterns across the conditions. Instead, participants …
https://www.media.mit.edu/publications/how-ai-and-human-beha…

travel_explore

web search NEUTRAL — Making Chatbot QA an Ongoing Advantage. Conclusion. Chatbots are often the first—and sometimes only—touchpoint between your brand and your customers. They promise speed, consistency, and 24/7 availabi…
https://www.maestroqa.com/guides/improving-chatbot-performan…

check_circle

“Policymakers need real-world performance data before implementing AI in healthcare.”

CORROBORATED

The discussion surrounding AI in medicine, particularly in claims 3 and 7, repeatedly emphasizes the need for caution and the necessity of real-world data. The context implies that policymakers need this data before full adoption, as seen in the discussions about study findings and limitations.

travel_explore

web search NEUTRAL — 1881 – The painted ceilings of the Natural History Museum, London, were unveiled when the building opened its doors to the public. 1946 – The final session of the League of Nations concluded in Geneva…
https://en.wikipedia.org/wiki/Wikipedia:On_this_day/Today

travel_explore

web search NEUTRAL — 4 days ago · Explore the major events that happened on this day throughout history. From scientific breakthroughs and political milestones to cultural shifts and iconic world moments, this section hig…
https://gclocks.com/en/on-this-day/

travel_explore

web search NEUTRAL — Find out what happened today or any day in history with On This Day. Historical events, birthdays, deaths, photos and famous people, from 4000 BC to today.
https://www.onthisday.com/

check_circle

“Language models excel in structured exams but struggle with real-world patient interactions.”

CORROBORATED

Two separate web search results strongly support this contrast. One notes that models performed well on 'medical exam-style questions' but declined in 'real-world interactions.' Another source explicitly states that models 'Excel in Medical Exams but Struggle with Real-World...' when evaluating symptom collection and diagnosis.

travel_explore

web search NEUTRAL — To simulate real-world interactions, CRAFT-MD evaluates how well large-language models can collect information about symptoms, medications, and family history and then make a diagnosis. An AI agent is…
https://theoutpost.ai/news-story/ai-models-excel-in-medical-…

travel_explore

web search NEUTRAL — “All four large language models performed well on medical exam-style questions,” the study found, “but their performance declined when involved in conversations that more closely mimic real-world inte…
https://www.rdworldonline.com/ai-tools-excel-on-exams-but-st…

travel_explore

web search NEUTRAL — Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-\psi, a novel…
https://liner.com/review/patientpsi-using-large-language-mod…

help

“Medical consultations require human connection, trust, and contextual judgment beyond diagnostic accuracy.”

INSUFFICIENT EVIDENCE

No evidence was gathered for this claim.

help

“Medical education uses the Calgary–Cambridge model to teach patient interaction skills.”

INSUFFICIENT EVIDENCE

No evidence was gathered for this claim.

schedule

“AI should support rather than replace doctors, requiring human judgment and empathy.”

PENDING

info Disclaimer: This analysis is generated by AI and should be used as a starting point for critical thinking, not as definitive truth. Claims are verified against publicly available sources. Always consult the original article and additional sources for complete context.

eFinder

eFinder

Why AI health chatbots won’t make you better at diagnosing yourself – new research

analyticsAnalysis

fact_checkFact-Check Results