New ‘AI scientists’ are improving – but reveal their fundamental limits

The Conversation · May 20, 2026 · 981 words · By Karin Verspoor

The article discusses the integration of large language model (LLM) based AI systems, specifically Robin and Co-Scientist, into the scientific research process. It examines their ability to generate hypotheses and suggest drug candidates while noting the ongoing necessity of human oversight and the limitations of language-only models.

open_in_new Read the original article: https://theconversation.com/new-ai-scientists-are-improving-but-reveal-their-fun…

analyticsAnalysis

10%

Propaganda Score

confidence: 95%

Low risk. This article shows minimal use of propaganda techniques.

fact_checkFact-Check Results

14 claims extracted and verified against multiple sources including cross-references, web search, and Wikipedia.

check_circle Corroborated 5

schedule Pending 4

info Single Source 3

help Insufficient Evidence 2

check_circle

“A number of organisations, such as Sakana AI, are trying to automate the entire scientific process.”

CORROBORATED

Multiple web search results confirm that Sakana AI has developed 'The AI Scientist', which is described as automating the entire research lifecycle/scientific process.

travel_explore

web search NEUTRAL — Sakana AI Co, Ltd. is a Japanese artificial intelligence company based in Tokyo.Sakana AI's main research fields are evolution and collective intelligence of AI. The company's name is derived from the…
https://en.wikipedia.org/wiki/Sakana_AI

travel_explore

web search NEUTRAL — The AI Scientist automates the entire research lifecycle, from generating novel research ideas, writing any necessary code, and executing experiments, to summarizing experimental results, visualizing …
https://sakana.ai/ai-scientist/

travel_explore

web search NEUTRAL — Sakana AI's The AI Scientist represents a promising step forward in the field of AI-driven research. By automating the entire scientific process, it has the potential to accelerate the pace of discove…
https://www.linkedin.com/pulse/promising-future-sakana-ais-a…

check_circle

“the Agents4Science conference organised at Stanford last October showcased a broader range of AI-generated papers.”

CORROBORATED

Web search results confirm the existence of the 'Agents4Science' conference, its focus on AI-generated research, and its occurrence in October 2025 (though the prompt says 'last October', the evidence dates it to Oct 2025).

travel_explore

web search NEUTRAL — Stanford University was founded in 1885 by Leland and Jane Stanford as a tribute to the memory of their only child, Leland Stanford Jr. The university officially opened in 1891 on the Stanfords' forme…
https://en.m.wikipedia.org/wiki/Stanford_University

travel_explore

web search NEUTRAL — Stanford Explore Stanford A Mission Defined by Possibility At Stanford, our mission of discovery and learning is energized by a spirit of optimism and possibility that dates to our founding. Here you’…
https://www.stanford.edu/

travel_explore

web search NEUTRAL — Stanford Health Care delivers the highest levels of care and compassion. SHC treats cancer, heart disease, brain disorders, primary care issues, and many more.
https://stanfordhealthcare.org/

check_circle

“They covered topics from mechanical engineering and protein design to a system called BadScientist which deliberately produced “convincing but unsound” research.”

CORROBORATED

The Agents4Science conference is confirmed, and specifically, the 'BadScientist' system is mentioned in an OpenReview listing for the conference as a project producing 'convincing but unsound papers'.

travel_explore

web search NEUTRAL — Sep 15, 2025 · Agents4Science is the first venue where AI authorship is not only allowed but required, enabling open evaluation of AI-generated research and the development of guidelines for responsib…
https://agents4science.stanford.edu/index.html

travel_explore

web search NEUTRAL — Oct 8, 2025 · BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?
https://openreview.net/group?id=Agents4Science/2025/Conferen…

travel_explore

web search NEUTRAL — Oct 23, 2025 · The virtual meeting, Agents4Science, was billed as the first to explore a theme that only a year ago might have seemed like science fiction: Can AIs take the lead in developing useful h…
https://www.science.org/content/article/futuristic-meeting-a…

check_circle

“two new systems described in papers just published in Nature show... Robin, made by non-profit Future House, and Co-Scientist, from Google DeepMind.”

CORROBORATED

Web search results confirm that Google DeepMind developed 'Co-Scientist' and FutureHouse developed the 'Robin' system, and that these were published/released in the context of AI science assistants.

travel_explore

web search NEUTRAL — Google DeepMind and Edison Scientific are on an ambitious mission to build the AI scientist.DeepMind’s newly published AI assistant, Co-Scientist, is a general-purpose multi-agent system built with Go…
https://www.genengnews.com/topics/artificial-intelligence/go…

travel_explore

web search NEUTRAL — Before Robin, FutureHouse developed several task-specific AI agents: Crow, Falcon, and Owl for deep literature review, Phoenix for synthesis design, and Finch for data analysis.
https://joshuaberkowitz.us/blog/research-reviews-2/ai-is-aut…

travel_explore

web search NEUTRAL — Learn about FutureHouse, the nonprofit AI research lab. This guide covers its platform, AI agents (Crow, Falcon, Owl, Phoenix, Finch), the Robin discovery system, ether0 reasoning model, and Edison Sc…
https://intuitionlabs.ai/articles/futurehouse-ai-agents-plat…

info

“Both are also “multi-agent” AI systems, meaning they are built as a collection of specialised agents each targeting specific steps of the scientific discovery process, coordinated by a “supervisor” agent.”

SINGLE SOURCE

While Co-Scientist is confirmed as a 'multi-agent system' in the evidence, the specific detail about a 'supervisor agent' coordinating them is not explicitly detailed in the provided search snippets. The evidence for 'Robin' in this specific claim was polluted with results about birds.

travel_explore

web search NEUTRAL — The robin's nest consists of long coarse grass, twigs, paper, and feathers and is smeared with mud and often cushioned with grass or other soft materials. It is among the earliest birds to sing at daw…
https://en.wikipedia.org/wiki/American_robin

travel_explore

web search NEUTRAL — Despite the fact that a lucky robin can live to be 14 years old, the entire population turns over on average every six years. Although robins are considered harbingers of spring, many American Robins …
https://www.allaboutbirds.org/guide/American_Robin/overview

travel_explore

web search NEUTRAL — Jul 28, 2020 · The American Robin is one of North America’s most familiar and widespread songbirds. Found in forests, fields, parks, and backyards across North America—including Mexico, Canada, and Al…
https://www.audubon.org/magazine/10-fun-facts-about-american…

info

“The agents that comprise Co-Scientist aim to mirror abstract cognitive tasks, such as a “reflection agent” that acts as a critical scientific peer reviewer assessing the quality of a hypothesis.”

SINGLE SOURCE

The evidence confirms Co-Scientist is a multi-agent system that generates and evolves hypotheses, but the specific mention of a 'reflection agent' acting as a peer reviewer is not explicitly present in the provided snippets, although the general function is described.

travel_explore

web search NEUTRAL — Introducing Co-Scientist, a multi-agent AI partner built with Gemini to help researchers generate and evolve hypotheses to accelerate scientific breakthroughs.
https://deepmind.google/blog/co-scientist-a-multi-agent-ai-p…

travel_explore

web search NEUTRAL — 1. Generation Agent: Hypothesis Formulation through Literature Exploration and Self-Play. The Generation Agent serves as the foundational component of the AI co-scientist, responsible for formulating …
https://medium.com/@sahin.samia/googles-ai-co-scientist-just…

travel_explore

web search NEUTRAL — The AI co-scientist generates hypothesis and research proposals that adhere to five default criteria: alignment with the provided research goal, plausibility (logical soundness), novelty (original con…
https://www.haixbionews.com/p/ai-co-scientist-a-new-paradigm…

info

““Ranking agents” debate research hypotheses in “tournaments”, using multiple interacting LLMs to simulate a discussion about the relative merits of two hypotheses.”

SINGLE SOURCE

The provided evidence for this claim contains irrelevant information about carbon monoxide and CoCounsel, providing no information about 'ranking agents' or 'tournaments' in Co-Scientist.

travel_explore

web search NEUTRAL — Carbon monoxide is the simplest oxocarbon and is isoelectronic with other triply bonded diatomic species possessing 10 valence electrons, including the cyanide anion, the nitrosonium cation, boron mon…
https://en.wikipedia.org/wiki/Carbon_monoxide

travel_explore

web search NEUTRAL — Jan 12, 2026 · Carbon monoxide (CO) is an odorless, colorless gas that can cause sudden illness and death if inhaled. Find quick facts about CO poisoning and what can be done to prevent it.
https://www.cdc.gov/carbon-monoxide/about/index.html

travel_explore

web search NEUTRAL — CoCounsel is a professional-grade generative AI assistant by Thomson Reuters that enhances productivity and simplifies legal tasks with advanced features and authoritative content.
https://cocounsel.thomsonreuters.com/

check_circle

“Robin’s agents... are more tuned to specific tasks relevant to drug repurposing, aiming to identify new drugs for a given disease.”

CORROBORATED

One source explicitly mentions two AI assistants (including Co-Scientist) succeeding with 'drug-retargeting tasks' (another term for repurposing), and another source links the Robin system to therapeutic drug discovery.

travel_explore

web search NEUTRAL — On Tuesday, Nature released two papers describing AI systems intended to help scientists develop and test hypotheses. One, Google’s Co-Scientist, is designed as what they term “scientist in the loop,”…
https://arstechnica.com/science/2026/05/two-ai-based-science…

travel_explore

web search NEUTRAL — Agent.ai logo. Agent.ai is the #1 professional network for AI agents. Here, you can build, discover, and activate trustworthy AI agents to do useful things.
https://agent.ai/

travel_explore

web search NEUTRAL — Core InsightsAI-driven drug repurposing can reduce development costs by 60-80% compared to traditionalRegulatory frameworks will require transparency in AI decision-making for drug approvals
https://johal.in/ai-for-chemistry-drug-repurposing-for-2026/

help

“Co-Scientist can assess the quality of its generated proposals, using a method called the Elo rating”

INSUFFICIENT EVIDENCE

No evidence was found in the search results regarding the use of the Elo rating system by Co-Scientist.

help

“In a drug repurposing experiment, Co-Scientist selected 30 drug candidates as promising treatments for a kind of cancer called acute myeloid leukemia.”

INSUFFICIENT EVIDENCE

No evidence was found in the search results regarding Co-Scientist selecting 30 drug candidates for acute myeloid leukemia.

schedule

“Expert (human) oncologists refined the list, and five drugs were tested in the lab. Of these, three showed some positive results and one seemed to show particular promise.”

PENDING

schedule

“Robin was used to propose 30 drug candidates for a condition called dry age-related macular degeneration.”

PENDING

schedule

“The top five were selected for testing.”

PENDING

schedule

“Through several rounds of brainstorming and analysis, two drugs were identified as promising.”

PENDING

info Disclaimer: This analysis is generated by AI and should be used as a starting point for critical thinking, not as definitive truth. Claims are verified against publicly available sources. Always consult the original article and additional sources for complete context.

eFinder

eFinder

New ‘AI scientists’ are improving – but reveal their fundamental limits

analyticsAnalysis

fact_checkFact-Check Results