‘State-of-the-art’ models can struggle with basic office work, says AI executive

South China Morning Post · Apr 19, 2026 · 265 words · By Eunice Xu

open_in_new Read the original article: https://www.scmp.com/tech/big-tech/article/3350614/state-art-models-can-struggle…

fact_checkFact-Check Results

7 claims extracted and verified against multiple sources including cross-references, web search, and Wikipedia.

check_circle Corroborated 4

info Single Source 3

check_circle

““State-of-the-art” (Sota) artificial intelligence models excel at solving complex Olympiad maths but still struggle with everyday enterprise tasks, according to an executive from a top AI unicorn in the US.”

CORROBORATED

Multiple web search results indicate that SOTA AI models excel at specific academic tasks like Olympiad math but struggle with broader, real-world reasoning or complex tasks, supporting the claim. The evidence points to an uneven pattern of intelligence.

menu_book

wikipedia NEUTRAL — Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and dec…
https://en.wikipedia.org/wiki/Artificial_intelligence

menu_book

wikipedia NEUTRAL — Lê Viết Quốc (born 1982), or in romanized form Quoc Viet Le, is a Vietnamese-American computer scientist and artificial intelligence researcher. He is a Google Fellow at Google DeepMind and a founding…
https://en.wikipedia.org/wiki/Quoc_V._Le

menu_book

wikipedia NEUTRAL — The 45th Chess Olympiad was an international team chess event organised by the International Chess Federation (FIDE) in Budapest, Hungary, from 10 to 23 September 2024. It consisted of two main tourna…
https://en.wikipedia.org/wiki/45th_Chess_Olympiad

+ 3 more evidence sources

info

“David Meyer, senior vice-president of product at US data processing and analysis company Databricks, told the South China Morning Post in a recent interview that the very traits making models state-of-the-art could cause issues in basic office work.”

SINGLE SOURCE

The claim attributes specific statements to David Meyer of Databricks in the South China Morning Post. While evidence confirms Databricks is a company and web searches mention Databricks, none of the provided evidence confirms the specific quote or interview with David Meyer in the South China Morning Post.

travel_explore

web search NEUTRAL — Ask a Databricks technical expert live.
https://www.databricks.com/resources/webinar/officehours

travel_explore

web search NEUTRAL — Databricks One launches a code-free AI platform empowering all employees, from executives to frontline staff, to explore data using natural language queries.
https://channellife.co.uk/story/databricks-one-launches-to-b…

travel_explore

web search NEUTRAL — Databricks, Inc. is an American software company based in San Francisco. It was founded in 2013 by the original creators of Apache Spark. It offers a cloud-based platform for data analytics and artifi…
https://en.wikipedia.org/wiki/Databricks

info

“For instance, when tasked with identifying an erroneous number on an invoice, a Sota model “will oftentimes fix the mistake” rather than simply extracting the error for downstream correction, he said.”

SINGLE SOURCE

The claim describes a specific failure mode (fixing an error instead of extracting it) when using SOTA models on invoices. While web search results discuss SOTA models and data tasks, none of the provided evidence directly corroborates this specific behavior described in the claim.

travel_explore

web search NEUTRAL — Related What is the difference between a mistake and an error?
https://www.quora.com/What-is-the-difference-between-a-mista…

travel_explore

web search NEUTRAL — Modern computer vision models can identify multiple objects in an image simultaneously. Source: Roboflow. The above image demonstrates how a SOTA vision model pinpoints different objects. This level o…
https://automatio.ai/blog/sota-models-llm-nlp/

travel_explore

web search NEUTRAL — When you encounter a 402 Error, it can be a significant roadblock in your web browsing or online transactions. This guide is designed to walk you through the steps to resolve this error efficiently.
https://apipark.com/techblog/en/how-to-fix-the-402-error-a-s…

info

“While advanced models such as Anthropic’s Claude were powerful at coding, they could lag in tasks like data engineering compared with models with significantly more specialised training and data in this area, according to Meyer.”

SINGLE SOURCE

The claim compares Claude's performance in data engineering to specialized models, citing David Meyer. While multiple web searches discuss performance issues with Claude and AI model tuning, the specific comparison regarding 'data engineering' lagging behind specialized models, attributed to Meyer, is not directly corroborated by the provided evidence.

menu_book

wikipedia NEUTRAL — Intelligent design (ID) is a pseudoscientific argument for the existence of God, presented by its proponents as "an evidence-based scientific theory about life's origins". The leading proponents of I…
https://en.wikipedia.org/wiki/Intelligent_design

menu_book

wikipedia NEUTRAL — David Hume (; born David Home; 7 May 1711 – 25 August 1776) was a Scottish philosopher, historian, economist and essayist who is known for his highly influential system of empiricism, philosophical sc…
https://en.wikipedia.org/wiki/David_Hume

travel_explore

web search NEUTRAL — Anthropic has actively been tuning these settings across different segments, which could plausibly affect user perceptions even if the core model weights are unchanged.
https://venturebeat.com/technology/is-anthropic-nerfing-clau…

+ 2 more evidence sources

check_circle

“Data engineering involves transforming datasets at scale and performing cleaning tasks, such as handling null values and zeros.”

CORROBORATED

Multiple web search results describe data engineering tasks, including transforming datasets at scale, and performing cleaning tasks like handling null values and general data preparation.

travel_explore

web search NEUTRAL — It includes tasks like handling null values, fixing inconsistent data, formatting columns, and preparing raw data for analysis.Data Cleaning Project.txt. This file contains hidden or bidirectional Uni…
https://gist.github.com/Deepanshu-Bhardwaj7877/5c24c7c0582c0…

travel_explore

web search NEUTRAL — 4. Perform EDA(Exploratory Data Analysis) & Power BI visualization. 5. Automate data loading via Snowpipe (AWS S3 & Snowflake integration). By demonstrating these essential data engineering skills and…
https://medium.com/@mjkjagadishkumarofficial/covid-layoffs-d…

travel_explore

web search NEUTRAL — Reducing transformation layers: Cleaning data directly within the database and creating reusable cleaning processes. Filtering and conditioning: Filtering necessary data and transforming data based on…
https://readmedium.com/data-transforming-and-cleansing-with-…

check_circle

“To solve these specific complexities more efficiently, Meyer pointed to the use of small open-source models refined with reinforcement learning.”

CORROBORATED

Web search results confirm that open-source models can be adapted using techniques like reinforcement learning (RL) for domain-specific tasks, supporting the suggestion that such refinement is possible.

travel_explore

web search NEUTRAL — Group Relative Policy Optimization, or GRPO, is a (rather new) Reinforcement Learning (RL) technique that researchers are using to fine-tune Large Language Models (LLMs) on logical and analytical task…
https://towardsdatascience.com/how-to-finetune-small-languag…

travel_explore

web search NEUTRAL — The domain- and task-specific adaptation processes, such as continued pretraining, SFT, and reinforcement learning, are uniquely feasible with open-source models.
https://arxiv.org/html/2405.00715v4

travel_explore

web search NEUTRAL — Deep reinforcement learning combines reinforcement learning with deep learning using a neural network to represent the value function, policy, or model considered in a classical RL setting [45].
https://www.sciencedirect.com/science/article/pii/S002199912…

check_circle

“This allowed for a specific purpose at a level of training cost “orders of magnitude lower” than Sota models, according to Meyer.”

CORROBORATED

Multiple web search results cite research (e.g., USC's Tina) demonstrating that lightweight models using techniques like LoRA and reinforcement learning can achieve strong performance at a significantly lower computational cost compared to larger SOTA models.

menu_book

wikipedia NEUTRAL — Brood parasitism is a subclass of parasitism and phenomenon and behavioural pattern of animals that rely on others to raise their young. The strategy appears among birds, insects and fish. The brood …
https://en.wikipedia.org/wiki/Brood_parasitism

menu_book

wikipedia NEUTRAL — David Saul Marshall (né Mashal; 12 March 1908 – 12 December 1995) was a Singaporean lawyer, politician, and diplomat who served as the first chief minister of Singapore from April 1955 to June 1956. H…
https://en.wikipedia.org/wiki/David_Marshall_(Singaporean_po…

menu_book

wikipedia NEUTRAL — A language model benchmark is a standardized test designed to evaluate the performance of language model on various natural language processing tasks. These tests are intended for comparing different …
https://en.wikipedia.org/wiki/Language_model_benchmark

+ 3 more evidence sources

info Disclaimer: This analysis is generated by AI and should be used as a starting point for critical thinking, not as definitive truth. Claims are verified against publicly available sources. Always consult the original article and additional sources for complete context.

eFinder

eFinder

‘State-of-the-art’ models can struggle with basic office work, says AI executive

fact_checkFact-Check Results