AI not yet good enough to grade university essays, rewarding 'style over substance'
Researchers from the University of Cambridge and other institutions tested Generative AI models on undergraduate psychology essays to evaluate their grading accuracy. The study found that AI often fails to match human grading for the highest and lowest performing students, tending to reward linguistic style over academic substance.
open_in_new
Read the original article: https://phys.org/news/2026-05-ai-good-grade-university-essays.html
analyticsAnalysis
10%
Propaganda Score
confidence: 95%
Low risk. This article shows minimal use of propaganda techniques.
psychologyDetected Techniques
warning
Loaded Language
70% confidence
Using words with strong emotional connotations to influence an audience.
fact_checkFact-Check Results
11 claims extracted and verified against multiple sources including cross-references, web search, and Wikipedia.
info
Single Source
8
help
Insufficient Evidence
2
schedule
Pending
1
“A University of Cambridge-led team of psychologists and AI experts tested three "frontier" systems including the latest versions (as of April 2026) of Claude and ChatGPT on over 750 student essays from three UK universities submitted as part of a psychology degree.”
SINGLE SOURCE
The provided web search results only define what Claude and ChatGPT are; they do not mention a University of Cambridge study involving 750 student essays.
travel_explore
web search
NEUTRAL
— Claude is a next generation AI assistant built by Anthropic and trained to be safe, accurate, and secure to help you do your best work.
https://claude.com/login
https://claude.com/login
travel_explore
web search
NEUTRAL
— Chat with Claude AI by Anthropic for free. Thoughtful reasoning and analysis. No registration required.
https://chatgpt.org/claude/chat
https://chatgpt.org/claude/chat
travel_explore
web search
NEUTRAL
— ChatGPT A conversational AI system that listens, learns, and challenges.
https://chatgpt.com/
https://chatgpt.com/
“it did manage to match the broad grading bands—a first, 2:1, 2:2 and so on—given out by human examiners between 35–65% of the time.”
SINGLE SOURCE
The evidence describes general UK grading systems but does not provide any data regarding AI's accuracy in matching these bands in a specific study.
travel_explore
web search
NEUTRAL
— Masters degree grades student So you’ve finished your bachelors and you're thinking of studying a masters program. You may find during this process that the UK masters grading system is slightly diffe…
https://www.postgrad.com/advice/masters_programs/masters_deg…
https://www.postgrad.com/advice/masters_programs/masters_deg…
travel_explore
web search
NEUTRAL
— Masters grades in the UK are usually classified as Distinction, Merit or Pass with specific percentage thresholds for each category. Assessment typically includes coursework, exams, and a dissertation…
https://www.findamasters.com/guides/masters-degree-grades
https://www.findamasters.com/guides/masters-degree-grades
travel_explore
web search
NEUTRAL
— University of St Andrews Ranking UK 2021 / 2022 - Complete.
https://www.thecompleteuniversityguide.co.uk/universities/un…
https://www.thecompleteuniversityguide.co.uk/universities/un…
“all the AI systems were "oversensitive to linguistic features": giving out higher marks based on essay length, vocabulary range, and sentence complexity, regardless of the academic quality of the essay.”
SINGLE SOURCE
The evidence provides general definitions of AI and AGI but contains no information about AI being oversensitive to linguistic features in grading essays.
travel_explore
web search
NEUTRAL
— Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and dec…
https://en.m.wikipedia.org/wiki/Artificial_intelligence
https://en.m.wikipedia.org/wiki/Artificial_intelligence
travel_explore
web search
NEUTRAL
— We believe our research will eventually lead to artificial general intelligence, a system that can solve human-level problems. Building safe and beneficial AGI is our mission.
https://openai.com/
https://openai.com/
travel_explore
web search
NEUTRAL
— Meet Gemini, Google’s AI assistant. Get help with writing, planning, brainstorming, and more. Experience the power of generative AI.
https://gemini.google.com/
https://gemini.google.com/
“The report is titled "AI in University Assessment: Evaluating the Opportunities and Risks of Automated Marking."”
SINGLE SOURCE
The search results mention various AI reports (Stanford AI Index) and general research platforms, but none confirm the existence or title of the specific report mentioned.
travel_explore
web search
NEUTRAL
— The 2021 AI Index Report. This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our co…
https://hai.stanford.edu/ai-index/2025-ai-index-report
https://hai.stanford.edu/ai-index/2025-ai-index-report
travel_explore
web search
NEUTRAL
— In this issue: Ethiopia launches a massive AI university, a $23M coalition trains 400,000 teachers to reclaim their time, and new tools target the "marking backlog." We also examine critical evidence …
https://www.linkedin.com/pulse/workload-relief-wave-national…
https://www.linkedin.com/pulse/workload-relief-wave-national…
travel_explore
web search
NEUTRAL
— Discover research. Access over 160 million publication pages and stay up to date with what's happening in your field. Connect with your scientific community. Share your research, collaborate with your…
https://www.researchgate.net/
https://www.researchgate.net/
“AI was also asked to provide student feedback, and it churned out reflections between three to eight times longer than those provided by the original assessors.”
SINGLE SOURCE
The evidence lists AI tools like Copilot and general Wikipedia definitions of AI, but does not mention a study comparing the length of AI feedback versus human feedback.
travel_explore
web search
NEUTRAL
— Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and dec…
https://en.m.wikipedia.org/wiki/Artificial_intelligence
https://en.m.wikipedia.org/wiki/Artificial_intelligence
travel_explore
web search
NEUTRAL
— We believe our research will eventually lead to artificial general intelligence, a system that can solve human-level problems. Building safe and beneficial AGI is our mission.
https://openai.com/
https://openai.com/
travel_explore
web search
NEUTRAL
— Microsoft Copilot is your companion to inform, entertain and inspire. Get advice, feedback and straightforward answers. Try Copilot now.
https://copilot.microsoft.com/
https://copilot.microsoft.com/
“when AI responses were kept to a word count comparable to those from humans, focus groups of staff and students found it difficult to distinguish between human and AI feedback.”
SINGLE SOURCE
The evidence discusses 'AI Humanizer' tools designed to make text sound human, but does not mention a specific focus group study regarding the indistinguishability of AI feedback when word counts are matched.
travel_explore
web search
NEUTRAL
— AI Humanizer helps you humanize AI text online for free. Turn ChatGPT, Claude, and Gemini content into natural, clear, human-like writing—no sign-up required.
https://notegpt.io/ai-humanizer
https://notegpt.io/ai-humanizer
travel_explore
web search
NEUTRAL
— When to Humanize AI Text? You are a student? Finish your homework in minutes. Teachers will not find out if a LLM did the work for you.
https://ai-text-humanizer.com/
https://ai-text-humanizer.com/
travel_explore
web search
NEUTRAL
— Humanize AI and make AI writing sound natural and human. Use our free AI Humanizer to remove robotic tone from ChatGPT and other AI-generated text.
https://www.grammarly.com/ai-humanizer
https://www.grammarly.com/ai-humanizer
“The study used 761 undergraduate essays in psychology submitted and marked between 2022 and 2025 from a total of 125 students from the universities of Cambridge, Manchester Metropolitan and Nottingham.”
SINGLE SOURCE
The evidence provides general information about the University of Manchester and psychology degrees, but does not corroborate the specific dataset of 761 essays from the three named universities.
travel_explore
web search
NEUTRAL
— The University of Manchester. Student life. Your city guide to Manchester. Discover what it’s like to live and study in Manchester, from getting around to things to do and days out.
https://www.manchester.ac.uk/
https://www.manchester.ac.uk/
travel_explore
web search
NEUTRAL
— The online MSc Psychology and MSc Psychology of Mental Health and Wellbeing programmes have been fully accredited by the British Psychological Society (BPS), which confers eligibility for the Graduate…
https://online.wlv.ac.uk/online-psychology-degrees-at-the-un…
https://online.wlv.ac.uk/online-psychology-degrees-at-the-un…
travel_explore
web search
NEUTRAL
— The study shows a psychological impact of the Covid-19 emergency on college students. Stress significantly decreases learning and negatively affects psychological well-being of students. Resilience sk…
https://pubmed.ncbi.nlm.nih.gov/33602027/
https://pubmed.ncbi.nlm.nih.gov/33602027/
“Researchers tested AI systems with the same essays at different times, and found AI gave the same or similar marks each time.”
SINGLE SOURCE
The evidence discusses why AI prompts can give different answers, but does not provide evidence for a study showing AI gave consistent marks for the same essays over time.
travel_explore
web search
NEUTRAL
— Sam Altman is the CEO of OpenAI, the company behind GPT-4, ChatGPT, DALL-E, Codex, and many other state-of-the-art AI technologies. Please support this podca...
https://www.youtube.com/watch?v=L_Guz73e6fw
https://www.youtube.com/watch?v=L_Guz73e6fw
travel_explore
web search
NEUTRAL
— “If I use the exact same prompt, why does the AI give different answers on different days, environments, or even time zones?” In reality, enterprise AI systems behave more like distributed runtime…
https://blog.gopenai.com/why-does-the-same-prompt-give-diffe…
https://blog.gopenai.com/why-does-the-same-prompt-give-diffe…
travel_explore
web search
NEUTRAL
— Human-realistic AI systems could be used to impersonate people for fraudulent or deceptive purposes, especially when combined with voice cloning techniques3.Humans are liable for their actions. As AI …
https://www.aisi.gov.uk/blog/should-ai-systems-behave-like-p…
https://www.aisi.gov.uk/blog/should-ai-systems-behave-like-p…
“The AI managed to match the right UK degree classification band of the five available (first, 2:1, 2:2, third, fail) some 63% of the time for Cambridge essays, while for Nottingham it was 53% and for Manchester Metropolitan it was 35%.”
INSUFFICIENT EVIDENCE
No evidence was found in the search results to support these specific percentage figures for Cambridge, Nottingham, and Manchester Metropolitan.
“An essay marked 75—a solid first—by a human is, on average, scored several points lower by every AI system. While an essay marked 50—a low 2:2—is scored several points higher.”
INSUFFICIENT EVIDENCE
No evidence was found in the search results to support the claim regarding the scoring patterns of high-mark vs low-mark essays by AI.
“The range on the marking scale where AI and humans most frequently align across institutions lies in the upper-50s to low-60s, so around a low 2:1, near the center of the grade distribution.”
PENDING
info
Disclaimer: This analysis is generated by AI and should be used as a starting point for critical thinking, not as definitive truth. Claims are verified against publicly available sources. Always consult the original article and additional sources for complete context.