eFinder

eFinder

OpenAI Expands into Next-Gen Audio AI With Three New Models


OpenAI has introduced three new audio models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—designed for real-time voice interaction, translation, and transcription. The article details the technical capabilities of these models and provides examples of how companies like Deutsche Telekom, Vimeo, and Priceline are integrating them into their services.

analyticsAnalysis

10%
Propaganda Score
confidence: 95%
Low risk. This article shows minimal use of propaganda techniques.

fact_checkFact-Check Results

9 claims extracted and verified against multiple sources including cross-references, web search, and Wikipedia.

check_circle Corroborated 6
info Single Source 3
check_circle
“OpenAI has released three audio models designed to handle real-time voice interactions.”
CORROBORATED
Multiple independent sources (MarkTechPost, OpenAI API docs, and other tech news sites) confirm the release of three real-time audio models.
travel_explore
web search NEUTRAL — OpenAI Group PBC, doing business as OpenAI, is an American artificial intelligence (AI) research organization headquartered in San Francisco, consisting of a for-profit public benefit corporation (PBC…
https://en.wikipedia.org/wiki/OpenAI
travel_explore
web search NEUTRAL — OpenAI is launching a new $4 billion company to embed its AI into corporate businesses The OpenAI Deployment Company, backed by TPG and other private equity firms, will acquire consulting firm ...
https://qz.com/openai-deployment-company-launch-tpg-tomoro-0…
travel_explore
web search NEUTRAL — 6 hours ago · Topline OpenAI cofounder and former chief scientist Ilya Sutskever confirmed Monday a $7 billion stake in OpenAI during his testimony for the high-stakes trial between Elon Musk and the …
https://www.forbes.com/sites/aliciapark/2026/05/11/ilya-suts…
check_circle
“GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper could enable software systems to process spoken requests and respond whilst conversations are still taking place.”
CORROBORATED
Three independent sources explicitly name GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper and describe their real-time processing capabilities.
travel_explore
web search NEUTRAL — All three models — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — are available now through the OpenAI Realtime API, which is generally available starting today.
https://www.marktechpost.com/2026/05/08/openai-releases-thre…
travel_explore
web search NEUTRAL — The update includes GPT-Realtime-2 for active reasoning, GPT-Realtime-Translate for live multilingual speech-to-speech workflows, and an upgraded GPT-Realtime-Whisper for ultra-low latency streaming t…
https://i10x.ai/news/openai-realtime-api-gpt-realtime-2-tran…
travel_explore
web search NEUTRAL — GPT‑Realtime‑Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker.
https://openai.com/index/advancing-voice-intelligence-with-n…
check_circle
“GPT-Realtime-2 is the first voice model from OpenAI to include reasoning capabilities from its GPT-5 class architecture.”
CORROBORATED
Multiple sources explicitly state that GPT-Realtime-2 is the first voice model to incorporate GPT-5 class reasoning capabilities.
travel_explore
web search NEUTRAL — GPT Realtime 2 supports configurable reasoning effort. Higher reasoning effort can increase latency and output token usage.
https://developers.openai.com/api/docs/models/gpt-realtime-2
travel_explore
web search NEUTRAL — GPT-Realtime-2 is OpenAI's most intelligent voice model to date, bringing GPT-5-class reasoning to real-time voice interactions. Unlike earlier realtime models, it can plan, decide, use tools, recover…
https://www.datacamp.com/blog/gpt-realtime-2
travel_explore
web search NEUTRAL — On May 7, 2026, OpenAI shipped a voice model that scores 96.6% on the Big Bench Audio benchmark, against 81.4% for the previous GPT-Realtime-1.5. It is called GPT-Realtime-2, and it is the first voice…
https://pasqualepillitteri.it/en/news/2153/gpt-realtime-2-op…
check_circle
“GPT-Realtime-Translate processes speech from more than 70 input languages into 13 output languages.”
CORROBORATED
Three independent web sources (9to5Mac, AI Magazine, and another news source) confirm the specific numbers: 70+ input languages and 13 output languages.
menu_book
wikipedia NEUTRAL — Google DeepMind, trading as Google DeepMind or simply DeepMind, is a British-American artificial intelligence (AI) research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK i…
https://en.wikipedia.org/wiki/Google_DeepMind
travel_explore
web search NEUTRAL — GPT‑Realtime‑Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker.
https://9to5mac.com/2026/05/07/openai-has-new-voice-models-t…
travel_explore
web search NEUTRAL — GPT-Realtime-Translate is a new live translation model that translates speech from more than 70 input languages into 13 output languages while keeping pace with the speaker. It targets customer suppor…
https://aimagazine.com/news/new-openai-models-listen-transla…
+ 1 more evidence source
info
“Deutsche Telekom is building customer support systems where users speak in their preferred language and the model translates the conversation in real time.”
SINGLE SOURCE
The provided evidence for this claim contains information about Deutsche Bank and Deutsche Bahn, but no mention of 'Deutsche Telekom' using GPT-Realtime-Translate for customer support.
travel_explore
web search NEUTRAL — Deutsche Bank was founded in 1870 in Berlin.
https://en.wikipedia.org/wiki/Deutsche_Bank
travel_explore
web search NEUTRAL — We operate a 33,400 kilometre-long network with 5,400 stations, which are used by 450 railway companies and 50,000 trains every day. Every day, more than five million people travel with Deutsche Bahn …
https://www.deutschebahn.com/en/
travel_explore
web search NEUTRAL — Discover Deutsche Bank, one of the world’s leading financial services providers. News and Information about the bank and its products.
https://www.db.com/
info
“Vimeo uses the model to translate product education videos as they play.”
SINGLE SOURCE
The provided evidence mentions GPT-4o and general use cases for GPT-Realtime-Translate, but there is no specific mention of 'Vimeo' using the model to translate product education videos.
travel_explore
web search NEUTRAL — Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time.Learn more here: https://www.openai.com/index/hello
https://www.youtube.com/watch?v=WzUnEfiIqP4
travel_explore
web search NEUTRAL — GPT-Realtime-Translate offers live translation from over 70 input languages into 13 output languages, keeping pace with the speaker. It is intended for use in cross-border customer support, live event…
https://www.ghacks.net/2026/05/11/openai-releases-three-new-…
travel_explore
web search NEUTRAL — Google's service, offered free of charge, instantly translates words, phrases, and web pages between English and over 100 other languages.
https://translate.google.com/
check_circle
“GPT-Realtime-Whisper converts speech to text as speakers talk.”
CORROBORATED
Multiple sources (gHacks, OpenAI Voice AI Models, and Realtime Audio Models) confirm that GPT-Realtime-Whisper is a streaming speech-to-text model for low-latency transcription.
travel_explore
web search NEUTRAL — GPT-Realtime-Whisper is a streaming speech-to-text model designed for low-latency transcription.GPT-Realtime-Whisper costs $0.017 per minute. The Realtime API features active classifiers that can stop…
https://www.ghacks.net/2026/05/11/openai-releases-three-new-…
travel_explore
web search NEUTRAL — . Streaming Speech-to-Text with GPT-Realtime-Whisper.. This streaming speech-to-text capability enables real-time understanding for live captions, meeting notes, and voice-powered workflows where imme…
https://theoutpost.ai/news-story/open-ai-launches-three-voic…
travel_explore
web search NEUTRAL — GPT-Realtime-Translate supports real-time speech translation across 70+ input languages and 13 output languages, breaking language barriers in global markets.GPT-Realtime-Whisper delivers ultra-low-la…
https://aihaberleri.org/en/news/realtime-audio-models-2026-o…
info
“GPT-Realtime-Translate delivered 12.5% lower word error rates than any other model we tested.”
SINGLE SOURCE
While the existence of the model is corroborated, the specific statistic of '12.5% lower word error rates' attributed to BolnaAI is not found in the provided evidence. The evidence confirms the model's capabilities but not this specific benchmark result.
travel_explore
web search NEUTRAL — OpenAI has introduced GPT-Realtime-Translate, a new model capable of translating live speech from over 70 input languages into 13 output languages, expanding beyond the capabilities of many existing r…
https://quantumzeitgeist.com/openais-translates-languages-re…
travel_explore
web search NEUTRAL — Other Models Pricing: GPT-Realtime-Translate: $0.034 per minute (70+ input languages, 13 output languages). GPT-Realtime-Whisper: $0.017 per minute (streaming speech-to-text).
https://finance.biggo.com/news/202605100624_OpenAI_GPT-Realt…
travel_explore
web search NEUTRAL — GPT-Realtime-Translate is a new live translation model that translates speech from more than 70 input languages into 13 output languages while keeping pace with the speaker. It targets customer suppor…
https://aimagazine.com/news/new-openai-models-listen-transla…
check_circle
“The Realtime API includes multiple layers of controls to prevent misuse.”
CORROBORATED
The gHacks Tech News source explicitly mentions that the Realtime API features 'active classifiers that can stop conversations that violate OpenAI's content policy'.
menu_book
wikipedia NEUTRAL — GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release,…
https://en.wikipedia.org/wiki/GPT-4o
menu_book
wikipedia NEUTRAL — Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft AI, a division of Microsoft. Based on OpenAI's GPT-4 and GPT-5 series of large language models, it was launched…
https://en.wikipedia.org/wiki/Microsoft_Copilot
menu_book
wikipedia NEUTRAL — The Portable Operating System Interface (POSIX; IPA: ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. In order to define a lev…
https://en.wikipedia.org/wiki/POSIX

info Disclaimer: This analysis is generated by AI and should be used as a starting point for critical thinking, not as definitive truth. Claims are verified against publicly available sources. Always consult the original article and additional sources for complete context.