• AI Search
  • Cryptocurrency
  • Earnings
  • Enterprise
  • About TechBooky
  • Submit Article
  • Advertise Here
  • Contact Us
TechBooky
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
TechBooky
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Home Artificial Intelligence

OpenAI Launches New Audio Models for Agentic Workflows

Akinola Ajibola by Akinola Ajibola
March 22, 2025
in Artificial Intelligence
Share on FacebookShare on Twitter

“With releases like Operator, Deep Research, Computer-Using Agents, and the Responses API with built-in tools, we’ve invested in advancing the intelligence, capabilities, and usefulness of text-based agents—or systems that independently accomplish tasks on behalf of users—over the past few months.” But for agents to be truly useful, people must be able to have deeper, more intuitive interactions with agents beyond just text—using natural spoken language to communicate effectively.

New audio models with enhanced accuracy and dependability were released by OpenAI on Thursday through the application programming interface (API). Three new artificial intelligence (AI) models for text-to-speech (TTS) and speech-to-text transcription were launched by the San Francisco-based AI company. According to the business, developers will be able to create apps with agentic workflows thanks to these models. Additionally, it said that companies may use the API to automate tasks similar to customer service. Interestingly, the company’s GPT-4o and GPT-4o small AI models serve as the foundation for the new models.

OpenAI is introducing new speech-to-text and text-to-speech audio models in the API today, which will enable the development of more potent, adaptable, and intelligent voice agents that provide tangible benefits. Our most recent voice-to-text models surpass current solutions in accuracy and dependability, setting a new bar for the state of the art, particularly in difficult situations with accents, loud surroundings, and variable speech rates. The models are particularly well-suited for use cases like customer call centres, meeting note transcription, and more because of these enhancements, which also raise transcription reliability.

In a blog post, the AI firm outlined the new API-specific AI models. The business stated that throughout the years it has developed numerous AI agents such as Operator, Deep Research, Computer-Using Agents, and the Responses API with built-in tools. It did add, though, that agents’ full potential won’t be realized until they are able to communicate and function intuitively in contexts other than text.

Three new audio models are available. The speech-to-text models are GPT-4o-transcribe and GPT-4o-mini-transcribe, whereas the GPT-4o-mini-tts is a TTS model as the name implies. According to OpenAI, these models perform better than the company’s current Whisper models, which were introduced in 2022. The new models, however, are not open-source like the earlier ones.

The AI company claimed that the GPT-4o-transcribe exhibits enhanced “word error rate” (WER) performance on the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) benchmark, which evaluates AI models on multilingual speech in 100 different languages. According to OpenAI, the enhancements were brought about by focused training methods including reinforcement learning (RL) and in-depth mis-training using high-quality audio datasets.

Even in difficult situations including loud surroundings, strong accents, and different speaking rates, these speech-to-text algorithms are able to record audio.

Significant enhancements are also included in the GPT-4o-mini-tts model, which notably only provides artificial and preset voices. According to the AI company, the models can speak with customizable inflections, intonations, and emotional expressiveness, allowing developers to create applications that can be used for a variety of tasks, such as customer service and creative storytelling.

The GPT-4o-based audio model will cost $40 per million input tokens and $80 per million output tokens, according to OpenAI’s API pricing page. However, the GPT-4o mini-based audio models will cost $10 for every million input tokens and $20 for every million output tokens.

Developers may now access all of the audio models using the API. To assist users in creating speech agents, OpenAI is now making available an interface with its Agents software development kit (SDK).

OpenAI tells more about the technical innovations which is behind the models

  • Utilizing Real Audio Datasets for Pretraining

In order to maximize model performance, our new audio models are heavily pretrained on specific audio-centric datasets, building on the GPT‑4o and GPT‑4o-mini architectures. This focused method allows for outstanding performance on a variety of audio-related activities and offers a greater understanding of speech subtleties.

  • Sophisticated Techniques for Distillation

By improving our distillation methods, we are able to transfer knowledge from our biggest audio models to more manageable, smaller models. By utilizing sophisticated self-play techniques, our distillation datasets successfully replicate authentic user-assistant interactions by capturing realistic conversational dynamics. This enables our smaller models to provide outstanding responsiveness and conversational quality.

  • The Concept of Reinforcement Learning

We’ve included a reinforcement learning (RL)-heavy paradigm for our speech-to-text models, achieving state-of-the-art transcription accuracy. Our voice-to-text solutions are incredibly competitive in difficult speech recognition settings because of this technology, which significantly increases precision and decreases hallucination.

These advancements mark a step forward in the field of audio modelling, fusing cutting-edge techniques with useful improvements to improve voice application performance.

Related Posts:

  • -1x-1 (3)
    OpenAI Launches Tools for Building Corporate AI Agents
  • 2024-10-29t164225z_1_lynxmpek9s0q0_rtroptp_3_openai-funding-startups
    OpenAI Plans AI Agents for Computer Automation
  • W7BnebUnSW8Mxsq8EwkTs3-1200-80
    OpenAI Upgrades Operator Agent's AI Model
  • OpenAI-Rethinks-Approach-Amid-Slower-‘GPT-Improvements
    ChatGPT Updates Signal OpenAI's Push Toward AI Agents
  • openai-launches-agentkit-build-ai-agents-in-record-time
    OpenAI Launches AgentKit to Help Developers Build AI Agents
  • 1392432_092010_updates
    OpenClaw Creator Peter Steinberger Joins OpenAI
  • 1770302497_openai_frontier
    OpenAI Launches Frontier Alliance Partners for Enterprise AI
  • gettyimages-2205145445
    Oracle Lets Companies Build AI Agents Without Coding

Discover more from TechBooky

Subscribe to get the latest posts sent to your email.

Tags: AIaudio modelsopenai
Akinola Ajibola

Akinola Ajibola

BROWSE BY CATEGORIES

Receive top tech news directly in your inbox

subscription from
Loading

Freshly Squeezed

  • South Africa Reviews Canal+–MultiChoice Deal Amid Showmax Concerns March 19, 2026
  • Baidu, Tencent Boost AI Push Amid OpenClaw Boom March 19, 2026
  • Researchers Warn DarkSword Exploit Could Hit Millions of iPhones March 18, 2026
  • Tech Giants Join Forces in New Coalition to Tackle Digital Scams March 18, 2026
  • Instagram Rolls Out Eight AI Voice Filters for Voice Messages March 18, 2026
  • Google Brings Gemini Personal Intelligence to Free Users March 17, 2026
  • Microsoft Restructures Copilot Leadership as Suleyman Shifts Focus March 17, 2026
  • OpenAI Launches GPT-5.4 Mini and Nano Models March 17, 2026
  • Samsung to Halt Sales of $2,899 Tri-Fold Phone March 17, 2026
  • Nvidia CloudXR Brings RTX Streaming to Apple Vision Pro March 17, 2026
  • MTN Revenue Jumps 23% on Data and Fintech Growth March 17, 2026
  • Nvidia Unveils AI Data Factory Blueprint for Robotics March 17, 2026

Browse Archives

March 2026
MTWTFSS
 1
2345678
9101112131415
16171819202122
23242526272829
3031 
« Feb    

Quick Links

  • About TechBooky
  • Advertise Here
  • Contact us
  • Submit Article
  • Privacy Policy
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
  • African
  • Artificial Intelligence
  • Gadgets
  • Metaverse
  • Tips
  • AI Search
  • About TechBooky
  • Advertise Here
  • Submit Article
  • Contact us

© 2025 Designed By TechBooky Elite

Discover more from TechBooky

Subscribe now to keep reading and get access to the full archive.

Continue reading

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.