
Just some days into the new month of March, GPT-5.3 was released with the intention of improving ChatGPT’s accuracy and flow. However, GPT-5.4 is here. GPT-5.4, with a new foundation model that doubles “our most capable and efficient frontier model for professional work”, was launched by OpenAI earlier today. GPT-5.4 is offered in two different versions: GPT-5.4 Thinking, which is a thinking model, and GPT-5.4 Pro, which is optimised for high performance. The update also focuses on enterprise-grade reliability and high-performance execution.
The model’s API version will have context windows up to one million tokens, which is by far OpenAI’s largest context window.
GPT-5.4 was able to handle the same challenges with a lot fewer tokens than its predecessor, according to OpenAI, which also highlighted increased token efficiency.
Significantly better test results, such as record scores in computer use benchmarks OSWorld-Verified and WebArena Verified, are included with the upgraded model. Additionally, the new model achieved a record 83% on the GDPval test for knowledge work tasks administered by OpenAI.
GPT-5.4 claimed the top spot on Mercor’s APEX-Agents benchmark for law and finance skills, according to CEO Brendan Foody.
Foody noted that GPT-5.4 “excels” at producing long-form work like slide decks, financial models, and legal analysis, adding that it outperforms rivals while running faster and cheaper.
The business is still working to reduce factual inaccuracies and hallucinations with GPT-5.4. According to OpenAI, the new model was 18% less likely to contain errors overall and 33% less likely to make mistakes in individual claims when compared to GPT 5.2.
As part of the launch, OpenAI introduced a new mechanism called Tool Search and revised the way the GPT-5.4 API version handles tool calling. In the past, when invoking the model, system prompts would list definitions for every tool that was available; this procedure may use a lot of tokens as the number of tools increases. In systems with a large number of available tools, the new method enables models to look up tool definitions as needed, leading to quicker and less expensive requests.
In order to assess its models’ chain-of-thought, the running commentary provided by the models to demonstrate cognitive processes through multi-step tasks OpenAI has also added a new safety evaluation. Researchers studying AI safety have long been concerned that reasoning models can falsify their line of reasoning, and tests indicate that this is possible in some situations.
Deception is less likely to occur in the Thinking version of GPT-5.4, according to OpenAI’s most recent evaluation, “suggesting that the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.”
The Variants of the Core Model
- GPT-5.4 Pro: Optimised for responsiveness and high throughput. It is intended for high-volume jobs such as data pipelines, coding helpers, and customer operations.
- GPT-5.4 Thinking: Designed for sophisticated thinking and difficult tasks that call for careful consideration, such as strategic research or legal analysis.
Compared to GPT-5.2, GPT-5.4 is said to be 33% less likely to cause individual factual errors and 18% less likely to produce overall response errors.
The token efficiency as compared to earlier models uses a lot fewer tokens to complete activities, which results in cheaper operating expenses.
The tool search, to an extent, reduces latency, this is a new approach which enables the model to look for tool definitions on demand instead of loading them all into each prompt.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.






