OpenAI Launches GPT-5.4: A Major Leap Towards AI Agents and Professional Work

OpenAI’s Latest Model Aims to Redefine AI-Powered Work

OpenAI has officially launched GPT-5.4, positioning it as its most capable and efficient frontier model for professional knowledge work. Released on March 5, 2026, the model integrates advanced reasoning, state-of-the-art coding from GPT-5.3-Codex, and a groundbreaking new capability: native computer use. This release signals a clear strategic pivot from conversational chatbots toward autonomous, workflow-oriented AI agents.

The model is available in three main forms: GPT-5.4 Thinking in ChatGPT for consumers and teams, GPT-5.4 in the API and Codex for developers, and a high-performance GPT-5.4 Pro variant for complex enterprise tasks. According to OpenAI, the goal is to deliver "complex real work" with greater accuracy and less back-and-forth, fundamentally changing how AI integrates into professional environments.

Benchmark Dominance in Professional and Agentic Tasks

GPT-5.4's performance claims are substantiated by significant improvements across a suite of new benchmarks focused on real-world output. On GDPval, a test of knowledge work across 44 occupations, GPT-5.4 achieved an 83.0% win rate against industry professionals, a substantial jump from GPT-5.2's 70.9%.

Perhaps more tellingly, it now leads the competitive Mercor APEX-Agents benchmark, which measures performance on professional services work like creating slide decks and financial models. Brendan Foody, CEO of Mercor, noted that while previous models performed like "an intern that gets it right a quarter of the time," GPT-5.4 now tops the leaderboard.

Specific internal scores highlight its prowess: an 87.3% mean score on junior investment banking spreadsheet tasks (vs. 68.4% for GPT-5.2) and a 68% human preference rate for its generated presentations due to stronger aesthetics and visual variety.

The Native Computer Use Breakthrough

The most technically significant advancement in GPT-5.4 is its native computer-use capability. This allows the model, particularly via the API, to operate computers by writing code (e.g., using Playwright) or by directly issuing mouse and keyboard commands in response to screenshots.

Benchmarks bear out this leap. On OSWorld-Verified, which tests desktop navigation through screenshots and inputs, GPT-5.4 scored a 75.0% success rate, surpassing reported human performance (72.4%) and far exceeding GPT-5.2's 47.3%. On WebArena-Verified for browser use, it reached 67.3% success.

This capability is built upon improved visual perception. GPT-5.4 scores 81.2% on MMMU-Pro (without tools) and shows enhanced document parsing on OmniDocBench. OpenAI is also introducing an `original` image detail level supporting up to 10.24 million pixels, crucial for precise UI interaction and localization.

continue reading below...

Engineering Efficiency: Tool Search and Token Savings

Beyond raw capability, OpenAI has engineered GPT-5.4 for efficiency in large-scale, tool-heavy agentic systems. The new Tool Search feature is a key innovation. Instead of loading all tool definitions into every prompt, the model receives a lightweight list and can search for definitions only when needed.

In tests with 36 Model Context Protocol (MCP) servers, this configuration reduced total token usage by 47% while maintaining accuracy. For developers building complex agents, this translates directly to lower costs and faster response times. The model also shows improved tool-calling accuracy and parallelization, reducing the latency-critical "tool yields."

OpenAI claims GPT-5.4 is its "most token-efficient reasoning model yet," using significantly fewer tokens than GPT-5.2 to solve problems. This efficiency is critical for making multi-step, agentic workflows economically viable.

Steerability, Safety, and the Competitive Landscape

For ChatGPT users, GPT-5.4 Thinking introduces a new level of control. The model will now provide an upfront plan for complex queries, and users can adjust instructions mid-response to steer the output without starting over. This feature is available now on the web and Android, with iOS coming soon.

On safety, OpenAI is treating GPT-5.4 as a "High" cyber capability model under its Preparedness Framework. It includes an expanded safety stack with monitoring and access controls. The company also released an open-source Chain-of-Thought (CoT) controllability evaluation, finding GPT-5.4 has a low ability to obfuscate its reasoning, which aids in safety monitoring.

The launch arrives during intense competition. Anthropic's Claude Opus 4.6 still leads on some coding benchmarks, while Google's Gemini 3.1 Pro offers a larger context window. GPT-5.4's claim to leadership rests on professional knowledge work and computer control—areas central to the emerging "AI agent" market.

Pricing, Availability, and Strategic Implications

GPT-5.4 carries a premium. API input pricing is set at $2.50 per million tokens, compared to $1.75 for GPT-5.2. GPT-5.4 Pro is $30 per million tokens. OpenAI argues that greater token efficiency offsets the higher per-token cost for many tasks.

The model is rolling out now. GPT-5.2 Thinking will remain available for three months before being retired on June 5, 2026. Enterprise users gain access to GPT-5.4 Pro, and Codex gets experimental support for a 1-million-token context window.

This release is more than an incremental update. By integrating coding, computer control, tool orchestration, and professional output generation into a single model, OpenAI is building the core engine for the next phase of AI: persistent, autonomous agents that complete complex workflows. As the benchmarks show, no model is yet flawless, but GPT-5.4 represents a substantial step toward making that agentic future a practical reality.