OpenAI Launches GPT-5.3-Codex-Spark: Ultra-Fast AI Coding Model

OpenAI Unveils GPT-5.3-Codex-Spark for Real-Time Coding

On February 12, 2026, OpenAI announced the research preview of GPT-5.3-Codex-Spark, a specialized model designed for instantaneous, interactive coding. The release marks a significant strategic pivot for the company, emphasizing raw speed and low-latency interaction as critical frontiers for AI-assisted development.

Positioned as a "smaller version" of the recently launched GPT-5.3-Codex, Spark is the first tangible milestone in OpenAI's partnership with chipmaker Cerebras, announced in January. Its core promise is to deliver "more than 1000 tokens per second" while maintaining high capability for real-world coding tasks.

The Speed Imperative: 15x Faster Than Predecessors

Where previous models like GPT-5.3-Codex excelled at long-horizon, autonomous tasks, Spark is engineered for the tight feedback loop of real-time collaboration. OpenAI states it is optimized to feel "near-instant," allowing developers to interrupt, redirect, and iterate with the model fluidly.

Independent benchmarks provide context for this leap. According to Ars Technica, Spark's 1000+ tokens/second throughput is roughly 15 times faster than its predecessor. For comparison, OpenAI's own GPT-4o maxes out at around 147 tokens/sec on Nvidia hardware, while Anthropic's premium Claude Opus 4.6 fast mode reaches about 2.5x its standard 68.2 tokens/sec.

This speed is not just for show. It fundamentally changes the interaction model. As ZDNET notes, it moves AI coding from a "batch-style" process to a fluid, conversational experience, eliminating frustrating wait times for simple queries.

Powered by Cerebras: A Strategic Hardware Shift

The engine behind this performance is Cerebras's Wafer Scale Engine 3 (WSE-3), a purpose-built AI accelerator. This marks a notable move, as highlighted by Ars Technica, as OpenAI's first production model to run on non-Nvidia hardware.

Cerebras's wafer-scale architecture, with its 4 trillion transistors on a single processor the size of a dinner plate, is designed for high-speed inference. OpenAI's head of compute, Sachin Katti, called Cerebras "a great engineering partner" in a statement, emphasizing the addition of "fast inference as a new platform capability."

It's a complementary strategy. OpenAI clarified that GPUs remain foundational for cost-effective, broad-scale tokens, while Cerebras excels at ultra-low-latency workflows. The two can even be combined within single workloads for optimal performance.

Capabilities, Limitations, and the Coding Arms Race

Spark is currently a text-only model with a 128k context window. It is tuned specifically for coding, not as a general-purpose model. Its default style is lightweight, making minimal, targeted edits and not automatically running tests unless instructed.

On benchmarks like SWE-Bench Pro and Terminal-Bench 2.0, it shows "strong performance" while completing tasks in "a fraction of the time" compared to GPT-5.3-Codex. However, it is not intended to replace the larger model. Instead, as TechCrunch reports, Spark is framed as a "daily productivity driver" for rapid prototyping, while the full Codex handles heavier, longer-running agentic tasks.

This release intensifies the competitive landscape. As noted in a Substack analysis, this launch came the same week Anthropic debuted Claude Opus 4.6, highlighting the fierce pace of the "agentic AI" arms race. Spark represents OpenAI's bid to dominate not just in capability, but in developer experience and speed.

Infrastructure Overhaul and Availability

OpenAI discovered that model speed alone wasn't enough. The company implemented sweeping latency improvements across its entire request-response pipeline to enable true real-time collaboration.

These optimizations, including a persistent WebSocket connection and rewritten inference stack components, reduced client-server roundtrip overhead by 80%, per-token overhead by 30%, and time-to-first-token by 50%. These benefits will soon extend to all models.

Availability is initially limited. Spark is rolling out as a research preview for ChatGPT Pro users ($200/month) in the latest Codex app, CLI, and VS Code extension. It has separate rate limits and does not count against standard API quotas. A small set of design partners also have API access. OpenAI plans to expand availability as it tunes the integration under real workloads.

Safety, Strategy, and What's Next

OpenAI states that Codex-Spark underwent the same safety training as its mainline models, including cyber-relevant training. Evaluations determined it does not reach the company's "Preparedness Framework threshold for high capability in cybersecurity or biology."

The long-term vision, as outlined by OpenAI, is a Codex platform with two complementary modes: Spark for real-time collaboration and the larger models for long-horizon reasoning. Eventually, these modes may blend, with Codex managing interactive loops while delegating background tasks to sub-agents.

Sean Lie, CTO and Co-Founder of Cerebras, captured the experimental spirit of the launch: "What excites us most... is partnering with OpenAI and the developer community to discover what fast inference makes possible—new interaction patterns, new use cases, and a fundamentally different model experience."

GPT-5.3-Codex-Spark is more than a faster model; it's a statement of direction. By prioritizing instantaneous interaction and partnering with specialized hardware, OpenAI is betting that the future of AI-assisted development is not just smarter, but significantly faster.