Just now, OpenAI released a new programming model that runs on a chip the size of a dinner plate and can output over 1000 tokens per second.
This model is called GPT-5.3-Codex-Spark, a lightweight version of GPT-5.3-Codex, designed specifically for real-time programming.
Here is the comparison video:
And Sam Altman himself gave a preview before the release: "It sparks joy for me".
The secret to its speed is not Nvidia's GPU, but Cerebras's wafer-scale chip.
Fast
Traditional AI inference involves distributing the model across a stack of GPUs, which requires frequent communication between chips, and communication is latency.
Cerebras's Wafer Scale Engine 3 (WSE-3) takes a completely different approach: making the entire chip into a single wafer.
How big is this chip?
About the size of a dinner plate.
And it has 4 trillion transistors.
It has the largest on-chip memory among all current AI processors, directly eliminating the communication overhead between multiple chips. The model runs on a single chip, without the need to move data back and forth between chips.
The result is: inference speed is directly pulled to over 1000 tokens per second, which is about 15 times faster than traditional GPU inference.
For programming scenarios, this means that by the time you type, the model has already written the code simultaneously.
Near real-time feedback, the code flows out from your fingertips.
Not Just Fast
Codex-Spark is not just a "small model that runs fast".
On the two mainstream software engineering agent benchmarks, SWE-Bench Pro and Terminal-Bench 2.0, Codex-Spark's performance surpassed GPT-5.1-Codex-mini, and the time to complete tasks was only a fraction of the latter.
Fast and powerful!
OpenAI's positioning for it is: this is a productivity tool for daily programming, used for rapid prototyping, real-time collaboration, and instant iteration.
You can interrupt and redirect it at any time during its code writing process, and it responds almost instantly.
The larger and more powerful GPT-5.3-Codex is responsible for handling complex tasks that require deep reasoning and long execution times.
OpenAI's vision is to let the two models complement each other: Spark handles speed, Codex handles depth.
OpenAI's Chip Ambition
This is the first milestone in the collaboration between OpenAI and Cerebras.
In January this year, OpenAI announced a multi-year collaboration plan with Cerebras, worth over $10 billion.
And Cerebras has just completed a funding round of over $1 billion, with a valuation of about $23 billion, and is considering an IPO.
The significance of this collaboration is not just a new model.
This is the first time OpenAI has massively broken away from the NVIDIA ecosystem at the inference level.
In the past, almost all large model companies' inference ran on Nvidia GPUs, and Codex-Spark proves one thing: for specific scenarios (like programming), specialized chips can elevate the experience to a completely different level.
Cerebras CTO and co-founder Sean Lie said:
What excites us most is exploring what fast inference can bring with OpenAI and the developer community—new interaction modes, new usage scenarios, fundamentally different model experiences. This preview version is just the beginning.
Cerebras stated that by 2026, it will extend this ultra-fast inference capability to the largest frontier models.
How to Use
Currently, GPT-5.3-Codex-Spark is released in a research preview form, available to ChatGPT Pro users, and can be used through the following channels:
Codex Application
CLI Command Line Tool
VS Code Plugin
Sam Altman admitted that there are still some limitations at release, but the team will "rapidly improve".
Related Links:
OpenAI Official Blog: https://openai.com/index/introducing-gpt-5-3-codex-spark/
Cerebras Blog: https://www.cerebras.ai/blog/openai-codexspark
GPT-5.3-Codex Introduction: https://openai.com/index/introducing-gpt-5-3-codex/