By Henry | QuantumBit (QbitAI)
OpenAI's newly released GPT-5.4 mini is already facing rejection on Day 0.
According to the public large language model benchmark Vals, the newly launched GPT-5.4 mini ranks only 13th today, merely surpassing GPT-5, which OpenAI released half a year ago.
Notably, the model ranking 12th is Kimi 2.5, released in late January. Kimi 2.5 is more than twice as cheap as the new 5.4 mini and boasts even lower latency.
In simultaneous topological proofs, the newly released mini and nano models showed only mediocre performance globally, ranking 9th and 10th respectively, falling short of earlier models like Kimi, Qwen, and DeepSeek.
(It seems OpenAI is falling behind in this regard.)
Some observers also pointed out that this time, GPT-5.4 mini's baseline comparison was against the old GPT-5 mini (which runs twice as fast), a version from over half a year ago, rather than against new models from other manufacturers.
Many netizens even直言 (bluntly stated) that switching to the new GPT-5.4 mini is "really unnecessary."
Although OpenAI's blog stated that for output tokens, the performance-similar mini version is three times cheaper than GPT-5.4, and the nano version is nearly twelve times cheaper...
...if you compare GPT-5.4 mini with the old GPT-5 mini, you will find that for models in the same mini tier, the price has actually increased by about three times.
It can be said that amidst the "Lobster" craze, all global model manufacturers are raising prices, and Sam Altman, being the shrewd operator he is, naturally didn't miss out either.
So, is this just a small model specifically optimized for programming and agents?
New Mini and Nano Models
Today, OpenAI launched the GPT-5.4 mini and nano models, focusing on speed and cost-effectiveness. They are specifically optimized for programming, computer operation, multimodal understanding, and sub-agents.
Compared to the previous GPT-5 mini, the new mini and nano versions show decent performance improvements, with running speeds increased by more than double.
Notably, in multiple evaluations, the gap between the mini/nano models and the full-blooded GPT-5.4 is no longer significant, and their performance is basically on par with the lightweight models from Google and Anthropic.
According to OpenAI's official blog, the new models focus on programming and sub-agents.
Specifically, GPT-5.4 mini has been optimized for programming, reasoning, multimodal understanding, and tool usage. Its running speed has increased by more than double, and it performs close to the full version of GPT-5.4 in benchmarks like SWE-Bench Pro and OSWorld-Verified.
GPT-5.4 nano is the smallest and most economical version in the GPT-5.4 series, suitable for tasks sensitive to speed and cost, such as classification, data extraction, sorting, and handling simpler auxiliary programming tasks.
In summary, these two new models are ideal for workloads where latency directly impacts product experience, such as coding assistants, sub-agents, screenshot parsing, and multimodal applications.
Simply put, for agents like "Lobster" that have already abstracted skills, deploying them on mini/nano small models that react quickly and have sufficient capability is more cost-effective.
In terms of specific usage, GPT-5.4 mini can be called via API, Codex, and ChatGPT, while nano is only available through the API.
Regarding pricing, the mini version costs $0.75 per million input tokens and $4.50 per million output tokens. The nano version is even cheaper in the API, costing $0.20 per million input tokens and $1.25 per million output tokens.
However, looking horizontally, some netizens pointed out that Gemini Flash 3 lite is smarter and overall more than six times cheaper.
Evaluation Results
In actual evaluations, mini and nano were mainly optimized for programming and Agent tasks.
In programming tasks, they can complete code modifications, debugging loops, and library navigation with low latency, allowing for rapid iteration and efficient handling of workflows that require a balance of speed and cost.
Mini's pass rate is close to GPT-5.4, while being faster.
In sub-agent scenarios, developers can let the large model handle decision-making and planning, while delegating smaller tasks in parallel to mini sub-agents, such as searching codebases, processing documentation, or assisting with operations.
As small models become faster and faster, the value of this model is becoming increasingly prominent.
In computer operation and multimodal tasks, mini also performed excellently, able to quickly parse complex user interface screenshots and complete operational tasks efficiently.
In the OSWorld-Verified test, its performance was almost close to the full-blooded GPT-5.4, while significantly outperforming GPT-5 mini.
Some netizens' actual tests also corroborate these highlights.
Reddit user Rent_South systematically tested the new mini's performance on classification, summarization, translation, and other tasks, giving high praise:
"I ran some benchmarks on them and found that—in some real-world scenario tasks—they are cheaper, faster, and better to use. Since about a year ago, when I was building a RAG pipeline and gpt-4.1-mini beat the original GPT-4 on certain agent task steps, my enthusiasm for mini/nano versions has been higher than for the original versions."
In classification tasks, nano achieved an accuracy of 70%, with costs reduced by twelve times compared to GPT-5.4.
When called over 10,000 times, GPT-5.4 cost approximately $20.30, while GPT-5.4-nano cost only $1.64, a savings of about 91.9%.
In translation tasks, nano scored slightly lower at 55 points, but the gap with GPT-5.4's 63 points is not large, making it a viable alternative. After over 10,000 calls, the savings rate still reached 91.3%!
In writing tasks, the mini version actually scored the highest, with costs about 60% lower than the full version. With over 10,000 calls, mini cost $29.61, while nano cost only $10.30.
Additionally, some netizens tested the model's ability to describe images. Nano output detailed and accurate descriptions of museum photos with very few tokens, costing less than a penny in total.
For instance, a netizen asked the nano model to describe a photo they casually took at a museum:
The nano model output the following description:
"The image shows the interior of a museum gallery with a long exhibition wall. Many framed portraits are neatly arranged on the white brick wall. Below the portraits are multiple glass display cases with dark wooden frames and glass tops/fronts, exhibiting various historical artifacts and equipment. The room has polished wooden floors, hanging lights and wires on the ceiling, and some pipes visible near the top of the wall. In the foreground, the glass cases arranged along the length of the room reflect items from other exhibition areas."
Generating this description used 2,751 input tokens and 112 output tokens, costing 0.069 cents (less than one-tenth of a penny).
Even in creative tasks, such as generating an SVG of a peloton riding a bicycle, nano and mini still have a certain gap compared to the full-blooded GPT-5.4, but they are completely feasible for completing basic creative tasks.
At the very least, as reasoning intensity increases, we can see that the image maintains relative correctness.
Overall, compared to OpenAI's own products, this model is indeed commendable.
But whether this is the best and most economical small model on the market remains to be discussed.
One More Thing
Interestingly, in the comments section of OpenAI President Greg Brockman's post announcing the new model, the hottest discussion wasn't about the new model's capabilities, nor its price, and had almost nothing to do with the new model itself.
The comments section was flooded with posts tagged #keep4o: "Bring back 4o!"
References:
[1] https://x.com/gdb/status/2034003374627049909
[2] https://simonwillison.net/2026/Mar/17/mini-and-nano/
[3] https://www.reddit.com/r/OpenAI/comments/1rwd9hd/breaking_openai_just_dropped_gpt54_mini_and_nano/
[4] https://x.com/scaling01/status/2033958931874099560
— End —
🦞 Have you raised your lobster today?
Note: Promotional content regarding community groups and product sales has been removed to focus on the main article substance.