OpenAI's New Model Rejected on Day 0! Ranks Poorly, Lagging Behind Domestic Models Released in Late January

By Henry | QuantumBit (QbitAI)

OpenAI's newly released GPT-5.4 mini is already facing rejection on Day 0.

According to the public large language model benchmark Vals, the newly launched GPT-5.4 mini ranks only 13th today, merely surpassing GPT-5, which OpenAI released half a year ago.

Benchmark ranking chart

Notably, the model ranking 12th is Kimi 2.5, released in late January. Kimi 2.5 is more than twice as cheap as the new 5.4 mini and boasts even lower latency.

In simultaneous topological proofs, the newly released mini and nano models showed only mediocre performance globally, ranking 9th and 10th respectively, falling short of earlier models like Kimi, Qwen, and DeepSeek.

(It seems OpenAI is falling behind in this regard.)

Global performance ranking

Some observers also pointed out that this time, GPT-5.4 mini's baseline comparison was against the old GPT-5 mini (which runs twice as fast), a version from over half a year ago, rather than against new models from other manufacturers.

Baseline comparison chart

Many netizens even直言 (bluntly stated) that switching to the new GPT-5.4 mini is "really unnecessary."

Netizen comments

Although OpenAI's blog stated that for output tokens, the performance-similar mini version is three times cheaper than GPT-5.4, and the nano version is nearly twelve times cheaper...

...if you compare GPT-5.4 mini with the old GPT-5 mini, you will find that for models in the same mini tier, the price has actually increased by about three times.

Price comparison chart

It can be said that amidst the "Lobster" craze, all global model manufacturers are raising prices, and Sam Altman, being the shrewd operator he is, naturally didn't miss out either.

So, is this just a small model specifically optimized for programming and agents?

New Mini and Nano Models

Today, OpenAI launched the GPT-5.4 mini and nano models, focusing on speed and cost-effectiveness. They are specifically optimized for programming, computer operation, multimodal understanding, and sub-agents.

Model features overview

Compared to the previous GPT-5 mini, the new mini and nano versions show decent performance improvements, with running speeds increased by more than double.

Speed improvement chart

Notably, in multiple evaluations, the gap between the mini/nano models and the full-blooded GPT-5.4 is no longer significant, and their performance is basically on par with the lightweight models from Google and Anthropic.

Performance gap comparison

According to OpenAI's official blog, the new models focus on programming and sub-agents.

Official blog screenshot

Specifically, GPT-5.4 mini has been optimized for programming, reasoning, multimodal understanding, and tool usage. Its running speed has increased by more than double, and it performs close to the full version of GPT-5.4 in benchmarks like SWE-Bench Pro and OSWorld-Verified.

SWE-Bench Pro results

GPT-5.4 nano is the smallest and most economical version in the GPT-5.4 series, suitable for tasks sensitive to speed and cost, such as classification, data extraction, sorting, and handling simpler auxiliary programming tasks.

In summary, these two new models are ideal for workloads where latency directly impacts product experience, such as coding assistants, sub-agents, screenshot parsing, and multimodal applications.

Simply put, for agents like "Lobster" that have already abstracted skills, deploying them on mini/nano small models that react quickly and have sufficient capability is more cost-effective.

In terms of specific usage, GPT-5.4 mini can be called via API, Codex, and ChatGPT, while nano is only available through the API.

Regarding pricing, the mini version costs $0.75 per million input tokens and $4.50 per million output tokens. The nano version is even cheaper in the API, costing $0.20 per million input tokens and $1.25 per million output tokens.

However, looking horizontally, some netizens pointed out that Gemini Flash 3 lite is smarter and overall more than six times cheaper.

Competitor comparison

Evaluation Results

In actual evaluations, mini and nano were mainly optimized for programming and Agent tasks.

In programming tasks, they can complete code modifications, debugging loops, and library navigation with low latency, allowing for rapid iteration and efficient handling of workflows that require a balance of speed and cost.

Mini's pass rate is close to GPT-5.4, while being faster.

Programming task results

In sub-agent scenarios, developers can let the large model handle decision-making and planning, while delegating smaller tasks in parallel to mini sub-agents, such as searching codebases, processing documentation, or assisting with operations.

As small models become faster and faster, the value of this model is becoming increasingly prominent.

In computer operation and multimodal tasks, mini also performed excellently, able to quickly parse complex user interface screenshots and complete operational tasks efficiently.

In the OSWorld-Verified test, its performance was almost close to the full-blooded GPT-5.4, while significantly outperforming GPT-5 mini.

OSWorld-Verified results

Some netizens' actual tests also corroborate these highlights.

Reddit user Rent_South systematically tested the new mini's performance on classification, summarization, translation, and other tasks, giving high praise:

"I ran some benchmarks on them and found that—in some real-world scenario tasks—they are cheaper, faster, and better to use. Since about a year ago, when I was building a RAG pipeline and gpt-4.1-mini beat the original GPT-4 on certain agent task steps, my enthusiasm for mini/nano versions has been higher than for the original versions."

In classification tasks, nano achieved an accuracy of 70%, with costs reduced by twelve times compared to GPT-5.4.

Classification accuracy chart

When called over 10,000 times, GPT-5.4 cost approximately $20.30, while GPT-5.4-nano cost only $1.64, a savings of about 91.9%.

In translation tasks, nano scored slightly lower at 55 points, but the gap with GPT-5.4's 63 points is not large, making it a viable alternative. After over 10,000 calls, the savings rate still reached 91.3%!

Translation task results

In writing tasks, the mini version actually scored the highest, with costs about 60% lower than the full version. With over 10,000 calls, mini cost $29.61, while nano cost only $10.30.

Writing task costs

Additionally, some netizens tested the model's ability to describe images. Nano output detailed and accurate descriptions of museum photos with very few tokens, costing less than a penny in total.

For instance, a netizen asked the nano model to describe a photo they casually took at a museum:

Museum photo example

The nano model output the following description:

"The image shows the interior of a museum gallery with a long exhibition wall. Many framed portraits are neatly arranged on the white brick wall. Below the portraits are multiple glass display cases with dark wooden frames and glass tops/fronts, exhibiting various historical artifacts and equipment. The room has polished wooden floors, hanging lights and wires on the ceiling, and some pipes visible near the top of the wall. In the foreground, the glass cases arranged along the length of the room reflect items from other exhibition areas."

Generating this description used 2,751 input tokens and 112 output tokens, costing 0.069 cents (less than one-tenth of a penny).

Even in creative tasks, such as generating an SVG of a peloton riding a bicycle, nano and mini still have a certain gap compared to the full-blooded GPT-5.4, but they are completely feasible for completing basic creative tasks.

At the very least, as reasoning intensity increases, we can see that the image maintains relative correctness.

Creative task SVG

Overall, compared to OpenAI's own products, this model is indeed commendable.

But whether this is the best and most economical small model on the market remains to be discussed.

One More Thing

Interestingly, in the comments section of OpenAI President Greg Brockman's post announcing the new model, the hottest discussion wasn't about the new model's capabilities, nor its price, and had almost nothing to do with the new model itself.

Greg Brockman post