Genius Move: A Tiny 7B Model Hired GPT-5, and Then Won the Test

A 7B model went and hired GPT-5.

GPT-5 accepted.

Then that small model beat GPT-5, which was working alone, in a test.

I stared blankly for about three seconds when I came across this paper, then scrolled back to read it again, just to make sure I hadn't misread.

I hadn't.

Let's be clear about what happened.

A paper this week revealed that researchers took a 7B language model and, through reinforcement learning training, taught it one thing: to break down complex problems into sub-tasks and then delegate them to more powerful models. GPT-5, Claude Sonnet 4, Gemini 2.5 Pro—all could be its "subordinates."

This 7B model isn't an executor; it's a dispatcher.

Its job isn't to solve problems, but to decide who solves what, how to cut up a problem, and which piece to hand to which model.

They then put it to work on a test called GPQA Diamond. This is a benchmark specifically designed for extremely difficult, multi-step reasoning, requiring integrated knowledge from physics, chemistry, and biology. Frontier large models already perform quite well on it individually.

The result: this 7B dispatcher outperformed the solo scores of GPT-5, and also surpassed the individual results of Claude Sonnet 4 and Gemini 2.5 Pro.

I kept mulling over why this stopped me in my tracks.

Logically, this result shouldn't be surprising. We've always known that teamwork is more efficient than going it alone, and that good division of labor allows the whole to surpass any single individual. These aren't new ideas; my mom told me this when I was a kid.

So why was I stunned?

Because it proved this with numbers for the first time.

Not a fuzzy feeling that "teamwork is good," but a quantifiable gap showing "the result orchestrated by a 7B model is better than using GPT-5 alone."

And the fact that this happened with AI makes the implication exceptionally clear, because AI has no feelings, no chemistry, no "shared purpose." All it has is a pure allocation logic: "Which task goes to which model, with what context, and is the result good enough?"

So what this experiment is saying is: getting the "allocation logic" right matters more than having the "most parameters."

Let's talk about how this 7B model was trained—this is the most fascinating detail in the whole story.

It used reinforcement learning, and this is key.

Reinforcement learning doesn't learn knowledge; it learns a strategy. Ordinary language models learn from vast amounts of text, figuring out "what word is most likely to appear in this situation." Reinforcement learning learns from a multitude of attempts, figuring out "after making this decision, was the final result good or bad?"

One is static statistics, the other is dynamic judgment.

Through reinforcement learning, this 7B model formed an intuition for "how to break down a task, how to delegate, and how to verify results." No one told it this; it accumulated it through countless trials and feedback.

This makes me think of one thing: we often say "management is a skill," but we rarely articulate exactly how that skill is acquired.

It's not learned from books. It's formed by doing things many times, failing many times, and absorbing the feedback from the results.

Reinforcement learning and human "experience" are structurally the same thing.

Let me paint this scenario more concretely.

When this 7B model tackles a problem, what it writes isn't code; it's natural language.

For example, it might write: "Give the first step of this chemistry problem to Claude, because it's more steady on molecular structure reasoning. Give the second step's math calculation to GPT-5, because it's more accurate on symbolic derivation." Then it sends this allocation plan out, waits for the results, and judges if they're good enough. If not, it tears down the problem again and asks from a different angle.

In this workflow, the 7B model isn't smarter than any of the larger models.

It might not even know the correct answer to that chemistry problem.

What it knows is "how to use these tools, which are more powerful than itself, to solve this problem."

Okay, here's what struck a chord with me.

The mainstream discussion about AI right now is which model is the smartest, has the most parameters, the largest training data. These conversations default to a premise: the AI race is a capability race, and whoever has the strongest raw ability wins.

This paper proposes another possibility.

Perhaps the real differentiator isn't capability, but "how to use these capabilities."

This logic holds true for humans, too. We've actually known it for a long time, we just never had such a clean, quantifiable experiment to prove it.

In the workplace, everyone has met that person with average technical skills whose team consistently delivers. And we've all seen the person with strong individual abilities who, when placed in a management role, causes everything to fall into chaos.

The former isn't better because of superior technical skills; it's because they know "who is better than me at this specific thing, and how to get them to do the right thing at the right time."

That ability, and what the 7B model learned, are one and the same.

So what is this experiment really saying?

I think it's suggesting that the next competitive dimension in AI isn't parameters, but architecture. It's not about who is the smartest individually, but who can best coordinate multiple models to work together.

For startups, this is great news, because it means "you don't need the most expensive model." You need a good orchestration logic.

For big companies, this is a reminder. Brute-forcing with compute and parameters isn't enough anymore; you also need to figure out how to make different models play their roles in the right scenarios.

For the average user, this says something very concrete: When you use AI, your job isn't to "find the smartest model." It's to "figure out which part of this task is best done by whom."

You think using AI is about finding the strongest one, but the real task is: figuring out which part of this job is best done by whom.

I suddenly remembered a quote from Peter Drucker: "The task of management is to make ordinary people do extraordinary things."

He said this about corporate management. But now, it seems to perfectly describe what this 7B model is doing.

An ordinary, small model, through correct allocation and scheduling, enabled tools far more powerful than itself to produce results that exceeded expectations.

This isn't "the small model beat the large model." This is "the one who knows how to manage people beat the smartest lone wolf."

This has always been happening. It just used to happen in the human world, where we describe it with the word "management."

Now it has happened again in a quantified AI experiment, measured with a GPQA Diamond score.

The difference is, this time, there are numbers.

That's it. If you've read this far and found it worthwhile, feel free to like, share, or hit that thumbs up. If you want to get the next post in a hurry, you can also give me a star to follow me. Thanks for reading my stuff. We'll chat again next time.

Genius Move: A Tiny 7B Model Hired GPT-5, and Then Won the Test

Related Articles

分享網址