Xinzhi Yuan Report
[Xinzhi Yuan Introduction] Mythos 5, hidden for two months as "too dangerous," is finally unsealed! ASI-class "Mythic" arrives tonight.
Anthropic drops a "double release" tonight!
Just now, Claude Fable 5 and Claude Mythos 5 launched simultaneously.
That Mythos-tier model Anthropic hid for two months, calling it "too dangerous to release," has been put into everyone's hands for the first time.
From Opus 4.7 to 4.8 took 43 days; 4.8 to Fable 5 took only 11 days.
The name Fable comes from Latin fabula, sharing the same root as Greek mythos.
Same story, same foundation. The public version is called Fable; the full version is called Mythos.
Software engineering, knowledge work, vision, scientific research, long context — on almost every benchmark, Fable 5 is #1.
And the longer and more complex the task, the wider the gap.
In Every CEO Dan Shipper's words, this is simply a "performance beast"!
Fable 5 and Mythos 5 scores are nearly identical, gaps typically within 1-3 percentage points.
Their biggest difference: the former has a built-in "safety classifier"; the latter has zero restrictions.
Once a cybersecurity task query is triggered, Fable 5 gets directly "downgraded" to Opus 4.8 for the response.
Fable 5 scoring 0 on all safety tasks says it best.
Pricing: Fable 5 matches Opus 4.8 Fast Mode — $10/million input tokens, $50/million output tokens.
That's 2x Opus standard, but less than half of Mythos Preview, and only 1/6 of GPT-5.5 Pro.
Pro, Max, and Team subscribers get free access until June 22; after that it consumes credits. API available today — developers just call claude-fable-5.
Claude 5 "Dual Model" Debut: Coding Global #1
Post-launch, the vibe on 𝕏 has completely shifted.
To newly-joined Anthropic researcher Karpathy, this is a generational leap worthy of a major version bump.
Working software is becoming like tap water — on-demand, instant. You can summon anything: interpreters, visualization tools, dashboards, throwaway custom apps.
He closed with a Matrix quote: "Free your mind."
Researcher Alex Albert, who's witnessed every Claude release, says this is the first model that feels "not like a tool, but a partner."
Claude Code lead Felix Rieseberg declares: "The Third AI Era" officially begins today!
With it comes an epochal shift — we'll no longer just dispatch "tasks" to AI, but formally assign them "responsibilities."
Scroll up/down to view more
The true weight behind these words, many have yet to fully grasp.
Actions speak louder: across major benchmarks and rigorous internal/external evaluations, Claude 5 has already demonstrated "crushing" dominance.
Crushing — Not by a Little
On Humanity's Last Exam (HLE), Mythos 5 without tools matches the Preview version.
Versus GPT-5.5 and Gemini 3.1 Pro, Mythos 5 leads by a structural margin.
Now look at Claude's signature: agentic coding tasks.
SWE-Bench Pro — the core benchmark for real-world agent coding, the bloodiest arena for all frontier models.
Fable 5 scores 80.3%.
For reference: Opus 4.8 (11 days ago) 69.2%, GPT-5.5 58.6%, Gemini 3.1 Pro 54.2%.
Fable 5 beats Opus 4.8 by 11 points, GPT-5.5 by 21.7 points.
The previous king, barely 11 days on the throne, gets kicked off by its own successor.
FrontierCode Diamond (Cognition's agent code quality benchmark): Fable 5 29.3%, GPT-5.5 5.7%. A 5x gap.
And Fable 5 hits ceiling at medium compute — no need to max out thinking; casual reasoning already wins.
Stripe, with early Fable 5 access, pulled off a massive feat.
In a 50-million-line Ruby codebase, they executed a full global code migration — work that normally takes a whole team 2+ months.
Fable 5 did it in one day. One day, 50 million lines, the whole team stunned.
Physical Superintelligence CEO was equally floored after testing.
On frontier physics research tasks, Fable 5 consumed only 1/3 of GPT-5.5's reasoning tokens, and reached in 36 hours what took GPT-5.5 four days.
Fable 5 Composes & Creates — Terrifyingly Strong
Every Anthropic test shows the same thing: Fable 5 can work autonomously for long periods, with output quality that's absurdly high.
It autonomously plays Factorio — the engineer's bible of factory-building games.
Conveyor belts humming, inserters frantically swinging between furnaces and assemblers. AI plans resource flows, builds automated production lines, imposes order from zero in a chaotic environment that consumes resources every second.
It designed a complete 3D-printable model in the browser.
First, a few lines of code — then a 3D CAD editor with UI panels and toolbars rendered into existence.
Then Fable 5 switched roles: inside its own editor, rotating views, extruding meshes, chamfering edges — turning an abstract concept into a printable physical model.
It simulated solar system planetary motion, deriving orbital equations from first principles of physics, then using those derivations to predict eclipses.
The most surreal: dark canvas titled "FIFTH SYMPHONY FABLE," Beethoven's 5th as an EDM remix, high-precision particle fluids exploding center-screen.
Bass drops summon deep purple nebulae; violins rise tearing darkness with icy blue aurora-like fluids — every fluid collision and diffusion locked to the beat.
The remix itself? Also code-generated by Fable 5.
An AI that never "heard" music wrote a Beethoven remix in code, then wrote fluid simulation code that dances to the beat.
Also: Fable 5 playing Slay the Spire with persistent file memory — performance 3x Opus 4.8, final-boss reach rate 3x.
Memory makes Fable 5 stronger by a margin far exceeding the same memory's effect on the previous generation.
The model has crossed a new threshold in "learning from its own experience."
Zero Code, Beat Pokémon with Naked Eyes
Not just that — Fable 5's "visual capabilities" have leapfrogged, on par with coding gains.
Previous Claude models playing Pokémon Red needed a full complex toolchain: map navigation, game state parsing, extra tool interfaces — all fed in, still got stuck.
Fable 5 uses only a minimal visual interface.
No map, no navigation aids, no extra game state info.
Just screen screenshots — played through all of Pokémon Red from start to finish.
Anthropic released a full timelapse recording the entire process. AI watches pixel frames, makes decisions, walks tall grass, encounters enemies, picks moves, beats gyms, navigates mazes — all the way to credits.
This means Fable 5 can extract precise values from complex charts in scientific journals, and rebuild a web app's full source code from just a few screenshots.
Visual understanding has reached a new level — not just "see and describe," but "comprehend, then act."
AI as Scientist: Science-Grade Results in One Week
If coding and vision are still in the "efficiency" realm, what Fable 5 and Mythos 5 do in life sciences forces a rethink of "what can AI do?"
In protein design, Mythos 5 achieves full autonomous R&D.
Target selection, design execution, failure auto-correction — end-to-end. 14 disease targets yielded 9 strong candidates, precisely covering immune, neurodegenerative, and muscular diseases.
The real shocker: genomics.
Mythos 5 spent ~1 week, nearly unsupervised, gathering 138 species and millions of cells of data, designing and training its own ML model.
The result surpasses recent Science-published peer work.
100x smaller model, better performance. Anthropic plans to publish in coming months.
Rejecting "Distillation," Swapping Brain to Opus 4.8
Now it's clear why Anthropic wrapped Fable 5 in a "safety classifier."
Especially for cybersecurity, biochemistry, or model distillation requests — system auto-routes to Opus 4.8.
Jailbreak resistance comparison (400-round red-team test)
Many developers complain: even simple tasks trigger Fable 5's "red line," forcing downgrade.
Notably, "distilling" Fable 5 isn't easy.
Unlike security tasks, distillation triggers don't notify — they directly limit capability via prompt modification, control vectors, and PEET methods.
Anthropic estimates ~0.03% of traffic affected.
Battle for the Throne: "Mythos" Opens the Game
GPT-5.5 launched just 1.5 months ago; only two benchmarks remain where it sees Fable 5's taillights.
Blueprint-Bench 2: down 2.4 points. Terminal-Bench: GPT-5.5 via Codex CLI scores 83.4% — the closest on the whole table.
Below that: pure slaughter.
Anthropic hides a layer of meaning in the names.
Mythos: civilization's sacred narrative explaining its own destiny. Fable: humanity's oldest moral teaching.
Greek philosophy's birth was once framed as Logos's victory over Mythos — humanity learning to explain the world through reason.
Now a company stands at ASI's threshold, naming its strongest models "Mythos" and "Fable."
The speed at which machines conquer Logos — everyone sees it.
The next question: can meaning-making and moral judgment stay in human hands?
References:
https://www.anthropic.com/news/claude-fable-5-mythos-5
https://x.com/claudeai/status/2064394146916229443
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf
Editors: Moses, Taizi