Claude's New Model 4.6 is Here! More Jobs at Risk: Wall Street Finance, Compilers, Security White Hats, PPT… All Under Threat

As soon as I woke up, Anthropic released a new model, bringing Claude Opus 4.6 to greet you for the New Year!

The news sent financial data service provider FactSet crashing 10% intraday, with S&P Global, Moody's, and Nasdaq all falling, and major indices plummeting across the board.

This is already the second time this week Anthropic has shaken the market.

A few days ago, one of its automated legal work plugins quietly went live, directly triggering a trillion-dollar software stock crash.

Investors' panic focused on one question: Who can guarantee they won't be disrupted by AI in the next few years? If not, sell.

Little did they know that today's Anthropic is even more ruthless.

Before today, everyone's impression of Claude was its overwhelmingly strong programming capabilities.

Claude Opus 4.6 sneered and shattered that impression with a punch: I'm strong in many more fields!

At least according to official statements, Claude Opus 4.6 can handle financial analysis, research, and the Office suite with exceptional skill.

The official website directly states:

On GDPval-AA (a performance metric evaluating economic value knowledge work tasks in finance, law, and other fields), Opus 4.6 outperforms the industry's next-best model, OpenAI GPT-5.2, by 144 Elo points~

(This means Claude Opus 4.6 scores higher than GPT-5.2 in approximately 70% of cases in this evaluation, with 50% meaning equivalent scores.)

Of course, in programming, it still reigns supreme.

It achieved the highest score in the Agent programming evaluation Terminal-Bench 2.0 and led all other frontier models in the "Humanity's Last Exam."

The good news is that it's more for the same price; Opus 4.6's pricing remains at the original standard:$5 per million tokens for input, $25 per million tokens for output.

(For easier reading, the new model will be referred to as Opus 4.6 from now on.)

Returning to the Peak with 1M Context and Adaptive Thinking

The most intuitive advancement of Opus 4.6 is that it now has a 1M Token ultra-large context. This is the first time Claude has introduced a context window of this length at the Opus level.

This greatly improves the "context decay" issue that Opus 4.6 would encounter when processing long texts.

In the MRCR v2 8-needle 1M benchmark test—finding a needle in a haystack—Opus 4.6 scored 76%, while Claude Sonnet 4.5 only scored 18.5%.

Along with this comes an improvement in search capabilities.

In the BrowseComp evaluation (assessing the ability to retrieve hard-to-find information online), Opus 4.6 ranked first in the industry, showing the best performance in deep, multi-step agentic search, capable of precisely locating key information scattered across long documents.

Opus 4.6 also introduces the Adaptive Thinking function.

Previously, developers using Claude models had to choose between two options: either turn extended thinking mode on or off.

Now, Claude can judge for itself when deep reasoning is needed.

(To be honest, this step is slower than ChatGPT's; please bring such great features faster next time.)

The accompanying effort parameter provides four levels of choice—low, medium, high, max—with high as the default. You can manually lower it if the model overthinks.

Another practical feature is context compaction.

When a conversation approaches the context window limit, it automatically summarizes and replaces old content, making long conversations and Agent tasks easier.

Dominating Core Scenarios: Coding, Knowledge Work, Search, and Reasoning

The official blog shows that with the release of Opus 4.6, almost no model can compete with it.

In core scenarios like coding, knowledge work, search, and reasoning, Opus 4.6 has achieved significant breakthroughs.

Multiple evaluation scores surpass previous generations and industry competitors, such as:

After getting a general impression, let's break it down one by one.

First, programming capabilities.

Opus 4.6 achieved the highest score in Terminal-Bench 2.0.

Looking at the actual capabilities behind the scores, Opus 4.6 can plan tasks more meticulously, run stably in large codebases, and improve the precision of code review and debugging.

And it can autonomously discover its own errors.

Another point is that Opus 4.6 supports multi-language coding and can handle cross-language software engineering problems.

It can complete the migration of millions of lines of code like a senior engineer, and the time taken is genuinely halved.

As I write this, I can't help but wonder:

When engineers see this news, will they be so happy that they stop losing hair, or will they lose it even faster...(Lost in thought.jpg)

Secondly, Opus 4.6 is actively invading traditional office territory.

This time, it has taken a hard hit at the Office suite.

It can directly ingest messy, unstructured data in Excel, infer reasonable table structures on its own, and handle multiple complex steps in a single operation;
It can remember your company's PPT templates, including font and layout styles, ensuring the generated PPT has no AI flavor, making the boss think you stayed up late working hard on it.

In the Cowork environment, Opus 4.6 can represent the user to run multiple tasks autonomously, running financial analysis while organizing research findings into documents.

It feels like Anthropic wants to pull Claude from the chat box into more spaces?

Third, let's talk about its progress in reasoning capabilities.

First, a summary:

Opus 4.6 is stronger in cross-domain reasoning.

In the multi-disciplinary complex reasoning test "Humanity's Last Exam," Opus led all frontier models.

In the legal field, Opus 4.6 scored 90.2% on BigLaw Bench, where 40% of questions are full marks.

In the GDPval-AA evaluation for economic value-oriented tasks in finance and law, Opus 4.6 surpassed "industry competitor" OpenAI GPT-5.2 by 144 Elo points.

Whether it's complex legal and financial expertise or tricky academic research, its reasoning and understanding depth have reached the pinnacle of current frontier models.

What's rare is that this leap in intelligence has not come at the cost of sacrificing safety.

In the automated behavior audits that Anthropic values most, Opus 4.6 has an extremely high alignment level, while negative behaviors like deception and sycophancy are extremely low.

Opus 4.6 has even solved the currently widespread AI problem of "over-refusal"—

When faced with normal, harmless requests, it shows much less of that rigid refusal than any previous model.

Currently, Opus 4.6 is available on the official website, API, and all major cloud platforms.

More for the same price; Opus 4.6's pricing remains at the original standard:$5 per million tokens for input, $25 per million tokens for output.

However, in the 10M token context test version, there will be additional charges if the prompt exceeds 200k tokens.

Key point!

If you want to use Opus 4.6, you need to explicitly specify the model identifier "Claude-opus-4-6" when calling the API.

More Jobs at Risk

16 Agents Wrote a C Compiler in Two Weeks, Running Doom

One of the core capability upgrades brought by Opus 4.6 is Agent Teams, meaning multiple Claude instances can work in parallel without real-time human supervision.

Nicholas Carlini, a researcher from Anthropic's security team, conducted a stress test: let 16 Agents start from scratch and write a C compiler capable of compiling the Linux kernel using Rust.

In two weeks, nearly 2000 Claude Code sessions, consuming 2 billion input tokens and 140 million output tokens, with a total cost of less than $20,000.

The final output is a 100,000-line compiler that can compile Linux 6.9 on x86, ARM, and RISC-V architectures, and can also run Doom.

已关注

关注

重播分享赞

观看更多

量子位

0/0

00:00/00:42

进度条，百分之0

播放

00:00

00:42

倍速

全屏

倍速播放中

0.5倍 0.75倍 1.0倍 1.5倍 2.0倍

超清流畅

若影片無法播放，請改看來源頁。

继续观看

Claude新模型4.6来了！更多饭碗没了：华尔街财务、编译器、安全白帽、PPT…通通失守

转载

Claude新模型4.6来了！更多饭碗没了：华尔街财务、编译器、安全白帽、PPT…通通失守

量子位

已同步到看一看写下你的评论

视频详情

This parallel mechanism allows each Agent to run in an independent Docker container, sharing a single git repository.

To prevent multiple Agents from colliding and all rushing to solve the same problem, the system uses a simple locking mechanism.

Agents "claim" tasks by writing files to the current_tasks/ directory, and git's synchronization mechanism automatically handles conflicts. There is no dedicated communication protocol between Agents, nor an orchestrator; each Claude decides what to do next on its own.

Carlini wrote in his blog:

"When the Agents started compiling the Linux kernel, they got stuck for a while because it was a huge monolithic task, and all 16 Agents collided on the same bug, overwriting each other."

The solution was to introduce GCC as an "oracle" control group, allowing each Agent to only compile a random subset of the kernel, locating the problematic file through binary search, so the parallel capability could truly be unleashed.

500 Zero-Day Vulnerabilities, Dug Out of the Box

Opus 4.6's performance in the cybersecurity field surprised even Anthropic itself.

During pre-release testing, Anthropic's frontier red team threw Opus 4.6 into a sandbox environment, gave it Python and regular vulnerability analysis tools (fuzzer, debugger, etc.), without any specific instructions or domain knowledge, and let it find vulnerabilities in open-source code on its own.

The result was that it dug out over 500 previously unknown high-risk zero-day vulnerabilities.

Each one was verified by Anthropic team members or external security researchers.

Specific cases include:

In GhostScript (a common tool for processing PDF and PostScript files), a vulnerability that could cause a crash was found, and it was discovered after traditional fuzzing and manual analysis failed to find the problem; Claude dug it out by itself by reviewing the project's git commit history;
Buffer overflow vulnerabilities were found in OpenSC (a tool for processing smart card data) and CGIF (a tool for processing GIF files); in the CGIF case, Claude even proactively wrote a PoC (proof-of-concept code) to prove the vulnerability's existence.

Logan Graham, head of Anthropic's frontier red team, said he wouldn't be surprised if this becomes one of the main ways for future open-source software security audits.

However, Anthropic also acknowledges that this capability could be misused.

For this reason, the team added six new cybersecurity detection mechanisms, and a real-time interception system may be launched in the future to block malicious traffic.

One More Thing

The official website shows that Anthropic is now "using Claude to build Claude."

Its own engineers use Claude Code to write code every day, and each new model is first tested in its own work environment.

Reference links:[1]https://www.anthropic.com/news/Claude-opus-4-6[2]https://www.anthropic.com/engineering/building-c-compiler[3]https://x.com/i/trending/2019496145987232014[4]https://www.axios.com/2026/02/05/anthropic-Claude-opus-46-software-hunting[5]https://techcrunch.com/2026/02/05/anthropic-releases-opus-4-6-with-new-agent-teams/[6]https://www.reddit.com/r/singularity/comments/1qwrrn7/Claude_opus_46_is_out/