Developers Outraged! Disabling Claude Code Telemetry Slashes Cache from 1 Hour to 5 Minutes—Is Anthropic Charging a 'Privacy Tax'?

Introduction
A single GitHub issue has sent shockwaves through the developer community: some discovered that disabling telemetry in Claude Code causes the prompt cache TTL (Time To Live), which was normally 1 hour, to quietly drop to 5 minutes. With 340,000 views and 4,500 likes, developers are feeling betrayed—claiming that if you want privacy, you must pay a 12-fold price. An Anthropic engineer later acknowledged the phenomenon, insisting it wasn't a "punishment" but a side effect of the coupling between experimental gates and default values. However, the anger continues to spread.


One Switch, Quietly Changing Your Costs

The story begins with a single environment variable.

While using Claude Code, some developers habitually set DISABLE_TELEMETRY=1—a standard operation for many privacy-conscious engineers, nothing unusual about it.

But they soon noticed their bills were changing.

It wasn't a massive price hike, but a more subtle shift: the same workflow and the same codebase saw slower recovery speeds after interruptions, faster quota consumption, and increasingly difficult long sessions.

Eventually, someone started digging into the local JSONL logs.

Claude Code saves detailed data of every API call to the ~/.claude/projects/ directory. Two specific fields were found:

  • usage.cache_creation.ephemeral_1h_input_tokens
  • usage.cache_creation.ephemeral_5m_input_tokens

A comparison revealed the truth.

Can Vardar's tweet: Claude Code penalizing users for disabling telemetry

▲ Can Vardar (343k views, 4,500 likes): "Is Claude Code actually punishing you for turning off telemetry??"


The Data Doesn't Lie: Baseline vs. Telemetry-Off

Community members wrote Python scripts to reproduce the issue. The method was simple: run three different configurations and read the token fields in the JSONL logs.

The results were as follows:

baseline: ttl=60m 1h=8215 5m=0
disable_telemetry: ttl=5m 1h=0 5m=8094
disable_nonessential_traffic: ttl=5m 1h=0 5m=8099

Same Claude Code, same model, same snippet of code. The only difference was the telemetry switch.

The 1h token field dropped to zero, while the 5m token field jumped from 0 to 8094.

This wasn't a feeling; it was hard data readable from local files.

Subsequently, independent reproductions appeared across platforms:

  • macOS + Claude Code 2.1.104 + Max plan: Reproduced
  • macOS + Opus 4.5: Reproduced
  • Windows/WSL2 + Max plan: After removing DISABLE_TELEMETRY=1, new sessions immediately switched back to 1h
Carlos Villuendas' tweet: Privacy cost is 12x

▲ Carlos Villuendas (245k views, 2,700 likes): "Disable telemetry, and Anthropic changes your cache from 1 hour to 5 minutes. Your privacy comes at a 12x cost."


GitHub Issue #45381: This is a Bug Report, Not Metaphysics

The real tipping point was GitHub issue #45381.

The title was blunt:

[BUG] Disabling telemetry also disables 1-hour prompt cache TTL
GitHub issue #45381 top

▲ anthropics/claude-code#45381: Reproduced by multiple developers, eventually closed by official staff with a promise to fix.

The value of this issue wasn't just that someone complained, but that it upgraded the situation from "I feel it's more expensive" to:

A real, cross-platform behavior observable by multiple people and independently verifiable via local JSONL fields.


Anthropic Engineers Step In, But the Answer is Unsettling

After the issue was filed, Anthropic engineer bcherny appeared.

His explanation can be summarized as follows:

"The 1-hour prompt cache is subtle; not all requests should have a 1h TTL. We've been testing different heuristic strategies for subscribers to improve cache hit rates, reduce average token usage, and lower latency. These strategies are controlled via experiment gates, and the gate results are cached on the client. When you turn off telemetry, we also turn off experiment gates because that means 'no calling home.' Once the gate is closed, Claude Code reverts to the default—and that default is 5m."

He added:

"Fix going out in the next release!"
Reply from Anthropic engineer bcherny

▲ Anthropic engineer bcherny admitting the phenomenon in the issue, explaining the coupling of experimental gates and telemetry, and promising a fix in the next version.

Wait.

So the official stance is: 1. We are running experiments; the results determine if you get 1h or 5m. 2. Experiments require telemetry to run. 3. You turned off telemetry, so the experimental gate closed. 4. Gate closed, you get the 5m default.

This explanation is technically consistent. However, it also implies: your cache benefits were never your right, but a byproduct of an experiment from the start.


'Silent Rug Pull': It's Not Just About the 5 Minutes

Daniel Nguyen traced TTL distributions from January to April 2026 using 119,866 local API call records.

He found that from February to early March, 1h TTL was dominant. Around March 6-8, 5m began to significantly overtake it.

Daniel Nguyen's data analysis tweet

▲ Daniel Nguyen (129k views, 1,300 likes): Actual cost $78.99, should have been $37.54 if 1h TTL was maintained; overpaid by $41.45, a 52.5% waste.

Sigrid Jin used the term: "silent rug pull."

Sigrid Jin's tweet (Part 1)Sigrid Jin's tweet (Part 2)

▲ Sigrid Jin: "Quietly dropping Claude Code cache TTL from 1h to 5m is a crazy rug pull."

This phrase resonated with many.

Because what truly angered developers wasn't just "I spent more money," but rather:

This was discovered by developers digging through JSONL logs, not from a changelog or advance documentation.

You subscribed to the Max plan, thinking you knew what you bought. Then one day, someone tells you: actually, you've been part of an experiment, the results decided what you received, and you were never told.


Is the '12x' Cost Real? Official Response: Not That Exaggerated

The most viral claim on X was that "privacy costs 12x."

An Anthropic engineer responded directly:

"the token savings is nowhere near 12x"

From a technical perspective, this claim is indeed exaggerated. The difference between 1h and 5m depends on the specific workflow:

  • For power users with long sessions and frequent interruptions: The impact is significant.
  • For subagents or single short requests: There is almost no difference; in fact, 1h can be more expensive (because cache write costs are higher).

But the fact that "12x" went viral proves one thing: developers have accumulated a significant amount of distrust regarding the transparency of this caching mechanism.

Tyler's tweet about 12x privacy cost

▲ The claim "privacy costs 12x token" spread rapidly on X.


The Bigger Picture: Claude Code's Cache Strategy is Always Shifting

Issue #46829 provided another clue.

Anthropic engineer Jarred-Sumner explained that changes around March 6 were "ongoing optimization work," not a regression bug. The team selects TTL based on request type because:

  • 1h writes are more expensive than 5m writes.
  • Many subagent requests have extremely short intervals and don't need 1h.
  • For these requests, 5m is actually cheaper.
GitHub issue #46829 Jarred's reply

▲ Anthropic engineer Jarred-Sumner: The March changes were active optimizations, not an accident.

This explanation is logically sound.

But it also means: Anthropic is dynamically adjusting your caching strategy without your knowledge.

For users paying per token, this might actually save money. But for Max subscribers, the issue isn't unit price, but rather: quota, experience, and the fundamental question of "what did I think I was buying?"


Conclusion: Issue Closed, Fix Promised

The good news is that #45381 has been closed, and the official team promised a fix in the next version.

Specific directions include:

  • Changing client-side defaults for certain queries to 1h.
  • Providing environment variables in the future to allow users to force a specific TTL (1h or 5m).

The bad news is that this event exposed a problem deeper than a single bug:

As AI coding tools become increasingly "black box," and as caching strategies, experimental gates, and subscription benefits are dynamically adjusted behind the scenes, "don't trust, verify" is becoming the new norm of developer culture.

Some are already writing scripts to monitor their JSONL logs.

Some are already asking: "What else don't I know?"

This isn't a story about a 5-minute cache. This is a story about trust.

Mario Nawfal account secondary propagation

▲ The incident continues to ferment, with major accounts like Mario Nawfal following up on the story.


— END —

Related Articles

分享網址
AINews·AI 新聞聚合平台
© 2026 AINews. All rights reserved.