Forcing AI to Speak Like a Caveman! Claude 'Anti-Chatter' Plugin Goes Viral as Users Tire of AI Verbosity

Reported by Xinzhiyuan

[Editor's Note] A plugin that makes AI speak like a primitive human has exploded overnight on Hacker News, surpassing 20k stars. Its core is a simple, blunt prompt: remove articles, pleasantries, and all fluff, claiming to save 75% of output tokens. Its popularity proves that developers have had enough of AI's long-windedness.

Recently, a Claude Code plugin called "caveman" took Hacker News by storm.

GitHub star growth curve

Looking at the GitHub star growth curve, "JuliusBrussee/caveman" climbed slowly for a long time before suddenly spiking. In about half a day, stars jumped from a few dozen to 500, and it has now surged past 20k!

Star growth graph

The viral success of "caveman"'s token-saving skill is a typical case of community emotional resonance. It shows that "AI Yapping"—a seemingly small pain point that has frustrated countless users—has been precisely targeted.

Some netizens have already called caveman the "best prompting technique of 2026," claiming it cuts out tokens wasted on polite fillers like "I'd be happy to help you."

User feedback screenshot

The plugin's function is simple: it makes the AI agent speak like a caveman.

Caveman mode example

It deletes "the," "please," "thank you," and all "human pleasantries" that do not affect technical meaning but continuously consume tokens.

Before and after comparison

https://github.com/JuliusBrussee/caveman

Created by developer Julius Brussee, the project's README poses a direct question: Why use so many tokens to say something that could be explained with a few?

README screenshot

This is a skill/plugin compatible with both "Claude Code" and "Codex." Its core logic is to make the agent speak like a "primitive man," compressing output to the extreme without sacrificing technical accuracy, claiming a token reduction of approximately 75%.

Token reduction claim

The question arises: does removing articles and politeness really save users 75% of their costs?

Section break

Analyzing SKILL.md: Netizens Shocked by Simplicity

How does caveman actually "save" tokens? Opening the core file, SKILL.md, reveals the content is quite short.

SKILL.md content

https://raw.githubusercontent.com/JuliusBrussee/caveman/main/skills/caveman/SKILL.md

The file frontmatter defines it as an "Ultra-compressed communication mode." It specifies that the goal is to push token usage lower by speaking like a caveman while maintaining technical accuracy.

It is activated when a user says "caveman mode," "talk like caveman," "use caveman," "less tokens," "be brief," or uses the /caveman command. It can also trigger automatically when higher token efficiency is requested.

The rules for saving tokens are blunt: no articles, no fluff, no politeness. Technical terms and code blocks are preserved; everything else is cut. Specifically:

  • Delete: Articles, filler words, pleasantries, and hedging expressions.
  • Allow: Short sentences and fragmented phrasing.
  • Prefer: Shorter synonyms (e.g., "big" instead of "massive," "fix" instead of "implement a solution").
  • Preserve: Exact technical terms, code blocks, and original error messages.
  • Recommended structure: [Problem][Action][Reason]. [Next Step].

For example, instead of writing: "Certainly! I'd be happy to help. The issue you're encountering is likely caused by...", it would write: "Bug in auth middleware. Token expiry check used <, not <=. Fix here:"

It supports three intensity levels: lite, full (default), and ultra.

  • lite: Removes fillers and hedging. Maintains full sentences and professional feel. Concise.
  • full: Further compresses expression. Omits some function words, allows fragments, and uses short words. Typical caveman style.
  • ultra: Heavy use of abbreviations (DB, auth, config, req, res, fn, impl). Removes conjunctions. Uses arrows for causality (X→Y). Uses one word instead of two wherever possible.

Example comparison:

  • lite: "The connection pool reuses open database connections instead of creating a new one for each request, avoiding repeated handshake overhead."
  • full: "Connection pool reuses open DB connections. Not new every request. Saves handshake overhead."
  • ultra: "Conn pool=reuse DB conn. Skip handshake→high concurrency faster."

Exceptions are made for security warnings, irreversible operation confirmations, multi-step processes, or when the user is clearly confused. This is explicitly noted in SKILL.md.

There are no model architecture changes or inference-level compressions. Caveman is essentially a well-crafted system prompt that constrains the AI's output style.

Crucially, Julius Brussee clarified on Hacker News that this skill does not target hidden reasoning tokens or thinking tokens.

Clarification screenshot

The model's internal "thinking" process isn't shortened; it only compresses the final output. Since loading the skill itself consumes context budget, the end-to-end cost savings may not equal the "75%" mentioned in the README.

Section break

Is the 75% Claim Reliable?

The author provided benchmark scripts and token comparisons in the README, showing reductions ranging from 22% to 87%, averaging 65%. However, it is difficult for outsiders to fully replicate every result based on the current public repository.

Benchmark table

The author noted on HN that these were preliminary tests, not rigorous benchmarks. However, the question of whether brevity harms performance has been studied academically.

Paper reference

https://arxiv.org/pdf/2401.05618

A 2024 paper, "The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models," showed that when models were asked to use concise reasoning, average response lengths for GPT-3.5 and GPT-4 dropped by 48.70% with almost no significant drop in overall problem-solving ability, though GPT-3.5's math performance dropped by 27.69%.

A 2026 paper, "Brevity Constraints Reverse Performance Hierarchies in Language Models," further pointed out that adding brevity constraints to large models could increase accuracy by 26 percentage points on some benchmarks and even change the performance ranking between different model sizes.

Paper reference

https://arxiv.org/pdf/2604.00025

These papers provide a background that brevity does not necessarily harm performance, but they research general strategies, not the specific "caveman" repository.

Section break

The Claude Code Plugin Ecosystem is Taking Off

Another reason for caveman's success is that Anthropic has provided a relatively complete skill and plugin mechanism for Claude Code.

Claude Code docs

https://code.claude.com/docs/en/skills

According to official documentation, developers only need to create a SKILL.md file for Claude to recognize it as a skill. The description determines when it loads automatically, and the name becomes a triggerable slash command.

Skill structure

The caveman repository includes directories like .claude-plugin and skills/caveman, showing it is a packaged extension following Claude Code's architecture, not just a few prompt snippets.

File structure

This is reminiscent of the early VS Code extension ecosystem: lightweight, almost joking extensions appear first, eventually growing into serious, specialized workflow tools.

Section break

Developers are Tired of AI Verbosity

Is caveman actually useful? As a strict "money-saving tool," one should be cautious. It compresses visible text, not the hidden reasoning tokens which often drive the bulk of the cost. Real token optimization lies in model tiering, context window management, prompt engineering, and caching strategies.

However, caveman is a signal. When a developer turns "stopping AI from yapping" into a plugin that goes viral, the focus shifts. It shows that AI verbosity is no longer a tolerable quirk, but a problem severe enough that users are taking matters into their own hands.

Developers are emotionally exhausted. Community forums are filled with complaints:

  • "I just need two lines of regex, and it gives me five paragraphs of regex history essays."
  • "Please stop saying 'Certainly! Here is the...' Just give me the error or the code."

On Hacker News, these complaints are tied to cost:

  • "I'm paying $15 per million tokens to read the AI's apologies and small talk."
  • "Because I wanted to change one punctuation mark, it output the entire 800-line file again. Watching my API balance drop is making me go broke."

When users prefer an AI that speaks like a "caveman" over paying for redundant output, the major AI companies should reflect. Why hasn't "restraint" become a core capability? Instead of focusing solely on the compute business, they must consider why users are increasingly unable to tolerate unnecessary output.

Related Articles

分享網址
AINews·AI 新聞聚合平台
© 2026 AINews. All rights reserved.