9.7K Stars: Slashing AI Coding Token Consumption by 16x

Use Tree-sitter to build a graph of your codebase, track the "blast radius" from any changed file, and let AI read only the affected files. This saves an average of 8.2x in tokens.

I've recently noticed a phenomenon: when using Claude Code or Cursor for code reviews, token consumption is often ridiculously high. For a tiny change involving just a few lines of code, these tools read through the entire project's files.

At first, I thought I was using them incorrectly. Then I realized—this is the default behavior of all current AI coding tools. They lack a concept of code structure; they don't know which files are related to your changes and which are completely irrelevant. So, the safest strategy for them is: read everything.

It's like fixing a kitchen faucet, and the plumber insists on re-examining the water pipe blueprints for the entire building. Safe? Yes. Reasonable? No.

This time, I'm sharing a project called code-review-graph (9,700 stars) that specifically solves this problem. I believe its core philosophy is even more worth discussing than its specific implementation because it reveals an overlooked fact: The bottleneck for AI coding tools isn't model capability; it's the quality of information input.

"Blast Radius"—The Soul of the Entire Project

The first thing code-review-graph does is use Tree-sitter to parse your codebase into a graph.

This isn't full-text indexing or vector embedding; it's a true structural graph: every function is a node, every class is a node. A function calling another creates an edge; class inheritance creates an edge. Import relationships and test coverage relationships are also edges. Your codebase becomes a network of "who depends on whom."

Then comes the key part—when you modify a file, the graph doesn't re-analyze all the code. Instead, starting from that file, it traces all nodes that could potentially be affected along the edges. These nodes constitute the "blast radius" of that change.

Changed a utility function? The graph tells you which 8 files call it, noting that 3 have test coverage while 5 do not. The AI only needs to read these 8 files and glance at the test coverage gaps to produce a high-quality review. What about the remaining hundreds of files? They are completely irrelevant to this change, so not a single token needs to be spent on them.

To be honest, this approach isn't very "AI"—it's essentially static analysis, an old trick from compiler theory classes. But it's precisely this "using old tools to solve new problems" approach that impresses me. While everyone is obsessed with scaling models and engineering prompts, few step back to realize that the information fed to the model itself is problematic.

Real-World Data: 6 Open-Source Projects, Saving 8.2x on Average

These aren't made-up numbers. The project comes with its own benchmark suite, tested across 13 commits in 6 real open-source repositories (express, fastapi, flask, gin, httpx, nextjs):

Project	Without Graph (Original Tokens)	With Graph (Graph Tokens)	Savings Factor
flask	44,751	4,252	9.1x
gin	21,972	1,153	16.4x
fastapi	4,944	614	8.1x
nextjs	9,882	1,249	8.0x
httpx	12,044	1,728	6.9x
express	693	983	0.7x

For gin, it saved 16.4x—code that originally required reading 22,000 tokens now only needs 1,153.

The row for express actually shows an increase. The author honestly admits: for small changes in single-file projects, the structural metadata inherent to the graph (edges, node types, review guidelines) can exceed the size of the original file volume. This is an honest admission of limitation, which makes me trust this dataset even more.

What impressed me even more was the recall of the impact analysis: **100%**. This means the "blast radius" traced by the graph absolutely never misses a truly affected file. Precision is only 0.38—it may over-report some files—but in a code review scenario, missing a bug is far more serious than reading a few extra files. This trade-off logic is remarkably clear-headed.

Extremely Lightweight Technically, Which Was Surprising

I originally assumed such a project would require at least a vector database or a model run for code understanding. As it turns out, its core tech stack is:

Tree-sitter: Parses ASTs, supporting 19 languages + Jupyter notebooks.
SQLite: The graph is stored in a file within a local .code-review-graph/ directory.
SHA-256 Hashing: Determines if a file has changed, re-parsing only the modified parts.

No GPU, no cloud services, no external databases. Initial build for a 500-file project takes ~10 seconds; subsequent incremental updates take less than 2 seconds. Even for a 2,900-file project, incremental updates remain under 2 seconds.

Diagram showing the code-review-graph architecture

It connects to AI tools via MCP (Model Context Protocol) and currently supports Claude Code, Cursor, Windsurf, Codex, Zed, and others. Installation requires just two commands:

git clone https://github.com/tirth8205/code-review-graph.git
cd code-review-graph
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

It offers 22 MCP tools and 5 workflow templates (review, architecture analysis, debug, onboarding for newcomers, pre-merge checks). For a project with 9,700 stars, the feature coverage is already quite comprehensive.

An interesting detail: it can also perform community detection—using the Leiden algorithm to cluster code nodes, identifying which modules are highly coupled and which are isolated silos, then automatically generating an architecture overview and a Markdown wiki. It's no longer just a code review tool; it's more like a "health report generator" for your codebase.

A Bold Prediction

code-review-graph brings to mind a larger issue: Context management in all current AI coding tools is too crude.

Models are getting stronger, and context windows are getting larger, but no one has seriously considered "what the model should see and what it shouldn't." Stuffing the entire codebase in wastes money (tokens aren't free) and lowers quality (the "lost in the middle" effect—where key information gets drowned out by大量 irrelevant code).

The approach of code-review-graph is a signal: The next round of competition for AI coding tools may not be about model capability, but about context engineering. Whoever can feed information to the model more precisely will achieve better results.

Recommended Reading

Anthropic "Leak": How Powerful is the Strongest Claude Mythos Model?

Memento-Skills VS OpenClaw: Evolution Without Model Changes

What if Agents Forget? Anthropic Teaches Effective Handovers

Anthropic Uses GAN Concepts to Solve AI Output Quality Issues

100K Stars Counter-Intuition: The Bottleneck of AI Coding Was Never the Code Itself

9.7K Stars: Slashing AI Coding Token Consumption by 16x

"Blast Radius"—The Soul of the Entire Project

Real-World Data: 6 Open-Source Projects, Saving 8.2x on Average

Extremely Lightweight Technically, Which Was Surprising

A Bold Prediction

Related Articles

分享網址