At 19, Dropping Out of the Ivy League: How This Group of Young Chinese Innovators is Reconstructing AI Memory

By Wen Le, from Aofeisi | Quantum位 | Official Account: QbitAI

The aftershocks of the Claude Code source code leak continue to ripple through the AI community!

It's quite paradoxical: while Claude has contributed to nearly all RAG-based memory projects, the leaked code reveals something surprising—

It doesn't actually use mainstream RAG technology itself??

Diagram showing Claude's memory architecture discrepancy

This is indeed contradictory. Anthropic's official documentation and technical blogs have consistently stated support for RAG retrieval.

Screenshot of Anthropic's official documentation mentioning RAG

Their "abandonment" of traditional RAG approaches actually highlights a critical issue: existing RAG solutions simply aren't meeting performance standards.

Since 2023, hybrid retrieval has become the standard logic for memory engines—vector plus keyword matching, weighted ranking... these patterns have continuously iterated.

However, as AI memory scenarios grow increasingly complex, the bottlenecks of traditional RAG have become fully exposed. Despite being called "memory engines," they're still performing the work of search engines—merely matching similar text without achieving true understanding, let alone associative reasoning.

So what's the solution? The answer is simple—

Tear it down and start over.

Looking back at the evolution of AI memory, the trajectory is remarkably clear:

The first generation forcibly stuffed in full context, like reading an entire diary; the second generation relied on vector plus keyword matching, similar to looking up words in a dictionary—but could only find similar content, failing to capture genuine connections.

Now, the third generation of memory models has arrived.

These are cognitive models capable of autonomous association, reasoning, and establishing cross-structural connections.

Animation showing evolution from first to third generation AI memory

Chinese Team's Proprietary Architecture Leads Benchmarks

To enable AI to perform reasoning and association, everyone agrees that effective organization of multi-granularity memory is key.

Simply put, this means allowing AI to simultaneously process fine-grained facts and coarse-grained context, while freely switching (transitioning associations) between them.

Yet this very issue has been the core bottleneck that the entire memory engine industry struggled to breakthrough between 2023 and 2026.

However, we've recently observed a Chinese team with an average age of 19, Flow Element, providing a viable solution—

M-FLOW, leveraging its proprietary graph-routing Bundle Search architecture, has achieved phenomenal leadership in benchmarks.

Compared to mainstream methods like Mem0, Graphiti, and Cognee, M-FLOW demonstrates significant performance advantages across three core scenarios: multi-turn conversations, long-term memory, and multi-hop reasoning.

Aligned with Mem0's official benchmark test (LoCoMo), leading Mem0 by 36%;
Aligned with Graphiti's official benchmark test (LongMemEval), leading Graphiti by 16%;
In long-term event evolution testing (EvolvingEvents), leading Cognee by 7% and Graphiti by 20%.

Tests conducted without any filtering, using industry-standard benchmarks

After in-depth evaluation, it's clearer to see that across 29 capability dimensions covering writing, retrieval, preprocessing, knowledge organization, and more, M-FLOW achieves comprehensive support in the vast majority of critical dimensions. (Scroll up and down in the image below for full view)

Comprehensive capability comparison table across 29 dimensions

It particularly excels in core capabilities determining memory quality, such as graph-enhanced retrieval, referential resolution, and multi-granularity indexing.

Behind these achievements lies the systematic advantage brought by M-FLOW's architecture:

The retrieval process doesn't rely on LLMs, achieving millisecond-level response times;
It maintains stable performance close to standard benchmarks even under ultra-large memory volume scenarios;
It's the industry's first memory engine supporting referential resolution, making AI's understanding of information more aligned with human thinking (referential resolution means distinguishing between "he" and "it" in events).

Animation demonstrating referential resolution capability

Moreover, it has virtually no usage threshold—the deployment process is extremely simple, requiring just one line of code to complete integration when a Docker environment is available.

Screenshot of one-line Docker deployment command

Of course, while it's easy to get started, before deployment, let's address the question everyone's curious about:

How did M-FLOW achieve this?

The answer actually returns to that opening statement: Tear it down and start over.

Unlike the大量 homogeneous memory solutions prevalent in the industry today, M-FLOW isn't about using LLMs to assist retrieval for higher benchmark scores, nor is it about simply stacking features.

Accurately speaking, it fundamentally reconstructs the organization and utilization system of AI memory.

Enabling Memory to Associate and Reason

In fact, all RAG systems face one common problem: given a user query, how to precisely locate the relevant stored knowledge?

The logic of mainstream approaches is straightforward: chunk documents, vectorize them, store in a vector database, and rank by cosine similarity during retrieval.

This approach essentially answers only one level of question: "Which text segment is semantically closest to the query?" It works well for simple fact-finding but completely fails in complex scenarios because:

Answers are distributed across documents: Chunked documents lack structural connections, unable to integrate related information scattered across different documents;
Mismatch between query and storage granularity: Macro-level questions retrieve trivial fragments, while micro-level questions match vague summaries;
Same entity, different context separation: When two documents discuss the same entity but in different contexts, they remain distant in vector space, unable to establish connections.

The root cause is that flat vector retrieval discards the intrinsic structure of knowledge.

It can determine similarity between text and query, but has no understanding of where that text sits topologically within the entire knowledge system.

On this point, M-FLOW replaces traditional flat retrieval with graph-routing retrieval, with core logic revolving around hierarchical knowledge topology. Its key insight is:

Not just finding "matching text," but locating the complete knowledge structure to which the matching point belongs, then scoring the entire structure.

Inverted Cone Structure Design

M-FLOW organizes all ingested knowledge into a four-layer directed graph, forming an inverted cone:

Diagram of M-FLOW's inverted cone four-layer structure

This structure's directionality is counter-intuitive: in traditional knowledge graphs or classification trees, things become more specific as you go deeper.

But in M-FLOW, the search "entry point is at the cone's tip" (fine-grained Entity and FacetPoint are most easily precisely hit by vector search), while the search "target is at the cone's base" (Episode is the final knowledge unit returned to users).

Information flow converges from sharp matching points downward to broad semantic landing points.

This breaks the traditional retrieval paradigm of "browsing from top to bottom."

Users aren't narrowing scope layer by layer in a hierarchy; instead, the system captures signals at the sharpest points, then propagates downward along the graph structure to the complete semantic unit to which they belong.

Animation showing information flow from cone tip to base

This is a process from fine to coarse: first capturing signals precisely at the sharpest points, then propagating downward along the graph structure to the complete semantic unit to which they belong.

How Graph-Routing Bundle Search Works

When a query arrives, the system doesn't simply find the nearest node.

It evaluates all possible paths in the graph that can reach each Episode to find the optimal Episode.

Phase One: Casting a Wide Net at the Cone Tip

After the query is vectorized, it simultaneously searches across seven vector sets, covering every layer from cone tip to cone base. Each set returns up to 100 candidates.

The most easily precisely hit are nodes at the cone tip—an Entity name or a FacetPoint assertion.

These fine-grained anchors have extremely focused semantics and small vector distances.

Episode summaries at the cone base might also be hit, but because their semantics are broader, matches are typically less precise than at the cone tip.

Phase Two: Projection into the Graph

These anchors serve as entry nodes into the knowledge graph.

The system extracts their surrounding subgraph—edges, neighbors, connectivity relationships—then expands one-hop neighbors.

This transforms a set of isolated vector hit points into a connected topological structure.

Phase Three: Propagating Cost from Cone Tip to Base

This is the core step, and the essence of graph-routing Bundle Search—

Capture signals at the cone tip, propagate along graph edges to the base, converging scoring at Episodes.

For each Episode in the subgraph, the system evaluates all possible paths from anchors to reach it:

Diagram showing path cost calculation in graph routing

The cost of each path consists of three parts:

Starting cost: vector distance of the anchor (sharpness of the signal);
Edge cost: vector distance of each edge along the path (relevance of connection relationship to query) plus jump penalty;
Miss penalty: default high cost when an edge isn't hit by vector search.

The final score of an Episode is the minimum cost among all paths.

Three Unconventional Design Choices

1. Edges Also Carry Semantics, Becoming Active Filters

In traditional knowledge graphs, edges (connections between nodes in the graph) serve only as type labels, such as 'works_at' or 'located_in', not participating in semantic retrieval.

When querying a graph, you either traverse edges or ignore them, because edges themselves don't carry searchable semantics.

In M-FLOW, however, every edge carries natural language descriptive text. These texts are vectorized and also participate in search.

This means edges are no longer passive connectors but active semantic filters.

Animation showing edges carrying semantic meaning

During the cost propagation phase, the system not only knows a connection exists between two nodes but also understands how relevant that connection relationship itself is to the current query.

Thus, even if both nodes of an edge are hit by search, as long as the edge's own semantics are irrelevant to the query, it will be judged as high-cost, directly cutting off this unreasonable association path.

2. Taking Minimum Path Cost, Not Average Cost

Why take the minimum? The team primarily considered a retrieval philosophy—a single strong evidence chain is sufficient to prove relevance.

An Episode might associate with 10 Facets, but 9 are irrelevant to the query.

Traditional methods would average all path costs, letting irrelevant paths inflate the score;

M-FLOW only looks at the best path.

As long as one Facet connects to the query via a low-cost path, this Episode should be retrieved.

This also corresponds to how human memory works: for instance, when you recall something, it's usually because one clue was strong enough, not because all clues pointed to it.

3. Penalizing Direct Hits, Preferring Precise Anchor Paths

This is the most counter-intuitive design: when a query directly matches an Episode summary, the system instead imposes additional penalty on this path.

The reason for penalizing the most direct hits is that they appear relevant to many queries.

An Episode summary about project management might have decent vector distance to any query mentioning "project" or "management."

But this match is broad and lacks focus—this actually reflects the root cause of retrieval noise in many RAG systems.

M-FLOW's design preference is to prioritize precise paths starting from the cone tip (FacetPoint, Entity).

Even if it takes a few more hops, it prioritizes these; direct Episode hits only win when no better alternative paths exist.

This ensures the precision of retrieval results—not broad summaries that touch on everything, but Episodes supported by concrete evidence chains.

Topological Reasoning

To explain why this mechanism works, the fundamental advantage lies in the fact that graph topology encodes knowledge organization structures that vectors themselves cannot capture.

Anchors can be found at multiple granularities. For example, when asking a macro question like "What happened during the database migration?", the system directly matches the Episode summary.

Although penalized for direct hits, since there's no more precise cone-tip path, this result still wins out.

For precise questions like "Is the P99 target below 500ms?", it strongly matches a FacetPoint, reaching the Episode via two hops from the cone tip—the extremely small starting distance makes the overall cost very low.

The system doesn't require manual granularity selection; the inverted cone topology automatically finds anchors at the most appropriate level.

Cross-document entity bridging: When "Dr. Zhang works at MIT" appears in Document A and "MIT published a quantum computing breakthrough" appears in Document B, both Episodes share the same Entity node: MIT.

When users query MIT, the cone tip hits this entity, and cost propagates downward simultaneously to both Episodes, thus retrieving associated results from two independent documents without requiring additional LLM reasoning—the graph structure itself completes the bridging.

Animation showing cross-document entity bridging via MIT

Structural noise filtering: In traditional flat retrieval, many semantically similar but thematically irrelevant text fragments rank higher.

In Bundle Search, any fragment must trace back along edges to some Episode.

If edges along the way are semantically irrelevant to the query, path costs rise rapidly, naturally pushing irrelevant results downward.

The graph structure itself acts as a powerful semantic noise filter.

Cost propagation is reasoning: Every path in the graph is essentially a reasoning chain—

Query matches this fact → Fact belongs to this dimension → Dimension belongs to this event.

Path cost quantifies the tightness of this reasoning chain; the system can complete lightweight multi-hop reasoning within 2–3 hops, with no need to call LLM during the retrieval phase.

Adaptive Confidence

Not every layer of vector sets is equally reliable for every query.

The system calculates two metrics for each set: absolute matching strength and discriminability, then classifies sets into "node-type" and "edge-type," dynamically allocating weights based on confidence.

For example, in one query, if the Entity set's confidence is significantly higher than the Facet set, the system automatically increases the influence of Entity paths.

It doesn't use fixed weights but adjusts retrieval strategy in real-time based on which granularity's hits are more credible in the current search.

An Additional Adjustment Mechanism

There's also an additional adjustment mechanism: when a Facet's vector distance to the query is extremely small and highly aligned, the system significantly reduces edge cost and jump cost on this path.

The logic is intuitive: if a Facet already almost perfectly matches the query, then its connection to the Episode is basically reliable, no need for repeated verification through edge semantics.

Beyond this, the system also includes mechanisms like query preprocessing, parallel multi-mode scheduling, result pruning, and more...

So in summary, M-FLOW's retrieval isn't a simple stacking of vector search plus graph database; the graph itself is the retrieval mechanism.

Chinese Memory Engine: Late Starter, Early Leader?

In China, external memory hasn't received as much attention as abroad. However, the M-FLOW team avoided homogeneous feature stacking, achieving a domestic breakthrough from scratch in this field, with world-leading performance, while insisting on open-source openness...

Many first-time users of memory engines have an intuitive confusion: Isn't human recollection about finding relevant information? Why does AI memory always seem to be searching for textually similar information?

This most common question precisely hits the core crux of AI memory solutions.

From the first generation's brute-force full-context memory to the second generation's vector-plus-keyword retrieval memory, AI has remained stuck at textual form matching, far from true understanding and association.

M-FLOW reconstructs the underlying logic of AI memory using graph structures, solving the granularity and connection issues of memory graphs, enabling AI memory to leap from form-similarity matching to association and reasoning.

Notably, this project was independently developed by a team with an average age of 19, who dropped out of Ivy League institutions.

In the AI community, stories of teenage prodigies always attract significant attention. After this technical breakthrough, we also want to know:

How far can these young people go in the future...

Project address: https://github.com/FlowElement-ai/m_flow
Product website: https://m-flow.ai
Company website: https://flowelement.ai