Overnight, AI Gains 'Permanent Memory'! Smashes SOTA with 99% on Toughest Exam, Netizens Go Wild

Reported by New Intelligence Element

Editors: Taozi, Hao Kun

[New Intelligence Element Briefing] AI finally has "permanent memory"! Today, the super memory system ASMR made a grand entrance, achieving a staggering 99% score on LongMemEval, the industry's most notoriously difficult AI memory exam. The entire internet is calling it insane.

Has the AI memory dilemma been completely solved?

Today, the Supermemory team exploded onto the scene, dropping a nuclear bomb on the world:

The super memory system "ASMR" has been unveiled, achieving a 99% accuracy rate on LongMemEval, the toughest exam in the field of AI memory.

Billions of agents globally require memory, and now, AI "amnesia" has almost been conquered.

Yes, you heard that right!

With a nearly invincible posture, ASMR smashed the State-of-the-Art (SOTA) records, instantly topping today's trending charts on X.

It abandons traditional "vector databases" and discards embedding patterns, operating entirely within memory.

This time, ASMR employs a pipeline of "multi-agent parallel reasoning," with specific divisions of labor as follows:

Three "Observer Agents" read raw data in parallel, extracting information across six dimensions: personal info, preferences, timelines, etc.

When a user asks a question, three "Search Agents" are dispatched for active reasoning and retrieval.

Nowadays, the entire internet is flooded with comments of "This is insane."

Slide up and down to view.

Notably, ASMR will open-source its entire codebase in early April, officially kicking off the "Age of Discovery" for AI memory!

Overnight, AI Gained "Permanent Memory"

First, let's mark the very first sentence of this blog post:

The memory problem for AI Agents may now be completely solved.

A few months ago, Supermemory released its inaugural research report, securing an 85% score on the LongMemEval-s test.

This score was already far ahead of all publicly available memory systems at the time.

Today, the birth of the super memory system "ASMR" (Agent Search and Memory Retrieval) has once again刷新 the record.

Its technical implementation is remarkably simple.

It requires no vector databases or embeddings, running entirely in memory.

This means it can be embedded into other systems, or even hardware like robots.

So, how exactly was ASMR built?

ASMR: Multiple Agents Working in Parallel

It is important to know that LongMemEval is currently one of the most rigorous long-term memory benchmark tests publicly available.

Many benchmarks only test simple retrieval within short contexts, but LongMemEval is different; it aims to simulate various chaotic situations in real production environments:

Dialogue histories exceeding 115,000 tokens, contradictory information, scattered events spanning multiple sessions, and complex questions requiring temporal reasoning.

Most memory systems perform poorly, often due to issues with "retrieval" rather than reasoning.

Even with high recall rates, if the retrieval process is accompanied by significant noise, LLMs still struggle to utilize this information.

The primary challenge lies in how to place only the correct information into the context window; an even more difficult task is—how to determine that a retrieved fact is outdated and has been superseded by an updated version.

Furthermore, while standard vector search works well in most cases,

it falls short when handling high-density information and temporal data details spanning multiple sessions. Semantic similarity matching cannot reliably distinguish whether a fact is "old information" or a "new correction."

To cope with the complexity of LongMemEval, one must rethink the information ingestion and retrieval pipeline from scratch, replacing vector mathematics with active Agent reasoning.

Thus, the team stepped outside the traditional RAG framework to construct a pipeline of "multi-agent collaborative orchestration."

3+3 Agents, Each with a Specific Role

Just like ASMR itself, this technology is straightforward and extremely satisfying.

Observer Agents: Parallel Ingestion

First, an agent orchestrator composed of three parallel readers—the Observer Agents—is deployed.

These Agents concurrently read raw sessions; for instance, Agent 1 handles sessions 1, 3, 5, while Agent 2 handles 2, 4, 6.

The goal of the Observer Agents is targeted knowledge extraction around "six dimensions": personal information, preferences, events, temporal data, information updates, and assistant information.

Then, these structured findings are "natively stored" and mapped back to their source sessions.

Search Agents: Active Retrieval

When a question is received, ASMR does not query a vector database.

By deploying three parallel "Search Agents," these AIs actively read and reason over the stored findings, with each Agent having a specific focus:

Agent 1: Searches for direct facts and explicit statements.

Agent 2: Looks for relevant context, social cues, and implications.

Agent 3: Reconstructs timelines and relationship graphs.

The orchestrator aggregates the findings of all three "Search Agents" and extracts verbatim snippets from the original sessions for detailed verification.

This mechanism allows the system to perform intelligent retrieval based on genuine cognitive understanding, rather than relying solely on keywords or mathematical similarity.

Once the context is integrated, a single prompt cannot handle the wide variety of questions in LongMemEval.

Some questions require inferring details, while others demand extremely specific answers.

Next, Supermemory attempted two distinctly different AI Agent answer workflows.

8-Variant Cluster (98.6% Accuracy)

Route the retrieved context to 8 highly specialized prompt variants running in parallel.

For example, Precision Counter, Time Expert, Context Deep Dive, etc. Each variant independently evaluates the context and generates an answer.

If any one of these 8 distinct reasoning paths successfully derives the correct answer (Ground Truth), the question is marked as correct.

This parallel multi-judgment method allowed ASMR to achieve an astonishing overall accuracy of 98.60%, perfectly covering blind spots.

12-Variant Decision Forest (97.2% Accuracy)

To test a system designed to produce a single, authoritative answer relying on multiple independent attempts, the team further expanded ASMR into a decision forest containing 12 variants.

Here, 12 highly specialized AI Agents (powered by GPT-4o-mini) answer the prompt independently.

Furthermore, an "Aggregator LLM" was introduced as the final judge.

The aggregator synthesizes these 12 answers through majority voting, domain trust, and conflict resolution mechanisms.

This single consensus model also achieved a stunning accuracy rate of up to 97.2%.

It should be noted that ASMR is not yet used in Supermemory's core production environment.

This experiment not only refreshed the data but also validated several key concepts:

Agent Retrieval Outperforms Vector Search: Active search eliminates semantic similarity traps and solves the problem of information invalidation caused by temporal changes.
Parallel Processing is the Core of Efficiency: Distributing the load among multiple specialized Agents significantly improves extraction speed and granularity.
Specialized Division of Labor Beats General Models: Specialized experts (such as detail extractors) perform far better than a single all-purpose prompt.

Supermemory is the Real Ambition

But if you think ASMR is just a benchmark-chasing experiment, you underestimate this team.

Behind ASMR lies a complete memory engine called Supermemory—a memory and context infrastructure designed for all AI applications.

Your AI forgets everything between conversations. Supermemory fixes that.

GitHub Address: https://github.com/supermemoryai/supermemory

Memory ≠ RAG, These Are Two Different Things

The ASMR mentioned earlier solves "how to precisely find the correct information from massive amounts of dialogue."

But the problem Supermemory aims to solve is bigger: giving AI true memory, not just retrieval.

The difference is that RAG doesn't recognize individuals; the result it returns to Zhang San today is exactly the same as what it returns to Li Si tomorrow. In contrast, Supermemory actively extracts facts from conversations, tracks changes, handles contradictions, and even automatically forgets.

For example, if you told the AI last month "I live in Beijing," and this month you say "I just moved to Shanghai," RAG would feed both pieces of information to the large model and let it guess. Supermemory knows the latter overrides the former and only returns "Shanghai."

Even more impressive is the "automatic forgetting" mechanism. If you say "I have an exam tomorrow," once the date passes, this memory automatically expires. Temporary facts do not become permanent noise.

By default, Supermemory merges RAG and memory to run in the same query, returning knowledge base retrieval and personalized context in one go.

50 Milliseconds, One API Call to Handle User Profiles

Beyond memory, Supermemory also takes over user profiling.

In traditional solutions, to make an AI "know" a user, you need to build a user profile system yourself, manually maintaining tags, preferences, and historical behavior. Supermemory automates all of this.

It splits user information into two layers:

Static facts ("Senior Engineer," "Uses Vim," "Prefers Dark Mode")
Dynamic context ("Currently migrating authentication modules," "Debugging rate-limiting issues").

With one API call and a latency of about 50 milliseconds, your Agent knows exactly who is sitting on the other side.

Injecting this profile into the system prompt instantly switches the Agent from "Stranger Mode" to "Old Friend Mode."

"All-in-One" Connectivity

Memory relying solely on dialogue isn't enough; Supermemory also connects to a whole suite of external data sources.

Google Drive, Gmail, Notion, OneDrive, GitHub—all automatically synchronized via real-time webhooks.

Documents are automatically processed upon upload: PDF parsing, image OCR, video transcription, and code AST-level chunking. Upload and search immediately, zero configuration.

For developers, integration costs are minimized.

Install one npm package, a few lines of code, and your Agent gains complete memory capabilities. Mainstream AI development frameworks like Vercel AI SDK, LangChain, LangGraph, OpenAI Agents SDK, and Mastra all have ready-made wrappers.

Built-in plugins for Claude Code, OpenCode, and OpenClaw.

You don't even need to write code.

Supermemory provides an MCP server; install with a single command, and use it directly with Claude Desktop, Cursor, Windsurf, and VS Code.

The Battle for Memory Has Just Begun

From experiment to product, what the Supermemory team is doing can be summarized in one sentence: transforming AI's "working memory" from an add-on feature into a layer of infrastructure.

In the past few years, competition among large models has focused on parameter scale, inference speed, and context window length.

But no matter how large a 128K context window is, it clears once the conversation ends, and the next meeting is still like meeting a stranger.

Memory is the final piece of the puzzle that turns AI from a "tool" into a "partner."

When every Agent can remember who you are, what you are doing, and where you left off last time, the human-computer interaction experience will undergo a quiet but qualitative change.

It's not that AI has become smarter; it's that it finally isn't amnesiac anymore.

References:

https://x.com/DhravyaShah/status/2035517012647272689?s=20