Open-Source Framework Enables Code AI to Learn from GitHub! Bug Fix Rate Soars to 69.8%, Performance Sets New Records

The MemGovern team contributed this article.

When human programmers encounter tricky bugs, they typically search online for the experiences of their predecessors.

Although current AI has begun to possess online search capabilities, it still cannot effectively acquire bug-fixing abilities from online experiences.

Allowing AI to learn from the workflow of human programmers may help improve its bug-fixing capabilities. The project team named MemGovern made attempts based on this idea and recently achieved excellent results.

In the field of automated software engineering (SWE), large language model-driven code agents (Code Agents) have brought about paradigm shifts in programming, but they currently face a common "closed-world" cognitive limitation: existing agents often attempt to fix bugs from scratch or rely only on local context within the repository, ignoring the vast historical human experience accumulated on platforms like GitHub.

In fact, when human engineers solve complex problems, they often search open-source communities and draw on historical solutions to similar problems.

However, directly allowing agents to utilize these "open-world" experiences is highly challenging because real Issue and Pull Request (PR) data are filled with unstructured social noise, ambiguous descriptions, and fragmented information.

To break through this barrier, the cutting-edge open-source academic community QuantaAlpha, in collaboration with teams from the University of Chinese Academy of Sciences (UCAS), National University of Singapore (NUS), Peking University (PKU), and East China Normal University (ECNU), proposed the MemGovern framework.

This framework does not adopt a simple retrieval-augmented generation (RAG) path. Instead, it proposes a complete "experience refinement" mechanism that transforms chaotic GitHub data into structured memory friendly to agents, and combines the idea of Deep Research to propose an "Experiential Memory Search" strategy, achieving a closed loop of extracting reusable repair logic from historical experiences.

Core Pain Point: Massive Data ≠ Usable Knowledge

Existing Code Agents (like SWE-Agent) often fall into a "helpless" situation when dealing with complex bugs because they lack historical memory. Although GitHub is a huge treasure trove, directly feeding Issues and PRs to AI does not work well, for the following reasons:

1. High Noise: Original discussions are filled with irrelevant social phrases like "thanks" and "merge requests."

2. Unstructured: Logs, error messages, and repair logic from different projects are mixed together, lacking a unified format.

3. Difficult to Retrieve: Simple semantic matching is easily misled by surface keywords and cannot reach deep repair logic.

The emergence of MemGovern is precisely to transform these "raw data" into "experience cards" that AI can truly use.

Experience Refinement Mechanism

MemGovern does not directly throw raw GitHub Issues and PRs to the agent. Instead, it builds a hierarchical screening and content purification pipeline.

Hierarchical Selection: First, by comprehensively considering the number of Stars and maintenance activity (Issue/PR frequency), high-quality repository sources are screened out. Subsequently, strict cleaning is performed at the instance level, retaining only "closed-loop" repair records that contain a complete evidence chain (problem-code-verification).
Standardized Experience Card: This is MemGovern's original design. The original records are reconstructed into standardized experience cards. Each card is explicitly decoupled into two layers:

Index Layer: Contains standardized problem summaries and key diagnostic signals (e.g., exception types, error signatures) for efficient symptom-based retrieval.
Resolution Layer: Encapsulates root cause analysis, fix strategy, patch digest, and verification methods.

This structural design effectively solves the problem of confusion between retrieval signals and reasoning logic, significantly improving the usability of knowledge. Currently, the team has successfully built a knowledge base containing 135,000 high-fidelity experience cards.

Agentic Experience Search: "Search-Browse" Documents Like Humans

Traditional RAG (Retrieval-Augmented Generation) often feeds retrieval results to the model in one go, which can easily lead to overly long contexts filled with noise. MemGovern adopts a more intuitive Search-then-Browse mode:

Searching
The agent first performs a broad search in the Index Layer based on the current bug's symptoms (e.g., error stack) to quickly locate potentially relevant candidate cases.
Browsing
The agent autonomously selects the most promising case and views its detailed "Solution Layer." This mechanism allows the agent to deeply understand the repair logic and exclude irrelevant interference.
Transfer and Application
The agent maps the abstract repair strategies from historical cases (e.g., "add boundary checks") to the current codebase, achieving knowledge transfer.

Experimental Evaluation: Comprehensive Surpassing of Mainstream Baselines

The research team conducted a detailed evaluation on SWE-bench Verified. The results show that MemGovern achieved significant improvements across all tested models.

Main Experimental Results (Pass@1 Fix Rate):

Claude-4-Sonnet+MemGovern
The fix rate reached 69.8%, a 3.2% improvement over the baseline SWE-Agent.
GPT-4o+MemGovern
The fix rate surged from 23.2% to 32.6%, achieving a significant improvement of 9.4%.
DeepSeek-V3+MemGovern
The fix rate increased to 65.8%.

The experimental data clearly shows that MemGovern's improvement is robust and model-agnostic. For models with weaker foundational capabilities, the external experience provided by MemGovern can lead to more significant performance leaps.

Ablation Study Validation:

Impact of Memory Scale
As the number of experience cards increased from 10% to 100%, the agent's fix rate showed a monotonic increasing trend, proving the effectiveness of large-scale experience memory.
Importance of Refinement
Compared to directly using raw Issue/PR data (Raw Experience), the "refined" experience cards brought more stable and higher performance improvements, proving the necessity of structured governance.

Case Study: How Experience Changes Outcomes?

In a real bug in the Django framework (order by causing a crash), we can clearly see the value of MemGovern.

Traditional Agent (No Experience):

The agent without experience can only see the error surface.

It adopts a "defensive programming" strategy, crudely adding a type check to bypass the error. However, this actually violates the function's API specification—it returns the wrong original object instead of the expected processed result.

This "covering one's ears and stealing the bell" type of repair temporarily eliminates the runtime error but causes downstream core functions to fail due to data type mismatch, ultimately still failing to pass the test cases.

MemGovern Agent:

The agent retrieves a similar historical experience.

The "Fix Strategy" in the experience card clearly states: "Do not just bypass the object; instead, perform explicit type checking and extract the field name."

Based on this guidance, the agent wrote perfect repair code that fixed the crash while preserving the original functionality.

Experience Reshaping

The proposal of MemGovern is not only a breakthrough in performance metrics but, more importantly, it points out a clear and feasible path for how AI agents can effectively utilize massive amounts of unstructured human debugging experience.

It proves that after processing the chaotic original Issues and PRs on GitHub, they can be regarded as retrievable, verifiable, and transferable "experience memories" rather than "interference data" full of noise. This is a powerful paradigm for breaking the limitations of the agent's closed world and solving complex real-world bugs.

In the future, the experience reshaping paradigm pioneered by MemGovern has potential far beyond the code domain.

This method of transforming unstructured human professional experience into machine-readable memory has strong universality and promotion value. It provides a standardized template for vertical fields that also rely heavily on historical cases and expert experience, such as legal consultation and medical diagnosis.

It is hoped that MemGovern's concept can go beyond the code repository to complete more complex intellectual tasks that "learn from history," laying the foundation for building cross-domain, general-purpose agent memory infrastructure.

Paper Title:

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Paper Link:

https://arxiv.org/abs/2601.06789

Open Source Code:

https://github.com/QuantaAlpha/MemGovern

About QuantaAlpha

QuantaAlpha was established in April 2025, composed of professors, postdocs, PhDs, and master's students from prestigious universities such as Tsinghua, Peking University, Chinese Academy of Sciences, CMU, and HKUST. Our mission is to explore the "quantum" of intelligence and lead the "alpha" frontier of agent research—from Code Agent to self-evolving intelligence, and to financial and cross-domain specialized agents—committed to reshaping the boundaries of artificial intelligence.

In 2026, we will continue to produce high-quality research results in directions such as Code Agent (end-to-end autonomous execution of real-world tasks), DeepResearch, Agentic Reasoning/Agentic RL, self-evolution, and collaborative learning. Students interested in our directions are welcome to join us!

Team Homepage:

https://quantaalpha.github.io/

Give a thumbs-up, share, and like!

Welcome to leave your thoughts in the comments!

— End —

We are recruiting an academic editing intern who is quick-eyed and attentive, and concerned about AI 🎓

Interested friends are welcome to follow 👉 Learn More

🌟 Highlight the Star 🌟

See daily progress in technology frontiers.