One Year into RAG: My Biggest Regret Was Adopting Knowledge Graphs

Hi, I'm PaperAGI, focusing on cutting-edge AI technologies like LLMs, RAG, and Agents. I share the latest industry achievements and practical case studies daily.

Retrieval-Augmented Generation (RAG) based on Knowledge Graphs (KG) faces a core challenge in question-answering tasks: triple indexing loses contextual semantics. When text is compressed into (head entity-relation-tail entity) triples, much of the implicit background information is stripped away, significantly degrading performance in Multi-hop QA. Multi-hop question answering requires combining answers from multiple entities, facts, or relations, demanding high levels of contextual understanding. The issue of semantic loss in traditional methods is particularly pronounced in such tasks.

MDER-DR: A Two-Stage Framework

The authors propose a domain-agnostic KG-QA framework covering both indexing and retrieval-reasoning stages, driven by two core components working in synergy:

MDER: Intelligent Indexing Strategy

While traditional methods directly store raw triples, MDER employs a four-step strategy to generate context-aware entity summaries:

Map: Identify entities and relations within the text.
Disambiguate: Resolve ambiguities in entity references.
Enrich: Generate natural language descriptions of triples based on context.
Reduce: Fuse entity-level summaries while retaining key semantics.

Key Advantage: This approach avoids explicit graph edge traversal during the retrieval phase, significantly boosting retrieval efficiency.

DR: Iterative Retrieval Mechanism

For user queries, DR adopts an iterative reasoning strategy of decomposition and resolution:

Decomposition: Break down complex queries into multiple analyzable triple-based sub-questions.
Resolution: Anchor these triples in the knowledge graph and narrow down the answer scope through iterative reasoning.
LLM-Driven: The entire process is driven by Large Language Models, offering robustness in handling sparse, incomplete, and complex relational data.

Experimental Results and Highlights

On standard benchmarks and domain-specific datasets, MDER-DR achieves a performance improvement of up to 66% compared to traditional RAG baselines, while maintaining cross-lingual robustness. This indicates that the method is not only suitable for specific domains but also possesses strong generalization capabilities.

Summary of Advantages

Semantic Preservation: Retains contextual details through entity-level summaries, solving the semantic loss problem of triple indexing.
Efficiency Optimization: Avoids explicit graph traversal, making the retrieval stage more efficient.
Strong Robustness: Demonstrates good adaptability to the sparsity and incompleteness of knowledge graphs.
Domain Agnostic: The framework design is universal and can be quickly adapted to different fields.
End-to-End LLM Driven: Fully leverages the reasoning capabilities of large language models without the need for cumbersome rule engineering.

Conclusion

MDER-DR offers a new perspective for knowledge graph question answering: instead of traversing complex graph structures during retrieval, it is better to generate semantically rich entity summaries during the indexing phase. This "heavy indexing, light retrieval" design philosophy, combined with an iterative query decomposition strategy, effectively addresses the semantic gap problem in multi-hop question answering.

Multi-Hop Question Answering with Entity-Centric Summaries
https://arxiv.org/pdf/2603.11223

One large model paper every day to exercise our minds~ Since you've read this far, don't forget to like and follow!