Category: AI Evaluation
- Is Agentic RAG Worth It? A Four-Dimensional Real-World Test Reveals the Answer!
- From 'LLM-as-a-Judge' to 'Agent-as-a-Judge': A Review of the Three-Stage Evolution of AI Evaluation Paradigms
- 0% Pass Rate! The Code Myth Debunked! LiveCodeBench Pro Released!
- Comprehensive Evaluation of 12 Latest GraphRAG Techniques
- ICML 2025 | Bursting the AI Bubble with 'Human Testing Methods': Building a Capability-Oriented Adaptive Assessment New Paradigm
- Can LLMs Understand Math? Latest Research Reveals Fatal Flaws in Large Models' Mathematical Reasoning
- AI's Second Half: From Algorithms to Utility