Category: Model Evaluation
- How Much GDP Corresponds to the Tokens Burned by AI Models? AI's Economic Contribution Now Has a Number
- Meituan Quietly Launches New Model! Real-Test of First Open-Source "Heavy Thinking" Model: 8-Way Parallel, Agent Hard-Clashes with Claude
- Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow
- The More Reasoning, The More Hallucinations? The "Hallucination Paradox" of Multimodal Reasoning Models
- Apple's 'Illusion of Thinking' Paper Criticized Again, Claude and Human Co-authored Paper Points Out Its Three Key Flaws
- Google | Tracing RAG System Errors: Proposing a Selective Generation Framework to Boost RAG Accuracy by 10%