Category: Benchmarks

GPT-5.5 Global First Breakthrough! Programming from Zero Without Source Code, Coding AI Enters a New Era
Static Benchmarks ‘Outdated’? OpenKG Continues to Update the LLM Knowledge-Enhanced Dynamic Evaluation Leaderboard Dynamic OneEval-202605
Matches Claude 3.7 at 1/8th the Cost: "European OpenAI" Mistral AI Releases New Multimodal Model
AI Self-Replication Risk: AISI Launches RepliBench Benchmark