Category: Multimodal Models

Remove the Vision Encoder, and Multimodal Models Actually Get Stronger?
Frozen Weights Are the Enemy of AI Progress! DeepMind's Top Researcher: The Key to AI Self-Improvement Lies in Evaluation, Drawing from Formal Verification! Expert Models Are Stepping Stones to Generalized AGI!
Xiaohongshu's "Everything is OCR": A 3B Small Model Outperforms Giants, Parsing Charts into Code
Reconstructing Native Multimodality! Meituan Releases Purely Discrete Base Model, Truly Achieving 'Everything is Token'
Are Top Multimodal Models Crushed by Humans in Real Web Search? GPT-5.2 Achieves Only 36% Win Rate; Peking University, Huawei, and Others Jointly Open-Source New Deep Search Benchmark BrowseComp-V3
Alibaba Just Open-Sourced Qwen-Image: Free GPT-4o Ghibli-Style Model, Best in Chinese
35% Accuracy Evaporates! ByteDance & HUST's WildDoc Reveals Robustness Shortcomings in Multimodal Document Understanding
Interpretation of Seed1.5-VL Technical Report