Category: Multimodal Models
- Remove the Vision Encoder, and Multimodal Models Actually Get Stronger?
- Frozen Weights Are the Enemy of AI Progress! DeepMind's Top Researcher: The Key to AI Self-Improvement Lies in Evaluation, Drawing from Formal Verification! Expert Models Are Stepping Stones to Generalized AGI!
- Xiaohongshu's "Everything is OCR": A 3B Small Model Outperforms Giants, Parsing Charts into Code
- Reconstructing Native Multimodality! Meituan Releases Purely Discrete Base Model, Truly Achieving 'Everything is Token'
- Are Top Multimodal Models Crushed by Humans in Real Web Search? GPT-5.2 Achieves Only 36% Win Rate; Peking University, Huawei, and Others Jointly Open-Source New Deep Search Benchmark BrowseComp-V3
- Alibaba Just Open-Sourced Qwen-Image: Free GPT-4o Ghibli-Style Model, Best in Chinese
- 35% Accuracy Evaporates! ByteDance & HUST's WildDoc Reveals Robustness Shortcomings in Multimodal Document Understanding
- Interpretation of Seed1.5-VL Technical Report