Category: Vision-Language Models

"Removing One Layer" Makes the Model Better at Tasks? HIT(SZ) | Yang Shuo's Team Discovers Task-Interfering Layers in VLMs
Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
Interpretation of Seed1.5-VL Technical Report
Multimodal Large Models Collectively Fail, GPT-4o Only 50% Safety Pass Rate: SIUO Reveals Cross-Modal Safety Blind Spots