Qwen3.5-Flash Arrives! Three Medium-Scale Models Open-Sourced

Today, we are officially open-sourcing the latest medium-scale models of Qwen3.5: Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B.

The performance of Qwen3.5-35B-A3B has surpassed the previous larger-scale models Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B, while the Qwen3.5-122B-A10B and 27B versions further narrow the gap between medium-scale models and frontier models, excelling particularly in complex agent scenarios. This indicates that performance surpasses scale, no longer solely relying on parameter stacking, but driven by architectural optimization, data quality improvement, and reinforcement learning for intelligence development.

🚀 Architectural Evolution, Performance Breakthrough:

Qwen3.5 adopts a hybrid attention mechanism, combining high-sparsity MoE architecture innovations, and is trained on a larger scale of text and visual mixed tokens. With smaller total parameters and activated parameters, Qwen3.5-122B-A10B and Qwen3.5-35B-A3B achieve greater performance improvements.

On multiple authoritative benchmarks such as instruction following (IFBench), PhD-level reasoning (GPQA), mathematical reasoning (HMMT 25), multilingual knowledge (MMMLU), agent tool calling (BFCL v4), and agentic coding (SWE-bench Verified), the new models have surpassed the much larger Qwen3-235B-A22B model and Qwen3-VL, as well as GPT-5 mini and gpt-oss-120b.

🛠️ Developer-Friendly, Suitable for Local Deployment:

The first dense model of Qwen3.5, Qwen3.5-27B, makes a stunning debut with stronger agent capabilities and native multimodal abilities.

In various agent evaluations such as tool calling, search, and programming, Qwen3.5-27B has exceeded GPT-5 mini. In multiple visual understanding benchmarks like visual reasoning, text recognition and understanding, and video reasoning, it has outperformed the flagship Qwen3-VL model and Claude Sonnet 4.5. Qwen3.5-27B can run on a single GPU, making it extremely friendly for local deployment.

🔧 Qwen3.5-Flash (Production Version of Qwen3.5-35B-A3B) API Service:

✨ Qwen3.5-Flash is now available on Alibaba Cloud Bailian, with a price as low as 0.2 yuan per million tokens, strong performance, high speed, suitable for developers and enterprises' scalable, production-grade model needs.

✨ It defaults to supporting 1M ultra-long context length, meeting the needs of long documents and complex task processing.

✨ Official built-in tool support reduces integration costs and accelerates application rollout.

Currently, all three models are open-sourced on ModelScope and Hugging Face. Additionally, we have also open-sourced the Qwen3.5-35B-A3B-Base base model. Developers can experience the new models for free on Qwen Chat or obtain the Qwen3.5-Flash model API service through Alibaba Cloud Bailian.

Qwen3.5-Flash Arrives! Three Medium-Scale Models Open-Sourced

Related Articles

分享網址