Edited by +0, Du Wei
Just a month later, Alibaba is back with its most powerful flagship model!
Early yesterday morning, Alibaba gave global developers a huge surprise by quietly launching the Qwen3.7 Preview, including two versions: Qwen3.7 Max Preview and Qwen3.7 Plus Preview.
The third-party authoritative evaluation agency Arena released the benchmark scores for these two models. Qwen3.7 Max Preview and Qwen3.7 Plus Preview ranked first among domestic models in the text and vision fields, respectively.
Left: Qwen3.7 Max Preview, Right: Qwen3.7 Plus Preview
The performance of the Qwen3.7 preview versions is already so impressive that expectations for the official version are sky-high.
This morning, at the 2026 Alibaba Cloud Summit, Alibaba's next-generation Qianwen flagship model, Qwen3.7-Max, made its debut!
Alibaba Group Tongyi Large Model Division Head 'Zhou Jingren'
It did not disappoint. Qwen3.7-Max came out swinging, delivering a knockout blow.
In the latest global large model blind test leaderboard released by Arena, Qwen3.7-Max's total score ranked first among domestic models: it outperformed a host of domestic large models, including Kimi-K2.6, DeepSeek-v4 Pro, and GLM-5.1, with performance nipping at the heels of the world's strongest models like GPT, Claude, and Gemini.
Beyond its dazzling overall ranking, Qwen3.7-Max has been systematically optimized specifically for Agents, an area many companies are betting on and deploying. Through continuous breakthroughs in Agentic, reasoning, and general capabilities, it further raises the performance ceiling of its role as a next-generation general-purpose agent foundation.
In terms of coding agents, Qwen3.7-Max achieved SOTA performance in several authoritative evaluations like SWE-Pro and SWE-Multilingual, securing a top score of 69.7 points on Terminal Bench 2.0-Terminus, surpassing models like DeepSeek-v4-pro-Max and Claude-Opus 4.6.
In terms of general-purpose agents, Qwen3.7-Max showed significant improvement, performing excellently in real-world capability evaluations such as MCP-Atlas, MCP-Mark, and Skillbench. It surpassed GLM-5.1 and Kimi-K2.6, setting a new domestic record, and demonstrated powerful GPU kernel optimization abilities on Kernel Bench L3.
In reasoning, Qwen3.7-Max also performed brilliantly, surpassing Claude-Opus 4.6 and all domestic models in core reasoning evaluations like GPQA Diamond, HLE, HMMT 2026 Feb, and IMOAnswerBench.
In general capabilities and multilingual tasks, Qwen3.7-Max achieved a record-breaking score of 79.1 points on the instruction-following benchmark IFBench, maintaining a leading position in multilingual understanding and translation evaluations like WMT24++ and MAXIFE.
The comprehensive leap in agent base capabilities gives Qwen3.7-Max the confidence to tackle ultra-long-duration coding tasks in real operating environments. At the launch event, Alibaba demonstrated a feat of AI autonomous iteration engineering:
Qwen3.7-Max was placed on a new hardware platform (the T-Head integrated training and inference AI chip, Zhenwu M890). The workspace only contained a task description, an SGLang Triton reference implementation, and evaluation scripts, without any other prompts or intervention. As a result, the model coded continuously for 35 hours, autonomously completing the optimization of a production-grade attention kernel operator. Moreover, the model-optimized inference kernel achieved a 10x speedup compared to the official SGLang Triton reference implementation. Who wouldn't love an AI replacement this efficient?
The model 'incarnated' as a senior engineer, performing 432 kernel evaluations and 1,158 tool calls, handling everything from writing, compilation, and performance analysis to iterative improvements entirely on its own.
In the subsequent hands-on testing, Qwen3.7-Max's speed and accuracy in handling complex agent tasks were truly eye-opening.
Hands-On Test: From Zero-Code Development to Complex Tool Orchestration
If you have zero programming knowledge and want to create a small tool for your computer (like a minimalist desktop Pomodoro timer), you previously had to learn coding from scratch, configure environments, debug errors, and finally learn how to package the code into a double-clickable .exe application.
Now, however, driven by the powerful native agent reasoning capabilities of the Qwen3.7-Max model and paired with execution tools like Claude Code, you only need to give a simple command like 'Make me a desktop Pomodoro timer app,' and it will smooth everything out behind the scenes for you.
Faced with a vague requirement, Qwen3.7-Max demonstrated extremely strong product architecture capabilities. It didn't just start piling up code blindly; instead, it strategized before acting. After confirming the tech stack (Python + PyQt) and feature scope, the model quickly output a structured Markdown architecture plan and directed the tool to begin execution.
Missing tools? No problem, the large model orchestrates the system to install them automatically; you don't need to worry about a thing. During runtime, continuous red error messages appeared due to a mismatched computer path. When Claude Code captured the error information and fed it back, Qwen3.7-Max instantly pinpointed the root cause and demonstrated an amazing self-correction ability. It quickly inferred several alternative command plans for dynamic trial and error. Within a few seconds, it eliminated the interference of the system environment and steadily popped the sleek Pomodoro timer onto your desktop.
If something doesn't look right—say, you want a Morandi color palette—just one sentence will do. The model precisely understands the aesthetic need and modifies the code, resolving it in minutes.
If you want to send this handy Pomodoro timer to a friend, just issue the ultimate delivery command: 'Package it into an exe for me.' Qwen3.7-Max once again unleashes its agent instinct. After identifying the missing environment, it generates the corresponding instructions, enabling the tool to automatically install the packaging dependencies. Following a smooth orchestration sequence behind the scenes, the originally complex and obscure code is transformed into a clean .exe file, sitting quietly in your folder, ready to run on a double-click.
If you think having AI write a desktop app is just basic operation within a large model's coding 'comfort zone,' you should see how it demonstrates true agent capability in real internet environments, CLIs, and skill invocations.
We first used a popular CLI tool. I instructed the agent to download opencli for me, giving it the ability to directly access and retrieve information from the entire internet.
After issuing the command 'Use the opencli tool to search for must-eat Cantonese cuisine in Beijing on Xiaohongshu, with pictures attached,' Qwen3.7-Max quickly 'read' the tool's documentation from scratch and autonomously figured out the correct invocation syntax. During the crawling process, the program encountered a network timeout crash. It autonomously inferred a workaround solution that involved modifying the underlying configuration to extend the wait time.
You don't need to understand how many bugs it fixed in the background. Just a few minutes later, it had stably downloaded a screen full of food images into your local folder. After gathering the materials, you can naturally ask the agent to quickly convert the research results into a PPT and an online document, completing the workflow loop.
Another core capability of the agent lies in flexibly invoking skills for specific scenarios.
Faced with a travelogue full of formulaic phrases like 'Firstly, secondly, lastly' and 'a hymn transcending time and space,' a simple 'Remove AI flavor' command is all it takes. Qwen3.7-Max accurately identifies the core requirement for text revision and proactively orchestrates a Skill within the system.
After completing the rewrite, the model outputs a structured Markdown review table. It clearly lists which 'filler phrases' and 'promotional language' it removed, and provides a quantitative score for the revision based on dimensions like 'directness' and 'authenticity.'
From zero-basis desktop software development, to autonomous exploration of unfamiliar terminal tools, to sophisticated text skill orchestration and reflection, Qwen3.7-Max demonstrates not just pure text generation ability across these three scenarios, but also a highly mature and independent agent execution capability.
Three Updates in Three Months: Alibaba Hits the AI 'Accelerator'
This string of eye-catching benchmark results and practical real-world effects is a microcosm of the recent rapid progress of the Qianwen large model.
The iteration cycle of the Qianwen flagship model has been compressed to 'monthly updates': On March 20th, Qwen3.5-Max-Preview was released; on April 20th, Qwen3.6-Max-Preview was released. Today, Qwen3.7-Max has arrived. For users, this is a 'happy dilemma.'
Image source: @LotusDecoder
Being able to guarantee a new flagship model generation each month relies not just on the model team 'burning the midnight oil.' Since Alibaba formed the ATH (Alibaba Token Hub) organization in March this year, the gradually formed full-stack capabilities spanning chips, cloud, models, and applications have greatly contributed to the current situation.
Among these, T-Head's custom chips provide ultimate training and inference efficiency, Alibaba Cloud's elastic computing power enables seamless large-scale pre-training and deployment, and rapid iteration at the model layer can directly feed back into upper-layer applications. This vertical integration compresses communication costs and engineering loss, allowing the Qianwen R&D cadence to roll forward as rapidly as an internet product.
It can be said that the acceleration of the Qianwen flagship model stems from the explosion of Alibaba's full-stack AI system. This systemic barrier advantage is much harder to replicate than a single model topping a benchmark once.
While pursuing a high-frequency iteration path, Qianwen has not abandoned its deep cultivation of the open-source community. It is no exaggeration to say that Qianwen has become a benchmark for domestic and even global open-source models, attracting enormous attention with almost every new release.
The open-source Qwen3.6-27B and Qwen3.6-35B-A3B released last month have become representative works of 'outperforming larger models with smaller ones,' topping the HuggingFace global open-source leaderboard. They comprehensively surpassed the previous generation, larger-scale Qwen3.5-397B-A17B on major programming benchmarks and significantly outperformed dense models of the same size.
These small-to-medium scale Qianwen models, with their extremely low deployment costs, offer performance that surpasses peers of the same size or even challenges larger models, better meeting the rigid demands for local deployment and customization. This has led global developers to unconsciously adopt them as the default base. As one user put it, 'Alibaba is racing forward with Qianwen. The open-source track is insanely competitive, but in the end, it's a victory for everyone.'
The word-of-mouth reputation within the open-source community has formed a powerful gravitational field, causing developers to 'vote with their feet' and willingly pay for Qianwen model API calls.
Last month, Qwen3.6-Plus captured the double crown of both the daily and weekly leaderboards on OpenRouter, the globally renowned large model API calling platform. It set a new global record for single-day single-model call volume, exceeding 1.4 trillion tokens. The standing of the Qianwen model in the minds of global developers is evident.
While capturing the mindshare of global developers, Qianwen has also quietly seized the traffic gateway of the global token economy. Now, tokens are rapidly becoming the universal input for solving problems, and Alibaba, through Qianwen, has firmly grasped this developmental node.
Laying the Foundation for Agentic Software
Monthly flagship updates, seemingly a 'flex,' are actually about seizing the initiative for the Agent era.
It's easy to see that the Qianwen models released in the past six months all point to the same theme — Agents. Qwen3.5 created native multimodal agents, Qwen3.6-Plus stepped towards real-world agents, and Qwen3.7-Max blazes a new frontier for agents. Each new release comes with enhanced capabilities in autonomous planning, tool use, and long-duration task execution.
This time, great hopes are pinned on Qwen3.7-Max. Alibaba wants to build it into a next-generation all-around agent foundation. Therefore, it is not satisfied with it merely acting as a brain to be called upon; it also hopes to sink down to the hardware layer for system-level programming and optimization. Qwen3.7-Max's successful 35-hour ultra-long-duration agent task run on T-Head's new AI chip is a powerful testament to this shift.
Furthermore, Qwen3.7-Max has also exhibited emergent generalization ability across agent frameworks. Without any special training, it smoothly supports frameworks like Claude Code, OpenClaw, and Hermes Agent. This strongly resembles the rise of operating systems in the past—like Windows in the PC era and Android in the mobile era—which, through unified standards and interfaces, enabled developers to build a rich ecosystem upon them.
Qianwen is striving to build the 'standard interfaces' of the Agent era, positioning itself as the preferred foundation for different agent frameworks. This future-oriented layout is highly strategic.
Ultimately, the competition in the Agent era hinges on whether the model's capabilities can stand firm. Alibaba is keenly aware of this and has consistently practiced it, boosting its influence in the global developer community through open source. As more and more developers get used to building agents and running tasks on Qianwen, Alibaba will gain a stronger voice in the construction of the next-generation Agentic software ecosystem.
Currently, Anthropic and OpenAI are winning over users and enterprises through a 'product-driven' route (Claude Code, Codex) and seeking trillion-dollar valuations in the commercial market. Alibaba, as one of the major representatives of domestic large model manufacturers, has chosen a more difficult and grander path: 'wanting it all'—from technology to ecosystem to discourse power.
In this crucial battle for positioning, Alibaba Qianwen's ambition is far greater than we imagine; it aims to become the most indispensable underlying infrastructure for developers building agent systems.
© THE END
Reprint requests should be directed to this official account for authorization.
Submissions or inquiries: liyazhou@jiqizhixin.com