
Alibaba’s Qwen AI team has introduced a new Qwen3.5 Medium model series, adding fresh competition to the large language model landscape with a focus on open source access and efficient deployment on local hardware.
The release, made public a little over a day ago, includes four models with support for agentic tool calling. Three of these are open source and available for commercial use under the Apache 2.0 license, targeting both enterprises and independent developers.
Three open models, one cloud-only option
The Qwen3.5 Medium series consists of the following open source models:
- Qwen3.5-35B-A3B
- Qwen3.5-122B-A10B
- Qwen3.5-27B
These three can be used commercially and are distributed under the standard open source Apache 2.0 license. Developers can download them now from major model hubs, including Hugging Face and ModelScope.
A fourth model in the series, Qwen3.5-Flash, is not open source. It appears to be proprietary and is currently only accessible via the Alibaba Cloud Model Studio API. According to the source material, Qwen3.5-Flash is positioned as a cost-effective option compared with other models available from Western providers, with a pricing comparison indicating a “strong advantage in cost.” Specific pricing figures were not detailed in the provided information.
The key talking point around the new Qwen3.5 Medium models is performance. On third-party benchmark tests, these open source models are described as offering performance comparable to similarly sized proprietary LLMs from major U.S. startups such as OpenAI and Anthropic.
Based on the available description, the Qwen3.5 Medium models are reported to outperform OpenAI’s GPT-5-mini and Anthropic’s Claude Sonnet 4.5 on those benchmarks. Claude Sonnet 4.5 was released roughly five months ago, underscoring how quickly Alibaba’s Qwen team is trying to close the gap with recent Western offerings. The specific benchmark suites or numeric scores behind these comparisons are not provided in the supplied text, only the characterization that Qwen3.5 Medium achieves “comparably high performance” and beats those particular models on cited tests.
Another technical focus for Alibaba’s team is how the models behave when “quantized.” Quantization is a common technique for shrinking AI models so they can run more efficiently, including on less powerful or local machines. It works by reducing the precision of the numerical values that represent the model’s parameters, compressing the model size and typically speeding up inference at the cost of some accuracy.
The Qwen team states that it has engineered these Medium models to stay “highly accurate” even after quantization. In practical terms, that implies developers could adopt reduced-precision variants to run on more modest hardware such as local servers or high-end workstations while retaining much of the original capability. The description suggests that maintaining accuracy under quantization is a deliberate design goal for Qwen3.5 Medium, but detailed metrics or configuration presets are not included in the provided material.
Taken together, the open Apache 2.0 licensing, availability on popular model repositories, benchmark positioning against GPT-5-mini and Claude Sonnet 4.5, and an emphasis on quantization-friendly design indicate that Alibaba is pushing Qwen3.5 Medium as a flexible option for developers who want strong performance without being locked into a single proprietary cloud stack.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.







