← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Qwen

Qwen3.5-397B-A17B

Model family: qwen3-5

Size
frontier (397.0B params)
Context
262,144 tokens
Released
2026-02-15
Openness
open-weight
License
Apache License 2.0 · commercial: yes
Cost tier
mixed
Rating
4.5 — The most capable open-weight model you can legally download and self-host with no strings — frontier-adjacent, natively multimodal, 201 languages, clean Apache 2.0. Held off 5 only by the serious hardware needed to run it yourself.
Modalities
image-input, text, video-input
Capabilities
chat, coding, function-calling, instruction-following, long-context, multilingual, reasoning, tool-use, vision
Access
api-first-party, api-third-party, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

The most capable model you can legally download and self-host with no strings — a 397B multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. Apache-2.0 flagship that rivals the frontier and speaks 201 languages.

Plain-English Description

Qwen3.5-397B-A17B, released in February 2026, is Qwen's largest open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. model and arguably the strongest model in the world that you can simply download and run yourself. It's a mixture-of-experts design: 397 billion total parameters, but only about 17 billion fire on any given tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words., so it keeps a large model's breadth of knowledge while running far more efficiently than a dense modelA model where every parameter is used for every input — the entire model runs on every token. Contrast with sparse or Mixture of Experts models, which activate only a fraction of the model per input. Dense models are simpler and more predictable; MoE models are more efficient at scale. its size.

Two qualities make it stand out. It's natively multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. — built from pretrainingThe first and most expensive phase of training a model, where it learns general language and knowledge from enormous datasets — typically trillions of tokens of text scraped from the internet, books, code, and other sources. Pretraining produces a base model. Major labs spend millions to hundreds of millions of dollars on a single pretraining run. to handle text, images (up to 1344 pixels), and video clips up to a minute long — and it's extraordinarily multilingual, covering 201 languages, the widest of any frontier model. On benchmarks it sits just below the absolute top tier: roughly 83.6 on LiveCodeBench v6, 91.3 on the AIME 2026 math exam, and 88.4 on GPQA Diamond, with strong "visual agent" scores for controlling desktop and mobile apps from screenshots. It's not quite the best at any single thing, but it's close to the frontier across the board — and it's open.

The cost story is the punchline. Under Apache 2.0 you can self-host it for nothing beyond compute, and the hosted version runs roughly 10–17x cheaper than equivalent Western models. The catch is hardware: at this size, self-hosting means an 8-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. server, so in practice many teams use the hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. and reserve self-hosting for the smaller open Qwen models.

Best For

  • Teams that need near-frontier capability and the ability to self-host — this is the open model that gets closest to the closed flagships.
  • Multilingual products: 201-language coverage is unmatched.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. workloads (image and short-video understanding, screenshot-driven agents) in a single open model.
  • Cost-sensitive production at scale, where the hosted pricing or self-hosting economics beat Western APIs by a wide margin.
  • Fine-tuning projects that want a powerful, unrestricted Apache 2.0 base.

Not For

  • Small teams hoping to self-host on modest hardware — at ~794GB full precision, this needs a cluster. Look at Qwen3.6-27B or the smaller Qwen3 dense models instead.
  • Squeezing out the single best agentic-coding score — the closed Qwen3.7-Max edges it there.
  • Regulated workloads that can't use the hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. and lack the hardware to self-host the open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself..
  • Buyers who want a Western-jurisdiction vendor with enterprise support out of the box.

License — Plain-English Summary

Qwen3.5 is Apache 2.0 — the permissive gold standard. You can use it commercially, modify it, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. it, redistribute it, and build closed products on top, with no royalties and no user-count carve-out like Llama's. The only obligations are the usual Apache ones: keep the copyright and license notices, and note significant changes if you redistribute. This is genuinely one of the cleanest licenses in the catalog. The only thing the license doesn't address is data routing: that's a question for how you run it (self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models. = your infrastructure; hosted DashScope = Alibaba Cloud under Chinese jurisdiction), not a licensing matter.

How It Compares

Against the closed Qwen3.7-Max, the 397B is a bit behind on peak agentic performance but open and self-hostable — the right pick whenever ownership or data control matters. Against DeepSeek's open V4 family, the two are the leading open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. options: DeepSeek often edges ahead on raw frontier coding/reasoning, while Qwen3.5 counters with native multimodality and far broader language coverage, both under similarly clean licenses (Apache here, MIT there). Against Meta's Llama models, Qwen3.5 is more capable and more permissively licensed for the largest users, with the China-jurisdiction consideration as the trade-off.

Cost

Self-hosted cost
$0.00 beyond compute
API providers
alibaba-dashscope, openrouter, together
Notes
Free to self-host under Apache 2.0 (beyond your own compute). The hosted Qwen3.5-Plus variant on DashScope runs roughly $0.26-0.40 per million input and ~$1.56 per million output tokens depending on tier and provider — about 10-17x cheaper than comparable Western models for similar quality.

Hardware requirements

Min VRAM
160 GB
Recommended VRAM
640 GB
Runs on laptop
No
Notes
Full BF16 weights are ~794GB (about 8x H100). Quantized versions fit on 2-3 H100-class GPUs. For most teams the hosted API is more practical than self-hosting this particular model; the smaller open Qwen models are the self-host-friendly options.

Comparable models

Sources