Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Qwen3-235B-A22B
Model family: qwen3
- llm
- open-weight
- commercial-friendly
- frontier
- long-context
- multilingual
- reasoning
- china-based
- apache-2-0
- mixture-of-experts
Quick Take
The open generalist that defined Qwen3: a 235B Apache-2.0 mixture-of-experts model that went toe-to-toe with the closed frontier and became one of the most-deployed open models anywhere.
Plain-English Description
Qwen3-235B-A22B, launched in April 2025, was the headline model of the Qwen3 generation and the one that cemented Qwen as the default open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. choice for serious work. It's a mixture-of-experts model — 235 billion parameters total, 22 billion active per tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. — and at launch it traded blows in benchmarks with the strongest models of its moment, including DeepSeek-R1, OpenAI's o1 and o3-mini, Grok-3, and Gemini 2.5 Pro. The mid-2025 "2507" refresh sharpened it further and extended its memory to 256,000 tokens.
What made it matter for businesses wasn't just the scores; it was that all of this shipped under Apache 2.0. A model competitive with the closed frontier, that you could download, self-host, fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch., and build a commercial product on with no strings, was a genuinely new option in 2025 — and the ecosystem responded. It became one of the most-deployed open models, with broad support across inferenceRunning a model to get outputs — as opposed to training it. When you send a prompt to ChatGPT, that's inference. Inference is much cheaper than training per operation but adds up quickly at scale. Pricing pages almost always refer to inference costs (per million tokens, per request, etc.), not training costs. frameworks, hosting providers, and tools.
It's worth being honest about where it sits now: the newer Qwen3.5-397B-A17B has succeeded it at the top of Qwen's open range, and the dense Qwen3.6-27B beats it on some coding tasks at a fraction of the size. Qwen3-235B remains a strong, proven, widely-supported model — but it's a generation behind the current lead.
Best For
- Teams that want a proven, heavily-supported open generalist with a large ecosystem of tooling and hosting options.
- Self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models. or third-party-hosted deployments where Apache 2.0 freedom matters and the model's maturity is a plus.
- Multilingual, reasoning, and coding workloads that don't specifically need the newest generation's gains.
- Anyone already running Qwen3-235B who wants to understand it before deciding whether to move to Qwen3.5.
Not For
- New deployments chasing maximum open capability — Qwen3.5-397B-A17B is the current top of the open range.
- Single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. self-hosting — at 235B this needs multiple high-end GPUs; use Qwen3.6-27B or the smaller dense models for that.
- MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — it's text-only; the 397B is the open multimodal option.
- Teams wanting the absolute strongest agentic-coding scores, where the newer models and the closed Max flagship lead.
License — Plain-English Summary
Apache 2.0 — the permissive standard, and a big part of why this model spread so widely. Commercial use, modification, fine-tuning, and redistribution are all allowed with no royalties and no user-count carve-out; just keep the copyright and license notices. For a model this capable, that combination — frontier-adjacent quality plus a no-strings license — is exactly what made it a default open choice. The only non-licensing consideration is the usual one: self-host to keep data in-house, or accept the hosting provider's data policy if you use a hosted endpoint.
How It Compares
Against its successor Qwen3.5-397B-A17B, the 235B is a generation behind and text-only, but more battle-tested and more widely supported across tools. Against the closed Qwen3.7-Max, it's open and self-hostable where Max is not. Against DeepSeek's and Meta's open flagships, Qwen3-235B holds up as a capable, permissively-licensed generalist — its main edge historically being Qwen's breadth and multilingual strength, its main caveat being the China-jurisdiction question for hosted use and that newer models have since moved the frontier.
Cost
- Self-hosted cost
- $0.00 beyond compute
- API input (per 1M tokens)
- $0.15
- API output (per 1M tokens)
- $0.60
- API providers
- alibaba-dashscope, deepinfra, openrouter, together
- Notes
- Free to self-host under Apache 2.0. Third-party hosting (e.g. DeepInfra) runs around $0.15 input / $0.60 output per million tokens; DashScope and other providers vary.
Hardware requirements
- Min VRAM
- 140 GB
- Recommended VRAM
- 470 GB
- Runs on laptop
- No
- Notes
- Full BF16 weights are ~470GB; quantized versions fit on a few high-end GPUs. A multi-GPU job to self-host — the smaller Qwen3 dense models are the single-GPU options.