Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Qwen · Qwen3-8B

Feature-frozen. The creator has frozen feature development on this model (critical fixes only).

DeepSeek-R1-0528-Qwen3-8B

distillation derivative of Qwen3-8B by DeepSeek

Distilled from the upgraded DeepSeek-R1-0528 by continuing post-training on the Qwen3-8B base using chain-of-thought generated by R1-0528 — transferring the newer R1's stronger reasoning into an 8B model. This is the second-wave R1 distill (May 2025), distinct from the January 2025 Qwen2.5- and Llama-based set.

Size

small (8.0B params)

Context

131,072 tokens

Released

2025-05-27

Openness

open-weight

License

MIT License (over Apache 2.0 base) · commercial: yes

Cost tier

mixed

Rating

4.0 ★ — Exceptional benchmark reasoning for an 8B model — matching far larger models on AIME — under a clean MIT-over-Apache license. Rated 4.0 because real-world reports are more mixed than the benchmarks (it tends toward very long reasoning traces), and it's a narrow reasoning specialist.

Modalities

text

Capabilities

chat, coding, math, reasoning

Access

local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-ollama, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
small
reasoning
math
on-device
distillation
china-based
apache-2-0

Quick Take

A second-wave R1 distill: the upgraded R1-0528's reasoning compressed into an 8B Qwen3 model that posts benchmark scores rivaling models many times its size, under a clean MIT-over-Apache license.

Plain-English Description

When DeepSeek upgraded R1 to R1-0528 in May 2025, it released just one distillation alongside it — this 8B model, built by teaching Qwen3-8B the chain-of-thought of the much larger upgraded R1. It's the successor to the January 2025 distill family, and notably DeepSeek dropped the Llama-based variants this time, distilling only onto Qwen3 (whose clean Apache 2.0 license is friendlier than Llama's terms).

On paper it's remarkable: it improves on standard Qwen3-8B by about 10 points on the AIME 2024 math exam and reportedly matches Qwen3's 235-billion-parameter "thinking" model on that benchmark — extraordinary for a model you can run on a laptop. That makes it one of the most capable small reasoning models on benchmarks.

The honest caveat is that real-world reception has been more mixed than the headline numbers. Like many reasoning distills it generates very long chains of thought, and some users find its general usefulness narrower than the benchmarks suggest. It's a reasoning specialist — strong on structured math and logic problems, less so as an all-purpose assistant.

Best For

A small, self-hostable reasoning model for math and logic where benchmark-grade chain-of-thought matters.
On-device or laptop reasoning at no per-token cost (runs in ~20GB, less when quantized).
Research and experimentation with distilled reasoning at small scale.
Fine-tuning a cleanly-licensed small reasoning model on your own data.

Not For

General-purpose chat or writing — it's a narrow reasoning specialist that tends to over-explain.
Workloads sensitive to long, verbose reasoning traces (and the tokens they consume).
The strongest open reasoning overall — the larger distills and the full DeepSeek-R1 go further.
Multimodal tasks — text only.

License — Plain-English Summary

Clean and permissive. DeepSeek released the distilled weights under MIT, and the Qwen3-8B base is Apache 2.0 — both layers allow commercial use, modification, fine-tuning, and redistribution with no royalties or user-count carve-outs; keep the respective notices. This is exactly the licensing advantage DeepSeek leaned into by distilling onto Qwen3 rather than Llama for this release: no community-license strings, just two permissive layers.

How It Compares

Against the January-wave DeepSeek-R1-Distill-Qwen-32B, this 8B model is far smaller and lighter, and on some math benchmarks punches surprisingly close — but the 32B is the more well-rounded performer. Against its base Qwen3-8B, it trades general versatility for a large jump in structured reasoning. Against the full DeepSeek-R1, it's the extreme-accessibility distill: a fraction of the size, runnable anywhere, carrying a slice of R1-0528's reasoning.

Cost

Self-hosted cost: $0.00 beyond compute
Notes: Free to self-host under Apache 2.0; widely available in GGUF/MLX quants for local runtimes. Also served by third-party hosts.

Comparable models

Commercial-use conditions

DeepSeek released the distilled weights under MIT; the base model (Qwen3-8B) is Apache 2.0. Both layers are permissive and allow commercial use, with no carve-outs — a contrast with the January 2025 Llama-based R1 distills.