Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Qwen · Qwen2.5-Math-1.5B

Feature-frozen. The creator has frozen feature development on this model (critical fixes only).

DeepSeek-R1-Distill-Qwen-1.5B

distillation derivative of Qwen2.5-Math-1.5B by DeepSeek

Fine-tuned (distilled) from Qwen2.5-Math-1.5B on 800K reasoning samples generated by DeepSeek-R1, transferring R1's step-by-step chain-of-thought reasoning into a smaller dense model.

Size

small (1.5B params)

Context

131,072 tokens

Released

2025-01-19

Openness

open-weight

License

MIT License (over Apache 2.0 base) · commercial: yes

Cost tier

mixed

Rating

3.5 ★ — Remarkable reasoning for a 1.5B model and trivially easy to run anywhere, under a clean MIT-over-Apache license — held to 3.5 because its small size naturally caps general capability.

Modalities

text

Capabilities

coding, math, reasoning

Access

local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-ollama, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
small
reasoning
math
on-device
distillation
china-based
apache-2-0

Quick Take

The smallest R1 distill: chain-of-thought reasoning compressed into a 1.5B model that runs on practically anything, under a clean MIT-over-Apache license.

Plain-English Description

This is the smallest of DeepSeek's six R1 distillations — the full R1 model's reasoning taught to a 1.5-billion-parameter version of Qwen2.5-Math. Distilling is like a brilliant professor tutoring a much smaller student until the student picks up the professor's way of working through problems; here the student is tiny enough to run on a phone.

Despite its size, it shows genuine step-by-step reasoning and does surprisingly well on focused math tasks, outperforming much larger conventional models on specific reasoning benchmarks. It's a reasoning specialist, not a general chatbot — best on problems with a clear chain of steps.

The appeal is reach: you can embed real reasoning capability in edge devices, browser apps, or anywhere a full-size model would never fit, entirely offline.

Best For

On-device and edge reasoning where footprint is the hard constraint.
Embedding a small reasoning helper into apps or pipelines at near-zero marginal cost.
Math-focused tasks where the distilled reasoning shines despite the small size.
Experimentation and fine-tuning on minimal hardware.

Not For

General chat, writing, or broad knowledge — it's a small reasoning specialist.
Hard problems that need depth — step up to DeepSeek-R1-Distill-Qwen-7B or larger.
Multimodal tasks — text only.
Anyone expecting frontier quality from a 1.5B model.

License — Plain-English Summary

This distill is unusually clean on licensing. DeepSeek released its fine-tuned weights under the permissive MIT license, and the base it was built on — Qwen2.5-Math-1.5B — is Apache 2.0. Both layers allow commercial use, modification, fine-tuning, and redistribution with no royalties and no user-count carve-outs; you just keep the respective notices. That's a meaningful contrast with the Llama-based R1 distills, which inherit Meta's community license and its 700M-monthly-user clause. If clean commercial terms matter, the Qwen-based distills like this one are the easier choice.

How It Compares

Against DeepSeek-R1-Distill-Qwen-7B, the 1.5B is far more portable but clearly weaker — the 7B is the better pick once you have a bit more hardware. Against the same-tier DeepSeek-R1-Distill-Llama-8B, this Qwen-based distill carries a cleaner license (MIT-over-Apache vs Llama's community terms). Against its parent DeepSeek-R1, it's the extreme-accessibility end of the family — a sliver of the capability, runnable anywhere.

Cost

Self-hosted cost: $0.00 beyond compute
Notes: Free to self-host under Apache 2.0; also served by third-party hosts.

Comparable models

Commercial-use conditions

DeepSeek released the distilled weights under MIT; the base model (Qwen2.5) is Apache 2.0. Both layers are permissive and allow commercial use, so there are no carve-outs to worry about here.