← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Qwen · Qwen2.5-Math-1.5B

Feature-frozen. The creator has frozen feature development on this model (critical fixes only).

DeepSeek-R1-Distill-Qwen-1.5B

distillation derivative of Qwen2.5-Math-1.5B by DeepSeek

Fine-tuned (distilled) from Qwen2.5-Math-1.5B on 800K reasoning samples generated by DeepSeek-R1, transferring R1's step-by-step chain-of-thought reasoning into a smaller dense model.

Size
small (1.5B params)
Context
131,072 tokens
Released
2025-01-19
Openness
open-weight
License
Cost tier
mixed
Rating
3.5 — Remarkable reasoning for a 1.5B model and trivially easy to run anywhere, under a clean MIT-over-Apache license — held to 3.5 because its small size naturally caps general capability.
Modalities
text
Capabilities
coding, math, reasoning
Access
local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

The smallest R1 distill: chain-of-thought reasoning compressed into a 1.5B model that runs on practically anything, under a clean MIT-over-Apache license.

Plain-English Description

This is the smallest of DeepSeek's six R1 distillations — the full R1 model's reasoning taught to a 1.5-billion-parameter version of Qwen2.5-Math. Distilling is like a brilliant professor tutoring a much smaller student until the student picks up the professor's way of working through problems; here the student is tiny enough to run on a phone.

Despite its size, it shows genuine step-by-step reasoning and does surprisingly well on focused math tasks, outperforming much larger conventional models on specific reasoning benchmarks. It's a reasoning specialist, not a general chatbot — best on problems with a clear chain of steps.

The appeal is reach: you can embed real reasoning capability in edge devices, browser apps, or anywhere a full-size model would never fit, entirely offline.

Best For

  • On-deviceRunning a model directly on a consumer device — a laptop, a phone, a smart speaker — rather than in a data center. On-device inference keeps data private by never leaving the device, and works offline. Small models (under ~10B parameters, often quantized) can run on-device; larger models cannot yet. and edge reasoning where footprint is the hard constraint.
  • Embedding a small reasoning helper into apps or pipelines at near-zero marginal cost.
  • Math-focused tasks where the distilled reasoning shines despite the small size.
  • Experimentation and fine-tuning on minimal hardware.

Not For

  • General chat, writing, or broad knowledge — it's a small reasoning specialist.
  • Hard problems that need depth — step up to DeepSeek-R1-Distill-Qwen-7B or larger.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.
  • Anyone expecting frontier quality from a 1.5B model.

License — Plain-English Summary

This distill is unusually clean on licensing. DeepSeek released its fine-tuned weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. under the permissive MIT license, and the base it was built on — Qwen2.5-Math-1.5B — is Apache 2.0. Both layers allow commercial use, modification, fine-tuning, and redistribution with no royalties and no user-count carve-outs; you just keep the respective notices. That's a meaningful contrast with the Llama-based R1 distills, which inherit Meta's community license and its 700M-monthly-user clause. If clean commercial terms matter, the Qwen-based distills like this one are the easier choice.

How It Compares

Against DeepSeek-R1-Distill-Qwen-7B, the 1.5B is far more portable but clearly weaker — the 7B is the better pick once you have a bit more hardware. Against the same-tier DeepSeek-R1-Distill-Llama-8B, this Qwen-based distill carries a cleaner license (MIT-over-Apache vs Llama's community terms). Against its parent DeepSeek-R1, it's the extreme-accessibility end of the family — a sliver of the capability, runnable anywhere.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host under Apache 2.0; also served by third-party hosts.

Comparable models

Commercial-use conditions

DeepSeek released the distilled weights under MIT; the base model (Qwen2.5) is Apache 2.0. Both layers are permissive and allow commercial use, so there are no carve-outs to worry about here.

Sources