Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Catalog entry last reviewed 92 days ago.

Ministral 3 14B Instruct

Model family: ministral-3

Size

mid (14.0B params)

Context

262,144 tokens

Released

2025-12-01

Openness

open-weight

License

Apache 2.0 · commercial: yes

Cost tier

mixed

Rating

4.5 ★ — Best balance in the Mistral lineup of capability, hardware accessibility, and license clarity. Multimodal, long context, Apache 2.0, and genuinely runs on consumer hardware — this is the Mistral model most small businesses should start with.

Modalities

image-input, text

Capabilities

chat, function-calling, instruction-following, long-context, multilingual, tool-use, vision

Access

api-first-party, api-third-party, local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-ollama, local-runtime-vllm, weights-download-direct, weights-download-hf

llm
open-weight
commercial-friendly
small-to-mid
long-context
multimodal
multilingual
laptop-friendly
edge
apache-licensed
eu-based
vision

Quick Take

Mistral's biggest edge-class model — 14B parameters, vision-capable, 256K context, runs on a single consumer GPU, and performs like a 24B model. Apache 2.0.

Plain-English Description

Ministral 3 is Mistral's family of edge-deployable models, released December 2, 2025 as part of the broader Mistral 3 generation launch. The family ships in three parameter sizes (3B, 8B, 14B), and each size comes in three post-training variants (Base for fine-tuning, Instruct for chat, Reasoning for step-by-step problem solving). Every Ministral 3 model is multimodal — text plus image input — which is notable for models in this size class; most competing edge models (Llama 3.2 1B/3B, Gemma 2B/7B) don't have vision capability built in.

The 14B Instruct variant is the largest and most capable of the family, and the one most teams will actually deploy. It's a dense model (every parameter activates on every token), which makes it straightforward to reason about for self-hosting — there's no expert-routing complexity like you'd find in Mistral's MoE models. The 256K-token context window matches what Mistral Large 3 and Mistral Small 4 offer. At FP8 quantization the model fits comfortably in 24GB of VRAM (a single RTX 4090 or 3090), and Q4 GGUF quantization brings it down to 16GB with modest quality loss. On Apple Silicon, a 32GB MacBook Pro runs it via llama.cpp or LM Studio at useful speeds.

The performance story is interesting. Mistral positions Ministral 3 14B as offering "frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart" — which is a real claim backed by benchmarks. The Ministral 3 14B Reasoning variant hits 85% on AIME 2025, a math-reasoning benchmark that's genuinely state-of-the-art for a model this size. The Instruct variant doesn't benchmark quite that high but remains the strongest open-weight 14B vision-capable model available. For most non-reasoning-heavy workloads, you'd reach for the Instruct variant; for reasoning-heavy workloads where the extra latency of a chain-of-thought is acceptable, the Reasoning variant is a separate listing.

Best For

Small-business-class local deployments. This is the Mistral model that fits the "one consumer GPU, private inference, real capability" profile most cleanly. For a law firm, a consultancy, an internal business-tools team — Ministral 3 14B on a single 4090 is a practical starting point.
Edge and on-device vision+text applications. Document understanding, image-Q&A, visual inspection workflows. The built-in vision capability means one model handles both modalities without routing.
Multilingual applications in small-to-mid markets. Broad multilingual coverage (13+ languages including European, Asian, and Arabic-script) with a model light enough to run in many environments.
Fine-tuning for narrow domains. The 14B parameter count is large enough to hold serious domain knowledge and small enough to fine-tune affordably. Good starting point for a domain-specific deployment.
Teams who want the best local-deployment Mistral. If you've decided on Mistral and you can't commit to the 8×H100 footprint of Mistral Large 3 or the single-H100 of Mistral Small 4, this is where you land.

Not For

Frontier-capability workloads. Ministral 3 14B is the best in its class, but it's still a 14B model. For maximum capability, Mistral Large 3 or Mistral Small 4 are meaningfully more capable.
Extreme-latency reasoning tasks. The Reasoning variant exists for a reason; Instruct's outputs are faster but lack the extended chain-of-thought. If AIME-class reasoning matters, use the Reasoning variant (separate listing).
Agentic coding as a primary use case. Ministral 3 14B can code but it isn't post-trained for agentic coding the way Devstral Small 2 is. For coding-agent workloads, Devstral Small 2 is the better specialist.
Very constrained hardware (<16GB VRAM). At aggressive quantization the 14B can technically squeeze into 12GB, but quality degrades noticeably. For truly small hardware targets, Ministral 3 8B or 3B are purpose-built.

License — Plain-English Summary

Apache 2.0. Commercial use unrestricted, modifications and redistribution allowed, fine-tuning allowed without special terms. Include the license file. This is the standard Mistral open-weight posture — no user caps, no revenue thresholds.

How It Compares

vs. Mistral Small 4 — Small 4 is more capable but requires an H100 to self-host; Ministral 3 14B runs on a single consumer GPU. If you can afford the infrastructure for Small 4, take it; if you can't, Ministral 3 14B is the next best Mistral.
vs. Devstral Small 2 24B — Devstral is a coding specialist; Ministral 3 14B is a generalist with vision. Both are Apache 2.0 and both laptop-class. For mixed workloads, Ministral; for agentic coding specifically, Devstral.
vs. Ministral 3 14B Reasoning — Same base, different post-training. Reasoning variant hits 85% on AIME 2025 but produces longer responses through extended chain-of-thought. Instruct is faster; Reasoning is more accurate on math/logic. Separate listing.
vs. Meta Llama 3.2 11B Vision Instruct — Similar size tier with vision capability. Ministral 3 14B has a longer context window (256K vs 128K), higher general benchmarks, and cleaner licensing (Apache 2.0 vs Llama Community License's 700M MAU clause). Llama 3.2 has a more mature ecosystem and larger community for tooling.

Under the Hood

Ministral 3 14B is a dense decoder-only transformer with a vision encoder fused for native multimodal input. Architecture includes rope-scaling (inspired by Llama 4) and scalable-softmax attention mechanisms to support the 256K context window efficiently. Supports function calling in Mistral's native format, structured outputs (JSON), and tool-use orchestration out of the box.

The default Hugging Face release is FP8-quantized for efficient deployment. Mistral also releases a companion BF16 ("no-loss FP8") version, a GGUF quantization family (Q4 / Q5 / Q8 / Q2 variants), and an ONNX export for certain deployment scenarios. The Ministral 3 - Additional Checkpoints collection on Hugging Face catalogs all of these.

Reported benchmarks at launch: comparable to Mistral Small 3.2 24B on most non-reasoning tasks. The Reasoning variant of the same 14B base achieves 85% on AIME 2025, beating Qwen3-14B's 73.7% — notable for a reasoning specialist in the small-model class. Ministral 3 14B Instruct (this model) focuses on fast, single-pass instruction following rather than extended chain-of-thought.

Cost

Self-hosted cost: $0.00 beyond compute
API input (per 1M tokens): $0.10
API output (per 1M tokens): $0.30
API providers: mistral, openrouter, fireworks, together
Notes: Self-hosting is free beyond compute. FP8 weights fit on a single consumer GPU with 24GB VRAM; GGUF quantizations run on 16GB or less. This is the "big model on a single prosumer GPU" tier.

Pricing data is 92 days old. Verify with the source before relying on it.

Hardware requirements

Min VRAM: 16 GB
Recommended VRAM: 24 GB
Runs on laptop: Yes
Notes: Q4-quantized GGUF runs on 16GB VRAM (RTX 4080 / RTX 3090). FP8 native fits in 24GB (RTX 4090 / RTX 3090 24GB variants). Full BF16 precision needs ~32GB. Practical laptop deployment via Apple Silicon Macs with 32GB+ unified memory.