Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Catalog entry last reviewed 92 days ago.

Mistral Large 3 Instruct

Model family: mistral-large-3

Size

frontier (675.0B params)

Context

262,144 tokens

Released

2025-12-01

Openness

open-weight

License

Apache 2.0 · commercial: yes

Cost tier

mixed

Rating

4.5 ★ — Frontier-tier open-weight capability with a genuinely permissive license; loses half a star only because the hardware requirements put self-hosting out of reach for anyone without serious GPU infrastructure.

Modalities

image-input, text

Capabilities

chat, long-context, multilingual, reasoning, tool-use, vision

Access

api-first-party, api-third-party, local-runtime-vllm, weights-download-direct, weights-download-hf

llm
open-weight
commercial-friendly
frontier
long-context
multimodal
multilingual
eu-based
moe
apache-licensed

Quick Take

Mistral's open-weight frontier model — 675B total parameters, Apache 2.0, 256K context, multimodal — the strongest permissive license you'll find on a model this capable.

Plain-English Description

Mistral Large 3 is the company's top-of-the-line general-purpose model and the first Mistral flagship to use a mixture-of-experts (MoE) architecture since the original Mixtral series from 2024. The MoE design is what makes the math work: the model has 675 billion parameters total, but for any given token only 41 billion of them activate. You get the knowledge capacity of a very large model with inference costs closer to a mid-sized dense model. The catch is memory — you still have to load all 675B parameters into GPU memory to serve the model, which is why self-hosting requires an 8-GPU node at minimum.

The model was announced December 2, 2025, released both as a base (pretrained) variant and the instruction-tuned variant cataloged here. It supports a 256,000-token context window, natively handles text and image input, and covers a broad swath of languages including European and major Asian languages. Mistral published it under the Apache 2.0 license — the same license that covers Linux, Kubernetes, and most of the modern open-source infrastructure stack — which means no user-count caps, no revenue thresholds, no industry restrictions beyond what the license itself specifies. For a model of this capability class, that level of license permissiveness is genuinely unusual; Meta's Llama 4 comes with a 700M monthly-active-user clause, and most other frontier open-weight releases carry custom licenses with at least some business-use caveats. Mistral Large 3 has none.

On capability, independent benchmarks place Mistral Large 3 in the top tier of open-weight models without quite matching the frontier proprietary tier. On the Artificial Analysis Intelligence Index it sits below DeepSeek-v3.2, Kimi K2-Thinking, and GLM-4.6 but above Meta's Llama 4 Maverick, OLMo 3, and most other Western open-weight releases. It's the current top open-source coding model on the LMArena leaderboard. Mistral has announced that a reasoning variant is coming soon but as of this writing it hasn't shipped — which is why we have "Mistral Large 3 Reasoning" on the Watchlist rather than as a separate catalog entry.

Best For

Enterprise deployments that need frontier capability and permissive licensing. If you're building a product that embeds a frontier-class LLM and the license terms matter (regulated industry, redistribution, international deployment), Mistral Large 3 is the strongest Apache 2.0 option available.
Long-context work on proprietary data. 256K tokens is ~200K words — enough for whole-repository code analysis, multi-document legal review, or book-length content synthesis.
European data-sovereignty deployments. Mistral is a French company subject to GDPR and the EU AI Act by default. For regulated EU workloads, this posture is often the deciding factor.
Research and experimentation that requires model weights. The full weights are on Hugging Face. You can probe, fine-tune, distill, or modify without negotiating a license.
Multilingual applications. Strong performance across European, major Asian, and Arabic-script languages.

Not For

Running on your own machine. Full precision requires ~1.35TB of memory across GPUs. Even the heavily quantized NVFP4 checkpoint needs a Blackwell NVL72 or 8×H100 node. If you want an open-weight Mistral you can actually self-host on consumer hardware, Mistral Small 4, Devstral Small 2, or the Ministral 3 family are your options.
Cost-optimized workloads where Claude or GPT-class reasoning is overkill. For everyday chat, summarization, and instruction-following, Mistral Small 4 at $0.15/M input tokens delivers much of the capability at a fraction of the cost.
The absolute bleeding-edge of reasoning benchmarks. Closed proprietary frontier models (GPT-5.1, Claude Opus 4.5, Gemini 3 Pro) still hold an edge on the hardest reasoning tasks, and the open-weight reasoning-specialist models (DeepSeek-v3.2, Kimi K2-Thinking) outscore Large 3 on the Intelligence Index.
Use cases where the reasoning variant would matter. Mistral has announced a reasoning variant as coming but it hasn't shipped. If reasoning is your bottleneck, the Ministral 3 14B Reasoning variant or Mistral Small 4 (with reasoning_effort: high) are available today.

License — Plain-English Summary

Mistral Large 3 is Apache 2.0 — the same license that covers Linux, Kubernetes, and most of the modern open-source stack. You can use it commercially without restrictions, modify it, fine-tune it, redistribute modified versions, and build products on top of it. The only obligations are: include the license file with redistribution, and don't use Mistral trademarks without permission. No user-count thresholds, no revenue caps, no industry bans. For a frontier-tier model, this is about as permissive as licensing gets.

How It Compares

vs. Meta Llama 4 Scout — Similar capability tier; Scout has a larger 10M context window but Llama 4's license restricts EU multimodal use and has a 700M MAU clause. Mistral Large 3 wins on licensing, Scout wins on extreme context.
vs. Meta Llama 3.3 70B Instruct — Llama 3.3 is dense (easier to run), smaller, and has the same 700M MAU conditional license. For teams without H100-class infrastructure, Llama 3.3 is more practical; for teams who want frontier capability and fully permissive licensing, Large 3 is the pick.
vs. Mistral Small 4 — Small 4 is 119B MoE with 6B active; dramatically cheaper and more deployable, with configurable reasoning that covers most real-world workloads. Use Large 3 when Small 4's quality ceiling becomes a bottleneck.

Under the Hood

Mistral Large 3 is a sparse mixture-of-experts model with granular expert routing — 41B parameters activate per token out of 675B total. Training was done on 3,000 NVIDIA H200 GPUs from scratch (not a continued-pretraining variant of an earlier Mistral model). The base model was released alongside the Instruct variant; both are Apache 2.0. An NVFP4-quantized checkpoint is co-released for Blackwell NVL72 systems, and an FP8 checkpoint fits on a single 8×H100 node using vLLM.

Architecturally, Mistral Large 3 uses a mix of sparse attention and implementation-level optimizations to sustain the 256K context window without quadratic compute blow-up. It supports function calling, structured outputs, multi-tool orchestration, fill-in-the-middle code editing, and both OCR and audio transcription endpoints through Mistral's API — though the audio and OCR pipeline integrations ship as separate closed-API products (Voxtral, Mistral OCR) rather than being native Large 3 capabilities.

On independent benchmarks, Mistral Large 3 scores 73.11% on MMLU-Pro and 93.60% on MATH-500, placing it in the top tier of open-weight models. It sits below DeepSeek-v3.2, Kimi K2-Thinking, and GLM-4.6 on the Artificial Analysis Intelligence Index (a composite benchmark) but above Llama 4 Maverick and OLMo 3. On the LMArena leaderboard, it's the current top open-source model in the non-reasoning category and the top open-source coding model.

Cost

Self-hosted cost: $0.00 beyond compute
API input (per 1M tokens): $0.50
API output (per 1M tokens): $1.50
API providers: mistral, openrouter, fireworks, together, bedrock, azure
Notes: Self-hosted deployment requires an 8×A100 or 8×H100 node minimum, or a Blackwell NVL72 system for the NVFP4-quantized checkpoint. Mistral's own API is the reference implementation; third-party providers host the open weights.

Pricing data is 92 days old. Verify with the source before relying on it.

Hardware requirements

Min VRAM: 320 GB
Recommended VRAM: 640 GB
Runs on laptop: No
Notes: BF16 weights require roughly 1.35TB (you can't actually run this on a laptop or a single workstation). NVFP4-quantized checkpoint fits on a Blackwell NVL72; the FP8 checkpoint fits on a single 8×H100 node.