← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models

Mistral AI

4.5 ★ — Consistent open-weight releases under Apache 2.0, rapid release cadence, and European jurisdiction that appeals to GDPR-sensitive deployments — tempered by license diversity across the lineup (Apache 2.0 for most, custom for Devstral 2 flagship with revenue restrictions, CC BY-NC 4.0 for Voxtral TTS) that requires reading carefully before you commit.

Type
ai-native-company
Country
FR
Founded
2023
Website

Quick Take

Mistral is France's frontier AI lab and the most prolific open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. ship­per in Europe — their Apache 2.0 model releases dominate the EU AI ecosystem and compete with U.S. labs on both capability and price.

Who They Are

Mistral AI is a Paris-based AI research and product company founded in 2023 by three researchers from Meta and Google DeepMind: Arthur Mensch (CEO), Guillaume Lample, and Timothée Lacroix. Within two and a half years, the company went from a €105M seed round to a €830M Series C in March 2026, a $13.8 billion valuation, and an annual recurring revenue of roughly $400 million. Headquartered in Paris with an expanding engineering presence in Sweden, Mistral is the clearest European counterweight to the U.S. concentration of frontier AI labs.

Mistral's early bet — ship foundation-model weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. under a genuinely permissive license and let the ecosystem develop on top of them — has aged well. Where OpenAI, Anthropic, and most of Google's tier-one work lives behind closed APIs, Mistral's default is to publish the weights. Llama-scale models, code specialists, speech models, edge models: most of it goes out to Hugging Face under Apache 2.0 and is simultaneously available as a paid API on api.mistral.ai for teams who don't want to self-host. The resulting dynamic — commercial-grade models freely available for self-deployment and metered access for everyone else — makes Mistral the closest thing the AI industry has to a "best of both worlds" vendor for business users.

What makes the company distinctive beyond licensing is the release cadence. In a single 15-day stretch in March 2026, Mistral shipped Mistral Small 4 (a new flagship-small unified reasoning/coding/vision modelA multimodal model that accepts images as input alongside text. Useful for describing images, extracting text from photos, analyzing charts or screenshots, and identifying objects. Vision models don't generate images — they read them. For generating images, you want an image-generation model, which is a separate category.), Voxtral TTS (their first text-to-speech model), Leanstral (a formal-proof coding agent), Forge (an enterprise training platform), Spaces CLI (a developer tooling product), and an NVIDIA partnership announcement. That pace is hard to sustain, and the quality hasn't suffered — Mistral Large 3 sits in the top tier of open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. models on LMArena, and Ministral 3 14B's reasoning variant hits 85% on AIME 2025, which is genuinely state-of-the-art for its size class.

Model Philosophy

Mistral's positioning is roughly: frontier quality, permissive license, European jurisdiction, and pricing aggressive enough to pressure the U.S. closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. incumbents. The pitch to enterprises is usually some version of "you can fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. and self-host on your own GPUs without legal friction, or you can use our hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. for what we charge, which is generally a fraction of what Claude or GPT-4-class models cost." For European companies specifically, the fact that Mistral is French — subject to GDPR and the EU AI Act by default rather than by bolt-on — is a real procurement advantage that shows up in contracts with ASML, Ericsson, BNP Paribas, and similar names.

The product naming follows a convention worth understanding. "Mistral" models are general-purpose language models (Small, Medium, Large). "Ministral" models are edge-size (3B, 8B, 14B). The -stral suffix marks specialists: Devstral for code, Voxtral for audio, Leanstral for formal proofs, Codestral for code embeddings. Numbered generations (3, 4) replace the older dated-version scheme (2501, 2506) for new flagships, though dated checkpoints are still how Mistral identifies individual releases on Hugging Face — you'll see Mistral-Small-4-119B-2603, where "2603" encodes March 2026 as the release date.

What To Know Before You Commit

Mistral's licensing is more varied than it first appears, and the variation matters for business use.

Apache 2.0 covers most of the lineup. Mistral Large 3, Mistral Small 4, all Ministral 3 variants, Devstral Small 2 (24B), and Voxtral speech-recognition models are Apache 2.0 — genuinely permissive, commercial-use allowed, no user-count thresholds, modification and redistribution fine. For these models, self-hosting is unambiguously free and fine-tuning for commercial product development is unambiguously fine.

Devstral 2 (the 123B flagship) is not Apache 2.0. Mistral calls the license "modified MIT" in their materials, but it contains commercial restrictions that developers on X correctly flagged as functionally proprietary. Commercial use is conditional and revenue-capped; exceeding the threshold requires a separate commercial agreement. If you're planning to deploy Devstral 2 in a revenue-generating product, read the license carefully or consult counsel before self-hosting. The 24B Devstral Small 2 has none of these restrictions and is the cleaner default for most commercial use.

Voxtral TTS is CC BY-NC 4.0. Creative Commons Attribution-NonCommercial 4.0 International means: free to use, modify, and redistribute with attribution, but commercial use is prohibited without a separate license from Mistral. The hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. at $0.016/1K characters is the commercial license. If you want to self-host Voxtral TTS weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. in a revenue-generating product, you need to contact Mistral for a commercial agreement. Research, evaluation, and personal use are unrestricted.

The closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. tier is growing. Mistral Medium 3, Mistral Embed, Codestral Embed, Mistral Moderation, Mistral OCR, and Mistral Saba (Middle Eastern / South Asian languages) are proprietary API-only products — no open weights, no self-hosting. For some of these (notably Codestral Embed), Mistral's offering is the only first-party option in the category, so if the task matches the specialty, the closed API is often the path of least resistance.

Beyond licensing, one practical note for EU-based deployments: Mistral is the only frontier AI lab where "my data will not leave the EU" is a default configuration rather than a paid enterprise tier. For regulated industries in Europe, this alone is often the deciding factor.

Original Models

Mistral Small

Eagle speculative-decoding head for Mistral Small 4 — pair it with the base modelA model straight out of pretraining, before any fine-tuning for chat or specific tasks. Base models predict the next token but don't follow instructions well — they'll continue your prompt rather than respond to it. Most people never use base models directly; they use the instruct-tuned or chat versions built on top. Useful mostly for researchers and people doing their own fine-tuning. for faster inferenceRunning a model to get outputs — as opposed to training it. When you send a prompt to ChatGPT, that's inference. Inference is much cheaper than training per operation but adds up quickly at scale. Pricing pages almost always refer to inference costs (per million tokens, per request, etc.), not training costs. throughput. Architectural extension, not standalone.

Mistral's specialist code agent for Lean 4 formal proof engineering — derived from Small 4, Apache 2.0, beats Claude Sonnet on formal proof benchmarks at ~1/15th the cost. The first genuinely open-sourceA stricter standard than open-weight: the weights, the training code, and the training data are all released publicly. Very few large language models meet the full open-source bar — most "open" models in the AI world are actually open-weight. When in doubt, check the license file and the creator's documentation. formal-proof agent.

Mistral's unified mid-tier workhorse — 119B MoEA model architecture that splits the model into many smaller specialized "expert" networks, only activating a handful per input rather than running the whole model every time. The practical effect: you get the knowledge capacity of a big model with the compute cost of a much smaller one. Mistral Large 3 and Mistral Small 4 are both MoE models. with 6B active, configurable reasoning depth, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., Apache 2.0, and among the cheapest capable API options available.

Mistral Small 3 24B — an Apache 2.0 24B base, used as a fine-tuning foundation (e.g. for DeepHermes).

Voxtral

Mistral's first text-to-speech model — 9 languages, zero-shot voice cloning from 3 seconds of audio, and roughly 27% of ElevenLabs' per-character cost through Mistral's API.

Streaming-ASR Voxtral — processes live audio incrementally for real-time transcription and voice agents. Apache 2.0.

Transcription-optimized Voxtral — $0.003/min via Mistral's API, batch ASR for meetings, podcasts, and recordings. Apache 2.0.

Edge-deployable Voxtral — 3B sibling of Voxtral Small 24B with the same speech-understanding architecture at a smaller scale. Apache 2.0.

Mistral's speech-understanding flagship — a 24B audio-text model that transcribes, translates, and directly answers questions from audio input.

Devstral

Mistral's 123B agentic coding flagship — 72.2% on SWE-Bench Verified — with a custom "modified MIT" license that has commercial-use restrictions tied to revenue. Powerful but legally non-trivial to deploy commercially.

Mistral's laptop-class coding specialist — 24B parameters, Apache 2.0, runs on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog., and beats 70B-class competitors on software-engineering benchmarks.

Ministral 3

Pretrained base variant of Ministral 3 14B — largest edge-model base for custom fine-tuning and domain adaptation. Apache 2.0.

Mistral's biggest edge-class model — 14B parameters, vision-capable, 256K context, runs on a single consumer GPUA GPU designed for desktop PCs and gaming — typically Nvidia RTX 3090, 4090, 5090 or similar. Consumer GPUs have 8-32GB of VRAM and cost a few thousand dollars each. Capable of running small and medium models, especially when quantized. The boundary between "runs on a consumer GPU" and "needs a datacenter GPU" roughly separates small from large models in the catalog., and performs like a 24B model.

Largest reasoning-tuned Ministral 3 — hits 85% on AIME 2025, state-of-the-art for 14B models. Laptop-class deployment, Apache 2.0.

Pretrained base variant of Ministral 3 3B — smallest Ministral 3, for teams doing their own instruction-tuning or domain adaptation at the edge scale.

Smallest Ministral 3 instruct variant — 3B parameters, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default., fits in 8GB VRAMThe memory built into a GPU. VRAM size determines what models you can load and run — a model's weights must fit in VRAM (or be cleverly swapped in and out). A 7B model in 4-bit quantization needs about 6GB of VRAM; a 70B model in 4-bit needs about 40GB; full-precision frontier models need multiple high-end GPUs. When people talk about a model "fitting" on a GPU, they mean VRAM. in FP8. Apache 2.0, edge- and smartphone-class deployment.

Reasoning-tuned variant of Ministral 3 3B — extended chain-of-thought in an edge-deployable 3B model. Apache 2.0.

Pretrained base variant of Ministral 3 8B — mid-size edge model for teams doing custom instruction-tuning or domain adaptation. Apache 2.0.

Balanced mid-size Ministral 3 — 8B parameters with vision, multilingual, 256K context. Fits in 12GB VRAMThe memory built into a GPU. VRAM size determines what models you can load and run — a model's weights must fit in VRAM (or be cleverly swapped in and out). A 7B model in 4-bit quantization needs about 6GB of VRAM; a 70B model in 4-bit needs about 40GB; full-precision frontier models need multiple high-end GPUs. When people talk about a model "fitting" on a GPU, they mean VRAM. in FP8. Apache 2.0.

Reasoning-tuned Ministral 3 8B — extended chain-of-thought at mid-size, laptop-deployable, Apache 2.0.

Mistral Large 3

Pretrained base variant of Mistral's Large 3 flagship — 675B MoEA model architecture that splits the model into many smaller specialized "expert" networks, only activating a handful per input rather than running the whole model every time. The practical effect: you get the knowledge capacity of a big model with the compute cost of a much smaller one. Mistral Large 3 and Mistral Small 4 are both MoE models., Apache 2.0, for teams that want to run their own instruction-tuning or domain adaptation.

Mistral's open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. frontier model — 675B total parameters, Apache 2.0, 256K context, multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. — the strongest permissive license you'll find on a model this capable.

Mistral Medium

Mistral's closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. enterprise tier — sits between Small 4 and Large 3 on capability and cost, with hybrid and on-prem deployment available but no published weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself..

Embeddings

Mistral's code-specialized embedding model — purpose-built for code retrieval, outperforms Voyage Code 3 and OpenAI's embeddings on code benchmarks, and lets you pick your output dimensions to trade q

Mistral's general-purpose text embedding model — competitive with OpenAI and Cohere for standard RAG and semantic-search workloads, at the usual Mistral advantage of EU jurisdiction and aggressive pri

Mistral Other

Mistral's document-understanding API. Extracts markdown + HTML tables from PDFs, images, and handwriting at $2/1,000 pages ($1 batch). 74% win rate over OCR 2 as of December 2025, undercuts AWS/Google/Azure on price.

Regional

Mistral's Middle Eastern and South Asian language specialist — 24B model with strong Arabic, Farsi, Urdu, Hebrew, and Hindi performance. Closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability..

Safety

Mistral's content-moderation classifier — nine harm categories, multilingual, closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. at $0.10 per million tokens.