← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Mistral AI · Mistral Small 3 24B

Feature-frozen. The creator has frozen feature development on this model (critical fixes only).

DeepHermes 3 Mistral 24B

fine-tune derivative of Mistral Small 3 24B by Nous Research

Nous Research's reasoning-focused fine-tune of Mistral Small 3 24B — unified intuitive + toggleable chain-of-thought reasoning, on a non-Llama base with a permissive license.

Size
mid (24.0B params)
Context
32,768 tokens
Released
2025-03-10
Openness
open-weight
License
Cost tier
mixed
Rating
4.0 — Toggleable reasoning at a useful size with a clean Apache license (no Llama strings) — a genuinely appealing 4.0.
Modalities
text
Capabilities
chat, coding, instruction-following, reasoning, tool-use
Access
local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

DeepHermes on a Mistral base: a 24B toggleable-reasoning fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of Mistral Small 3 24B — single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models., with a clean Apache 2.0 license (no Llama strings).

Plain-English Description

DeepHermes 3 Mistral 24B applies Nous's unified intuitive/reasoning tuning to Mistral's Small 3 24B rather than a Llama base. That's significant for two reasons: it's a more capable size than the 8B DeepHermes, and because the base is Apache 2.0, the whole model carries a clean, unrestricted license — no Llama community-license carve-out.

It offers the same toggleable reasoning (direct answers or explicit chain-of-thought via system prompt) at a single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. 24B scale, with Hermes's steerability. For teams that want a reasoning-toggle model they can self-host commercially with minimal license friction, it's an appealing option.

License details below.

Best For

  • Single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. reasoning with a mode toggle and a clean Apache license.
  • Teams that want DeepHermes behavior without Llama's license terms.
  • Steerable, self-hostable reasoning for commercial use.
  • Math, logic, and structured problems at mid scale.

Not For

  • The absolute strongest reasoning — larger models go further.
  • Laptop-only setups — the 24B wants a real GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models.; use DeepHermes 3 8B.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.

License — Plain-English Summary

Clean and permissive — the contrast with the Llama-based Hermes models. Nous releases the DeepHermes weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. openly, and the base (Mistral Small 3 24B) is Apache 2.0, so both layers allow unrestricted commercial use, modification, and redistribution with no royalties and no user-count carve-out. Just retain the Apache notices.

How It Compares

Against DeepHermes 3 8B, the Mistral 24B is more capable and, crucially, cleaner on licensing (Apache vs Llama). Against its base Mistral Small 3 24B, it's Nous's reasoning-toggle tuning. Against the much larger Hermes 4 70B, it's lighter and Apache-clean but without Hermes 4's frontier-scale capability.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host; the base model's license governs commercial use (see License).

Comparable models

Commercial-use conditions

Nous releases the Hermes weights openly and the base (Mistral Small 3 24B) is Apache 2.0 — both layers are permissive, with unrestricted commercial use and no carve-outs.

Sources