← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Meta · Llama 3.1 405B

Hermes 4 405B

fine-tune derivative of Llama 3.1 405B by Nous Research

Nous Research's post-training of Llama 3.1 405B into Hermes 4 — adding hybrid reasoning (toggleable <think> tags), stronger schema-adherent output, and steerable, low-refusal instruction following.

Size
frontier (405.0B params)
Context
131,072 tokens
Released
2025-08-25
Openness
open-weight
License
Cost tier
mixed
Rating
4.0 — SOTA-class open reasoning with hybrid think tags and strong steerability, but cluster-scale hardware and the inherited Llama license keep it at 4.0.
Modalities
text
Capabilities
chat, coding, function-calling, instruction-following, long-context, math, reasoning, tool-use
Access
local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

Nous Research&#39;s flagship: a 405B hybrid-reasoning fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of Llama 3.1 405B, state-of-the-art among open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. models on reasoning, with toggleable chain-of-thought.

Plain-English Description

Hermes 4 405B is the top of Nous Research&#39;s lineup — their post-trainingAny training that happens after pretraining to make a base model useful for real tasks. Includes instruction tuning, chat tuning, and alignment work. Post-training is dramatically cheaper than pretraining — thousands to low millions rather than tens of millions. Most of what distinguishes GPT-4 from Llama 3.1 as a product, rather than as a base capability, is post-training. applied to Meta&#39;s largest Llama 3.1 model. The headline feature is hybrid reasoning: a system prompt toggles tags on or off, so the same model can answer directly for simple queries or deliberate step-by-step for hard ones. It reaches frontier-level open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. performance through post-training alone, no new pretrainingThe first and most expensive phase of training a model, where it learns general language and knowledge from enormous datasets — typically trillions of tokens of text scraped from the internet, books, code, and other sources. Pretraining produces a base model. Major labs spend millions to hundreds of millions of dollars on a single pretraining run..

Beyond reasoning, Hermes is known for steerability — it adopts strong personas, follows detailed system prompts closely, and refuses less than Meta&#39;s own Instruct release, which makes it a favorite for builders who want control over voice and behavior. It&#39;s a serious model that needs serious hardware: at 405B it&#39;s a multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. or cloud-inferenceRunning a model to get outputs — as opposed to training it. When you send a prompt to ChatGPT, that's inference. Inference is much cheaper than training per operation but adds up quickly at scale. Pricing pages almost always refer to inference costs (per million tokens, per request, etc.), not training costs. deployment.

Like all Llama-based Hermes models, its license is inherited from Llama (see below).

Best For

  • Top-tier open reasoning you can self-host (with cluster-scale hardware).
  • Applications wanting strong steerability and low-refusal instruction following at the frontier.
  • Agentic and schema-adherent output (structured tool calls) at maximum capability.
  • Research and high-end deployments where owning the weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. matters.

Not For

  • Anyone without multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models./cluster capacity — use Hermes 4 70B.
  • Products near the 700M-MAU mark, which trip Llama&#39;s license carve-out.
  • Teams wanting a clean, unrestricted license — the Apache-based Hermes 4.3 36B avoids Llama&#39;s terms.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.

License — Plain-English Summary

Two layers. Nous releases the Hermes 4 weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. openly, but the base is Meta&#39;s Llama 3.1 405B, so Meta&#39;s Llama 3.1 Community License governs the model and travels with it: commercial use is allowed, but you must display &quot;Built with Llama,&quot; observe the acceptable-use terms, and secure a separate Meta license only if your product exceeds 700 million monthly active users. That threshold is irrelevant for nearly all businesses, hence &quot;conditional.&quot; For a similar model without Llama&#39;s strings, the Apache-licensed Hermes 4.3 36B is the alternative.

How It Compares

Against Hermes 4 70B, the 405B is more capable but far heavier — the 70B is what Nous recommends for hosted use. Against the prior Hermes 3 405B, Hermes 4 adds hybrid reasoning and sharper outputs. Against its base Llama 3.1 405B, Hermes is the steerable, lower-refusal, reasoning-toggle alternative to Meta&#39;s own Instruct tuning.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host; the base model's license governs commercial use (see License).

Comparable models

Commercial-use conditions

Nous releases the Hermes weights openly, but the base is Meta's Llama 3.1, so Meta's Llama 3.1 Community License governs the model — including the clause requiring a separate Meta license if your product exceeds 700 million monthly active users.

Sources