← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Meta · Llama 3.1 70B

Hermes 4 70B

fine-tune derivative of Llama 3.1 70B by Nous Research

Nous Research's post-training of Llama 3.1 70B into Hermes 4 — hybrid reasoning with toggleable <think> tags, balanced for capability and deployment cost.

Size
large (70.0B params)
Context
131,072 tokens
Released
2025-08-25
Openness
open-weight
License
Cost tier
mixed
Rating
4.0 — Nous's recommended hosted Hermes 4 — most of the 405B's behavior at a practical size; held to 4.0 by the Llama license and text-only scope.
Modalities
text
Capabilities
chat, coding, function-calling, instruction-following, long-context, math, reasoning, tool-use
Access
local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

The practical Hermes 4: a 70B hybrid-reasoning fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of Llama 3.1 70B that Nous recommends for hosted use — most of the 405B&#39;s behavior at a fraction of the footprint.

Plain-English Description

Hermes 4 70B is the size most people should use from the Hermes 4 line. It carries the same hybrid-reasoning design as the 405B — toggleable tags, steerable instruction following, schema-adherent output — but at 70 billion parameters it runs on hardware a single team can realistically operate, and Nous explicitly recommends it for its hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. over the 405B.

It&#39;s a strong open generalist with a reasoning mode you control, well suited to agentic workflows and applications that need a model willing to follow detailed system prompts. As with the 405B, capability is bounded by the Llama 3.1 70B base — Hermes changes how the model behaves, not its fundamental ceiling.

License is inherited from Llama (see below).

Best For

  • The default Hermes 4 for self-hosting or hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. use.
  • Agentic and reasoning workloads needing toggleable chain-of-thought at a manageable size.
  • Applications wanting steerable, low-refusal behavior with detailed system-prompt control.
  • Teams with a single multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. node rather than a cluster.

Not For

  • Maximum capability — Hermes 4 405B goes higher.
  • Laptop/single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. setups — drop to DeepHermes 3 8B.
  • Products near 700M MAU (Llama carve-out).
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.

License — Plain-English Summary

Two layers, like the 405B. Nous&#39;s open Hermes weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. sit on Meta&#39;s Llama 3.1 70B, so the Llama 3.1 Community License governs: commercial use allowed, &quot;Built with Llama&quot; attribution required, and a separate Meta license only above 700M monthly active users (irrelevant for almost everyone). For an unrestricted license, compare the Apache-based Hermes 4.3 36B.

How It Compares

Against Hermes 4 405B, the 70B trades peak capability for practicality — Nous&#39;s own hosted recommendation. Against Hermes 3 70B, Hermes 4 adds hybrid reasoning. Against its base Llama 3.1 70B, it&#39;s the steerable, reasoning-toggle Nous tuning rather than Meta&#39;s Instruct.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host; the base model's license governs commercial use (see License).

Comparable models

Commercial-use conditions

Nous releases the Hermes weights openly, but the base is Meta's Llama 3.1, so Meta's Llama 3.1 Community License governs the model — including the clause requiring a separate Meta license if your product exceeds 700 million monthly active users.

Sources