Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Hermes 4 70B
fine-tune derivative of Llama 3.1 70B by Nous Research
Nous Research's post-training of Llama 3.1 70B into Hermes 4 — hybrid reasoning with toggleable <think> tags, balanced for capability and deployment cost.
- llm
- open-weight
- large
- reasoning
- agentic
- self-hostable
- fine-tune
- us-based
- llama-derivative
Quick Take
The practical Hermes 4: a 70B hybrid-reasoning fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of Llama 3.1 70B that Nous recommends for hosted use — most of the 405B's behavior at a fraction of the footprint.
Plain-English Description
Hermes 4 70B is the size most people should use from the Hermes 4 line. It carries the same hybrid-reasoning design as the 405B — toggleable
It's a strong open generalist with a reasoning mode you control, well suited to agentic workflows and applications that need a model willing to follow detailed system prompts. As with the 405B, capability is bounded by the Llama 3.1 70B base — Hermes changes how the model behaves, not its fundamental ceiling.
License is inherited from Llama (see below).
Best For
- The default Hermes 4 for self-hosting or hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. use.
- Agentic and reasoning workloads needing toggleable chain-of-thought at a manageable size.
- Applications wanting steerable, low-refusal behavior with detailed system-prompt control.
- Teams with a single multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. node rather than a cluster.
Not For
- Maximum capability — Hermes 4 405B goes higher.
- Laptop/single-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. setups — drop to DeepHermes 3 8B.
- Products near 700M MAU (Llama carve-out).
- MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.
License — Plain-English Summary
Two layers, like the 405B. Nous's open Hermes weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. sit on Meta's Llama 3.1 70B, so the Llama 3.1 Community License governs: commercial use allowed, "Built with Llama" attribution required, and a separate Meta license only above 700M monthly active users (irrelevant for almost everyone). For an unrestricted license, compare the Apache-based Hermes 4.3 36B.
How It Compares
Against Hermes 4 405B, the 70B trades peak capability for practicality — Nous's own hosted recommendation. Against Hermes 3 70B, Hermes 4 adds hybrid reasoning. Against its base Llama 3.1 70B, it's the steerable, reasoning-toggle Nous tuning rather than Meta's Instruct.
Cost
- Self-hosted cost
- $0.00 beyond compute
- Notes
- Free to self-host; the base model's license governs commercial use (see License).
Comparable models
Commercial-use conditions
Nous releases the Hermes weights openly, but the base is Meta's Llama 3.1, so Meta's Llama 3.1 Community License governs the model — including the clause requiring a separate Meta license if your product exceeds 700 million monthly active users.