Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Meta · Llama 3.1 70B

Hermes 4 70B

fine-tune derivative of Llama 3.1 70B by Nous Research

Nous Research's post-training of Llama 3.1 70B into Hermes 4 — hybrid reasoning with toggleable <think> tags, balanced for capability and deployment cost.

Size

large (70.0B params)

Context

131,072 tokens

Released

2025-08-25

Openness

open-weight

License

Llama 3.1 Community License (Nous fine-tune) · commercial: conditional

Cost tier

mixed

Rating

4.0 ★ — Nous's recommended hosted Hermes 4 — most of the 405B's behavior at a practical size; held to 4.0 by the Llama license and text-only scope.

Modalities

text

Capabilities

chat, coding, function-calling, instruction-following, long-context, math, reasoning, tool-use

Access

local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

llm
open-weight
large
reasoning
agentic
self-hostable
fine-tune
us-based
llama-derivative

Quick Take

The practical Hermes 4: a 70B hybrid-reasoning fine-tune of Llama 3.1 70B that Nous recommends for hosted use — most of the 405B's behavior at a fraction of the footprint.

Plain-English Description

Hermes 4 70B is the size most people should use from the Hermes 4 line. It carries the same hybrid-reasoning design as the 405B — toggleable tags, steerable instruction following, schema-adherent output — but at 70 billion parameters it runs on hardware a single team can realistically operate, and Nous explicitly recommends it for its hosted API over the 405B.

It's a strong open generalist with a reasoning mode you control, well suited to agentic workflows and applications that need a model willing to follow detailed system prompts. As with the 405B, capability is bounded by the Llama 3.1 70B base — Hermes changes how the model behaves, not its fundamental ceiling.

License is inherited from Llama (see below).

Best For

The default Hermes 4 for self-hosting or hosted API use.
Agentic and reasoning workloads needing toggleable chain-of-thought at a manageable size.
Applications wanting steerable, low-refusal behavior with detailed system-prompt control.
Teams with a single multi-GPU node rather than a cluster.

Not For

Maximum capability — Hermes 4 405B goes higher.
Laptop/single-GPU setups — drop to DeepHermes 3 8B.
Products near 700M MAU (Llama carve-out).
Multimodal tasks — text only.

License — Plain-English Summary

Two layers, like the 405B. Nous's open Hermes weights sit on Meta's Llama 3.1 70B, so the Llama 3.1 Community License governs: commercial use allowed, "Built with Llama" attribution required, and a separate Meta license only above 700M monthly active users (irrelevant for almost everyone). For an unrestricted license, compare the Apache-based Hermes 4.3 36B.

How It Compares

Against Hermes 4 405B, the 70B trades peak capability for practicality — Nous's own hosted recommendation. Against Hermes 3 70B, Hermes 4 adds hybrid reasoning. Against its base Llama 3.1 70B, it's the steerable, reasoning-toggle Nous tuning rather than Meta's Instruct.

Cost

Self-hosted cost: $0.00 beyond compute
Notes: Free to self-host; the base model's license governs commercial use (see License).

Comparable models

Commercial-use conditions

Nous releases the Hermes weights openly, but the base is Meta's Llama 3.1, so Meta's Llama 3.1 Community License governs the model — including the clause requiring a separate Meta license if your product exceeds 700 million monthly active users.