← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · ByteDance · Seed-OSS-36B-Instruct

Hermes 4.3 36B

fine-tune derivative of Seed-OSS-36B-Instruct by Nous Research

Nous Research's fine-tune of ByteDance's Seed-OSS-36B — the most recent Hermes and the first trained on Nous's Psyche decentralized network rather than a centralized cluster.

Size
mid (36.0B params)
Context
524,288 tokens
Released
2025-12-01
Openness
open-weight
License
Cost tier
mixed
Rating
4.0 — The newest Hermes, with a 512K context window and clean Apache licensing from its Seed-OSS base; 4.0, with the caveat that it is recent and trained by a novel decentralized method.
Modalities
text
Capabilities
chat, coding, function-calling, instruction-following, long-context, math, reasoning, tool-use
Access
local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

Quick Take

The newest Hermes: a 36B fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. of ByteDance's Seed-OSS-36B with a 512K context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run., Apache-clean, and the first Hermes trained on Nous's decentralized Psyche network.

Plain-English Description

Hermes 4.3 36B (December 2025) is Nous Research's most recent model and a notable departure: instead of a Llama base, it's built on ByteDance's Apache 2.0 Seed-OSS-36B — which means it inherits both a 512K-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run. and a clean, unrestricted license. It's also the first Hermes model trained on Nous's Psyche decentralized training network rather than a centralized GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. cluster, a milestone for the lab's distributed-training ambitions.

The result is a mid-size open model with an unusually large context window and Hermes's steerable, reasoning-capable tuning, all under Apache 2.0. For document-heavy or long-session work where you also want to self-host commercially with minimal license friction, the 512K window plus the clean license is a strong combination.

License details below.

Best For

  • Long-context work — the 512K window handles very large documents or long sessions.
  • Self-hostedRunning a model on hardware you control — your own servers, your own cloud instance, or your own laptop — rather than paying to access it through someone else's API. Self-hosting gives you full control over data and predictable costs, but requires the hardware and operational effort to run the model. Only possible with open-weight models., commercially-clean (Apache) deployments wanting Hermes steerability.
  • Mid-size reasoning and agentic tasks on a single high-end GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models..
  • Teams wanting the newest Hermes without Llama's license terms.

Not For

  • Maximum capability — the frontier-scale Hermes 4 405B goes higher.
  • Laptop-only setups — a 36B with huge context wants real hardware.
  • Buyers who need a fully proven track record — it's recent and trained via a novel decentralized method.
  • MultimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. tasks — text only.

License — Plain-English Summary

Clean and permissive. Nous releases the Hermes 4.3 weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. openly, and the base — ByteDance's Seed-OSS-36B — is Apache 2.0, so both layers allow unrestricted commercial use, modification, and redistribution with no carve-outs; retain the Apache notices. This is the licensing upside of moving off a Llama base: no "Built with Llama," no 700M-MAU clause.

How It Compares

Against Hermes 4 70B, the 4.3 36B is smaller but carries a far larger context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run. (512K vs 128K) and a cleaner Apache license. Against its base Seed-OSS-36B-Instruct, it's Nous's steerable Hermes tuning over ByteDance's foundation. Against Hermes 4 405B, it gives up frontier capability for portability, long context, and clean licensing.

Cost

Self-hosted cost
$0.00 beyond compute
Notes
Free to self-host; the base model's license governs commercial use (see License).

Comparable models

Commercial-use conditions

Nous releases the Hermes weights openly and the base (Seed-OSS-36B) is Apache 2.0 — both layers are permissive, with unrestricted commercial use and no carve-outs.

Sources