Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Google

Gemma 4 26B-A4B

Model family: gemma-4

Size

mid (26.0B params)

Context

262,144 tokens

Released

2026-04-01

Openness

open-weight

License

Apache License 2.0 · commercial: yes

Cost tier

mixed

Rating

4.5 ★ — The Gemma 4 size most teams should start with — strong quality, fast and cheap to run thanks to the 4B active-parameter MoE design, native multimodality, and clean Apache 2.0. Excellent value for self-hosting.

Modalities

audio-input, image-input, text

Capabilities

chat, coding, function-calling, instruction-following, long-context, multilingual, reasoning, tool-use, vision

Access

api-first-party, api-third-party, local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
mid-size
multimodal
long-context
self-hostable
us-based
apache-2-0
mixture-of-experts

Quick Take

The Gemma 4 to start with: a mixture-of-experts model that gives 26B-class quality at roughly 4B-class cost, multimodal and Apache 2.0, comfortable on a single consumer GPU.

Plain-English Description

Gemma 4 26B-A4B is the size Google itself points most people to, and the logic is easy to see. It's a mixture-of-experts model: 26 billion parameters total, but only about 4 billion active for any given token. That means it carries the knowledge and quality of a 26B model while running with the speed and memory footprint closer to a 4-billion-parameter one — fast, cheap, and easy to fit on hardware most teams already have.

Like the rest of Gemma 4 it's natively multimodal (text, images, and audio in), covers a wide span of languages, and ships under clean Apache 2.0. The practical effect is a model you can self-host on a single consumer GPU, get genuinely strong results from, and deploy commercially with no licensing friction. For a business standing up its first in-house AI capability, this is a very sensible default — capable enough for real work, light enough to run affordably, and open enough to own.

If you later find you need more raw quality, Gemma 4 31B is the step up; if you need to go smaller for the edge, Gemma 4 E2B is the step down. This one sits in the middle on purpose.

Best For

Teams standing up their first self-hosted model who want the best quality-per-dollar default.
Production workloads that need to be fast and cheap to run while staying capable.
Multimodal applications (image and audio understanding) on a budget, self-hosted.
Anyone who wants Gemma 4's capability without the 31B's heavier footprint.

Not For

Squeezing out the absolute best scores — Gemma 4 31B and the closed Gemini 3.5 Flash go higher.
The smallest edge devices — phones and IoT want Gemma 4 E2B.
Teams that specifically prefer a dense model's predictable per-token behavior over MoE routing.

License — Plain-English Summary

Apache 2.0 — unrestricted commercial use, modification, fine-tuning, and redistribution, no royalties or carve-outs; keep the notices and flag significant changes. As with all of Gemma 4, this is a clean break from the older custom Gemma terms, so confirm you're on version 4. Self-hosted, it keeps all your data in-house.

How It Compares

Against Gemma 4 31B, the 26B-A4B is faster and cheaper to run for a small quality trade — the better everyday choice for most. Against Gemma 4 E2B, it's far more capable but needs a real GPU rather than a phone. Against open MoE models from Qwen (like the 30B-A3B) and others, Gemma 4 26B-A4B competes on efficiency and multimodality under a similarly clean Apache 2.0 license, with US jurisdiction as a differentiator for data-governance-sensitive buyers.

Cost

Self-hosted cost: $0.00 beyond compute
Notes: Free to self-host under Apache 2.0; also hosted on Google AI Studio / Vertex and third parties. The ~4B active-parameter design makes it cheap and fast to run.

Hardware requirements

Min VRAM: 16 GB
Recommended VRAM: 24 GB
Runs on laptop: Yes
Notes: With only ~4B active parameters it runs fast on a single consumer GPU and is comfortable on high-end laptops — Google's recommended pick for the best balance of quality, speed, and consumer-hardware deployment.