Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Google

Gemma 4 31B

Model family: gemma-4

Size

mid (31.0B params)

Context

262,144 tokens

Released

2026-04-01

Openness

open-weight

License

Apache License 2.0 · commercial: yes

Cost tier

mixed

Rating

4.5 ★ — Frontier-adjacent capability — strong math and coding, native multimodality, 140 languages — under clean Apache 2.0 and runnable on a single consumer GPU. One of the best open models for businesses that want US-jurisdiction weights.

Modalities

audio-input, image-input, text

Capabilities

chat, coding, function-calling, instruction-following, long-context, math, multilingual, reasoning, tool-use, vision

Access

api-first-party, api-third-party, local-runtime-llama-cpp, local-runtime-ollama, local-runtime-vllm, weights-download-hf

llm
open-weight
commercial-friendly
mid-size
multimodal
long-context
multilingual
self-hostable
us-based
apache-2-0

Quick Take

Google's open flagship: a 31B multimodal model under clean Apache 2.0 that beats far larger models on math and coding — and runs on a single consumer GPU.

Plain-English Description

Gemma 4 31B, released in April 2026, is the largest model in Google's open Gemma 4 family and the headline of a genuinely important release: it's the first Gemma built directly from Gemini 3 research and the first to ship under Apache 2.0 rather than Google's older custom license. In plain terms, Google took the techniques behind its closed frontier models and put a slice of them into something you can download and use commercially with no strings.

The standout trait is efficiency — capability per parameter. The 31B model posts scores you'd expect from something much larger: around 89.2% on the AIME 2026 math exam (where many models struggle to clear 60%) and about 80% on LiveCodeBench v6, and it landed near the top of public leaderboards at launch despite being a fraction of the size of the models around it. That comes from architectural work — a hybrid attention design and per-layer embeddings — aimed squarely at getting more out of fewer parameters. It's also natively multimodal (text, images, and audio in, text out) and covers 140 languages.

For a business, the combination is the appeal: near-frontier quality, an unrestricted Apache 2.0 license, US-jurisdiction weights, and small enough to self-host on a single high-end GPU. It's Google's answer to the open models from Meta, Alibaba, and DeepSeek — and on licensing it's now among the cleanest of the bunch.

Best For

Businesses that want a capable, self-hostable open model with clean Apache 2.0 licensing and US-based provenance.
Multimodal work (image and audio understanding) in an open model you control.
Multilingual products — 140-language coverage.
Math, coding, and reasoning tasks where the efficiency-per-parameter pays off on modest hardware.
The "cloud + edge" pattern: Gemma 4 self-hosted for private/local work, Gemini API for the heaviest cloud jobs.

Not For

Absolute maximum capability — the closed Gemini 3.5 Flash and the larger open models from other labs go higher on some tasks.
The lightest edge/on-device targets — for phones and IoT, drop to Gemma 4 E2B.
Workloads needing a mixture-of-experts model's throughput profile — see Gemma 4 26B-A4B.
Teams that specifically need the very longest context windows (multi-million token) for single-pass retrieval.

License — Plain-English Summary

Apache 2.0 — and that's news worth noting, because earlier Gemma versions (2 and 3) used Google's custom "Gemma Terms of Use," which allowed commercial use but layered on a prohibited-use policy and wasn't true open source. Gemma 4 drops all that: unrestricted commercial use, modification, fine-tuning, and redistribution, no royalties, no carve-outs — just keep the notices and flag significant changes. If you're downloading Gemma, confirm you're getting version 4 (Apache 2.0) rather than an older one (custom terms), because the license genuinely differs between generations.

How It Compares

Against the smaller Gemma 4 sizes, the 31B is the most capable but heaviest — Gemma 4 26B-A4B trades a little quality for mixture-of-experts efficiency, and Gemma 4 E2B goes all the way down to phone-class. Against Google's own closed Gemini 3.5 Flash, Gemma 4 31B is the open, self-hostable option — less peak capability, full ownership. Against the other open flagships — Meta's Llama, Alibaba's Qwen, and DeepSeek — Gemma 4 competes on efficiency-per-parameter and now matches or beats them on license cleanliness (Apache 2.0), with US jurisdiction as a point in its favor for data-governance-sensitive buyers.

Cost

Self-hosted cost: $0.00 beyond compute
Notes: Free to self-host under Apache 2.0. Also hosted on Google AI Studio / Vertex AI and third-party providers for per-token pricing. 140-language support.

Hardware requirements

Min VRAM: 20 GB
Recommended VRAM: 32 GB
Runs on laptop: Yes
Notes: Runs quantized on a single 24GB consumer GPU; high-end laptops can manage it. Far more accessible than its benchmark scores would suggest.