One Architecture, Multiple Scales: The Gemma 4 family shares a unified transformer foundation with optimized routing, quantization pipelines, and deployment tooling. Whether you're building on-device assistants, enterprise RAG systems, or research-grade agents, there's a Gemma 4 variant engineered for your constraints and performance targets.

Available Model Variants

2B parameters
Edge

Ultra-lightweight variant optimized for mobile, IoT, and browser-based inference.

Architecture Dense Transformer
Context 8K tokens
VRAM (FP16) ~4.2 GB
Best For On-device, low latency
9B parameters
Balanced

Optimal trade-off between capability and efficiency for desktop & cloud workloads.

Architecture Dense Transformer
Context 32K tokens
VRAM (FP16) ~18.5 GB
Best For RAG, agents, APIs
27B parameters
Advanced

High-capability model for complex reasoning, coding, and enterprise-grade deployments.

Architecture Dense Transformer
Context 128K tokens
VRAM (FP16) ~54 GB
Best For Enterprise, research
270B parameters
Flagship

Preview MoE variant delivering frontier-level capabilities for specialized & research use.

Architecture MoE (8 experts)
Context 256K tokens
VRAM (FP16) ~540 GB (sharded)
Best For Research, complex agents

Quantization & Deployment Options

⚠️ Hardware Recommendation

INT4 variants enable deployment on consumer hardware (RTX 3090/4090, Apple M-series), but complex reasoning & long-context tasks may benefit from INT8 or FP16 on data-center GPUs (A100/H100/MI300).

How to Choose the Right Model

1
Define Latency & Throughput Requirements

Real-time interactive apps → 2B/9B. Batch processing & complex reasoning → 27B/270B.

2
Assess Hardware Constraints

Mobile/Edge → 2B INT4. Desktop → 9B INT8/INT4. Cloud/Enterprise → 27B FP16/INT8. Research clusters → 270B MoE.

3
Match Task Complexity

Classification, summarization, light chat → 2B/9B. Coding, math, multi-step planning, RAG → 27B+. Frontier research → 270B.

4
Plan for Fine-Tuning

QLoRA works efficiently on 9B/27B. Full fine-tuning recommended only for 27B+ on multi-GPU setups.

Access Models & Integration Tools

Download weights, run inference, or integrate via API:

⚠️ Licensing & Usage Notice

Gemma 4 weights are released under a permissive open-weight license allowing commercial use, modification, and distribution. Usage must comply with Google's AI Principles and acceptable use policy. The 270B variant is currently in limited preview and requires approval for commercial deployment.