On the edge, our E2B and E4B fashions redefine on-device utility, prioritizing multimodal performance, low-latency processing, and seamless ecosystem integration over uncooked parameter counts.
Highly effective, Accessible, and Open
To energy the subsequent technology of pioneering analysis and merchandise, we particularly designed the Gemma 4 mannequin’s measurement to run and fine-tune effectively on the world’s billions of Android units, from laptop computer GPUs to developer workstations and accelerators.
These extremely optimized fashions can help you fine-tune Gemma 4 to realize state-of-the-art efficiency for particular duties. Now we have already had superb success with this strategy. For instance, INSAIT created the pioneering Bulgarian First Language Mannequin (BgGPT) and collaborated with Yale College. Cell2Sentence scale Amongst different issues, to find new pathways for most cancers remedy.
Here is why Gemma 4 is probably the most succesful open mannequin household thus far.
- Superior reasoning: Gemma 4 is able to multi-step planning and deep logic, with vital enhancements in math and directive-following benchmarks that require it.
- Agent workflow: With native help for perform calls, structured JSON output, and native system directions, you’ll be able to construct autonomous brokers that may work together with quite a lot of instruments and APIs to reliably execute your workflows.
- Code technology: Gemma 4 helps high-quality offline code, turning your workstation right into a local-first AI code assistant.
- Visible and audio: All fashions natively course of video and pictures, help variable decision, and excel at visible duties like OCR and graph understanding. Moreover, E2B and E4B fashions function native speech enter for speech recognition and understanding.
- Longer context: Deal with long-form content material seamlessly. Edge fashions have a 128K context window, and bigger fashions supply as much as 256K, permitting you to go repositories and lengthy paperwork in a single immediate.
- 140+ languages: Natively educated in over 140 languages, Gemma 4 helps builders construct complete, high-performance purposes for customers all over the world.
Versatile mannequin suitable with numerous {hardware}
We launch Gemma 4 mannequin weights sized for particular {hardware} and use circumstances to make sure you get frontier class inference the place you want it.
26B and 31B fashions: Frontier Intelligence, offline on a private laptop
Optimized to supply researchers and builders with state-of-the-art inference on accessible {hardware}, unquantized bfloat16 weights effectively match on a single 80GB NVIDIA H100 GPU. For native setups, the quantized model runs natively on shopper GPUs, powering IDEs, coding assistants, and agent workflows. Our 26 billion Mixture of Specialists (MoE) focuses on latency, activating solely 3.8 billion of the overall parameters throughout inference, delivering extraordinarily quick tokens per second. In the meantime, 31 billion Dense maximizes uncooked high quality and gives a powerful basis for fine-tuning.

