Home / Technology / Google's DiffusionGemma: Text Generation Reimagined
Google's DiffusionGemma: Text Generation Reimagined
11 Jun
Summary
- DiffusionGemma generates text in parallel, unlike traditional models.
- It offers significant speed increases on local hardware.
- The model is experimental and available under Apache 2.0 license.

Google DeepMind's latest innovation, DiffusionGemma, departs from conventional AI text generation methods. Unlike autoregressive models that produce text token by token, DiffusionGemma can generate entire blocks of text simultaneously. This non-linear approach, akin to image diffusion models, significantly boosts speed and efficiency, especially on local hardware such as gaming GPUs.
This experimental model, part of the Gemma 4 family, boasts 26 billion parameters but activates only 3.8 billion during inference, making it suitable for high-end GPUs. DiffusionGemma achieves impressive speeds, reportedly generating over 1,000 tokens per second on an Nvidia H100. This parallel processing capability enhances performance in tasks like in-line editing and solving complex problems such as Sudoku puzzles.
While DiffusionGemma offers local efficiency, Google acknowledges drawbacks like a potentially higher error rate in discrete text compared to continuous image generation. Autoregressive models remain efficient in cloud environments due to batching and high-bandwidth memory. However, DiffusionGemma's parallel processing remains a promising avenue for optimizing local AI compute cycles. The model is available under the Apache 2.0 license and optimized with Nvidia for various hardware setups.