Google’s latest DiffusionGemma open AI model comes with a 4x speed boost

Multiple paths to local efficiency

If diffusion is so much faster, why isnâ€™t Google using it in big cloud-based Gemini models? Google has experimented with this, but there are a few drawbacks to text diffusion, including a higher error rate. In image diffusion models, a single badly predicted pixel doesnâ€™t make the image useless, but language is discreet. An equivalent error in text can make a block of tokens meaningless and force you to start over to get a better output. Diffusion models also waste resources when the desired output is only a few tokens long. They have to do a lot more parallel work to whittle down to a few tokens that an autoregressive model does from beginning to end in just five steps.

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

The efficiency gain for local processing makes this an appealing avenue of experimentation, though. In the cloud, autoregressive models can batch large numbers of compute jobs from multiple users so theyâ€™re always churning out tokens, and the high bandwidth memory (HBM) used in these systems can move data around much more efficiently.

Conversely, local AI encounters wasted compute cycles due to lower memory bandwidth and idle time. Diffusion models can make more efficient use of available compute, but this isnâ€™t the only way. Google also recently began implementing Multi-Token Prediction (MTP) drafters, which use otherwise wasted compute cycles to predict possible tokens to increase speed. But diffusion is even faster than the MTP versions of Gemma.

Google stresses that DiffusionGemma is experimental, but itâ€™s available under the same Apache 2.0 license as all the other fourth-generation Gemma models. You can download the model weights today from Hugging Face. Google says it worked with Nvidia to ensure DiffusionGemma was optimized for a variety of setups, including high-end RTX GPUs (quantized) and enterprise systems like the H100 or DGX Spark platform.

              </div>

By Laily UPN

Digital Trends and Tech

Google’s latest DiffusionGemma open AI model comes with a 4x speed boost

Multiple paths to local efficiency

By Laily UPN

You Missed

10 Fallout: New Vegas Features That Still Feel More Reactive Than Most Modern RPGs

CISA Tells US Agencies to Fix Security Bugs in as Little as 3 Days Thanks to AI Threats

Bradley the Badger to be published by Focus Entertainment

Xbox warns of a ‘reset’ as it prepares for layoffs

Recent Posts

Recent Comments

Archives

Categories

Google’s latest DiffusionGemma open AI model comes with a 4x speed boost

Multiple paths to local efficiency

By Laily UPN

Related Posts

CISA Tells US Agencies to Fix Security Bugs in as Little as 3 Days Thanks to AI Threats

Xbox warns of a ‘reset’ as it prepares for layoffs

How I Get Free Traffic from ChatGPT in 2025 (AIO vs SEO)

You Missed

10 Fallout: New Vegas Features That Still Feel More Reactive Than Most Modern RPGs

CISA Tells US Agencies to Fix Security Bugs in as Little as 3 Days Thanks to AI Threats

Bradley the Badger to be published by Focus Entertainment

Xbox warns of a ‘reset’ as it prepares for layoffs