谷歌开源Diffusion Gemma，可在h100上跑出1000tps

编辑部 2026-06-11T01:34:02+08:00 tech

Blazing fast inference: By shifting the decode bottleneck from memory-bandwidth to compute, DiffusionGemma generates up to 4x faster token output on d...

谷歌开源Diffusion Gemma，可在h100上跑出1000tps — 谷歌开源 Diffusion Gemma，可在h100上跑出1000tps

Blazing fast inference: By shifting the decode bottleneck from memory-bandwidth to compute, DiffusionGemma generates up to 4x faster token output on dedicated GPUs. (1000+ tokens per second on a single NVIDIA H100, 700+ tokens per second on NVIDIA GeForce RTX 5090).

一些补充

Diffusion是一种不同于主流文本大模型Next Token Predict的模型架构，常用于图片生成领域中。NTP是从左向右逐个token生成的，而Diffusion则是给定一块空白区域，模型预测这片区域的每个位置可能的内容，并一次次进行纠错，最终生成完整内容。

14 个帖子 - 9 位参与者

阅读完整话题

来源: LinuxDo 最新话题查看原文

谷歌开源 Diffusion Gemma h100 帖子一个参与者

谷歌开源Diffusion Gemma，可在h100上跑出1000tps

一些补充

[职场话题] 现在花 2w 找外包，不如冲 2000 的 token 实在

[记录帖] 朋友准备去泰国发展，这一趟不知道会怎样

相关推荐