Introducing Gemma 4 12B: a unified, encoder-free multimodal model
An overview of Gemma 4 12B, a model designed to bring high-performance multimodal intelligence directly to your laptop.
以下为官方公布的benchmark

同样为多模态模型,采用encoder-free 架构训练,支持文字,图片,音频
可参阅相关技术报告
Gemma 4 12B: The Developer Guide- Google Developers Blog
Meet Gemma 4 12B: the first medium-sized, encoder-free multimodal model capable of natively ingesting audio and video. Ideal for local AI development with 16GB VRAM, Hugging Face integrations, and drop-in local API servers.
采用sliding window attention技术 1024的滑动窗口大小,256k上下文长度.
谷歌blog介绍,其性能接近gemma4 26b model
2 个帖子 - 2 位参与者