Google recently introduced Gemini Omni Flash, the first model in the new Gemini Omni family, built to create and edit video from multimodal inputs.
Unlike traditional text-to-video tools, Omni Flash can work with text, images, audio, and video as inputs, then generate high-quality video with native audio in one workflow.
- Create videos from different types of references, not just text prompts
- Generate video and audio together, including dialogue, ambience, and sound effects
- Edit videos through natural conversation instead of restarting from scratch
- Use it for short-form video, creative prototyping, marketing assets, and rapid iteration
One of the most interesting parts is conversational editing: you can refine a video by giving follow-up instructions, such as changing the scene, adjusting the style, or modifying details without rebuilding the whole concept from zero.
Fast, multimodal, and much easier to iterate with. Gemini Omni Flash feels like a meaningful step toward more controllable AI video creation.