Home / Technology / Google's Gemini Omni Flash: Video Editing by Chat

Google's Gemini Omni Flash: Video Editing by Chat

30 Jun

Summary

Omni Flash allows conversational editing of video clips.
It integrates multiple AI tools into a single model.
Video clips are limited to 10 seconds and 720p resolution.

Google's Gemini Omni Flash: Video Editing by Chat

Google is poised to transform enterprise video production with the rollout of Gemini Omni Flash, the first model in its 'Omni' family, accessible to developers and businesses via an API as of June 30, 2026. Previously only available to consumers, this advancement allows for the editing of finished video clips through conversational commands, a significant leap from traditional, time-consuming video production workflows.

This new API enables marketing and learning-and-development teams to bypass the complex process of integrating multiple AI tools. Traditionally, teams cobbled together separate models for scripts, image generation, video conversion, lip-syncing, and voiceovers, each with its own management and billing. Omni's unified approach simplifies this by accepting text, images, and video inputs to generate a complete clip with synchronized audio.

The model supports advanced editing features, allowing marketers to relight, reframe, or alter wardrobe in a product shot conversationally, building on previous edits without regenerating from scratch. It also accepts multimodal references, including multiple images and video clips, to ensure specific objects, brand logos, or locations are reproduced accurately in the final output. Omni Flash includes a world model for realistic physics and scene behavior, and can rewrite text on signs within a scene or insert logos.

Gemini Omni Flash has current limitations: clips are capped at 10 seconds and are only available in 720p resolution, necessitating the stitching of multiple clips for longer content. Google is actively developing its capabilities, with consistency across edits and accurate text rendering being ongoing areas of focus. All generated clips include SynthID watermarking and C2PA credentials for provenance and are detected by Google's AI Content Detection API. Crucially, the model is designed with guardrails, specifically avoiding deepfake generation by not creating lip-synced speech from still photos and audio.

The pricing is highly competitive at $0.10 per second for 720p video, making a 10-second clip approximately $1. This cost-effective model, while limited in resolution compared to Google's Veo 3.1, offers unprecedented editing flexibility. Omni Flash's ability to treat video as a dynamic document, rather than a static render, distinguishes it in a market with emerging rivals.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.