Skip to content

Introduction

AutoVio is an open-source AI video generation pipeline that connects multiple AI providers (Google Gemini, OpenAI, Anthropic Claude) to automate video production. It handles the full workflow: scenario writing, image generation, image-to-video conversion, and timeline editing.

  • Text to video — Create videos from text descriptions. Describe your product or idea; AutoVio generates scenes, images, and video clips.
  • Reference video analysis — Upload an existing video. Vision AI extracts style, tone, colors, and structure so you can replicate or remix it.
  • Scene-by-scene control — Each scene has an image prompt and video prompt. Edit them, regenerate per scene, and approve before moving on.
  • Timeline editing — Arrange clips, add text and image overlays, set transitions, and export a final MP4.

AutoVio is a monorepo with three packages:

PackageRole
BackendExpress.js REST API. Orchestrates AI providers, MongoDB, file storage, and video export (FFmpeg).
FrontendReact SPA. Step-by-step pipeline UI and timeline editor.
SharedTypeScript types and Zod schemas used by both backend and frontend.

You can use the web UI, the REST API, or the MCP server (for Claude Desktop, Cursor, etc.) to create videos.