· TongFlow Team · Announcements  · 3 min read

TongFlow v0.1.0 is open source

An open-source multi-modal AIGC studio with text, image, video, audio, and 3D on one infinite canvas. AGPL-3.0, runs anywhere Docker runs.

TongFlow v0.1.0 is now public on GitHub at tong-io/tongflow, under AGPL-3.0. One Docker command, your own machine, every modality on one canvas.

What it is

TongFlow is a multi-modal AIGC studio built around a single idea: every AI model is a modality transform. A text-to-image model is text → image. A speech recognizer is audio → text. A 3D generator is image → 3D. Wrap each one as a node with typed inputs and outputs, drop them onto an infinite canvas, and you have a creative pipeline you can see, edit, and share.

Three verbs cover the whole interface:

  • Add materials onto the canvas: text, image, photo, sketch, audio, video, document, URL, or 3D model.
  • Transform them between modalities: text-to-image, image-to-video, audio-to-text, image-to-3D, and so on.
  • Combine the results: image fusion, lip sync, voice cloning, character swap, motion transfer.

No complex parameter panels, no manual node wiring. Drop something on the canvas, pick the next step from the Smart Island, and the connection is made for you.

What you can build

A few patterns the v0.1.0 graph already supports end-to-end:

  • Talking-head videos — script → speech → image → lip-synced video, all on one canvas.
  • Short films from a paragraph — text → scene images → image-to-video → concatenated cut.
  • E-commerce visuals at scale — drop a product photo and a reference, run image fusion across a batch, get clean variants.
  • Original music from a prompt — ACE-Step turns a text description into a finished track.
  • AI comics and shorts — story prompt → panel images → arrangement → optional voiceover.
  • Character animation — bring a still character into motion using motion-transfer or character-swap nodes.

What’s in v0.1.0

  • 7 input types on the canvas: text, image, audio, video, document, URL, 3D model.
  • Image transforms: generation, edit, captioning / Q&A, upscale, image-to-3D.
  • Video transforms: text-to-video, image-to-video, first/last-frame interpolation, description, upscale, frame extraction, subtitle removal, watermark removal.
  • Audio transforms: music generation, speech synthesis (preset / voice clone / instruction), speech recognition, noise reduction, speaker diarization, voice replacement.
  • Combine nodes: image fusion, lip sync (audio+image→video, audio+video→video, audio+text→video), voice cloning, character swap, motion transfer, text merging.
  • Backend models, named, not hidden: Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, Gemma 4, Qwen3, ACE-Step.
  • Extensible by design: new transforms plug in via the ABI (config/tongflow.abi.json) and the plugin scanner. Add your own model, your own slot, your own workflow.
  • Self-host via one Docker command: git clone + docker compose up.

Privacy by construction

No accounts. No central CDN. No telemetry. Your workflows and uploaded files live in a local SQLite database and on local disk — under your control, on your machine. The studio talks to exactly two outside services: Modal for GPU workers and one LLM provider of your choice (OpenRouter, Gemini, OpenAI, or DeepSeek). You bring the API keys; nothing routes through us.

How to try it

git clone https://github.com/tong-io/tongflow
cd tongflow
docker compose up

You’ll need:

  • Docker (Compose v2)
  • A Modal account and token — the free tier ($30/month credit) includes plenty of GPU time for everyday work
  • One LLM API key: OpenRouter, Gemini, OpenAI, or DeepSeek

Set the env vars from .env.example, then open http://localhost:3000. The Getting Started doc walks through the first workflow.

Or skip the install and use the hosted studio at https://app.tongflow.com — same canvas, same nodes, ready in the browser.

Join the community

If you build something with TongFlow, we’d love to see it. And if the project is useful to you, a star on the repo makes a real difference.

Related Posts

View All Posts »

TongFlow v0.1.0 开源了

一个 AGPL-3.0 开源的多模态 AIGC 工作室——文本、图像、视频、音频、3D 全部在一张无限画布上。Docker 一行起跑。