An open-sourceMulti-Modal AIGC Studio
Add materials, transform between modalities, combine the results — on an infinite canvas. Free, open source, and runs on your own computer. Open source. Multi-modal. Runs on your machine. Built around one idea: every AI model is a modality transform.

Add → Transform → Combine
One canvas. All modalities.
No complex parameter panels and no manual node connecting — just add, transform, and combine.
Add
Text, images, photos, sketches, audio recordings, video, documents, URLs, even 3D models — drop any material onto the canvas as a node.
Transform
Text → text, text → image, text → video, image → video, audio → text, image → 3D — every model is a modality transform, exposed as a node.
Combine
Image fusion, lip sync, voice cloning, character swap, motion transfer, text merging — wire nodes together to build anything.
Your data stays on your machine
Workflows and uploaded files are saved locally on your own computer. No account to create, no cloud sync, no telemetry.
Use any AI service you want
Heavy AI tasks run on Modal (their free tier includes serious GPU time). For text, plug in your own OpenRouter, Gemini, OpenAI, or DeepSeek key.
Real models, named
Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, Gemma 4, Qwen3, ACE-Step — the models doing the work are listed in the README, not hidden behind marketing.
Capabilities currently shipped
Pulled directly from the README. If a row is here, it works today.
Text, image, photo, sketch, audio file, audio recording, video file, video recording, document, URL, 3D model — drop any material onto the canvas.
Image generation, image editing (inpaint/redraw), image understanding (captions/Q&A), image upscaling.
Text-to-video, image-to-video, first/last-frame interpolation, video understanding, video upscaling, frame extraction, subtitle removal, watermark removal.
Music generation, speech synthesis (preset / voice clone / instruction), speech recognition, noise reduction, speaker diarization, voice replacement.
Generate or rewrite copy from a prompt — routed through OpenRouter, Gemini, OpenAI, or DeepSeek depending on the node's model slot.
Image fusion (multi-reference blending), lip sync (audio+video / audio+image / audio+text → video), voice cloning, character swap, motion transfer, text merging.
Concatenate clips, mux audio+video, split by shots, demux, extract audio track, split long text, merge text blocks, filter clips, batch arrange groups.
Image → 3D model, document → text, URL → text — bring outside material into the canvas.
FFmpeg for media pipelines, scene detection for shot splitting, Modal for GPU workers. Models: Z-Image, FLUX.2, LTX-2, SeedVR2, Gemma 4, Qwen3, ACE-Step.
FAQ
Frequently asked
Straight answers about what TongFlow is — and what it isn't.
Is this really open source?
Yes. The full source is on GitHub at tong-io/tongflow under AGPL-3.0. You can read, modify, and self-host it.
Do I need a GPU?
Not locally. Heavy inference runs on Modal — their free tier includes H100 time. You bring your own Modal tokens and LLM API keys; TongFlow itself runs fine on a laptop.
How is this different from using AI tools separately?
Each AI model is wrapped as a modality-transform node on the canvas. You arrange nodes — add inputs, transform between text/image/audio/video/3D, combine the results — instead of copy-pasting between five separate apps.
What's the difference between self-hosting and app.tongflow.com?
Self-hosting (one Docker command) keeps everything on your own computer — your API keys, your files, no account, no cloud. app.tongflow.com runs the same studio for you if you'd rather not deal with running it yourself.
How do I install it?
One Docker command. git clone https://github.com/tong-io/tongflow && cd tongflow && docker compose up. You'll need Docker, a Modal token (their free tier works), and at least one LLM API key (OpenRouter, Gemini, OpenAI, or DeepSeek — your choice). The README walks through the rest.
Can I extend it with my own models?
Yes. Model slots and handler routing are configured via the ABI (config/tongflow.abi.json) and a plugin scanner. See docs/feature-registry.md in the repo for how to register new capabilities.
What's the project's current stage?
Early days — v0.1.0. Contributions, bug reports, and model integrations are very welcome. Join the Discord or open an issue on GitHub.
Two ways to start
Run it yourself with a single Docker command, or try the hosted version at app.tongflow.com.
