An open-sourceMulti-Modal AIGC Studio

Open source. Multi-modal. Runs on your machine. Built around one idea: every AI model is a modality transform.

Workflow screenshot

Add → Transform → Combine

One canvas. All modalities.

No complex parameter panels and no manual node connecting — just add, transform, and combine.

Add

Text, images, photos, sketches, audio recordings, video, documents, URLs, even 3D models — drop any material onto the canvas as a node.

Transform

Text → text, text → image, text → video, image → video, audio → text, image → 3D — every model is a modality transform, exposed as a node.

Combine

Image fusion, lip sync, voice cloning, character swap, motion transfer, text merging — wire nodes together to build anything.

Your data stays on your machine

Workflows and uploaded files are saved locally on your own computer. No account to create, no cloud sync, no telemetry.

Use any AI service you want

Heavy AI tasks run on Modal (their free tier includes serious GPU time). For text, plug in your own OpenRouter, Gemini, OpenAI, or DeepSeek key.

Real models, named

Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, Gemma 4, Qwen3, ACE-Step — the models doing the work are listed in the README, not hidden behind marketing.

Capabilities currently shipped

Pulled directly from the README. If a row is here, it works today.

Add: 11 input types

Text, image, photo, sketch, audio file, audio recording, video file, video recording, document, URL, 3D model — drop any material onto the canvas.

Transform: Image

Image generation, image editing (inpaint/redraw), image understanding (captions/Q&A), image upscaling.

Transform: Video

Text-to-video, image-to-video, first/last-frame interpolation, video understanding, video upscaling, frame extraction, subtitle removal, watermark removal.

Transform: Audio

Music generation, speech synthesis (preset / voice clone / instruction), speech recognition, noise reduction, speaker diarization, voice replacement.

Transform: Text

Generate or rewrite copy from a prompt — routed through OpenRouter, Gemini, OpenAI, or DeepSeek depending on the node's model slot.

Combine

Image fusion (multi-reference blending), lip sync (audio+video / audio+image / audio+text → video), voice cloning, character swap, motion transfer, text merging.

Helpers

Concatenate clips, mux audio+video, split by shots, demux, extract audio track, split long text, merge text blocks, filter clips, batch arrange groups.

Bridges

Image → 3D model, document → text, URL → text — bring outside material into the canvas.

Backend

FFmpeg for media pipelines, scene detection for shot splitting, Modal for GPU workers. Models: Z-Image, FLUX.2, LTX-2, SeedVR2, Gemma 4, Qwen3, ACE-Step.

FAQ

Frequently asked

Straight answers about what TongFlow is — and what it isn't.

Is this really open source?

Yes. The full source is on GitHub at tong-io/tongflow under AGPL-3.0. You can read, modify, and self-host it.

Do I need a GPU?

Not locally. Heavy inference runs on Modal — their free tier includes H100 time. You bring your own Modal tokens and LLM API keys; TongFlow itself runs fine on a laptop.

How is this different from using AI tools separately?

Each AI model is wrapped as a modality-transform node on the canvas. You arrange nodes — add inputs, transform between text/image/audio/video/3D, combine the results — instead of copy-pasting between five separate apps.

What's the difference between self-hosting and app.tongflow.com?

Self-hosting (one Docker command) keeps everything on your own computer — your API keys, your files, no account, no cloud. app.tongflow.com runs the same studio for you if you'd rather not deal with running it yourself.

How do I install it?

One Docker command. git clone https://github.com/tong-io/tongflow && cd tongflow && docker compose up. You'll need Docker, a Modal token (their free tier works), and at least one LLM API key (OpenRouter, Gemini, OpenAI, or DeepSeek — your choice). The README walks through the rest.

Can I extend it with my own models?

Yes. Model slots and handler routing are configured via the ABI (config/tongflow.abi.json) and a plugin scanner. See docs/feature-registry.md in the repo for how to register new capabilities.

What's the project's current stage?

Early days — v0.1.0. Contributions, bug reports, and model integrations are very welcome. Join the Discord or open an issue on GitHub.

Two ways to start

Run it yourself with a single Docker command, or try the hosted version at app.tongflow.com.