TongFlow Team

Your Multi-Modal AI Studio Every model.
One infinite canvas.

Text, image, video, audio, 3D — every AI model is a node you can wire, swap, and combine. Sign in, bring your own keys, and create. Every model. One canvas. Your keys. Free for 7 days, no card — then a 7-day trial when you add one. Modal & OpenRouter free tiers let you create before spending a cent on compute.

Start creating

See pricing

no credits · no markup bring your own keys 7 days free — no card open-source core · AGPL-3.0 text · image · video · audio · 3D

Add → Transform → Combine

One canvas. Every modality.

No parameter panels. No manual wiring. Three operations carry every workflow — and the models are yours to choose.

Add

Drop anything on the canvas: text, images, audio, video, documents, URLs, 3D models. Everything becomes a node.

Transform

Every model is a modality transform. Text→image, image→video, audio→text — swap the model without rewiring the flow.

Combine

Lip sync, image fusion, character swap, motion transfer. Wire many inputs into one output — all built in.

Keys that stay yours

Plug in your OpenAI, Gemini, OpenRouter, or Hugging Face keys. Model usage is billed by your providers at cost — we never mark it up.

Your own GPU lane

Heavy models deploy into your own Modal account and run there. Modal's free tier covers real GPU time every month.

Private by architecture

Your canvases live in a database that belongs to you alone; your files in your own private storage. Keys are encrypted at rest. Isolation is the default, not a setting.

Add

Drop anything on the canvas: text, images, audio, video, documents, URLs, 3D models. Everything becomes a node.

Transform

Every model is a modality transform. Text→image, image→video, audio→text — swap the model without rewiring the flow.

Combine

Lip sync, image fusion, character swap, motion transfer. Wire many inputs into one output — all built in.

Keys that stay yours

Plug in your OpenAI, Gemini, OpenRouter, or Hugging Face keys. Model usage is billed by your providers at cost — we never mark it up.

Your own GPU lane

Heavy models deploy into your own Modal account and run there. Modal's free tier covers real GPU time every month.

Private by architecture

Your canvases live in a database that belongs to you alone; your files in your own private storage. Keys are encrypted at rest. Isolation is the default, not a setting.

What's shipped today

Pulled directly from the README. If a row is here, it works today.

Add: 11 input types

Text, image, photo, sketch, audio file, audio recording, video file, video recording, document, URL, 3D model — drop any material onto the canvas.

Transform: Image

Image generation, image editing (inpaint/redraw), image understanding (captions/Q&A), image upscaling.

Transform: Video

Text-to-video, image-to-video, first/last-frame extraction, video understanding, video upscaling.

Transform: Audio

Music generation, speech synthesis (preset / voice clone / instruction), speech recognition.

Transform: Text

Generate or rewrite copy from a prompt — routed through OpenRouter, Gemini, OpenAI, or DeepSeek depending on the node's model slot.

Combine

Image fusion (multi-reference blending), lip sync (audio+video / audio+image / audio+text → video), voice cloning, character swap, motion transfer, text merging.

Helpers

Concatenate clips, mux audio+video, split by shots, demux, extract audio track, split long text, merge text blocks, filter clips, batch arrange groups.

Bridges

Document → text, URL → text — bring outside material into the canvas.

Backend & Models

FFmpeg for media pipelines, Modal for GPU workers. Models shipping today: Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, InfiniteTalk, Wan-Animate, ACE-Step, Qwen3, Whisper, Gemini, OpenAI, OpenRouter.

$2.99 a month. That's the price list.

No credits. No markup. The studio is one flat subscription — compute runs on your own keys at cost, and Modal's free GPU quota plus OpenRouter's free routes mean you can start without paying providers at all. Self-hosting the open-source core is free forever.

See pricing →

FAQ

Straight answers

What TongFlow is. What it isn't.

What is TongFlow Cloud?

A hosted studio at app.tongflow.com: an infinite canvas where every AI model — text, image, video, audio, speech, music, 3D — is a node. Plugins are managed for you; you sign in and create. Every account starts with 7 free days (no card), plus a 7-day free trial when you add a card.

What does "bring your own keys" mean?

You add your provider keys once in Settings. API models (OpenAI, Gemini, OpenRouter…) bill your provider account directly; GPU models deploy into your own Modal account and run there. TongFlow never resells compute and never marks it up — the subscription is the whole price.

Do I need a GPU?

No. Heavy inference runs on Modal under your own token — their free tier includes real GPU time every month. Your laptop just runs a browser.

What do I actually pay?

$2.99/month for the studio after your free week and 7-day trial (cancel anytime), plus whatever your providers charge for what you generate — at their prices, on your accounts. Compute often starts at $0: Modal's free tier includes real GPU time every month, and OpenRouter has free text routes. Self-hosting the open-source core is free.

Is my work private?

Each user gets a structurally isolated database and a private storage prefix — not rows in a shared table. Your keys are encrypted at rest (AES-256-GCM) and are only ever used to call your providers. Your Modal token is used once to set up your executor and never stored by our deployer.

Is this open source?

The core is AGPL-3.0 at github.com/tong-io/tongflow. The cloud runs the same canvas. If you ever want to leave, self-host it — your data and workflows come with you.

How is this different from credit-based tools?

Credit systems resell compute with a margin, so every generation has a hidden exchange rate. TongFlow doesn't sell compute at all: you pay providers directly and pay us a flat $2.99 for the studio. The bill is legible.

How is this different from ComfyUI or n8n?

ComfyUI is built for image generation, n8n for API orchestration. TongFlow treats all seven modalities as first-class, and the combine nodes — lip sync, image fusion, motion transfer — are built in, not third-party extensions.

Ready when you are.

Sign in and your first canvas is waiting — plugins managed, keys yours, results yours.

Cloud

Sign in and create. 7 days free, then $2.99/month — cancel anytime.

Start creating →

Desktop

The cloud studio in a 10 MB native shell, for macOS & Windows.

Download →

Self-Host

The open-source core, free forever, on your machines.

View on GitHub →

Prefer a window of its own? The desktop app is the cloud studio in a 10 MB native shell.

macOS · Universal · Windows · x64 · All versions · Latest v0.2.0

First-time open

macOS: if you see “Apple could not verify…”, open System Settings → Privacy & Security and click “Open Anyway”. (This open-source cloud shell isn’t Apple-notarized yet.)

Windows: if SmartScreen warns, click “More info → Run anyway”.

Your Multi-Modal AI Studio Every model.One infinite canvas.

One canvas. Every modality.

Add

Transform

Combine

Keys that stay yours

Your own GPU lane

Private by architecture

Add

Transform

Combine

Keys that stay yours

Your own GPU lane

Private by architecture

What's shipped today

$2.99 a month. That's the price list.

Straight answers

What is TongFlow Cloud?

What does "bring your own keys" mean?

Do I need a GPU?

What do I actually pay?

Is my work private?

Is this open source?

How is this different from credit-based tools?

How is this different from ComfyUI or n8n?

Ready when you are.

Your Multi-Modal AI Studio Every model.
One infinite canvas.