AI capabilities

TongFlow’s transforms run on a small, named set of backend models — not on a vague “thousands of models” promise. Here’s the actual list, where each model is used, and how to configure access.

These models execute inside Modal worker containers. You set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET and the runtime calls them through Modal:

Model	Used for	Nodes
Z-Image	Text → image	`image-gen-text`, `image-gen`
FLUX.2 Klein 9B	Multi-reference fusion, image editing	`image-fusion`, `image-edit`
LTX-2	Text/image → video, talking-head	`text-gen-video`, `image-gen-video`, `image-image-gen-video`, `audio-image-gen-video`
SeedVR2	Image and video super-resolution	`image-upscale`, `video-upscale`
Gemma 4	Multimodal text understanding (image / video)	`image-describe`, `video-describe`, `video-gen-text`
Qwen3	Speech recognition and text-to-speech	`transcribe`, `transcribe-timestamp`, `text-gen-speech-preset`, `text-gen-speech-clone`, `text-gen-speech-instruct`, `convert_voice`
ACE-Step	Text → music	`gen-music`

For animation / character swap / motion transfer (wan-animate-mix, video-image-move-animal, video-image-gen-video-mix), TongFlow uses WAN-Animate variants — see the ABI for exact slot wiring.

Local media plumbing

Some operations don’t need a model — just media tooling. These run on the Modal worker but are local in the sense that no learned model is invoked:

FFmpeg — transcoding, muxing, demuxing, frame extraction (merge-video-audio, separate-video-audio, extract-audio, get-first-frame, get-last-frame)
Scene detection — shot boundary detection for split-video
Subtitle / watermark removal — handled by dedicated workers (subtitle_remove, remove_watermark)

LLM providers (for text generation and routing)

Text generation (gen-text, combine-text grouping) routes through one of four LLM providers. You choose which by setting environment variables:

Provider	Env var	Notes
OpenRouter	`OPENROUTER_API_KEY`	Default for `gen-text`. Has a free routing tier. Optional `OPENROUTER_FREE_MODEL` to pin a specific route.
Google Gemini	`GEMINI_API_KEY` or `GOOGLE_API_KEY`	Used when the node’s model slot is set to a Gemini variant. Also powers some multimodal handlers.
OpenAI	`OPENAI_API_KEY`	Used when the node’s model slot is OpenAI. Default chat model is `gpt-4o-mini` (override with `OPENAI_CHAT_MODEL`).
DeepSeek	`DEEPSEEK_API_KEY`	Only used by a few specific code paths (e.g. batch text grouping). Not in the main `gen-text` dropdown.

You only need to configure the providers you actually plan to use. Set at least one — the studio refuses to run text-generation transforms without any LLM key configured.

How a transform call happens

For an image-gen-text (“text → image with Z-Image”) node:

The canvas hands the workflow exporter the node’s input (the upstream text node’s output).
The exporter calls the Next.js task API: POST /api/task/create with {feature: "image-gen-text", pluginId, prompt: {text}, nodeId}.
The server enqueues a Modal call to the Z-Image worker, passing the input prompt.
The worker generates the image, returns base64; the server post-processes into a stored file reference (file_key) in data/uploads/.
The image node on the canvas updates with the result.

The same call pattern works for any transform — only the slot name and input shape change.

Modal offers $30 / month free credit. For most TongFlow workflows that’s a generous quota.
Image generation and TTS are cheap. Video generation (especially long clips) and 4K upscale are expensive — you’ll see the dent in your Modal usage dashboard.
Set spending limits in Modal’s settings if you want a hard ceiling.

LLM cost notes

OpenRouter free tier handles light text-generation use.
For heavier text use, paid OpenRouter routes, Gemini, or OpenAI Mini are typically pennies-per-call.
DeepSeek is cheap but not the default; only enable it if you specifically need it.

Extending the model list

If you want to wire in a new model (your own LoRA, a different upscaler, an open-source TTS), see docs/feature-registry.md in the tongflow repo. The flow:

Define a new slot in config/tongflow.abi.json with typed inputs and outputs.
pnpm gen:abi regenerates TS types.
Implement the slot as a plugin under plugins/ using the Python SDK (@node_slot decorator + Pydantic models).
pnpm tongflow:publish to push a new SDK version, then deploy the plugin to Modal.

Getting started — env var setup for Modal + LLM
Node types — which node uses which model

AI capabilities

Backend models (run on Modal)

Local media plumbing

LLM providers (for text generation and routing)

How a transform call happens

Modal cost notes

LLM cost notes

Extending the model list

Related