AI capabilities

TongFlow’s transforms run on a small, named set of backend models — not on a vague “thousands of models” promise. Here’s the actual list, where each model is used, and how to configure access.

Backend models (run on Modal)

These models execute inside Modal worker containers. You set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET and the runtime calls them through Modal:

ModelUsed forNodes
Z-ImageText → imageimage-gen-text, image-gen
FLUX.2 Klein 9BMulti-reference fusion, image editingimage-fusion, image-edit
LTX-2Text/image → video, talking-headtext-gen-video, image-gen-video, image-image-gen-video, audio-image-gen-video
SeedVR2Image and video super-resolutionimage-upscale, video-upscale
Gemma 4Multimodal text understanding (image / video)image-describe, video-describe, video-gen-text
Qwen3Speech recognition and text-to-speechtranscribe, transcribe-timestamp, text-gen-speech-preset, text-gen-speech-clone, text-gen-speech-instruct, convert_voice
ACE-StepText → musicgen-music

For animation / character swap / motion transfer (wan-animate-mix, video-image-move-animal, video-image-gen-video-mix), TongFlow uses WAN-Animate variants — see the ABI for exact slot wiring.

Local media plumbing

Some operations don’t need a model — just media tooling. These run on the Modal worker but are local in the sense that no learned model is invoked:

  • FFmpeg — transcoding, muxing, demuxing, frame extraction (merge-video-audio, separate-video-audio, extract-audio, get-first-frame, get-last-frame)
  • Scene detection — shot boundary detection for split-video
  • Subtitle / watermark removal — handled by dedicated workers (subtitle_remove, remove_watermark)

LLM providers (for text generation and routing)

Text generation (gen-text, combine-text grouping) routes through one of four LLM providers. You choose which by setting environment variables:

ProviderEnv varNotes
OpenRouterOPENROUTER_API_KEYDefault for gen-text. Has a free routing tier. Optional OPENROUTER_FREE_MODEL to pin a specific route.
Google GeminiGEMINI_API_KEY or GOOGLE_API_KEYUsed when the node’s model slot is set to a Gemini variant. Also powers some multimodal handlers.
OpenAIOPENAI_API_KEYUsed when the node’s model slot is OpenAI. Default chat model is gpt-4o-mini (override with OPENAI_CHAT_MODEL).
DeepSeekDEEPSEEK_API_KEYOnly used by a few specific code paths (e.g. batch text grouping). Not in the main gen-text dropdown.

You only need to configure the providers you actually plan to use. Set at least one — the studio refuses to run text-generation transforms without any LLM key configured.

How a transform call happens

For an image-gen-text (“text → image with Z-Image”) node:

  1. The canvas hands the workflow exporter the node’s input (the upstream text node’s output).
  2. The exporter calls the Next.js task API: POST /api/task/create with {feature: "image-gen-text", pluginId, prompt: {text}, nodeId}.
  3. The server enqueues a Modal call to the Z-Image worker, passing the input prompt.
  4. The worker generates the image, returns base64; the server post-processes into a stored file reference (file_key) in data/uploads/.
  5. The image node on the canvas updates with the result.

The same call pattern works for any transform — only the slot name and input shape change.

  • Modal offers $30 / month free credit. For most TongFlow workflows that’s a generous quota.
  • Image generation and TTS are cheap. Video generation (especially long clips) and 4K upscale are expensive — you’ll see the dent in your Modal usage dashboard.
  • Set spending limits in Modal’s settings if you want a hard ceiling.

LLM cost notes

  • OpenRouter free tier handles light text-generation use.
  • For heavier text use, paid OpenRouter routes, Gemini, or OpenAI Mini are typically pennies-per-call.
  • DeepSeek is cheap but not the default; only enable it if you specifically need it.

Extending the model list

If you want to wire in a new model (your own LoRA, a different upscaler, an open-source TTS), see docs/feature-registry.md in the tongflow repo. The flow:

  1. Define a new slot in config/tongflow.abi.json with typed inputs and outputs.
  2. pnpm gen:abi regenerates TS types.
  3. Implement the slot as a plugin under plugins/ using the Python SDK (@node_slot decorator + Pydantic models).
  4. pnpm tongflow:publish to push a new SDK version, then deploy the plugin to Modal.