Node types
TongFlow’s nodes fall into six groups. The Add and Modality nodes hold materials on the canvas; everything else operates on them.
The authoritative list lives in config/tongflow.abi.json in the tongflow repo — this page reflects v0.1.0.
Add (7 nodes)
Add nodes drop a new material onto the canvas. Picked from the Smart Island Add toolbar in Create mode:
| Node | Icon | What it does |
|---|---|---|
addTextNode | Type | Type text directly into the node body |
addImageNode | Image | Upload a file, capture from camera, or draw a sketch — outputs one image |
addAudioNode | Music | Upload an audio file or record from the mic |
addVideoNode | Video | Upload a video file or record from the camera |
addFileNode | FileText | Upload a document (PDF / DOCX / TXT / MD) |
addLinkNode | Link | Paste a URL — fetches the page content into text |
addModelNode | Box | Upload a 3D model file (GLB / GLTF) |
There are seven Add types, not eleven. Earlier docs counted “Add image” and “Record image (from camera)” as separate nodes; they’re modes inside the same addImageNode.
Transform
Each transform takes one input modality and produces another. Wired up against backend models or external LLMs.
Text transforms
| Node slot | Description | Backend |
|---|---|---|
gen-text | Generate or rewrite text from a prompt | OpenRouter / Gemini / OpenAI / DeepSeek (configurable) |
combine-text | Merge multiple text inputs into one | Local |
split-text | Split a long passage into chunks | Local |
Image transforms
| Node slot | Description | Backend model |
|---|---|---|
image-gen-text | Text → image | Z-Image |
image-gen | Image → edited image (full-frame) | Z-Image |
image-gen-model | Model-conditioned image generation | Configurable |
image-edit | Inpaint / instruction-driven edit | FLUX.2 Klein 9B |
image-fusion | Multi-reference blend | FLUX.2 Klein 9B |
image-describe | Image → text (caption / Q&A) | Gemma 4 (multimodal) |
image-upscale | Upscale image | SeedVR2 |
Video transforms
| Node slot | Description | Backend |
|---|---|---|
gen-video, text-gen-video | Text → video | LTX-2 |
image-gen-video | Image → video | LTX-2 |
image-image-gen-video | First + last frame → video (interpolation) | LTX-2 |
video-image-gen-video-mix, wan-animate-mix | Image + video → video with character swap / scene mix | WAN Animate |
video-image-gen-video-move, video-image-move-animal | Motion transfer (subject from one, motion from another) | WAN Animate (move variant) |
audio-image-gen-video | Audio + image → talking-head / animated portrait | LTX-2 / WAN |
video-describe, video-gen-text | Video → text (summary / caption) | Gemma 4 |
video-upscale | Upscale video | SeedVR2 |
get-first-frame, get-last-frame | Extract a single frame | Local (FFmpeg) |
subtitle_remove | Remove burned-in subtitles | Backend |
remove_watermark | Remove watermark | Backend |
Audio transforms
| Node slot | Description | Backend |
|---|---|---|
gen-music | Text → music | ACE-Step |
text-gen-speech-preset | TTS with a preset voice | Qwen3 |
text-gen-speech-clone | TTS with a reference voice (clone) | Qwen3 |
text-gen-speech-instruct | TTS with style instructions | Qwen3 |
text-audio-gen-speech | TTS using both text and a reference audio | Qwen3 |
transcribe, transcribe-timestamp | Audio / video → text (with optional timestamps) | Qwen3 |
denoise_audio | Noise reduction | Backend |
separate_speaker | Speaker diarization | Backend |
separate_audio_track, separate-video-audio | Demux audio from video | Local (FFmpeg) |
convert_voice | Voice / timbre replacement | Qwen3 |
Cross-modal bridges
| Node slot | Description |
|---|---|
parse-document | Document → text |
link | URL → text |
| Image → 3D (in pipeline) | Image → 3D model |
Combine
Combine nodes take multiple inputs and produce one output.
| Node slot | Inputs | Output |
|---|---|---|
image-fusion | N images | One blended image |
speech-video-gen-video, lip-sync variants | Audio + video → video / Audio + image → video / Audio + text → video / Audio + image + video → video | Lip-synced video |
speech-image-video-gen-video | Speech + image + video | Composite video |
speech-text-gen-video | Speech + text | Video |
convert_voice (combine flavor) | Text + reference audio → speech | Cloned voice |
combine-text | N text nodes → one |
Helpers
| Node slot | Description |
|---|---|
concat-videos | Join multiple clips end-to-end |
merge-video-audio | Mux audio + video into one file |
split-video | Cut by scene boundaries (scene detection) |
separate-video-audio | Demux into separate tracks |
extract-audio | Pull audio track from a video |
split-text | Break long text into chunks |
combine-text | Merge text segments |
drop-video | Filter / drop clips by rule |
arrange-group | Group and arrange clips/text for batch downstream |
How types are checked
Connection validation is driven by the ABI. When you drag an output handle to an input handle, the system checks that the modality and shape match — if you try to feed a video into an input that wants text, the edge won’t connect. The generated TypeScript types in src/generated/abi/index.ts keep the canvas and the workflow exporter honest at compile time.
Adding your own node
If a transform you need isn’t listed, you can plug it in. See docs/feature-registry.md and docs/plugins.md in the tongflow repo. The flow is:
- Add the slot definition to
config/tongflow.abi.json. - Regenerate types:
pnpm gen:abi. - Implement the plugin under
plugins/with the@node_slotdecorator and matching Pydantic models. - Bump the Python SDK pin, publish, redeploy Modal.
Related
- Smart Island — how to surface these nodes from the dock
- Workflow studio — connecting nodes and running
- AI capabilities — the named backend models
