Skip to main content

Documentation Index

Fetch the complete documentation index at: https://polargrid.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

HumeAI TADA 3B ML (tada-3b-ml) is a multilingual text-to-speech model served on PolarGrid edge nodes via Triton’s python backend. Unlike preset-voice models, TADA clones a speaker from a short reference clip and can carry that voice across languages — synthesize French in a voice you only ever recorded speaking English.
  • HF repo: HumeAI/tada-3b-ml
  • Modality: Text-to-Speech (streaming)
  • Backend: Triton python (isolated TADA pod)
  • Streaming availability: Pre-GA validation — not yet announced for a production region.

Headline benchmark

TADA exposes a chunked-HTTP streaming transport for /v1/audio/speech.
Metricp50p95
TTFA — time to first audio byte499 ms523 ms
Full-utterance latency1417 ms1886 ms
Real-time factor (RTF)0.250.46
Streaming /v1/audio/speech (stream: true, pcm), 100 runs against a PolarGrid edge endpoint. TTFA is end-to-end wall-clock from request sent to first audio byte (network included). RTF = synthesis wall-clock ÷ audio duration; below 1.0 is faster than real time. Every run returned a streaming verdict — audio arrives in chunks during synthesis. Source: bench/tada-3b-ml/.

Quickstart

Edge endpoints accept your raw pg_* API key as a bearer token — no token exchange. See Authentication. Replace <region> with your edge region, or discover the nearest one via the autorouter.
curl -X POST https://api.<region>.edge.polargrid.ai/v1/audio/speech \
  -H "Authorization: Bearer $POLARGRID_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "tada-3b-ml",
    "input": "Hello from PolarGrid.",
    "voice": "default",
    "response_format": "pcm",
    "stream": true
  }' \
  --output speech.pcm

Capabilities

FieldValue
EndpointPOST /v1/audio/speech
Audio output24 kHz, 16-bit, mono
StreamingYes — chunked HTTP, pcm and opus (stream: true); audio delivered incrementally in ~4-token windows during synthesis
Batch formatspcm, wav, mp3
Voice modelCross-lingual voice cloning from a reference clip (no preset voice catalog)
LanguagesEnglish, French, German, Spanish, Italian, Portuguese, Polish, Japanese, Arabic, Chinese
speed controlBatch only — streaming requires speed = 1.0
Max batch size1

Voices — cross-lingual cloning

TADA does not expose preset voice IDs. The voice parameter selects a reference speaker:
voice valueMeaning
defaultThe bundled reference clip — a neutral English speaker. Use this when you just want speech and don’t care about the timbre.
A URLA WAV file (24 kHz mono) fetched and used as the reference. Pair it with voice_transcript — the exact text spoken in the clip.
A base64 WAVThe reference clip inlined as a base64-encoded WAV string. Also pair with voice_transcript.
voice_transcript is required whenever voice is a URL or base64 clip — TADA conditions on both the reference audio and its transcript. It is not needed for voice: "default". The cloned voice carries across languages: provide an English reference clip and set the language field (or write the input in the target language) to synthesize that speaker in French, Japanese, and so on.
curl -X POST https://api.<region>.edge.polargrid.ai/v1/audio/speech \
  -H "Authorization: Bearer $POLARGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tada-3b-ml",
    "input": "Bonjour, ceci est une voix clonée.",
    "voice": "https://example.com/my-reference.wav",
    "voice_transcript": "This is the exact text spoken in the reference clip.",
    "language": "fr",
    "response_format": "wav"
  }' \
  --output cloned.wav

Streaming

Pass stream: true for chunked audio over a single HTTP response. Streaming formats are pcm (default for raw HTTP callers) and opus; the PolarGrid SDKs default streaming requests to opus. Requesting wav or mp3 with stream: true returns 400 Bad Request. Audio is delivered incrementally: TADA’s synthesis loop runs one step per text token, and the handler emits a chunk every couple of tokens (a ~4-token window, 2 of them overlap for crossfade context) as synthesis proceeds — so the first audio arrives well before the utterance finishes. Synthesis is fast on top of that (real-time factor ~0.13–0.25, i.e. audio produced several × faster than it plays). The streaming_verdict field in bench/tada-3b-ml/ reports per run whether the edge delivered bytes incrementally. TADA streaming does not honor the speed parameter — speed control needs a full second synthesis pass, which is incompatible with per-chunk streaming. Pass speed: 1.0 (or omit it) for streaming requests; use batch mode if you need to change the rate. See the Text-to-Speech API reference for the full streaming contract — response headers, truncated-stream detection, and the per-format table.

Aliases

The following caller-facing aliases resolve to tada-3b-ml:
AliasResolves to
humane-tadatada-3b-ml
humane/tada-ttstada-3b-ml

Model identifier

Call this model with the canonical id tada-3b-ml (or an alias above) at /v1/audio/speech. The HuggingFace repo id HumeAI/tada-3b-ml is accepted at /v1/models/load for hot-loading but does not resolve at inference time.

Notes

  • TADA runs in its own Triton pod, isolated from the voice pod: hume-tada pins transformers < 5 and torch < 2.8, while the voice pod’s cohere-transcribe needs transformers >= 5.4. See backend/edge-production-setup/CLAUDE.md for the pod layout.
  • Streaming synthesis is per-token through a decoupled Triton transaction policy — the handler pushes PCM windows as they are decoded rather than buffering the full utterance.
  • For preset-voice English/British TTS with a fixed catalog, use kokoro-82m instead — TADA is the choice when you need a specific cloned voice or a non-English language.

See also