Skip to main content

Text-to-Speech

Generate audio from text with a catalog of Kokoro-82M voices. The endpoint returns raw PCM audio.
Edge endpoints require a JWT. See Authentication for how to obtain one.

Create Speech

POST /v1/audio/speech
Generate audio from input text.

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesTTS model: kokoro-82m
inputstringYesText to convert (max 4096 chars)
voicestringYesVoice to use (see below)
response_formatstringNopcmAccepted for forward compatibility; currently ignored (see Audio Format below)
speednumberNo1.0Speed multiplier (0.25-4.0)

Voices

The kokoro-82m model exposes eight voices through the PolarGrid SDKs:
Voice IDAccent / gender
af_bellaAmerican English, female
af_sarahAmerican English, female
am_adamAmerican English, male
am_michaelAmerican English, male
bf_emmaBritish English, female
bf_isabellaBritish English, female
bm_georgeBritish English, male
bm_lewisBritish English, male
See the Voice AI guide for how these map to Kokoro-82M and a link to the full upstream voice list.

Audio Format

The endpoint returns raw 24 kHz, 16-bit, mono PCM audio. The response_format parameter is accepted by the API for forward compatibility but is currently ignored — the response is always PCM regardless of what you pass. Backend encoder support for mp3, opus, aac, flac, and wav is planned but not yet implemented. To produce one of those container formats today, transcode the PCM bytes client-side — for example:
ffmpeg -f s16le -ar 24000 -ac 1 -i speech.pcm speech.mp3

Example Request

# Edge endpoints require a JWT — see Authentication
curl -X POST https://api.yto-01.edge.polargrid.ai/v1/audio/speech \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Hello from PolarGrid!",
    "voice": "af_bella",
    "response_format": "pcm"
  }' \
  --output speech.pcm

Response

Returns raw 24 kHz, 16-bit, mono PCM audio bytes.

Streaming TTS

For real-time audio playback, use streaming:
for await (const chunk of client.textToSpeechStream({
  model: 'kokoro-82m',
  input: 'Long text to convert to speech...',
  voice: 'af_bella',
})) {
  // Process audio chunks as they arrive
  audioStream.write(chunk);
}
Streaming TTS is available in mock mode. Production streaming coming soon.