Text-to-Speech

Generate audio from text with a catalog of Kokoro-82M voices. The endpoint returns raw PCM audio.

Edge endpoints require a JWT. See Authentication for how to obtain one.

Create Speech

POST /v1/audio/speech

Generate audio from input text.

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	TTS model: `kokoro-82m`
`input`	string	Yes	—	Text to convert (max 4096 chars)
`voice`	string	Yes	—	Voice to use (see below)
`response_format`	string	No	`pcm`	Accepted for forward compatibility; currently ignored (see Audio Format below)
`speed`	number	No	1.0	Speed multiplier (0.25-4.0)

Voices

The kokoro-82m model exposes eight voices through the PolarGrid SDKs:

Voice ID	Accent / gender
`af_bella`	American English, female
`af_sarah`	American English, female
`am_adam`	American English, male
`am_michael`	American English, male
`bf_emma`	British English, female
`bf_isabella`	British English, female
`bm_george`	British English, male
`bm_lewis`	British English, male

See the Voice AI guide for how these map to Kokoro-82M and a link to the full upstream voice list.

Audio Format

The endpoint returns raw 24 kHz, 16-bit, mono PCM audio. The response_format parameter is accepted by the API for forward compatibility but is currently ignored — the response is always PCM regardless of what you pass. Backend encoder support for mp3, opus, aac, flac, and wav is planned but not yet implemented. To produce one of those container formats today, transcode the PCM bytes client-side — for example:

ffmpeg -f s16le -ar 24000 -ac 1 -i speech.pcm speech.mp3

Example Request

# Edge endpoints require a JWT — see Authentication
curl -X POST https://api.yto-01.edge.polargrid.ai/v1/audio/speech \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Hello from PolarGrid!",
    "voice": "af_bella",
    "response_format": "pcm"
  }' \
  --output speech.pcm

Response

Returns raw 24 kHz, 16-bit, mono PCM audio bytes.

Streaming TTS

For real-time audio playback, use streaming:

for await (const chunk of client.textToSpeechStream({
  model: 'kokoro-82m',
  input: 'Long text to convert to speech...',
  voice: 'af_bella',
})) {
  // Process audio chunks as they arrive
  audioStream.write(chunk);
}

Streaming TTS is available in mock mode. Production streaming coming soon.

Overview

Chat & Completions

Audio

Models

Infrastructure

Text-to-Speech

Text-to-Speech

Create Speech

Request Body

Voices

Audio Format

Example Request

Response

Streaming TTS

Overview

Chat & Completions

Audio

Models

Infrastructure

​Text-to-Speech

​Create Speech

​Request Body

​Voices

​Audio Format

​Example Request

​Response

​Streaming TTS

Text-to-Speech

Create Speech

Request Body

Voices

Audio Format

Example Request

Response

Streaming TTS