Skip to main content

Text-to-Speech

Generate audio from text with multiple voice options and formats.

Create Speech

POST /v1/audio/speech
Generate audio from input text.

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesTTS model: tts-1, tts-1-hd, kokoro-82m
inputstringYesText to convert (max 4096 chars)
voicestringYesVoice to use (see below)
response_formatstringNomp3Audio format
speednumberNo1.0Speed multiplier (0.25-4.0)

Available Voices

VoiceDescription
alloyNeutral, balanced
echoWarm, conversational
fableExpressive, storytelling
onyxDeep, authoritative
novaFriendly, upbeat
shimmerSoft, gentle
Kokoro voices (for kokoro-82m model):
  • af_bella, af_sarah — Female American
  • am_adam, am_michael — Male American
  • bf_emma, bf_isabella — Female British
  • bm_george, bm_lewis — Male British

Audio Formats

FormatDescription
mp3Compressed, widely compatible
opusEfficient, good for streaming
aacApple-friendly
flacLossless
wavUncompressed
pcmRaw audio

Example Request

curl -X POST https://api.ymq-01.edge.polargrid.ai:55111/v1/audio/speech \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello from PolarGrid!",
    "voice": "alloy",
    "response_format": "mp3"
  }' \
  --output speech.mp3

Response

Returns raw audio bytes in the requested format.

Streaming TTS

For real-time audio playback, use streaming:
for await (const chunk of client.textToSpeechStream({
  model: 'tts-1',
  input: 'Long text to convert to speech...',
  voice: 'nova',
})) {
  // Process audio chunks as they arrive
  audioStream.write(chunk);
}
Streaming TTS is available in mock mode. Production streaming coming soon.