Skip to main content

Voice AI

PolarGrid provides low-latency voice capabilities at the edge.

Text-to-Speech (TTS)

Convert text to natural-sounding speech.

Basic Usage

const audioBuffer = await client.textToSpeech({
  model: 'tts-1',
  input: 'Hello from PolarGrid!',
  voice: 'nova',
  responseFormat: 'mp3',
});

// Save to file (Node.js)
import { writeFile } from 'fs/promises';
await writeFile('speech.mp3', Buffer.from(audioBuffer));

// Play in browser
const blob = new Blob([audioBuffer], { type: 'audio/mpeg' });
const audio = new Audio(URL.createObjectURL(blob));
audio.play();

Available Voices

Standard voices (all models):
VoiceStyle
alloyNeutral, balanced
echoWarm, conversational
fableExpressive, storytelling
onyxDeep, authoritative
novaFriendly, upbeat
shimmerSoft, gentle
Kokoro voices (for kokoro-82m model):
VoiceDescription
af_bella, af_sarahFemale American
am_adam, am_michaelMale American
bf_emma, bf_isabellaFemale British
bm_george, bm_lewisMale British

Speed Control

Adjust playback speed from 0.25x to 4.0x:
const audioBuffer = await client.textToSpeech({
  model: 'tts-1',
  input: 'This will be spoken slowly.',
  voice: 'nova',
  speed: 0.75,  // Slower
});

Audio Formats

FormatUse Case
mp3General purpose, widely compatible
opusEfficient streaming, low bandwidth
aacApple ecosystem
flacLossless, archival
wavUncompressed, editing
pcmRaw audio, processing

Speech-to-Text (STT)

Transcribe audio to text.

Basic Transcription

const file = new File([audioData], 'recording.mp3', { type: 'audio/mpeg' });

const result = await client.transcribe({
  file,
  model: 'whisper-1',
  language: 'en',  // Optional: hint the language
});

console.log(result.text);

Verbose Output with Timestamps

Get word-level timestamps:
const result = await client.transcribe({
  file,
  model: 'whisper-1',
  responseFormat: 'verbose_json',
});

console.log(`Duration: ${result.duration}s`);
console.log(`Language: ${result.language}`);

result.segments.forEach(segment => {
  console.log(`[${segment.start.toFixed(2)} - ${segment.end.toFixed(2)}] ${segment.text}`);
});

Subtitle Formats

Generate subtitles directly:
// SRT format
const srt = await client.transcribe({
  file,
  model: 'whisper-1',
  responseFormat: 'srt',
});

// WebVTT format
const vtt = await client.transcribe({
  file,
  model: 'whisper-1',
  responseFormat: 'vtt',
});

Translation

Translate audio from any language to English:
// Spanish audio → English text
const result = await client.translate({
  file: spanishAudioFile,
  model: 'whisper-1',
});

console.log(result.text);  // English transcription

Real-Time Voice Chat

Combine TTS and STT for voice conversations:
async function voiceChat(audioInput) {
  // 1. Transcribe user speech
  const transcription = await client.transcribe({
    file: audioInput,
    model: 'whisper-1',
  });
  
  // 2. Generate AI response
  const response = await client.chatCompletion({
    model: 'llama-3.1-8b',
    messages: [
      { role: 'user', content: transcription.text }
    ],
  });
  
  // 3. Convert response to speech
  const audio = await client.textToSpeech({
    model: 'tts-1',
    input: response.choices[0].message.content,
    voice: 'nova',
  });
  
  return audio;
}

Supported Audio Formats

For transcription and translation:
  • MP3
  • WAV
  • M4A
  • OGG
  • FLAC
  • WebM