Skip to main content

Speech-to-Text

Transcribe audio files to text. A single endpoint serves three modes — default async, opt-in streaming, opt-in sync — controlled by query parameters.
Edge endpoints accept your pg_* API key as a bearer token. See Authentication for details. The cURL examples below pin Toronto (yto-01) for concreteness — substitute another region or discover the fastest one via GET https://autorouter.polargrid.ai/v1/route.

Transcribe Audio

POST /v1/audio/transcriptions
The multipart body carries only the file. Everything else is a query parameter.

Query Parameters

ParameterTypeRequiredDefaultDescription
modelstringYesSTT model (e.g. whisper-large-v3-turbo, cohere-transcribe-03-2026)
languagestringNoISO-639-1 language code
promptstringNoContext hint to guide transcription
response_formatstringNojsonjson, text, srt, vtt, or verbose_json
temperaturenumberNo0Sampling temperature (0.0-1.0)
punctuationbooleanNoForce/forbid punctuation in the output
streambooleanNofalseIf true, return Server-Sent Events
syncbooleanNofalseIf true, block until completion and return the result inline

Three Modes

Query flagsResponseUse when
(none)202 Accepted with { job_id, status, poll_url }Default. Best for long files; poll with GET /v1/audio/transcriptions?job_id=...
?stream=true200 SSE of transcript.text.delta + transcript.text.doneBest UX for real-time display
?sync=true200 with formatted body (JSON / text / SRT / VTT / verbose JSON)Small files when you can wait inline
stream=true and sync=true are mutually exclusive.

Available Models

ModelDescription
whisper-large-v3-turboOpenAI Whisper, fast multilingual transcription
cohere-transcribe-03-2026Cohere transcription, 14 languages

Examples

Default — async job

# 1. Submit
curl -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?model=whisper-large-v3-turbo&language=en" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

# {
#   "job_id": "job_transcription_a1b2c3d4e5f6",
#   "status": "accepted",
#   "poll_url": "/v1/audio/transcriptions?job_id=job_transcription_a1b2c3d4e5f6"
# }

# 2. Poll
curl "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?job_id=job_transcription_a1b2c3d4e5f6" \
  -H "Authorization: Bearer pg_your_api_key"

Streaming — SSE

curl -N -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?stream=true&model=whisper-large-v3-turbo&language=en" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

Sync — blocking request

curl -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?sync=true&model=whisper-large-v3-turbo&language=en" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

Cohere model — same surface, different model id

cohere-transcribe-03-2026 supports the same sync, stream, and async modes. Swap the model query parameter — everything else is identical:
curl -N -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?stream=true&model=cohere-transcribe-03-2026" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"
Cohere covers 14 languages and auto-detects when language is omitted.

SSE Event Types

typeFieldsWhen
transcript.text.deltadelta (string)Zero or more, each an incremental piece of the transcript
transcript.text.donetext, duration (s), languageExactly once, before the stream closes
errorerror (string)On failure — stream ends after
Terminated by data: [DONE]\n\n. Each transcript.text.delta event carries the full transcript hypothesis so far, and later events may revise earlier text. Render by replacing your display with the latest delta, not by appending. The final authoritative transcript is done.text.

Polling Responses

GET /v1/audio/transcriptions?job_id=... returns:
  • 202 while the job is accepted / processing{ job_id, status, poll_interval_ms }
  • 200 when completed — formatted body, with job_id and status: "completed"
  • 400 on failed / cancelled

Supported Audio Formats

MP3, WAV, M4A, OGG, FLAC, WebM