Speech-to-Text

Transcribe audio files to text. A single endpoint serves three modes — default async, opt-in streaming, opt-in sync — controlled by query parameters.

Edge endpoints accept your pg_* API key as a bearer token. See Authentication for details. The cURL examples below pin Toronto (yto-01) for concreteness — substitute another region or discover the fastest one via GET https://autorouter.polargrid.ai/v1/route.

Transcribe Audio

POST /v1/audio/transcriptions

The multipart body carries only the file. Everything else is a query parameter.

Query Parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	STT model (e.g. `whisper-large-v3-turbo`, `cohere-transcribe-03-2026`)
`language`	string	No	—	ISO-639-1 language code
`prompt`	string	No	—	Context hint to guide transcription
`response_format`	string	No	`json`	`json`, `text`, `srt`, `vtt`, or `verbose_json`
`temperature`	number	No	0	Sampling temperature (0.0-1.0)
`punctuation`	boolean	No	—	Force/forbid punctuation in the output
`stream`	boolean	No	`false`	If `true`, return Server-Sent Events
`sync`	boolean	No	`false`	If `true`, block until completion and return the result inline

Three Modes

Query flags	Response	Use when
(none)	`202 Accepted` with `{ job_id, status, poll_url }`	Default. Best for long files; poll with `GET /v1/audio/transcriptions?job_id=...`
`?stream=true`	`200` SSE of `transcript.text.delta` + `transcript.text.done`	Best UX for real-time display
`?sync=true`	`200` with formatted body (JSON / text / SRT / VTT / verbose JSON)	Small files when you can wait inline

stream=true and sync=true are mutually exclusive.

Available Models

Model	Description
`whisper-large-v3-turbo`	OpenAI Whisper, fast multilingual transcription
`cohere-transcribe-03-2026`	Cohere transcription, 14 languages

Examples

Default — async job

# 1. Submit
curl -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?model=whisper-large-v3-turbo&language=en" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

# {
#   "job_id": "job_transcription_a1b2c3d4e5f6",
#   "status": "accepted",
#   "poll_url": "/v1/audio/transcriptions?job_id=job_transcription_a1b2c3d4e5f6"
# }

# 2. Poll
curl "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?job_id=job_transcription_a1b2c3d4e5f6" \
  -H "Authorization: Bearer pg_your_api_key"

Streaming — SSE

curl -N -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?stream=true&model=whisper-large-v3-turbo&language=en" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

Sync — blocking request

curl -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?sync=true&model=whisper-large-v3-turbo&language=en" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

Cohere model — same surface, different model id

cohere-transcribe-03-2026 supports the same sync, stream, and async modes. Swap the model query parameter — everything else is identical:

curl -N -X POST "https://api.yto-01.edge.polargrid.ai/v1/audio/transcriptions?stream=true&model=cohere-transcribe-03-2026" \
  -H "Authorization: Bearer pg_your_api_key" \
  -F "file=@recording.mp3"

Cohere covers 14 languages and auto-detects when language is omitted.

SSE Event Types

`type`	Fields	When
`transcript.text.delta`	`delta` (string)	Zero or more, each an incremental piece of the transcript
`transcript.text.done`	`text`, `duration` (s), `language`	Exactly once, before the stream closes
`error`	`error` (string)	On failure — stream ends after

Terminated by data: [DONE]\n\n. Each transcript.text.delta event carries the full transcript hypothesis so far, and later events may revise earlier text. Render by replacing your display with the latest delta, not by appending. The final authoritative transcript is done.text.

Polling Responses

GET /v1/audio/transcriptions?job_id=... returns:

202 while the job is accepted / processing — { job_id, status, poll_interval_ms }
200 when completed — formatted body, with job_id and status: "completed"
400 on failed / cancelled

Supported Audio Formats

MP3, WAV, M4A, OGG, FLAC, WebM

​Speech-to-Text

​Transcribe Audio

​Query Parameters

​Three Modes

​Available Models

​Examples

​Default — async job

​Streaming — SSE

​Sync — blocking request

​Cohere model — same surface, different model id

​SSE Event Types

​Polling Responses

​Supported Audio Formats

Speech-to-Text

Transcribe Audio

Query Parameters

Three Modes

Available Models

Examples

Default — async job

Streaming — SSE

Sync — blocking request

Cohere model — same surface, different model id

SSE Event Types

Polling Responses

Supported Audio Formats