Models

PolarGrid serves open-weight models on GPU-accelerated edge infrastructure. All models are available via our OpenAI-compatible API.

PolarGrid runs open-source models optimized for low-latency edge inference. These models are selected for speed and efficiency in real-time applications like voice AI. For workloads that require large cloud-hosted reasoning models (e.g., GPT-4, Gemini, Claude), use those providers directly — PolarGrid does not proxy requests to third-party APIs.

LLM Models

Qwen 3.5 27B


Parameters	27B
Quantization	FP8 (native)
Max context	8,192 tokens
License	Apache 2.0
Pricing	$0.20 / 1M input tokens,$ 0.75 / 1M output tokens

General-purpose large language model with strong performance across reasoning, coding, and multilingual tasks. Our largest deployed LLM, suitable for complex workloads where reply quality is the priority. Available fleet-wide on every edge node. Endpoint: POST /v1/chat/completions with "model": "qwen-3.5-27b"

Speech-to-Text Models

Whisper Large V3 Turbo


Parameters	809M
License	Apache 2.0
Pricing	$0.004 / min

OpenAI’s Whisper model optimized for speed. Supports multilingual transcription with high accuracy. Endpoint: POST /v1/audio/transcriptions with "model": "whisper-large-v3-turbo" Full model card →

Cohere Transcribe


Parameters	2B
License	Apache 2.0
Supported languages	English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, Arabic
Pricing	$0.004 / min

High-accuracy multilingual transcription with support for 14 languages. Supports punctuation toggling. Available on yvr-02 (Blackwell production). Endpoint: POST /v1/audio/transcriptions?sync=true with "model": "cohere-transcribe-03-2026" (use the full model ID; the short alias cohere-transcribe is not routable) Latency (yvr-02 Blackwell, external bench from a Vancouver-area laptop, 100 sync runs, 2026-05-27): server inference p50 238 ms / p95 305 ms, server-only RTF p50 0.041 (24× faster than real time). End-to-end TTFB p50 1068 ms is upload-dominated — each request ships a ~300 KB WAV body before inference can start; the network leg is the upload time, not POP-to-client RTT. See the Cohere Transcribe model page for the full breakdown.

Text-to-Speech Models

Hume AI TADA


Parameters	~4B (Llama 3.2 3B text base + audio components)
License	Llama 3.2 Community License
Output	24 kHz mono — caller picks the container via `response_format` (`pcm`, `wav`, or `mp3`)
Supported languages	English, French, German, Spanish, Italian, Portuguese, Polish, Japanese, Arabic, Chinese
Capabilities	Cross-lingual voice cloning, speed control (batch only)
Streaming	✓ chunked HTTP, `pcm` + `opus` (per-token via decoupled Triton handler; `speed` not honored in streaming mode)
Pricing	$0.009 / min

Expressive text-to-speech with cross-lingual voice cloning. Generates natural-sounding speech across 10 languages from a short reference clip. Endpoint: POST /v1/audio/speech with "model": "tada-3b-ml"

Kokoro 82M


Parameters	82M
License	Apache 2.0
Streaming	✓ chunked HTTP, `pcm` + `opus`
Pricing	$0.006 / min

Lightweight, fast text-to-speech model. Ideal for low-latency voice applications where speed is critical. Endpoint: POST /v1/audio/speech with "model": "kokoro-82m" Full model card →

Voice Pipeline

PersonaPlex


Parameters	7B
Pipeline	STT + LLM + TTS (end-to-end)
Pricing	$0.070 / min

Integrated voice-to-voice pipeline that combines speech recognition, language model reasoning, and speech synthesis into a single low-latency stream. Billed by wall-clock duration. See the PersonaPlex guide for setup details.

Performance

Latency benchmarks per model and region are actively being measured. Performance depends on:

Client proximity to the nearest edge region
Model size — smaller models have lower TTFT and higher throughput
Request complexity — token count, audio length, streaming vs. batch

The autorouter optimizes for the lowest-latency region automatically. For detailed performance data, contact us.

Custom Models

Enterprise customers can deploy custom fine-tuned models on PolarGrid infrastructure. PolarGrid handles provisioning and loading — contact us to discuss your model requirements. For custom model deployments, contact hello@polargrid.ai.

Pricing

See full pricing details and volume discounts

API Reference

Full endpoint documentation

​Models

​LLM Models

​Qwen 3.5 27B

​Speech-to-Text Models

​Whisper Large V3 Turbo

​Cohere Transcribe

​Text-to-Speech Models

​Hume AI TADA

​Kokoro 82M

​Voice Pipeline

​PersonaPlex

​Performance

​Custom Models

Pricing

API Reference

Models

LLM Models

Qwen 3.5 27B

Speech-to-Text Models

Whisper Large V3 Turbo

Cohere Transcribe

Text-to-Speech Models

Hume AI TADA

Kokoro 82M

Voice Pipeline

PersonaPlex

Performance

Custom Models