PolarGrid serves open-weight models on GPU-accelerated edge infrastructure. All models are available via our OpenAI-compatible API.
PolarGrid runs open-source models optimized for low-latency edge inference. These models are selected for speed and efficiency in real-time applications like voice AI. For workloads that require large cloud-hosted reasoning models (e.g., GPT-4, Gemini, Claude), use those providers directly — PolarGrid does not proxy requests to third-party APIs.
Default LLM on every edge node. Lower TTFT and higher throughput than 27B, multilingual + multimodal (image input supported). Use this for latency-sensitive voice paths.Endpoint:POST /v1/chat/completions with "model": "qwen-3.5-9b"
General-purpose large language model with strong performance across reasoning, coding, and multilingual tasks. Our largest deployed LLM, suitable for complex workloads where reply quality is the priority. Available fleet-wide on every edge node.Endpoint:POST /v1/chat/completions with "model": "qwen-3.5-27b"
OpenAI’s Whisper model optimized for speed. Supports multilingual transcription with high accuracy.Endpoint:POST /v1/audio/transcriptions with "model": "whisper-large-v3-turbo"Full model card →
High-accuracy multilingual transcription with support for 14 languages. Supports punctuation toggling. Available on yvr-02 (Blackwell production).Endpoint:POST /v1/audio/transcriptions?sync=true with "model": "cohere-transcribe-03-2026" (also accepts the alias cohere-transcribe)Latency (yvr-02 Blackwell, external bench from a Vancouver-area laptop, 100 sync runs, 2026-05-27): server inference p50 238 ms / p95 305 ms, server-only RTF p50 0.041 (24× faster than real time). End-to-end TTFB p50 1068 ms is upload-dominated — each request ships a ~300 KB WAV body before inference can start; the network leg is the upload time, not POP-to-client RTT. See the Cohere Transcribe model page for the full breakdown.
24 kHz mono — caller picks the container via response_format (pcm, wav, or mp3)
Supported languages
English, French, German, Spanish, Italian, Portuguese, Polish, Japanese, Arabic, Chinese
Capabilities
Cross-lingual voice cloning, speed control (batch only)
Streaming
✓ chunked HTTP, pcm + opus (per-token via decoupled Triton handler; speed not honored in streaming mode)
Pricing
$0.009 / min
Expressive text-to-speech with cross-lingual voice cloning. Generates natural-sounding speech across 10 languages from a short reference clip.Endpoint:POST /v1/audio/speech with "model": "tada-3b-ml"
Lightweight, fast text-to-speech model. Ideal for low-latency voice applications where speed is critical.Endpoint:POST /v1/audio/speech with "model": "kokoro-82m"Full model card →
Integrated voice-to-voice pipeline that combines speech recognition, language model reasoning, and speech synthesis into a single low-latency stream. Billed by wall-clock duration.See the PersonaPlex guide for setup details.
Enterprise customers can deploy custom fine-tuned models on PolarGrid infrastructure. PolarGrid handles provisioning and loading — contact us to discuss your model requirements.For custom model deployments, contact hello@polargrid.ai.