PolarGrid serves open-weight models on GPU-accelerated edge infrastructure. All models are available via our OpenAI-compatible API.
PolarGrid runs open-source models optimized for low-latency edge inference. These models are selected for speed and efficiency in real-time applications like voice AI. For workloads that require large cloud-hosted reasoning models (e.g., GPT-4, Gemini, Claude), use those providers directly — PolarGrid does not proxy requests to third-party APIs.
General-purpose large language model with strong performance across reasoning, coding, and multilingual tasks. Our largest LLM offering, suitable for complex workloads where quality is the priority.Endpoint:POST /v1/chat/completions with "model": "qwen-3.5-27b"
Fast, cost-efficient model for tasks where latency and throughput matter more than peak quality. Strong for classification, extraction, summarization, and simple generation.Endpoint:POST /v1/chat/completions with "model": "qwen-3.5-9b"
OpenAI’s Whisper model optimized for speed. Supports multilingual transcription with high accuracy.Endpoint:POST /v1/audio/transcriptions with "model": "whisper-large-v3-turbo"
High-accuracy multilingual transcription with support for 14 languages. Supports punctuation toggling.Endpoint:POST /v1/audio/transcriptions with "model": "cohere-transcribe-03-2026"
Lightweight, fast text-to-speech model. Ideal for low-latency voice applications where speed is critical.Endpoint:POST /v1/audio/speech with "model": "kokoro-82m"
Integrated voice-to-voice pipeline that combines speech recognition, language model reasoning, and speech synthesis into a single low-latency stream. Billed by wall-clock duration.See the PersonaPlex guide for setup details.
Enterprise customers can deploy custom fine-tuned models on PolarGrid infrastructure. PolarGrid handles provisioning and loading — contact us to discuss your model requirements.For custom model deployments, contact hello@polargrid.ai.