Skip to main content

Documentation Index

Fetch the complete documentation index at: https://polargrid.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Models

PolarGrid serves open-weight models on GPU-accelerated edge infrastructure. All models are available via our OpenAI-compatible API.
PolarGrid runs open-source models optimized for low-latency edge inference. These models are selected for speed and efficiency in real-time applications like voice AI. For workloads that require large cloud-hosted reasoning models (e.g., GPT-4, Gemini, Claude), use those providers directly — PolarGrid does not proxy requests to third-party APIs.

LLM Models

Qwen 3.5 27B

Parameters27B
QuantizationFP8
Max context8,192 tokens
LicenseApache 2.0
Pricing0.165/1Minputtokens,0.165 / 1M input tokens, 0.255 / 1M output tokens
General-purpose large language model with strong performance across reasoning, coding, and multilingual tasks. Our largest LLM offering, suitable for complex workloads where quality is the priority. Endpoint: POST /v1/chat/completions with "model": "qwen-3.5-27b"

Qwen 3.5 9B

Parameters9B
QuantizationFP8
Max context8,192 tokens
LicenseApache 2.0
Pricing0.055/1Minputtokens,0.055 / 1M input tokens, 0.085 / 1M output tokens
Fast, cost-efficient model for tasks where latency and throughput matter more than peak quality. Strong for classification, extraction, summarization, and simple generation. Endpoint: POST /v1/chat/completions with "model": "qwen-3.5-9b"

Speech-to-Text Models

Whisper Large V3 Turbo

Parameters809M
LicenseApache 2.0
Pricing$0.004 / min
OpenAI’s Whisper model optimized for speed. Supports multilingual transcription with high accuracy. Endpoint: POST /v1/audio/transcriptions with "model": "whisper-large-v3-turbo"

Cohere Transcribe

Parameters2B
LicenseApache 2.0
Supported languagesEnglish, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, Arabic
Pricing$0.004 / min
High-accuracy multilingual transcription with support for 14 languages. Supports punctuation toggling. Endpoint: POST /v1/audio/transcriptions with "model": "cohere-transcribe-03-2026"

Text-to-Speech Models

Hume AI TADA

Parameters3B
LicenseCC-BY-NC-4.0
Output24 kHz mono — caller picks the container via response_format (pcm, wav, or mp3)
Supported languagesEnglish, French, German, Spanish, Italian, Portuguese, Polish, Japanese, Arabic, Chinese
CapabilitiesVoice cloning, speed control
Streaming✓ chunked HTTP, pcm + opus (per-token via decoupled Triton handler; speed not honored in streaming mode)
Pricing$0.008 / min
Expressive text-to-speech with voice cloning support. Generates natural-sounding speech across 10 languages. Endpoint: POST /v1/audio/speech with "model": "tada-3b-ml"

Kokoro 82M

Parameters82M
LicenseApache 2.0
Streaming✓ chunked HTTP, pcm + opus
Pricing$0.008 / min
Lightweight, fast text-to-speech model. Ideal for low-latency voice applications where speed is critical. Endpoint: POST /v1/audio/speech with "model": "kokoro-82m"

Voice Pipeline

PersonaPlex

Parameters7B
PipelineSTT + LLM + TTS (end-to-end)
Pricing$0.070 / min
Integrated voice-to-voice pipeline that combines speech recognition, language model reasoning, and speech synthesis into a single low-latency stream. Billed by wall-clock duration. See the PersonaPlex guide for setup details.

Performance

Latency benchmarks per model and region are actively being measured. Performance depends on:
  • Client proximity to the nearest edge region
  • Model size — smaller models have lower TTFT and higher throughput
  • Request complexity — token count, audio length, streaming vs. batch
The autorouter optimizes for the lowest-latency region automatically. For detailed performance data, contact us.

Custom Models

Enterprise customers can deploy custom fine-tuned models on PolarGrid infrastructure. PolarGrid handles provisioning and loading — contact us to discuss your model requirements. For custom model deployments, contact hello@polargrid.ai.

Pricing

See full pricing details and volume discounts

API Reference

Full endpoint documentation