Skip to main content

API Overview

PolarGrid provides an OpenAI-compatible REST API, making it easy to migrate existing applications or use familiar patterns.

Base URL

All inference traffic targets a regional edge gateway directly:
https://api.{region}.edge.polargrid.ai
Available regions (10):
  • yto-01 — Toronto
  • yto-02 — Toronto 02
  • yul-01 — Montreal
  • yul-02 — Montreal 02
  • yvr-02 — Vancouver
  • nyc-01 — New York
  • nyc-02 — New York 02
  • dal-01 — Dallas
  • dal-02 — Dallas 02
  • sfo-01 — San Francisco

Picking a region

There are two patterns:
  1. Auto-route via the autorouter (recommended). Make a single GET https://autorouter.polargrid.ai/v1/route to discover the best edge for the caller — the nearest edge in the caller’s country (falling back to the globally nearest edge if none is available) — then use the returned endpoint as the base URL for all subsequent requests. This is what the SDKs do internally — see Regions.
  2. Pin a specific region. Skip the autorouter and hit https://api.{region}.edge.polargrid.ai directly when you need predictable routing.
autorouter.polargrid.ai is a discovery service, not an inference proxy. It serves GET /v1/route only. Sending POST /v1/chat/completions (or any other inference verb) to the autorouter returns 403 — CloudFront on that distribution only allows cacheable requests.

Authentication

Edge endpoints accept your pg_* API key directly — attach it as a bearer token on every request. See Authentication for full details.
# Send the API key directly to any edge endpoint.
curl -s "https://api.yto-01.edge.polargrid.ai/v1/models" \
  -H "Authorization: Bearer pg_your_api_key"
Get your API key from the Console.

Endpoints

Text Inference

MethodEndpointDescription
POST/v1/chat/completionsChat completions (recommended)
POST/v1/completionsText completions

Audio

MethodEndpointDescription
POST/v1/audio/speechText-to-speech
POST/v1/audio/transcriptionsSpeech-to-text

Models

MethodEndpointDescription
GET/v1/modelsList available models
GET/v1/models/statusGet model loading status

Operator endpoints (superadmin only)

The model-lifecycle and GPU endpoints below act on node-global, multi-tenant state. They require a superadmin-scoped credential issued only to PolarGrid operators — standard pg_* API keys receive 403 Forbidden. You never need these for inference; models are pre-deployed per region (see Model Loading and GPU).
MethodEndpointDescription
POST/v1/models/loadLoad a model into GPU memory
POST/v1/models/unloadUnload a model
POST/v1/models/unload-allUnload all models
GET/v1/gpu/statusDetailed GPU status
GET/v1/gpu/memoryGPU memory usage
POST/v1/gpu/purgeClear GPU memory

Health

MethodEndpointDescription
GET/healthService health check

Request Format

All POST requests accept JSON:
curl -X POST https://api.yto-01.edge.polargrid.ai/v1/chat/completions \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-3.5-27b",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Response Format

Responses are JSON with this structure:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen-3.5-27b",
  "choices": [...],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Errors

Errors return appropriate HTTP status codes with a detail string (FastAPI format):
{
  "detail": "Invalid API key"
}
StatusDescription
400Bad request (validation error)
401Unauthorized (invalid API key)
404Not found
429Rate limit exceeded
500Server error

Streaming

For streaming responses, set stream: true:
curl -X POST https://api.yto-01.edge.polargrid.ai/v1/chat/completions \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-3.5-27b",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'
Streaming responses use Server-Sent Events (SSE). See the Streaming Guide for details.

OpenAI Compatibility

PolarGrid’s REST API follows OpenAI’s endpoint structure and request/response formats, so tools that speak the OpenAI wire protocol (e.g., curl, LangChain, LiteLLM) can target PolarGrid with a base URL change. The PolarGrid Python and JavaScript SDKs use their own method signatures (e.g., client.chat_completion({...}) instead of client.chat.completions.create(...)) and are not drop-in replacements for the OpenAI SDK. See the SDK docs for details. Known limitations vs. OpenAI:
FeatureStatusNotes
Tool/function calling (tools, tool_choice, functions, function_call)LimitedThe gateway processes tools and parses tool_calls from model output, but currently deployed models (qwen-3.5-9b, qwen-3.5-27b) have limited tool-calling reliability. See Chat Completions.
response_format default for TTSDiffersREST defaults to pcm; OpenAI defaults to mp3. The PolarGrid SDKs default to mp3 (batch) / opus (streaming) for parity. Pass response_format explicitly to avoid surprises.
Streaming TTS formatsPartialOnly pcm and opus are streamable. wav and mp3 are batch-only.