API Overview

PolarGrid provides an OpenAI-compatible REST API, making it easy to migrate existing applications or use familiar patterns.

Base URL

All inference traffic targets a regional edge gateway directly:

https://api.{region}.edge.polargrid.ai

Available regions (10):

yto-01 — Toronto
yto-02 — Toronto 02
yul-01 — Montreal
yul-02 — Montreal 02
yvr-02 — Vancouver
nyc-01 — New York
nyc-02 — New York 02
dal-01 — Dallas
dal-02 — Dallas 02
sfo-01 — San Francisco

Picking a region

There are two patterns:

Auto-route via the autorouter (recommended). Make a single GET https://autorouter.polargrid.ai/v1/route to discover the best edge for the caller — the nearest edge in the caller’s country (falling back to the globally nearest edge if none is available) — then use the returned endpoint as the base URL for all subsequent requests. This is what the SDKs do internally — see Regions.
Pin a specific region. Skip the autorouter and hit https://api.{region}.edge.polargrid.ai directly when you need predictable routing.

autorouter.polargrid.ai is a discovery service, not an inference proxy. It serves GET /v1/route only. Sending POST /v1/chat/completions (or any other inference verb) to the autorouter returns 403 — CloudFront on that distribution only allows cacheable requests.

Authentication

Edge endpoints accept your pg_* API key directly — attach it as a bearer token on every request. See Authentication for full details.

# Send the API key directly to any edge endpoint.
curl -s "https://api.yto-01.edge.polargrid.ai/v1/models" \
  -H "Authorization: Bearer pg_your_api_key"

Get your API key from the Console.

Endpoints

Text Inference

Method	Endpoint	Description
POST	`/v1/chat/completions`	Chat completions (recommended)
POST	`/v1/completions`	Text completions

Audio

Method	Endpoint	Description
POST	`/v1/audio/speech`	Text-to-speech
POST	`/v1/audio/transcriptions`	Speech-to-text

Models

Method	Endpoint	Description
GET	`/v1/models`	List available models
GET	`/v1/models/status`	Get model loading status

Operator endpoints (superadmin only)

The model-lifecycle and GPU endpoints below act on node-global, multi-tenant state. They require a superadmin-scoped credential issued only to PolarGrid operators — standard pg_* API keys receive 403 Forbidden. You never need these for inference; models are pre-deployed per region (see Model Loading and GPU).

Method	Endpoint	Description
POST	`/v1/models/load`	Load a model into GPU memory
POST	`/v1/models/unload`	Unload a model
POST	`/v1/models/unload-all`	Unload all models
GET	`/v1/gpu/status`	Detailed GPU status
GET	`/v1/gpu/memory`	GPU memory usage
POST	`/v1/gpu/purge`	Clear GPU memory

Health

Method	Endpoint	Description
GET	`/health`	Service health check

Request Format

All POST requests accept JSON:

curl -X POST https://api.yto-01.edge.polargrid.ai/v1/chat/completions \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-3.5-27b",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Response Format

Responses are JSON with this structure:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen-3.5-27b",
  "choices": [...],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Errors

Errors return appropriate HTTP status codes with a detail string (FastAPI format):

{
  "detail": "Invalid API key"
}

Status	Description
400	Bad request (validation error)
401	Unauthorized (invalid API key)
404	Not found
429	Rate limit exceeded
500	Server error

Streaming

For streaming responses, set stream: true:

curl -X POST https://api.yto-01.edge.polargrid.ai/v1/chat/completions \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-3.5-27b",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Streaming responses use Server-Sent Events (SSE). See the Streaming Guide for details.

OpenAI Compatibility

PolarGrid’s REST API follows OpenAI’s endpoint structure and request/response formats, so tools that speak the OpenAI wire protocol (e.g., curl, LangChain, LiteLLM) can target PolarGrid with a base URL change. The PolarGrid Python and JavaScript SDKs use their own method signatures (e.g., client.chat_completion({...}) instead of client.chat.completions.create(...)) and are not drop-in replacements for the OpenAI SDK. See the SDK docs for details. Known limitations vs. OpenAI:

Feature	Status	Notes
Tool/function calling (`tools`, `tool_choice`, `functions`, `function_call`)	Limited	The gateway processes tools and parses `tool_calls` from model output, but currently deployed models (`qwen-3.5-9b`, `qwen-3.5-27b`) have limited tool-calling reliability. See Chat Completions.
`response_format` default for TTS	Differs	REST defaults to `pcm`; OpenAI defaults to `mp3`. The PolarGrid SDKs default to `mp3` (batch) / `opus` (streaming) for parity. Pass `response_format` explicitly to avoid surprises.
Streaming TTS formats	Partial	Only `pcm` and `opus` are streamable. `wav` and `mp3` are batch-only.

​API Overview

​Base URL

​Picking a region

​Authentication

​Endpoints

​Text Inference

​Audio

​Models

​Operator endpoints (superadmin only)

​Health

​Request Format

​Response Format

​Errors

​Streaming

​OpenAI Compatibility

API Overview

Base URL

Picking a region

Authentication

Endpoints

Text Inference

Audio

Models

Operator endpoints (superadmin only)

Health

Request Format

Response Format

Errors

Streaming

OpenAI Compatibility