API Overview
PolarGrid provides an OpenAI-compatible REST API, making it easy to migrate existing applications or use familiar patterns.
Base URL
All inference traffic targets a regional edge gateway directly:
https://api.{region}.edge.polargrid.ai
Available regions (10):
yto-01 — Toronto
yto-02 — Toronto 02
yul-01 — Montreal
yul-02 — Montreal 02
yvr-02 — Vancouver
nyc-01 — New York
nyc-02 — New York 02
dal-01 — Dallas
dal-02 — Dallas 02
sfo-01 — San Francisco
Picking a region
There are two patterns:
- Auto-route via the autorouter (recommended). Make a single
GET https://autorouter.polargrid.ai/v1/route to discover the best edge for the caller — the nearest edge in the caller’s country (falling back to the globally nearest edge if none is available) — then use the returned endpoint as the base URL for all subsequent requests. This is what the SDKs do internally — see Regions.
- Pin a specific region. Skip the autorouter and hit
https://api.{region}.edge.polargrid.ai directly when you need predictable routing.
autorouter.polargrid.ai is a discovery service, not an inference proxy. It serves GET /v1/route only. Sending POST /v1/chat/completions (or any other inference verb) to the autorouter returns 403 — CloudFront on that distribution only allows cacheable requests.
Authentication
Edge endpoints accept your pg_* API key directly — attach it as a bearer token on every request. See Authentication for full details.
# Send the API key directly to any edge endpoint.
curl -s "https://api.yto-01.edge.polargrid.ai/v1/models" \
-H "Authorization: Bearer pg_your_api_key"
Get your API key from the Console.
Endpoints
Text Inference
| Method | Endpoint | Description |
|---|
| POST | /v1/chat/completions | Chat completions (recommended) |
| POST | /v1/completions | Text completions |
Audio
| Method | Endpoint | Description |
|---|
| POST | /v1/audio/speech | Text-to-speech |
| POST | /v1/audio/transcriptions | Speech-to-text |
Models
| Method | Endpoint | Description |
|---|
| GET | /v1/models | List available models |
| GET | /v1/models/status | Get model loading status |
Operator endpoints (superadmin only)
The model-lifecycle and GPU endpoints below act on node-global, multi-tenant state. They require a superadmin-scoped credential issued only to PolarGrid operators — standard pg_* API keys receive 403 Forbidden. You never need these for inference; models are pre-deployed per region (see Model Loading and GPU).
| Method | Endpoint | Description |
|---|
| POST | /v1/models/load | Load a model into GPU memory |
| POST | /v1/models/unload | Unload a model |
| POST | /v1/models/unload-all | Unload all models |
| GET | /v1/gpu/status | Detailed GPU status |
| GET | /v1/gpu/memory | GPU memory usage |
| POST | /v1/gpu/purge | Clear GPU memory |
Health
| Method | Endpoint | Description |
|---|
| GET | /health | Service health check |
All POST requests accept JSON:
curl -X POST https://api.yto-01.edge.polargrid.ai/v1/chat/completions \
-H "Authorization: Bearer pg_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-3.5-27b",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'
Responses are JSON with this structure:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "qwen-3.5-27b",
"choices": [...],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}
Errors
Errors return appropriate HTTP status codes with a detail string (FastAPI format):
{
"detail": "Invalid API key"
}
| Status | Description |
|---|
| 400 | Bad request (validation error) |
| 401 | Unauthorized (invalid API key) |
| 404 | Not found |
| 429 | Rate limit exceeded |
| 500 | Server error |
Streaming
For streaming responses, set stream: true:
curl -X POST https://api.yto-01.edge.polargrid.ai/v1/chat/completions \
-H "Authorization: Bearer pg_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-3.5-27b",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
Streaming responses use Server-Sent Events (SSE). See the Streaming Guide for details.
OpenAI Compatibility
PolarGrid’s REST API follows OpenAI’s endpoint structure and request/response formats, so tools that speak the OpenAI wire protocol (e.g., curl, LangChain, LiteLLM) can target PolarGrid with a base URL change.
The PolarGrid Python and JavaScript SDKs use their own method signatures (e.g., client.chat_completion({...}) instead of client.chat.completions.create(...)) and are not drop-in replacements for the OpenAI SDK. See the SDK docs for details.
Known limitations vs. OpenAI:
| Feature | Status | Notes |
|---|
Tool/function calling (tools, tool_choice, functions, function_call) | Limited | The gateway processes tools and parses tool_calls from model output, but currently deployed models (qwen-3.5-9b, qwen-3.5-27b) have limited tool-calling reliability. See Chat Completions. |
response_format default for TTS | Differs | REST defaults to pcm; OpenAI defaults to mp3. The PolarGrid SDKs default to mp3 (batch) / opus (streaming) for parity. Pass response_format explicitly to avoid surprises. |
| Streaming TTS formats | Partial | Only pcm and opus are streamable. wav and mp3 are batch-only. |