Rate Limits
PolarGrid enforces rate limits to ensure fair usage and platform stability. These limits apply uniformly across all inference endpoints (LLM, STT, TTS) and all models.
Current Limits
| Limit | Value | Scope |
|---|
| Requests per minute | 100 | Per API key (or client IP if unauthenticated) |
- Request rate limit: A per-minute window allows up to 100 requests per authenticated identity. The window resets at each minute boundary. If no API key is provided, the limit is applied per client IP address.
This limit applies identically to all endpoints: /v1/chat/completions, /v1/completions, /v1/audio/speech, and /v1/audio/transcriptions.
What Happens When You Hit a Limit
When the limit is exceeded, the gateway returns an HTTP 429 Too Many Requests response:
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
The SDK surfaces this as a RateLimitError with a retryAfter property indicating how many seconds to wait. See Error Handling for full error type reference.
Retry Strategy
Use exponential backoff with jitter to avoid thundering-herd problems when retrying after a 429:
async function withBackoff(fn, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (error.status !== 429 || attempt === maxRetries - 1) {
throw error;
}
const baseDelay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s, 8s, 16s
const jitter = Math.random() * 1000;
await new Promise(r => setTimeout(r, baseDelay + jitter));
}
}
}
const response = await withBackoff(() =>
client.chatCompletion({
model: 'qwen-3.5-9b',
messages: [{ role: 'user', content: 'Hello' }],
})
);
The PolarGrid SDKs include built-in retry with exponential backoff for transient errors (including 429s). Configure via maxRetries when initializing the client. The examples above are for custom retry logic beyond the defaults.
Best Practices for Production
Distribute requests across edges
Use the autorouter to spread traffic across multiple edge nodes. Each edge node maintains its own rate limit counters, so distributing requests reduces the chance of hitting limits on any single node.
const client = new PolarGrid({
apiKey: 'pg_...',
// Autorouter selects the nearest available edge
baseUrl: 'https://autorouter.polargrid.ai',
});
Implement client-side throttling
Rather than relying on server-side 429 responses, proactively throttle requests in your application:
// Simple token bucket rate limiter
class RateLimiter {
constructor(maxRequests = 90, windowMs = 60_000) {
this.tokens = maxRequests;
this.maxTokens = maxRequests;
this.windowMs = windowMs;
this.lastRefill = Date.now();
}
async acquire() {
this.refill();
if (this.tokens <= 0) {
const waitMs = this.windowMs - (Date.now() - this.lastRefill);
await new Promise(r => setTimeout(r, waitMs));
this.refill();
}
this.tokens--;
}
refill() {
const now = Date.now();
if (now - this.lastRefill >= this.windowMs) {
this.tokens = this.maxTokens;
this.lastRefill = now;
}
}
}
const limiter = new RateLimiter(90); // Stay under the 100/min limit
async function safeRequest(client, params) {
await limiter.acquire();
return client.chatCompletion(params);
}
Use streaming to reduce request count
A single streaming request holds one connection open while tokens are generated, rather than making multiple polling requests. This is especially effective for long responses.
// One request, streamed over a single connection
for await (const chunk of client.chatCompletionStream({
model: 'qwen-3.5-9b',
messages: [{ role: 'user', content: 'Write a detailed analysis...' }],
})) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
See the Streaming guide for full details.
Batch where possible
If you have multiple independent prompts, send them as separate requests but pace them to stay within your rate limit. Avoid firing all requests simultaneously.
Custom Limits
Enterprise customers can request custom rate limits tailored to their workload. Contact support@polargrid.ai or reach out to your account representative to discuss higher limits.