Chat Completions
The chat completions endpoint is the recommended way to generate text. It supports multi-turn conversations with system, user, and assistant messages.
This endpoint is OpenAI-compatible. If you’re migrating from OpenAI, most code will work with minimal changes.
Create Chat Completion
POST /v1/chat/completions
Generate a chat completion for the given messages.
Request Body
| Parameter | Type | Required | Default | Description |
|---|
model | string | Yes | — | Model ID (e.g., llama-3.1-8b) |
messages | array | Yes | — | Array of message objects |
max_tokens | integer | No | 150 | Maximum tokens to generate (1-4096) |
temperature | number | No | 0.7 | Sampling temperature (0.0-2.0) |
top_p | number | No | 0.9 | Nucleus sampling (0.0-1.0) |
top_k | integer | No | — | Top-k sampling |
frequency_penalty | number | No | 0.0 | Frequency penalty (-2.0 to 2.0) |
presence_penalty | number | No | 0.0 | Presence penalty (-2.0 to 2.0) |
stop | array | No | — | Up to 4 stop sequences |
stream | boolean | No | false | Enable streaming |
user | string | No | — | End-user identifier |
{
"role": "system" | "user" | "assistant",
"content": "message text"
}
Example Request
curl -X POST https://api.ymq-01.edge.polargrid.ai:55111/v1/chat/completions \
-H "Authorization: Bearer pg_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 100,
"temperature": 0.7
}'
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "llama-3.1-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
Streaming
For real-time responses, enable streaming:
for await (const chunk of client.chatCompletionStream({
model: 'llama-3.1-8b',
messages: [{ role: 'user', content: 'Tell me a story' }],
})) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
Each chunk is a Server-Sent Event:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"The"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" capital"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Finish Reasons
| Reason | Description |
|---|
stop | Natural completion or stop sequence hit |
length | Max tokens reached |
content_filter | Content filtered |