Chat Completions
The chat completions endpoint is the recommended way to generate text. It supports multi-turn conversations with system, user, and assistant messages.Edge endpoints require a JWT. See Authentication for how to obtain one.
Create Chat Completion
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model ID (e.g., Meta-Llama-3.1-8B-Instruct) |
messages | array | Yes | — | Array of message objects |
max_tokens | integer | No | 150 | Maximum tokens to generate (1-4096) |
temperature | number | No | 0.7 | Sampling temperature (0.0-2.0) |
top_p | number | No | 0.9 | Nucleus sampling (0.0-1.0) |
top_k | integer | No | — | Top-k sampling |
frequency_penalty | number | No | 0.0 | Frequency penalty (-2.0 to 2.0) |
presence_penalty | number | No | 0.0 | Presence penalty (-2.0 to 2.0) |
stop | array | No | — | Up to 4 stop sequences |
stream | boolean | No | false | Enable streaming |
user | string | No | — | End-user identifier |
Function / tool calling is not supported. The
tools, functions, and function_call parameters are silently ignored by the chat completions endpoint. To use tools, parse the model’s text output and dispatch tool calls in your application code.Message Format
Example Request
Response
Streaming
For real-time responses, enable streaming:Stream Response Format
Each chunk is a Server-Sent Event:Finish Reasons
| Reason | Description |
|---|---|
stop | Natural completion or stop sequence hit |
length | Max tokens reached |
content_filter | Content filtered |
