Skip to main content

Completions

The completions endpoint generates text from a single prompt. For conversational use cases, prefer Chat Completions.

Create Completion

POST /v1/completions
Generate a completion for the given prompt.

Request Body

ParameterTypeRequiredDefaultDescription
promptstringYesThe prompt to complete
modelstringNogpt2Model ID
max_tokensintegerNo100Maximum tokens (1-4096)
temperaturenumberNo0.7Sampling temperature (0.0-2.0)
top_pnumberNo0.9Nucleus sampling (0.0-1.0)
top_kintegerNo50Top-k sampling
frequency_penaltynumberNo0.0Frequency penalty (-2.0 to 2.0)
presence_penaltynumberNo0.0Presence penalty (-2.0 to 2.0)
stoparrayNoUp to 4 stop sequences
userstringNoEnd-user identifier

Example Request

curl -X POST https://api.ymq-01.edge.polargrid.ai:55111/v1/completions \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "model": "llama-3.1-8b",
    "max_tokens": 100,
    "temperature": 0.8
  }'

Response

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1234567890,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "text": " in a land far away, there lived a young princess...",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 100,
    "total_tokens": 104
  }
}

Streaming

Enable streaming for real-time token generation:
for await (const chunk of client.completionStream({
  prompt: 'Once upon a time',
  model: 'llama-3.1-8b',
})) {
  process.stdout.write(chunk.choices[0].text);
}

Legacy Generate Method

The SDKs also provide a generate() method for backward compatibility. It wraps chatCompletion() internally:
const response = await client.generate({
  model: 'llama-3.1-8b',
  prompt: 'Hello, how are you?',
  maxTokens: 100,
});

console.log(response.content);
console.log(`Processing time: ${response.processingTimeMs}ms`);