Streaming

Stream response tokens in real-time for better user experience.

Why Stream?

Without streaming, users wait for the entire response before seeing anything. With streaming, tokens appear as they’re generated — critical for interactive applications.

Chat Completion Streaming

for await (const chunk of client.chatCompletionStream({
  model: 'qwen-3.5-27b',
  messages: [{ role: 'user', content: 'Tell me a story about a robot' }],
  maxTokens: 500,
})) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
  
  // Check for completion
  if (chunk.choices[0]?.finishReason) {
    console.log('\n\nFinished:', chunk.choices[0].finishReason);
  }
}

Text Completion Streaming

for await (const chunk of client.completionStream({
  prompt: 'Once upon a time',
  model: 'qwen-3.5-27b',
  maxTokens: 200,
})) {
  process.stdout.write(chunk.choices[0].text);
}

Chunk Format

Each streaming chunk contains a delta (incremental change):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1234567890,
  "model": "qwen-3.5-27b",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",  // Only in first chunk
        "content": "Once"     // Token content
      },
      "finish_reason": null   // null until final chunk
    }
  ]
}

SSE Format (Raw API)

When using the API directly with stream: true, responses use Server-Sent Events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Browser Example

async function streamChat(prompt) {
  const outputElement = document.getElementById('output');
  outputElement.textContent = '';

  for await (const chunk of client.chatCompletionStream({
    model: 'qwen-3.5-27b',
    messages: [{ role: 'user', content: prompt }],
  })) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      outputElement.textContent += content;
    }
  }
}

React Hook Example

import { useState, useCallback } from 'react';

function useChatStream(client: PolarGrid) {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const stream = useCallback(async (messages: Message[]) => {
    setResponse('');
    setIsStreaming(true);

    try {
      for await (const chunk of client.chatCompletionStream({
        model: 'qwen-3.5-27b',
        messages,
      })) {
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
          setResponse(prev => prev + content);
        }
      }
    } finally {
      setIsStreaming(false);
    }
  }, [client]);

  return { response, isStreaming, stream };
}

Finish Reasons

Reason	Description
`stop`	Natural completion or stop sequence hit
`length`	Max tokens reached
`content_filter`	Content was filtered
`tool_calls`	Model is calling a tool/function

Tips

Always check for content: Some chunks may have empty content
Handle the finish reason: Know why generation stopped
Buffer if needed: For sentence-by-sentence display, buffer until punctuation
Error handling: Wrap in try/catch for network errors mid-stream

​Streaming

​Why Stream?

​Chat Completion Streaming

​Text Completion Streaming

​Chunk Format

​SSE Format (Raw API)

​Browser Example

​React Hook Example

​Finish Reasons

​Tips