Skip to main content

PersonaPlex

PersonaPlex is PolarGrid’s real-time voice conversation endpoint. It streams Opus audio in both directions over a single WebSocket, with a persona prompt and voice selected at connect time.

PersonaPlex vs. the Modular Pipeline Agent

DimensionPersonaPlex (multi-modal)Modular Pipeline Agent
Model architectureSingle multi-modal model (audio in / audio out)Three models: STT → LLM → TTS
Latency profileOne model inference per turnThree model calls per turn; LLM output is streamed into TTS at sentence boundaries rather than after generation completes
Persona / promptText persona query parameter at connect timeSystem prompt passed to the LLM
Voice selectionOne of the PersonaPlex voice IDs (NATF0…)Any voice supported by the selected TTS model
Model substitutionNot supported (models are bundled)Any STT / LLM / TTS from GET /v1/models
Barge-inHandled inside the multi-modal modelUser speech cancels in-flight LLM and TTS
Structured eventsAudio frames + transcript text framesJSON event stream (see Modular Pipeline Agent)
Function callingNot applicable (audio-native model)Via the LLM step, once native tool-use is available
PersonaPlex is not listed in GET /v1/models — that endpoint only returns the request/response models (Whisper, Llama, Kokoro). PersonaPlex is a separate streaming service with its own endpoint and wire protocol.

Connecting

Connecting is a two-step flow: exchange your API key for a short-lived JWT, then open a WebSocket to the voice endpoint with that JWT.

1. Exchange your API key for a JWT

TOKEN=$(curl -s -X POST https://autorouter.edge.polargrid.ai/v1/auth/token \
  -H "Authorization: Bearer $API_KEY" | jq -r .token)

2. Open the WebSocket

wss://api.<region>.edge.polargrid.ai/v1/voice/personaplex
  ?voice=NATF0
  &persona=<url-encoded persona prompt>
  &token=<JWT>
Default region is yto-01 (Toronto) if you omit it. As an alternative to the token query parameter, you can pass the JWT via the WebSocket subprotocol header:
Sec-WebSocket-Protocol: bearer.<JWT>

Voices

Pass one of the following as the voice query parameter. Do not include the .pt suffix.
GroupVoices
NATFNATF0NATF3
NATMNATM0NATM3
VARFVARF0VARF4
VARMVARM0VARM4

Wire protocol

All frames are binary. The first byte is a type tag; the remaining bytes are the payload.

Client → Server

TagPayloadNotes
0x01Opus audioOgg container, mono, 24 kHz
0x02UTF-8 textInject text into the conversation
0x03Control frame (bos or eos)Stream boundary markers

Server → Client

TagPayloadNotes
0x00HandshakeSent once, immediately after connect
0x01Opus audioGenerated speech
0x02UTF-8 textTranscript tokens

Gotchas

Wait for the 0x00 handshake before sending any audio. The first audio bytes you send must be the Opus Ogg BOS (beginning-of-stream) page. If audio arrives before the handshake — or without a valid BOS page — the server closes the connection with code 1000.
Disable your WebSocket library’s heartbeat / ping. The upstream moshi runtime does not respond to RFC 6455 pongs, so a client-side ping timer will tear down an otherwise healthy session. Most libraries expose this as a ping_interval or heartbeat option — set it to 0 or None.
Sessions are billed by wall-clock duration. Close the socket as soon as the conversation is idle; an open connection keeps accruing cost even with no audio flowing.

Minimal example

const token = await fetch('https://autorouter.edge.polargrid.ai/v1/auth/token', {
  method: 'POST',
  headers: { Authorization: `Bearer ${API_KEY}` },
}).then(r => r.json()).then(j => j.token);

const url = new URL('wss://api.yvr-02.edge.polargrid.ai/v1/voice/personaplex');
url.searchParams.set('voice', 'NATF0');
url.searchParams.set('persona', 'A friendly tour guide for Vancouver.');
url.searchParams.set('token', token);

const ws = new WebSocket(url);
ws.binaryType = 'arraybuffer';

ws.onmessage = (ev) => {
  const buf = new Uint8Array(ev.data);
  const tag = buf[0];
  const payload = buf.subarray(1);

  if (tag === 0x00) {
    // Handshake received — now it's safe to start streaming audio.
    startMicEncoderAndSendOggBosPage();
  } else if (tag === 0x01) {
    playOpusAudio(payload);
  } else if (tag === 0x02) {
    appendTranscript(new TextDecoder().decode(payload));
  }
};