PersonaPlex

PersonaPlex is PolarGrid’s real-time voice conversation endpoint. It streams Opus audio in both directions over a single WebSocket, with a persona prompt and voice selected at connect time.

PersonaPlex vs. the Modular Pipeline Agent

Dimension	PersonaPlex (multi-modal)	Modular Pipeline Agent
Model architecture	Single multi-modal model (audio in / audio out)	Three models: STT → LLM → TTS
Latency profile	One model inference per turn	Three model calls per turn; LLM output is streamed into TTS at sentence boundaries rather than after generation completes
Persona / prompt	Text `persona` query parameter at connect time	System prompt passed to the LLM
Voice selection	One of the PersonaPlex voice IDs (`NATF0`…)	Any voice supported by the selected TTS model
Model substitution	Not supported (models are bundled)	Any STT / LLM / TTS from `GET /v1/models`
Barge-in	Handled inside the multi-modal model	User speech cancels in-flight LLM and TTS
Structured events	Audio frames + transcript text frames	JSON event stream (see Modular Pipeline Agent)
Function calling	Not applicable (audio-native model)	Via the LLM step, once native tool-use is available

PersonaPlex is not listed in GET /v1/models — that endpoint only returns the request/response models (Whisper, Llama, Kokoro). PersonaPlex is a separate streaming service with its own endpoint and wire protocol.

Connecting

Connecting is a two-step flow: exchange your API key for a short-lived JWT, then open a WebSocket to the voice endpoint with that JWT.

1. Exchange your API key for a JWT

TOKEN=$(curl -s -X POST https://autorouter.edge.polargrid.ai/v1/auth/token \
  -H "Authorization: Bearer $API_KEY" | jq -r .token)

2. Open the WebSocket

wss://api.<region>.edge.polargrid.ai/v1/voice/personaplex
  ?voice=NATF0
  &persona=<url-encoded persona prompt>
  &token=<JWT>

Default region is yto-01 (Toronto) if you omit it. As an alternative to the token query parameter, you can pass the JWT via the WebSocket subprotocol header:

Sec-WebSocket-Protocol: bearer.<JWT>

Voices

Pass one of the following as the voice query parameter. Do not include the .pt suffix.

Group	Voices
NATF	`NATF0`–`NATF3`
NATM	`NATM0`–`NATM3`
VARF	`VARF0`–`VARF4`
VARM	`VARM0`–`VARM4`

Wire protocol

All frames are binary. The first byte is a type tag; the remaining bytes are the payload.

Client → Server

Tag	Payload	Notes
`0x01`	Opus audio	Ogg container, mono, 24 kHz
`0x02`	UTF-8 text	Inject text into the conversation
`0x03`	Control frame (`bos` or `eos`)	Stream boundary markers

Server → Client

Tag	Payload	Notes
`0x00`	Handshake	Sent once, immediately after connect
`0x01`	Opus audio	Generated speech
`0x02`	UTF-8 text	Transcript tokens

Gotchas

Wait for the 0x00 handshake before sending any audio. The first audio bytes you send must be the Opus Ogg BOS (beginning-of-stream) page. If audio arrives before the handshake — or without a valid BOS page — the server closes the connection with code 1000.

Disable your WebSocket library’s heartbeat / ping. The upstream moshi runtime does not respond to RFC 6455 pongs, so a client-side ping timer will tear down an otherwise healthy session. Most libraries expose this as a ping_interval or heartbeat option — set it to 0 or None.

Sessions are billed by wall-clock duration. Close the socket as soon as the conversation is idle; an open connection keeps accruing cost even with no audio flowing.

Minimal example

const token = await fetch('https://autorouter.edge.polargrid.ai/v1/auth/token', {
  method: 'POST',
  headers: { Authorization: `Bearer ${API_KEY}` },
}).then(r => r.json()).then(j => j.token);

const url = new URL('wss://api.yvr-02.edge.polargrid.ai/v1/voice/personaplex');
url.searchParams.set('voice', 'NATF0');
url.searchParams.set('persona', 'A friendly tour guide for Vancouver.');
url.searchParams.set('token', token);

const ws = new WebSocket(url);
ws.binaryType = 'arraybuffer';

ws.onmessage = (ev) => {
  const buf = new Uint8Array(ev.data);
  const tag = buf[0];
  const payload = buf.subarray(1);

  if (tag === 0x00) {
    // Handshake received — now it's safe to start streaming audio.
    startMicEncoderAndSendOggBosPage();
  } else if (tag === 0x01) {
    playOpusAudio(payload);
  } else if (tag === 0x02) {
    appendTranscript(new TextDecoder().decode(payload));
  }
};

Getting Started

Guides

PersonaPlex

PersonaPlex

PersonaPlex vs. the Modular Pipeline Agent

Connecting

1. Exchange your API key for a JWT

2. Open the WebSocket

Voices

Wire protocol

Client → Server

Server → Client

Gotchas

Minimal example

Getting Started

Guides

​PersonaPlex

​PersonaPlex vs. the Modular Pipeline Agent

​Connecting

​1. Exchange your API key for a JWT

​2. Open the WebSocket

​Voices

​Wire protocol

​Client → Server

​Server → Client

​Gotchas

​Minimal example

PersonaPlex

PersonaPlex vs. the Modular Pipeline Agent

Connecting

1. Exchange your API key for a JWT

2. Open the WebSocket

Voices

Wire protocol

Client → Server

Server → Client

Gotchas

Minimal example