Skip to main content

PersonaPlex

PersonaPlex is PolarGrid’s real-time voice conversation endpoint. It streams Opus audio in both directions over a single WebSocket, with a persona prompt and voice selected at connect time.

PersonaPlex vs. the Modular Pipeline Agent

DimensionPersonaPlex (multi-modal)Modular Pipeline Agent
Model architectureSingle multi-modal model (audio in / audio out)Three models: STT → LLM → TTS
Latency profileOne model inference per turnThree model calls per turn; LLM output is streamed into TTS at sentence boundaries rather than after generation completes
Persona / promptText persona query parameter at connect timeSystem prompt passed to the LLM
Voice selectionOne of the PersonaPlex voice IDs (NATF0…)Any voice supported by the selected TTS model
Model substitutionNot supported (models are bundled)Any STT / LLM / TTS from GET /v1/models
Barge-inHandled inside the multi-modal modelUser speech cancels in-flight LLM and TTS
Structured eventsAudio frames + transcript text framesJSON event stream (see Modular Pipeline Agent)
Function callingNot applicable (audio-native model)Via the LLM step, once native tool-use is available
PersonaPlex is not listed in GET /v1/models — that endpoint only returns the Triton-served request/response models (qwen-3.5-9b, qwen-3.5-27b, whisper-large-v3-turbo, cohere-transcribe-03-2026, kokoro-82m, tada-3b-ml). PersonaPlex runs as a separate moshi-backend LiveKit agent pod with its own WebSocket endpoint and wire protocol.

Connecting

Open a WebSocket to the voice endpoint with your pg_* API key as the token query parameter.

Open the WebSocket

wss://api.<region>.edge.polargrid.ai/v1/voice/personaplex
  ?voice=NATF0
  &persona=<url-encoded persona prompt>
  &token=pg_your_api_key
Default region is yto-01 (Toronto) if you omit it. As an alternative to the token query parameter, you can pass the credential via the WebSocket subprotocol header:
Sec-WebSocket-Protocol: bearer.pg_your_api_key

Voices

Pass one of the following as the voice query parameter. Do not include the .pt suffix.
GroupVoices
NATFNATF0NATF3
NATMNATM0NATM3
VARFVARF0VARF4
VARMVARM0VARM4

Wire protocol

All frames are binary. The first byte is a type tag; the remaining bytes are the payload.

Client → Server

TagPayloadNotes
0x01Opus audioOgg container, mono, 24 kHz
0x02UTF-8 textInject text into the conversation
0x03Control frame (bos or eos)Stream boundary markers

Server → Client

TagPayloadNotes
0x00HandshakeSent once, immediately after connect
0x01Opus audioGenerated speech
0x02UTF-8 textTranscript tokens

Gotchas

Wait for the 0x00 handshake before sending any audio. The first audio bytes you send must be the Opus Ogg BOS (beginning-of-stream) page. If audio arrives before the handshake — or without a valid BOS page — the server closes the connection with code 1000.
Disable your WebSocket library’s heartbeat / ping. The upstream moshi runtime does not respond to RFC 6455 pongs, so a client-side ping timer will tear down an otherwise healthy session. Most libraries expose this as a ping_interval or heartbeat option — set it to 0 or None.
Sessions are billed by wall-clock duration. Close the socket as soon as the conversation is idle; an open connection keeps accruing cost even with no audio flowing.

Minimal example

const url = new URL('wss://api.yvr-02.edge.polargrid.ai/v1/voice/personaplex');
url.searchParams.set('voice', 'NATF0');
url.searchParams.set('persona', 'A friendly tour guide for Vancouver.');
url.searchParams.set('token', API_KEY); // your pg_* API key

const ws = new WebSocket(url);
ws.binaryType = 'arraybuffer';

ws.onmessage = (ev) => {
  const buf = new Uint8Array(ev.data);
  const tag = buf[0];
  const payload = buf.subarray(1);

  if (tag === 0x00) {
    // Handshake received — now it's safe to start streaming audio.
    startMicEncoderAndSendOggBosPage();
  } else if (tag === 0x01) {
    playOpusAudio(payload);
  } else if (tag === 0x02) {
    appendTranscript(new TextDecoder().decode(payload));
  }
};