PersonaPlex
PersonaPlex is PolarGrid’s real-time voice conversation endpoint. It streams Opus audio in both directions over a single WebSocket, with a persona prompt and voice selected at connect time.
PersonaPlex vs. the Modular Pipeline Agent
| Dimension | PersonaPlex (multi-modal) | Modular Pipeline Agent |
|---|
| Model architecture | Single multi-modal model (audio in / audio out) | Three models: STT → LLM → TTS |
| Latency profile | One model inference per turn | Three model calls per turn; LLM output is streamed into TTS at sentence boundaries rather than after generation completes |
| Persona / prompt | Text persona query parameter at connect time | System prompt passed to the LLM |
| Voice selection | One of the PersonaPlex voice IDs (NATF0…) | Any voice supported by the selected TTS model |
| Model substitution | Not supported (models are bundled) | Any STT / LLM / TTS from GET /v1/models |
| Barge-in | Handled inside the multi-modal model | User speech cancels in-flight LLM and TTS |
| Structured events | Audio frames + transcript text frames | JSON event stream (see Modular Pipeline Agent) |
| Function calling | Not applicable (audio-native model) | Via the LLM step, once native tool-use is available |
PersonaPlex is not listed in GET /v1/models — that endpoint only returns the request/response models (Whisper, Llama, Kokoro). PersonaPlex is a separate streaming service with its own endpoint and wire protocol.
Connecting
Connecting is a two-step flow: exchange your API key for a short-lived JWT, then open a WebSocket to the voice endpoint with that JWT.
1. Exchange your API key for a JWT
TOKEN=$(curl -s -X POST https://autorouter.edge.polargrid.ai/v1/auth/token \
-H "Authorization: Bearer $API_KEY" | jq -r .token)
2. Open the WebSocket
wss://api.<region>.edge.polargrid.ai/v1/voice/personaplex
?voice=NATF0
&persona=<url-encoded persona prompt>
&token=<JWT>
Default region is yto-01 (Toronto) if you omit it.
As an alternative to the token query parameter, you can pass the JWT via the WebSocket subprotocol header:
Sec-WebSocket-Protocol: bearer.<JWT>
Voices
Pass one of the following as the voice query parameter. Do not include the .pt suffix.
| Group | Voices |
|---|
| NATF | NATF0–NATF3 |
| NATM | NATM0–NATM3 |
| VARF | VARF0–VARF4 |
| VARM | VARM0–VARM4 |
Wire protocol
All frames are binary. The first byte is a type tag; the remaining bytes are the payload.
Client → Server
| Tag | Payload | Notes |
|---|
0x01 | Opus audio | Ogg container, mono, 24 kHz |
0x02 | UTF-8 text | Inject text into the conversation |
0x03 | Control frame (bos or eos) | Stream boundary markers |
Server → Client
| Tag | Payload | Notes |
|---|
0x00 | Handshake | Sent once, immediately after connect |
0x01 | Opus audio | Generated speech |
0x02 | UTF-8 text | Transcript tokens |
Gotchas
Wait for the 0x00 handshake before sending any audio. The first audio bytes you send must be the Opus Ogg BOS (beginning-of-stream) page. If audio arrives before the handshake — or without a valid BOS page — the server closes the connection with code 1000.
Disable your WebSocket library’s heartbeat / ping. The upstream moshi runtime does not respond to RFC 6455 pongs, so a client-side ping timer will tear down an otherwise healthy session. Most libraries expose this as a ping_interval or heartbeat option — set it to 0 or None.
Sessions are billed by wall-clock duration. Close the socket as soon as the conversation is idle; an open connection keeps accruing cost even with no audio flowing.
Minimal example
const token = await fetch('https://autorouter.edge.polargrid.ai/v1/auth/token', {
method: 'POST',
headers: { Authorization: `Bearer ${API_KEY}` },
}).then(r => r.json()).then(j => j.token);
const url = new URL('wss://api.yvr-02.edge.polargrid.ai/v1/voice/personaplex');
url.searchParams.set('voice', 'NATF0');
url.searchParams.set('persona', 'A friendly tour guide for Vancouver.');
url.searchParams.set('token', token);
const ws = new WebSocket(url);
ws.binaryType = 'arraybuffer';
ws.onmessage = (ev) => {
const buf = new Uint8Array(ev.data);
const tag = buf[0];
const payload = buf.subarray(1);
if (tag === 0x00) {
// Handshake received — now it's safe to start streaming audio.
startMicEncoderAndSendOggBosPage();
} else if (tag === 0x01) {
playOpusAudio(payload);
} else if (tag === 0x02) {
appendTranscript(new TextDecoder().decode(payload));
}
};