Skip to main content

Model Loading

PolarGrid supports dynamic model loading — load models into GPU memory on-demand and unload them when not needed.

Load Model

POST /v1/models/load
Load a model into GPU memory.

Request Body

ParameterTypeRequiredDefaultDescription
model_namestringYesModel ID to load
force_reloadbooleanNofalseForce reload even if already loaded

Example

curl -X POST https://api.ymq-01.edge.polargrid.ai:55111/v1/models/load \
  -H "Authorization: Bearer pg_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "llama-3.1-70b",
    "force_reload": false
  }'

Response

{
  "status": "success",
  "model": "llama-3.1-70b",
  "force_reload": false,
  "message": "Model llama-3.1-70b loaded successfully"
}

Unload Model

POST /v1/models/unload
Unload a model from GPU memory.

Example

const result = await client.unloadModel({
  modelName: 'gpt2',
});

console.log(result.message);

Response

{
  "status": "success",
  "model": "gpt2",
  "message": "Model gpt2 unloaded successfully"
}

Unload All Models

POST /v1/models/unload-all
Unload all models from GPU memory.

Example

const result = await client.unloadAllModels();

console.log(`Unloaded ${result.totalUnloaded} models`);
console.log('Models:', result.unloadedModels);

Response

{
  "status": "success",
  "unloaded_models": ["llama-3.1-8b", "whisper-1"],
  "errors": [],
  "total_unloaded": 2
}

Get Model Status

GET /v1/models/status
Get the loading status of all models.

Example

const status = await client.getModelStatus();

console.log('Loaded models:', status.loaded);
console.log('Status:', status.loadingStatus);

Response

{
  "loaded": ["llama-3.1-8b", "whisper-1"],
  "loading_status": {
    "llama-3.1-8b": "loaded",
    "llama-3.1-70b": "unloaded",
    "whisper-1": "loaded",
    "gpt2": "unloaded"
  },
  "repository": "/models"
}

Status Values

StatusDescription
loadedModel is in GPU memory and ready
loadingModel is currently being loaded
unloadedModel is not in memory
failedModel failed to load