Skip to main content

Edge Runtime

The EdgeRuntime manages an agent on constrained hardware with automatic watchdog restarts, resource monitoring, health endpoints, and graceful degradation.

Quick Start

import { Agent, ollama } from "@radaros/core";
import { EdgeRuntime, SystemToolkit, edgePreset } from "@radaros/edge";

const preset = edgePreset("pi5-8gb");

const agent = new Agent({
  name: "pi-agent",
  model: ollama(preset.recommendedModel),
  instructions: "You are a Raspberry Pi assistant.",
  tools: [...new SystemToolkit().getTools()],
});

const runtime = new EdgeRuntime({
  preset,
  agent,
  healthPort: 9090,
});

await runtime.start();

// Signal agent activity to prevent watchdog restarts
runtime.heartbeat();

// Check status
const status = runtime.getStatus();
console.log(status.state); // "running" | "degraded" | "stopped"

// Shutdown
await runtime.stop();

Presets

Use edgePreset(id) to get optimized defaults for your device:
import { edgePreset, listEdgePresets, customEdgePreset } from "@radaros/edge";

const presets = listEdgePresets();
// [{ id: "pi4-2gb", label: "..." }, { id: "pi4-4gb", label: "..." }, ...]

const preset = edgePreset("pi5-8gb");
// { recommendedModel: "phi3:mini", maxTokens: 2048, contextWindow: 16384, ... }

// Customize a preset
const custom = customEdgePreset("pi5-8gb", { maxTokens: 4096 });
PresetModelMax TokensContextMemory Limit
pi4-2gbtinyllama:1.1b2562048512 MB
pi4-4gbtinyllama:1.1b51240961024 MB
pi4-8gbllama3.2:1b102481922048 MB
pi5-4gbllama3.2:1b102481921536 MB
pi5-8gbphi3:mini2048163843072 MB

Features

Watchdog

Automatically detects unresponsive agents. If no heartbeat() call is received within the timeout window, the runtime emits a watchdog-restart event.
runtime.on("watchdog-restart", ({ reason, restarts }) => {
  console.log(`Watchdog triggered: ${reason} (${restarts} total)`);
  // Recreate or restart your agent here
});

Resource Monitor

Periodically checks CPU temperature, memory, and disk usage. Emits warnings when thresholds are exceeded.
runtime.on("thermal-warning", ({ temperature, threshold }) => {
  console.log(`CPU at ${temperature}°C (threshold: ${threshold}°C)`);
});

runtime.on("memory-warning", ({ usage_percent, threshold }) => {
  console.log(`Memory at ${usage_percent}% (threshold: ${threshold}%)`);
});

runtime.on("recovered", () => {
  console.log("Resources back to normal");
});

Health Endpoint

A lightweight HTTP server on port 9090 (configurable) responds to GET /health:
{
  "state": "running",
  "uptime_ms": 3600000,
  "watchdog_restarts": 0,
  "resources": { "cpu": { ... }, "memory": { ... }, "disk": { ... } },
  "degraded_reason": null
}

Config

preset
string | EdgePreset
required
Device preset ID or custom preset object.
agent
Agent
required
The agent instance to manage.
healthPort
number
default:"9090"
Port for the health check HTTP server.
disableHealthCheck
boolean
default:"false"
Disable the health endpoint entirely.

GPU Monitoring

The ResourceMonitor automatically detects NVIDIA GPUs via nvidia-smi and includes GPU metrics in every snapshot:
import { ResourceMonitor } from "@radaros/edge";

const monitor = new ResourceMonitor({ intervalMs: 5000 });

monitor.on("snapshot", (snap) => {
  if (snap.gpu) {
    console.log(`GPU: ${snap.gpu.name}`);
    console.log(`Memory: ${snap.gpu.memoryUsedGb.toFixed(1)}/${snap.gpu.memoryTotalGb.toFixed(1)} GB`);
    console.log(`Utilization: ${snap.gpu.utilizationPercent}%`);
    console.log(`Temperature: ${snap.gpu.temperatureC}°C`);
  }
});

monitor.on("gpu-warning", (data) => {
  console.log(`GPU HBM pressure: ${data.memoryUsedGb.toFixed(1)}/${data.memoryTotalGb.toFixed(1)} GB`);
});

monitor.start();

GPU Snapshot Fields

FieldTypeDescription
gpu.namestringGPU model name (e.g. “NVIDIA H100 SXM”)
gpu.memoryUsedGbnumberUsed HBM in GB
gpu.memoryTotalGbnumberTotal HBM in GB
gpu.utilizationPercentnumberGPU compute utilization (0–100)
gpu.temperatureCnumberGPU temperature in Celsius
The gpu-warning event fires when GPU memory usage exceeds the memoryThreshold (default 85%). GPU monitoring is automatic — if nvidia-smi is not available (e.g. on CPU-only machines), the gpu field is simply omitted from snapshots.

Connecting to Capacity Planning

The GPU snapshot data can be combined with the Capacity Planning module to compare actual GPU usage against theoretical capacity:
import { ResourceMonitor } from "@radaros/edge";
import {
  planCapacity, SessionProfiler,
  DEFAULT_ARCHITECTURES, DEFAULT_GPU_SPECS,
} from "@radaros/core";

const monitor = new ResourceMonitor({ intervalMs: 10_000 });

monitor.on("snapshot", (snap) => {
  if (!snap.gpu) return;

  // Real GPU data
  const freeGpuGb = snap.gpu.memoryTotalGb - snap.gpu.memoryUsedGb;

  // Theoretical capacity for this hardware
  const plan = planCapacity(
    DEFAULT_ARCHITECTURES["llama-3.1-70b"],
    {
      gpu: DEFAULT_GPU_SPECS["rtx-a5000"],
      gpuCount: 8,
      nandPerGpuGb: 0,
      nandBandwidthGBs: 7,
    },
    { extreme: 1, heavy: 2, medium: 3, light: 4 },
    "fp8", "int4",
  );

  console.log(`Actual free GPU memory: ${freeGpuGb.toFixed(1)} GB`);
  console.log(`Theoretical free for KV: ${plan.freeHbmForKvGb} GB`);
  console.log(`Utilization: ${snap.gpu.utilizationPercent}%`);
});

monitor.start();
When paired with the Session Profiler on the same EventBus, you get a complete picture: real GPU usage from nvidia-smi, real token counts from the LLM API, and theoretical capacity limits — all feeding into the same Prometheus metrics.