Edge Runtime
The EdgeRuntime manages an agent on constrained hardware with automatic watchdog restarts, resource monitoring, health endpoints, and graceful degradation.
Quick Start
import { Agent, ollama } from "@radaros/core";
import { EdgeRuntime, SystemToolkit, edgePreset } from "@radaros/edge";
const preset = edgePreset("pi5-8gb");
const agent = new Agent({
name: "pi-agent",
model: ollama(preset.recommendedModel),
instructions: "You are a Raspberry Pi assistant.",
tools: [...new SystemToolkit().getTools()],
});
const runtime = new EdgeRuntime({
preset,
agent,
healthPort: 9090,
});
await runtime.start();
// Signal agent activity to prevent watchdog restarts
runtime.heartbeat();
// Check status
const status = runtime.getStatus();
console.log(status.state); // "running" | "degraded" | "stopped"
// Shutdown
await runtime.stop();
Presets
Use edgePreset(id) to get optimized defaults for your device:
import { edgePreset, listEdgePresets, customEdgePreset } from "@radaros/edge";
const presets = listEdgePresets();
// [{ id: "pi4-2gb", label: "..." }, { id: "pi4-4gb", label: "..." }, ...]
const preset = edgePreset("pi5-8gb");
// { recommendedModel: "phi3:mini", maxTokens: 2048, contextWindow: 16384, ... }
// Customize a preset
const custom = customEdgePreset("pi5-8gb", { maxTokens: 4096 });
| Preset | Model | Max Tokens | Context | Memory Limit |
|---|
pi4-2gb | tinyllama:1.1b | 256 | 2048 | 512 MB |
pi4-4gb | tinyllama:1.1b | 512 | 4096 | 1024 MB |
pi4-8gb | llama3.2:1b | 1024 | 8192 | 2048 MB |
pi5-4gb | llama3.2:1b | 1024 | 8192 | 1536 MB |
pi5-8gb | phi3:mini | 2048 | 16384 | 3072 MB |
Features
Watchdog
Automatically detects unresponsive agents. If no heartbeat() call is received within the timeout window, the runtime emits a watchdog-restart event.
runtime.on("watchdog-restart", ({ reason, restarts }) => {
console.log(`Watchdog triggered: ${reason} (${restarts} total)`);
// Recreate or restart your agent here
});
Resource Monitor
Periodically checks CPU temperature, memory, and disk usage. Emits warnings when thresholds are exceeded.
runtime.on("thermal-warning", ({ temperature, threshold }) => {
console.log(`CPU at ${temperature}°C (threshold: ${threshold}°C)`);
});
runtime.on("memory-warning", ({ usage_percent, threshold }) => {
console.log(`Memory at ${usage_percent}% (threshold: ${threshold}%)`);
});
runtime.on("recovered", () => {
console.log("Resources back to normal");
});
Health Endpoint
A lightweight HTTP server on port 9090 (configurable) responds to GET /health:
{
"state": "running",
"uptime_ms": 3600000,
"watchdog_restarts": 0,
"resources": { "cpu": { ... }, "memory": { ... }, "disk": { ... } },
"degraded_reason": null
}
Config
preset
string | EdgePreset
required
Device preset ID or custom preset object.
The agent instance to manage.
Port for the health check HTTP server.
Disable the health endpoint entirely.
GPU Monitoring
The ResourceMonitor automatically detects NVIDIA GPUs via nvidia-smi and includes GPU metrics in every snapshot:
import { ResourceMonitor } from "@radaros/edge";
const monitor = new ResourceMonitor({ intervalMs: 5000 });
monitor.on("snapshot", (snap) => {
if (snap.gpu) {
console.log(`GPU: ${snap.gpu.name}`);
console.log(`Memory: ${snap.gpu.memoryUsedGb.toFixed(1)}/${snap.gpu.memoryTotalGb.toFixed(1)} GB`);
console.log(`Utilization: ${snap.gpu.utilizationPercent}%`);
console.log(`Temperature: ${snap.gpu.temperatureC}°C`);
}
});
monitor.on("gpu-warning", (data) => {
console.log(`GPU HBM pressure: ${data.memoryUsedGb.toFixed(1)}/${data.memoryTotalGb.toFixed(1)} GB`);
});
monitor.start();
GPU Snapshot Fields
| Field | Type | Description |
|---|
gpu.name | string | GPU model name (e.g. “NVIDIA H100 SXM”) |
gpu.memoryUsedGb | number | Used HBM in GB |
gpu.memoryTotalGb | number | Total HBM in GB |
gpu.utilizationPercent | number | GPU compute utilization (0–100) |
gpu.temperatureC | number | GPU temperature in Celsius |
The gpu-warning event fires when GPU memory usage exceeds the memoryThreshold (default 85%). GPU monitoring is automatic — if nvidia-smi is not available (e.g. on CPU-only machines), the gpu field is simply omitted from snapshots.
Connecting to Capacity Planning
The GPU snapshot data can be combined with the Capacity Planning module to compare actual GPU usage against theoretical capacity:
import { ResourceMonitor } from "@radaros/edge";
import {
planCapacity, SessionProfiler,
DEFAULT_ARCHITECTURES, DEFAULT_GPU_SPECS,
} from "@radaros/core";
const monitor = new ResourceMonitor({ intervalMs: 10_000 });
monitor.on("snapshot", (snap) => {
if (!snap.gpu) return;
// Real GPU data
const freeGpuGb = snap.gpu.memoryTotalGb - snap.gpu.memoryUsedGb;
// Theoretical capacity for this hardware
const plan = planCapacity(
DEFAULT_ARCHITECTURES["llama-3.1-70b"],
{
gpu: DEFAULT_GPU_SPECS["rtx-a5000"],
gpuCount: 8,
nandPerGpuGb: 0,
nandBandwidthGBs: 7,
},
{ extreme: 1, heavy: 2, medium: 3, light: 4 },
"fp8", "int4",
);
console.log(`Actual free GPU memory: ${freeGpuGb.toFixed(1)} GB`);
console.log(`Theoretical free for KV: ${plan.freeHbmForKvGb} GB`);
console.log(`Utilization: ${snap.gpu.utilizationPercent}%`);
});
monitor.start();
When paired with the Session Profiler on the same EventBus, you get a complete picture: real GPU usage from nvidia-smi, real token counts from the LLM API, and theoretical capacity limits — all feeding into the same Prometheus metrics.