Observability
@radaros/observability is a separate, opt-in package that adds tracing, metrics, and structured logging to any RadarOS agent. It listens to the agent’s EventBus from the outside — zero changes to core, zero overhead when not installed.
Quick Start
Exporter Shorthands
Pass exporter names as strings — credentials are read from env vars automatically:How It Works
Theinstrument() function attaches three listeners to the agent’s EventBus:
- Tracer — builds a span tree from events (
run.start→tool.call→tool.result→run.complete) - MetricsCollector — counts runs, tool calls, errors, cache hits, and tracks latency histograms
- StructuredLogger — emits JSON log entries correlated with trace IDs
Provider Metrics in Traces & Logs
When a run completes, therun.complete event includes providerMetrics — the raw usage object from the underlying model API. This is automatically captured by:
- Tracer — stored as a span attribute (
providerMetrics) on the root run span - StructuredLogger — included in the JSON log payload for
run.completeevents - MetricsExporter — stored in the
RunRecordfor export and dashboard consumption - LangfuseExporter — forwarded as generation metadata in Langfuse
thoughtsTokenCount, prompt_tokens_details, cache_read_input_tokens) in your observability pipeline without any extra configuration.
Trace Tree
Everyagent.run() produces a trace like:
Exporters
| Shorthand | Env Vars | Description |
|---|---|---|
"console" | — | Pretty-print trace tree to terminal |
"langfuse" | LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY | Native Langfuse format with generations and spans |
"otel" | OTEL_EXPORTER_OTLP_ENDPOINT | OTLP/HTTP JSON to any OpenTelemetry collector |
"json-file" | — | Append traces to a JSON file |
CallbackExporter for custom integrations:
Metrics
Structured Logging
Three drain modes:traceId for correlation with traces.
Works With Teams & Workflows
UseinstrumentBus() to attach to any EventBus:
Langfuse Integration Example
Langfuse provides an open-source LLM observability dashboard. Set up in 3 steps:- Traces for each
agent.run()with duration, token usage, and cost - Generations for each LLM call within a run
- Spans for tool calls, handoffs, and other operations
- Sessions grouping traces by
sessionId
OpenTelemetry Export
Send traces to any OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, etc.):Building a Custom Dashboard
Combine metrics and events to build a real-time dashboard:Capacity Metrics
When the Session Profiler is attached to the sameEventBus, MetricsExporter automatically captures capacity-related metrics:
AgentMetrics fields
| Field | Type | Description |
|---|---|---|
estimatedKvCacheGb | number? | Estimated total KV cache memory across all sessions |
avgContextLength | number? | Average prompt tokens per run |
sessionCategories | Record<string, number>? | Session counts by category (light/medium/heavy/extreme) |
Prometheus output
ThetoPrometheus() method includes three new capacity counters:
capacity.session.classified and capacity.warning events are emitted on the EventBus — no additional configuration needed.