Token-aware Context Compaction

Agents with long conversations or large tool results can exceed the model’s context window. The ContextCompactor automatically manages context size before each LLM call.

Configuration

import { Agent, openai } from "@radaros/core";

const agent = new Agent({
  name: "long-conversation-bot",
  model: openai("gpt-4o"),
  contextCompactor: {
    maxContextTokens: 120_000, // for 128k context window
    reserveTokens: 4096,      // leave room for the response
    strategy: "hybrid",       // "trim" | "summarize" | "hybrid"
    summarizeModel: openai("gpt-4o-mini"), // cheap model for summaries
  },
});

Strategies

Trim

Drops oldest non-system messages first, keeping the system prompt and most recent exchanges intact.

Summarize

Uses a cheap model to summarize older messages into a single compact summary, preserving key context while reducing token count.

Hybrid

Trims first, then summarizes if still over budget. Best balance of speed and context preservation.

How It Works

The compactor hooks into the beforeLLMCall loop hook and runs before every LLM API call:

Estimates token count of all messages
If under budget, passes through unchanged
If over budget, applies the configured strategy
Returns the compacted messages to the LLM

System messages are always preserved. The most recent user/assistant exchanges are prioritized.

Loop Hooks Cost Auto-Stop

Getting Started

Agents

Memory

Skills

Handoff

Cost Tracking

Semantic Cache

Eval Framework

Compliance & Audit

Culture System

Webhooks

Capacity Planning

Observability

Voice Agents

Browser Agents

Models

Teams

Workflows

Storage

Knowledge & RAG

Toolkits

MCP (Model Context Protocol)

A2A (Agent-to-Agent)

Edge & IoT

Transport

Queue

Scheduling

Advanced Features

Context Compaction

Token-aware Context Compaction

Configuration

Strategies

Trim

Summarize

Hybrid

How It Works

Getting Started

Agents

Memory

Skills

Handoff

Cost Tracking

Semantic Cache

Eval Framework

Compliance & Audit

Culture System

Webhooks

Capacity Planning

Observability

Voice Agents

Browser Agents

Models

Teams

Workflows

Storage

Knowledge & RAG

Toolkits

MCP (Model Context Protocol)

A2A (Agent-to-Agent)

Edge & IoT

Transport

Queue

Scheduling

Advanced Features

​Token-aware Context Compaction

​Configuration

​Strategies

​Trim

​Summarize

​Hybrid

​How It Works

Token-aware Context Compaction

Configuration

Strategies

Trim

Summarize

Hybrid

How It Works