Overview
One user action can trigger dozens of LLM calls. Token-unaware rate limits block 41% of legitimate traffic. RadarOS provides token-aware rate limiting with sliding windows, concurrency control, and graceful degradation.Quick Start
Token Rate Limiter
Sliding-window token counting with per-scope tracking:Concurrency Limiter
Control maximum concurrent LLM calls:Limit Reached Strategies
| Strategy | Behavior |
|---|---|
"queue" | Queue requests until capacity is available |
"reject" | Immediately reject with error |
"degrade" | Switch to cheaper model and reduce token limits |
Events
| Event | Payload |
|---|---|
rateLimit.throttled | { scope, limitType, resetMs } |
rateLimit.degraded | { scope, originalModel, degradedModel } |
rateLimit.rejected | { scope, reason } |