Semantic Cache
Semantic caching stores LLM responses indexed by the semantic meaning of the input. When a similar query arrives, the cached response is returned without calling the LLM — reducing costs and latency.Quick Start
Configuration
Scope
| Scope | Behavior |
|---|---|
global | All agents share one cache |
agent | Each agent has its own cache partition |
session | Each session has its own cache partition |
How It Works
- Before calling the LLM, the input is embedded and searched against the vector store
- If a result exceeds the
similarityThreshold, it’s returned as a cache hit - Output guardrails still run on cached responses
- After an LLM call, the input + output are stored in the vector store (fire-and-forget)
- TTL is enforced on lookup — expired entries are evicted lazily
Events
| Event | Payload |
|---|---|
cache.hit | { agentName, input, cachedId } |
cache.miss | { agentName, input } |
Supported Backends
AnyVectorStore implementation works: InMemoryVectorStore, QdrantVectorStore, MongoDBVectorStore, PgVectorStore.