Eval Framework
The@radaros/eval package provides automated quality testing for agents. Define test cases, run them against your agent, and score the outputs using built-in or custom scorers.
Quick Start
Built-in Scorers
| Scorer | Description |
|---|---|
contains(text) | Output contains the expected string |
regexMatch(pattern) | Output matches a regex pattern |
semanticSimilarity({ expected, embedding }) | Cosine similarity above threshold |
llmJudge({ model, criteria }) | LLM rates output on criteria (relevance, helpfulness, etc.) |
jsonMatch(fields) | Structured output fields match expected values |
custom(name, fn) | User-defined scoring function |
LLM-as-Judge
Use another model to evaluate the output:Reporters
| Reporter | Output |
|---|---|
ConsoleReporter | Pretty-printed table in the terminal |
JsonReporter | JSON file with detailed results |