Evaluation
Synaptic provides an evaluation framework for measuring the quality of AI outputs. The Evaluator trait defines a standard interface for scoring predictions against references, and the Dataset + evaluate() pipeline makes it easy to run batch evaluations across many test cases.
The Evaluator Trait
All evaluators implement the Evaluator trait from synaptic_eval:
#[async_trait]
pub trait Evaluator: Send + Sync {
async fn evaluate(
&self,
prediction: &str,
reference: &str,
input: &str,
) -> Result<EvalResult, SynapticError>;
}
prediction-- the AI's output to evaluate.reference-- the expected or ground-truth answer.input-- the original input that produced the prediction.
EvalResult
Every evaluator returns an EvalResult:
pub struct EvalResult {
pub score: f64, // Between 0.0 and 1.0
pub passed: bool, // true if score >= 0.5
pub reasoning: Option<String>, // Optional explanation
}
Helper constructors:
| Method | Score | Passed |
|---|---|---|
EvalResult::pass() | 1.0 | true |
EvalResult::fail() | 0.0 | false |
EvalResult::with_score(0.75) | 0.75 | true (>= 0.5) |
You can attach reasoning with .with_reasoning("explanation").
Built-in Evaluators
Synaptic provides five evaluators out of the box:
| Evaluator | What It Checks |
|---|---|
ExactMatchEvaluator | Exact string equality (with optional case-insensitive mode) |
JsonValidityEvaluator | Whether the prediction is valid JSON |
RegexMatchEvaluator | Whether the prediction matches a regex pattern |
EmbeddingDistanceEvaluator | Cosine similarity between prediction and reference embeddings |
LLMJudgeEvaluator | Uses an LLM to score prediction quality on a 0-10 scale |
See Evaluators for detailed usage of each.
Batch Evaluation
The evaluate() function runs an evaluator across a Dataset of test cases, producing an EvalReport with aggregate statistics. See Datasets for details.
Guides
- Evaluators -- usage and configuration for each built-in evaluator
- Datasets -- batch evaluation with
Datasetandevaluate()