Datasets

The Dataset type and evaluate() function provide a batch evaluation pipeline. You define a dataset of input-reference pairs, generate predictions, and score them all at once to produce an EvalReport.

Creating a Dataset

A Dataset is a collection of DatasetItem values, each with an input and a reference (expected answer):

use synaptic::eval::{Dataset, DatasetItem};

// From DatasetItem structs
let dataset = Dataset::new(vec![
    DatasetItem {
        input: "What is 2+2?".to_string(),
        reference: "4".to_string(),
    },
    DatasetItem {
        input: "Capital of France?".to_string(),
        reference: "Paris".to_string(),
    },
]);

// From string pairs (convenience method)
let dataset = Dataset::from_pairs(vec![
    ("What is 2+2?", "4"),
    ("Capital of France?", "Paris"),
]);

Running Batch Evaluation

The evaluate() function takes an evaluator, a dataset, and a slice of predictions. It evaluates each prediction against the corresponding dataset item and returns an EvalReport:

use synaptic::eval::{evaluate, Dataset, ExactMatchEvaluator};

let dataset = Dataset::from_pairs(vec![
    ("What is 2+2?", "4"),
    ("Capital of France?", "Paris"),
    ("Largest ocean?", "Pacific"),
]);

let evaluator = ExactMatchEvaluator::new();

// Your model's predictions (one per dataset item)
let predictions = vec![
    "4".to_string(),
    "Paris".to_string(),
    "Atlantic".to_string(),  // Wrong!
];

let report = evaluate(&evaluator, &dataset, &predictions).await?;

println!("Total: {}", report.total);      // 3
println!("Passed: {}", report.passed);     // 2
println!("Accuracy: {:.0}%", report.accuracy * 100.0);  // 67%

The number of predictions must match the number of dataset items. If they differ, evaluate() returns a SynapticError::Validation.

`EvalReport`

The report contains aggregate statistics and per-item results:

pub struct EvalReport {
    pub total: usize,
    pub passed: usize,
    pub accuracy: f32,
    pub results: Vec<EvalResult>,
}

You can inspect individual results for detailed feedback:

for (i, result) in report.results.iter().enumerate() {
    let status = if result.passed { "PASS" } else { "FAIL" };
    let reason = result.reasoning.as_deref().unwrap_or("--");
    println!("[{status}] Item {i}: score={:.2}, reason={reason}", result.score);
}

End-to-End Example

A typical evaluation workflow:

Build a dataset of test cases.
Run your model/chain on each input to produce predictions.
Score predictions with an evaluator.
Inspect the report.

use synaptic::eval::{evaluate, Dataset, ExactMatchEvaluator};

// 1. Dataset
let dataset = Dataset::from_pairs(vec![
    ("2+2", "4"),
    ("3*5", "15"),
    ("10/2", "5"),
]);

// 2. Generate predictions (in practice, run your model)
let predictions: Vec<String> = dataset.items.iter()
    .map(|item| {
        // Simulated model output
        match item.input.as_str() {
            "2+2" => "4",
            "3*5" => "15",
            "10/2" => "5",
            _ => "unknown",
        }.to_string()
    })
    .collect();

// 3. Evaluate
let evaluator = ExactMatchEvaluator::new();
let report = evaluate(&evaluator, &dataset, &predictions).await?;

// 4. Report
println!("Accuracy: {:.0}% ({}/{})",
    report.accuracy * 100.0, report.passed, report.total);

Using Different Evaluators

The evaluate() function works with any Evaluator. Swap in a different evaluator to change the scoring criteria without modifying the dataset or prediction pipeline:

use synaptic::eval::{evaluate, RegexMatchEvaluator};

// Check that predictions contain a date
let evaluator = RegexMatchEvaluator::new(r"\d{4}-\d{2}-\d{2}")?;
let report = evaluate(&evaluator, &dataset, &predictions).await?;

Keyboard shortcuts