Watch Mode

Usage

Watch mode re-runs evals automatically when files change:

agent-eval-kit run --watch --mode=replay --suite=smoke

Short form:

agent-eval-kit run -w --mode=replay -s smoke

What it watches

Watch mode monitors files matching these extensions in your project directory:

.ts, .js — source and config files
.jsonl — case files
.yaml, .yml — case files

Changes are debounced (300ms) to avoid redundant runs when multiple files are saved simultaneously.

Recommended workflow

Record fixtures once: agent-eval-kit record --suite=smoke
Start watch mode: agent-eval-kit run -w --mode=replay -s smoke
Edit your graders or cases — results update automatically

This gives you a tight feedback loop while developing graders or tuning thresholds, with zero API cost.

Combining with other flags

Watch mode works with all run flags:

# Watch with verbose output
agent-eval-kit run -w --mode=replay -s smoke -v

# Watch with specific reporter
agent-eval-kit run -w --mode=replay -s smoke -r json -o results.json

# Watch with multiple trials
agent-eval-kit run -w --mode=replay -s smoke -t 3

Platform notes

Uses native fs.watch({ recursive: true }) on all platforms. Requires Node.js >= 20.16.0.