Watch Mode
Watch mode re-runs evals automatically when files change:
agent-eval-kit run --watch --mode=replay --suite=smokeShort form:
agent-eval-kit run -w --mode=replay -s smokeWhat it watches
Section titled “What it watches”Watch mode monitors files matching these extensions in your project directory:
.ts,.js— source and config files.jsonl— case files.yaml,.yml— case files
Changes are debounced (300ms) to avoid redundant runs when multiple files are saved simultaneously.
Recommended workflow
Section titled “Recommended workflow”- Record fixtures once:
agent-eval-kit record --suite=smoke - Start watch mode:
agent-eval-kit run -w --mode=replay -s smoke - Edit your graders or cases — results update automatically
This gives you a tight feedback loop while developing graders or tuning thresholds, with zero API cost.
Combining with other flags
Section titled “Combining with other flags”Watch mode works with all run flags:
# Watch with verbose outputagent-eval-kit run -w --mode=replay -s smoke -v
# Watch with specific reporteragent-eval-kit run -w --mode=replay -s smoke -r json -o results.json
# Watch with multiple trialsagent-eval-kit run -w --mode=replay -s smoke -t 3Platform notes
Section titled “Platform notes”Uses native fs.watch({ recursive: true }) on all platforms. Requires Node.js >= 20.16.0.