Skip to content

Watch Mode

Watch mode re-runs evals automatically when files change:

Terminal window
agent-eval-kit run --watch --mode=replay --suite=smoke

Short form:

Terminal window
agent-eval-kit run -w --mode=replay -s smoke

Watch mode monitors files matching these extensions in your project directory:

  • .ts, .js — source and config files
  • .jsonl — case files
  • .yaml, .yml — case files

Changes are debounced (300ms) to avoid redundant runs when multiple files are saved simultaneously.

  1. Record fixtures once: agent-eval-kit record --suite=smoke
  2. Start watch mode: agent-eval-kit run -w --mode=replay -s smoke
  3. Edit your graders or cases — results update automatically

This gives you a tight feedback loop while developing graders or tuning thresholds, with zero API cost.

Watch mode works with all run flags:

Terminal window
# Watch with verbose output
agent-eval-kit run -w --mode=replay -s smoke -v
# Watch with specific reporter
agent-eval-kit run -w --mode=replay -s smoke -r json -o results.json
# Watch with multiple trials
agent-eval-kit run -w --mode=replay -s smoke -t 3

Uses native fs.watch({ recursive: true }) on all platforms. Requires Node.js >= 20.16.0.