Installation
Prerequisites
Section titled “Prerequisites”- Node.js 20.16+ (22 LTS recommended for native TypeScript execution)
- pnpm, npm, yarn, or bun package manager
Install
Section titled “Install”# pnpm (recommended)pnpm add -D agent-eval-kit
# npmnpm install --save-dev agent-eval-kit
# yarnyarn add --dev agent-eval-kit
# bunbun add -D agent-eval-kitVerify installation
Section titled “Verify installation”npx agent-eval-kit --helpYou should see the available commands listed.
Initialize a project
Section titled “Initialize a project”The init wizard creates a starter config, case files, and optionally a GitHub Actions workflow:
npx agent-eval-kit initThis creates:
eval.config.ts— main configuration file with a framework-specific target stubcases/smoke.jsonl— 3 starter test cases.eval-fixtures/.gitkeep— fixture directory.github/workflows/evals.yml— CI workflow (optional)AGENTS.md— AI agent boundaries file (optional)
The wizard auto-detects your framework (Vercel AI SDK, LangChain, Mastra, or custom) and package manager.
For non-interactive setup: npx agent-eval-kit init --yes
Package exports
Section titled “Package exports”agent-eval-kit provides several subpath exports for targeted imports:
| Import | Description |
|---|---|
agent-eval-kit | Main entry — defineConfig, runner, storage, comparison, caching utilities |
agent-eval-kit/graders | All 20 graders, composition operators, scoring, grader types |
agent-eval-kit/plugin | Plugin interface types (EvalPlugin, PluginHooks) |
agent-eval-kit/reporters | resolveReporter — for custom reporter plugin integration |
agent-eval-kit/comparison | compareRuns, formatComparisonReport |
agent-eval-kit/fixtures | Fixture loading and management |
agent-eval-kit/watcher | File watcher for watch mode |
Next steps
Section titled “Next steps”Continue to Quick Start to run your first eval.