Reporters
Overview
Section titled “Overview”Reporters format eval results for different contexts. agent-eval-kit ships four built-in reporters.
Console (default)
Section titled “Console (default)”Human-readable terminal output with color-coded pass/fail indicators.
agent-eval-kit run --suite=smokeagent-eval-kit run --suite=smoke -r consoleFeatures:
- Per-case status with pass/fail icon, latency, and cost
- Multi-trial display:
N/M passedwith Wilson confidence interval - LLM reasoning (truncated to 200 chars in verbose mode)
- Judge cost aggregation
- Gate result with inline failure reasons
- Color-coded pass rates: green (≥90%), yellow (≥70%), red (below 70%)
Use --verbose for full grader details and LLM reasoning.
Outputs the complete Run object as pretty-printed JSON. This is the full data model — every trial, grade, score, and metadata field.
agent-eval-kit run --suite=smoke -r jsonagent-eval-kit run --suite=smoke -r json -o results.jsonThe JSON structure matches the Run type:
{ schemaVersion: "1.0.0", id: string, suiteId: string, mode: "live" | "replay" | "judge-only", trials: Trial[], summary: RunSummary, timestamp: string, configHash: string, frameworkVersion: string,}JUnit XML
Section titled “JUnit XML”Standard JUnit XML format for CI tool integration (Jenkins, GitHub Actions, etc.).
agent-eval-kit run --suite=smoke -r junit -o results.xml- Groups test cases by case ID
- Distinguishes between failures (grader failed) and errors (runtime exception)
- Strips XML 1.0 illegal control characters
- Writes to file when
--outputis specified
Markdown
Section titled “Markdown”Formatted markdown tables suitable for PR comments or documentation.
agent-eval-kit run --suite=smoke -r markdownagent-eval-kit run --suite=smoke -r markdown -o results.mdIncludes a summary table, per-case result table, and gate status section.
Using multiple reporters
Section titled “Using multiple reporters”Configure multiple reporters in your config file:
export default defineConfig({ reporters: [ "console", { reporter: "json", output: "results.json" }, { reporter: "junit", output: "results.xml" }, ], suites: [/* ... */],});Or override via CLI:
agent-eval-kit run --suite=smoke -r json -o results.jsonThe --reporter flag replaces the default console output on stdout. Config-level reporters always run in addition.
GitHub Actions integration
Section titled “GitHub Actions integration”In GitHub Actions, run results are automatically written to $GITHUB_STEP_SUMMARY when the environment variable is set, giving you formatted results directly in the PR checks UI.