Skip to content

Reporters

Reporters format eval results for different contexts. agent-eval-kit ships four built-in reporters.

Human-readable terminal output with color-coded pass/fail indicators.

Terminal window
agent-eval-kit run --suite=smoke
agent-eval-kit run --suite=smoke -r console

Features:

  • Per-case status with pass/fail icon, latency, and cost
  • Multi-trial display: N/M passed with Wilson confidence interval
  • LLM reasoning (truncated to 200 chars in verbose mode)
  • Judge cost aggregation
  • Gate result with inline failure reasons
  • Color-coded pass rates: green (≥90%), yellow (≥70%), red (below 70%)

Use --verbose for full grader details and LLM reasoning.

Outputs the complete Run object as pretty-printed JSON. This is the full data model — every trial, grade, score, and metadata field.

Terminal window
agent-eval-kit run --suite=smoke -r json
agent-eval-kit run --suite=smoke -r json -o results.json

The JSON structure matches the Run type:

{
schemaVersion: "1.0.0",
id: string,
suiteId: string,
mode: "live" | "replay" | "judge-only",
trials: Trial[],
summary: RunSummary,
timestamp: string,
configHash: string,
frameworkVersion: string,
}

Standard JUnit XML format for CI tool integration (Jenkins, GitHub Actions, etc.).

Terminal window
agent-eval-kit run --suite=smoke -r junit -o results.xml
  • Groups test cases by case ID
  • Distinguishes between failures (grader failed) and errors (runtime exception)
  • Strips XML 1.0 illegal control characters
  • Writes to file when --output is specified

Formatted markdown tables suitable for PR comments or documentation.

Terminal window
agent-eval-kit run --suite=smoke -r markdown
agent-eval-kit run --suite=smoke -r markdown -o results.md

Includes a summary table, per-case result table, and gate status section.

Configure multiple reporters in your config file:

export default defineConfig({
reporters: [
"console",
{ reporter: "json", output: "results.json" },
{ reporter: "junit", output: "results.xml" },
],
suites: [/* ... */],
});

Or override via CLI:

Terminal window
agent-eval-kit run --suite=smoke -r json -o results.json

The --reporter flag replaces the default console output on stdout. Config-level reporters always run in addition.

In GitHub Actions, run results are automatically written to $GITHUB_STEP_SUMMARY when the environment variable is set, giving you formatted results directly in the PR checks UI.