Skip to content

Plugin API

import type { EvalPlugin } from "agent-eval-kit/plugin";
interface EvalPlugin {
readonly name: string; // required, non-empty
readonly version: string; // required, non-empty
readonly graders?: Readonly<Record<string, GraderFn>>;
readonly hooks?: PluginHooks;
}

Plugin graders follow the same GraderFn signature as built-in graders:

type GraderFn = (
output: TargetOutput,
expected: CaseExpected | undefined,
context: GraderContext,
) => Promise<GradeResult>;
interface GraderContext {
caseId: string;
suiteId: string;
mode: "live" | "replay" | "judge-only";
graderName: string;
judge?: JudgeCallFn; // available if judge is configured
}

The judge function is injected from the config, allowing plugin graders to use LLM judging without managing their own LLM client.

interface GradeResult {
pass: boolean;
score: number; // 0–1
reason: string;
graderName: string;
metadata?: Record<string, unknown>; // arbitrary data (e.g., judgeCost)
}
interface PluginHooks {
readonly beforeRun?: (context: BeforeRunContext) => Promise<void>;
readonly afterTrial?: (trial: Trial, context: AfterTrialContext) => Promise<void>;
readonly afterRun?: (run: Run) => Promise<void>;
}
interface BeforeRunContext {
suiteId: string;
mode: "live" | "replay" | "judge-only";
caseCount: number;
trialCount: number;
}
interface AfterTrialContext {
suiteId: string;
completedCount: number;
totalCount: number;
}
  • Hooks execute sequentially in plugin registration order
  • beforeRun errors propagate — a failing hook stops the run
  • afterTrial and afterRun errors are caught and logged — they don’t interrupt the run

The config loader validates plugins at load time:

RuleError
Empty name"Plugin missing required 'name' field"
Empty version"Plugin '<name>' missing required 'version' field"
Duplicate grader name"Duplicate grader name '<name>' from plugin '<plugin>' (already registered by '<other>')"

Plugin types are available from the agent-eval-kit/plugin subpath:

import type {
EvalPlugin,
PluginHooks,
BeforeRunContext,
AfterTrialContext,
} from "agent-eval-kit/plugin";
import type { EvalPlugin } from "agent-eval-kit/plugin";
export const timingPlugin: EvalPlugin = {
name: "timing",
version: "1.0.0",
hooks: {
beforeRun: async (ctx) => {
console.time(`suite:${ctx.suiteId}`);
},
afterRun: async (run) => {
console.timeEnd(`suite:${run.suiteId}`);
console.log(`Total cost: $${run.summary.totalCost.toFixed(4)}`);
},
},
};