Plugin API
EvalPlugin interface
Section titled “EvalPlugin interface”import type { EvalPlugin } from "agent-eval-kit/plugin";
interface EvalPlugin { readonly name: string; // required, non-empty readonly version: string; // required, non-empty readonly graders?: Readonly<Record<string, GraderFn>>; readonly hooks?: PluginHooks;}Grader functions
Section titled “Grader functions”Plugin graders follow the same GraderFn signature as built-in graders:
type GraderFn = ( output: TargetOutput, expected: CaseExpected | undefined, context: GraderContext,) => Promise<GradeResult>;GraderContext
Section titled “GraderContext”interface GraderContext { caseId: string; suiteId: string; mode: "live" | "replay" | "judge-only"; graderName: string; judge?: JudgeCallFn; // available if judge is configured}The judge function is injected from the config, allowing plugin graders to use LLM judging without managing their own LLM client.
GradeResult
Section titled “GradeResult”interface GradeResult { pass: boolean; score: number; // 0–1 reason: string; graderName: string; metadata?: Record<string, unknown>; // arbitrary data (e.g., judgeCost)}Lifecycle hooks
Section titled “Lifecycle hooks”interface PluginHooks { readonly beforeRun?: (context: BeforeRunContext) => Promise<void>; readonly afterTrial?: (trial: Trial, context: AfterTrialContext) => Promise<void>; readonly afterRun?: (run: Run) => Promise<void>;}BeforeRunContext
Section titled “BeforeRunContext”interface BeforeRunContext { suiteId: string; mode: "live" | "replay" | "judge-only"; caseCount: number; trialCount: number;}AfterTrialContext
Section titled “AfterTrialContext”interface AfterTrialContext { suiteId: string; completedCount: number; totalCount: number;}Hook execution order
Section titled “Hook execution order”- Hooks execute sequentially in plugin registration order
beforeRunerrors propagate — a failing hook stops the runafterTrialandafterRunerrors are caught and logged — they don’t interrupt the run
Validation rules
Section titled “Validation rules”The config loader validates plugins at load time:
| Rule | Error |
|---|---|
Empty name | "Plugin missing required 'name' field" |
Empty version | "Plugin '<name>' missing required 'version' field" |
| Duplicate grader name | "Duplicate grader name '<name>' from plugin '<plugin>' (already registered by '<other>')" |
Package exports
Section titled “Package exports”Plugin types are available from the agent-eval-kit/plugin subpath:
import type { EvalPlugin, PluginHooks, BeforeRunContext, AfterTrialContext,} from "agent-eval-kit/plugin";Example: timing plugin
Section titled “Example: timing plugin”import type { EvalPlugin } from "agent-eval-kit/plugin";
export const timingPlugin: EvalPlugin = { name: "timing", version: "1.0.0", hooks: { beforeRun: async (ctx) => { console.time(`suite:${ctx.suiteId}`); }, afterRun: async (run) => { console.timeEnd(`suite:${run.suiteId}`); console.log(`Total cost: $${run.summary.totalCost.toFixed(4)}`); }, },};