Skip to content

Plugins

Plugins extend agent-eval-kit with custom graders and lifecycle hooks. They are registered in your eval config and apply to all suites.

import { defineConfig } from "agent-eval-kit";
import { myPlugin } from "./my-plugin";
export default defineConfig({
plugins: [myPlugin],
suites: [/* ... */],
});

Plugin graders are used like any other grader:

import { myCustomGrader } from "./my-plugin";
defaultGraders: [
{ grader: myCustomGrader("some-config") },
]

A plugin is an object implementing the EvalPlugin interface:

import type { EvalPlugin } from "agent-eval-kit/plugin";
export const myPlugin: EvalPlugin = {
name: "my-plugin", // required, non-empty
version: "1.0.0", // required, non-empty
// Optional: custom graders
graders: {
myGrader: async (output, expected, context) => ({
pass: output.text?.includes("hello") ?? false,
score: output.text?.includes("hello") ? 1 : 0,
reason: "Checked for greeting",
graderName: "my-plugin/myGrader",
}),
},
// Optional: lifecycle hooks
hooks: {
beforeRun: async (context) => {
console.log(`Starting suite: ${context.suiteId}`);
},
afterTrial: async (trial, context) => {
console.log(`Trial ${context.completedCount}/${context.totalCount}`);
},
afterRun: async (run) => {
console.log(`Run complete: ${run.summary.passRate * 100}%`);
},
},
};

See the Plugin API Reference for the complete interface.

In the grader registry and MCP tools, plugin graders are namespaced as <plugin-name>/<grader-name> (e.g., my-plugin/myGrader).

The config loader validates plugins at load time:

  • name must be non-empty
  • version must be non-empty
  • Grader names must not conflict across plugins

Duplicate grader names across plugins cause a load error.