Skip to content

Custom Graders

A grader is any async function matching the GraderFn signature. You can write custom graders for domain-specific checks that the 20 built-in graders don’t cover.

import type { GraderFn } from "agent-eval-kit/graders";
const myGrader: GraderFn = async (output, expected, context) => {
const hasGreeting = output.text?.toLowerCase().includes("hello") ?? false;
return {
pass: hasGreeting,
score: hasGreeting ? 1 : 0,
reason: hasGreeting ? "Contains greeting" : "Missing greeting",
graderName: "myGrader",
};
};

For configurable graders, create a factory function:

import type { GraderFn } from "agent-eval-kit/graders";
function minWordCount(min: number): GraderFn {
return async (output, _expected, _context) => {
const count = (output.text ?? "").split(/\s+/).filter(Boolean).length;
const pass = count >= min;
return {
pass,
score: pass ? 1 : Math.min(count / min, 1),
reason: pass ? `${count} words (>= ${min})` : `Only ${count} words (need ${min})`,
graderName: "minWordCount",
};
};
}

Use it like any built-in grader:

defaultGraders: [
{ grader: minWordCount(50) },
]

If you need LLM judging, access it via context.judge:

const domainExpert: GraderFn = async (output, expected, context) => {
if (!context.judge) {
return {
pass: false,
score: 0,
reason: "No judge configured",
graderName: "domainExpert",
};
}
const response = await context.judge([
{ role: "system", content: "You are a medical accuracy expert..." },
{ role: "user", content: `Evaluate: ${output.text}` },
], { temperature: 0 });
// Parse the response...
return {
pass: true,
score: 1,
reason: response.text,
graderName: "domainExpert",
};
};

Built-in deterministic graders follow strict rules you should also follow:

  • Pure functions: No I/O, no side effects, no external state
  • Deterministic: Same input always produces same output
  • Score range: Always 0–1
  • graderName: Always set — identifies the grader in reports

To share graders across projects, package them as a plugin:

import type { EvalPlugin } from "agent-eval-kit/plugin";
export const myGraders: EvalPlugin = {
name: "my-graders",
version: "1.0.0",
graders: {
minWordCount: minWordCount(50),
domainExpert,
},
};

See the Plugin API for the full plugin interface.