User Guide Overview¶

Welcome to the Evalcraft user guide. Use the links below to navigate to specific topics.

Core workflow¶

The typical Evalcraft workflow has three phases:

1. Capture (once)¶

Run your agent with a CaptureContext active. Every LLM call, tool invocation, and agent decision is recorded into a cassette (a plain JSON file).

from evalcraft import CaptureContext

with CaptureContext(name="my_test", save_path="tests/cassettes/my_test.json") as ctx:
    ctx.record_input("user prompt")
    result = my_agent.run("user prompt")
    ctx.record_output(result)

Commit the cassette to git. This is your ground truth.

2. Replay (every test run)¶

Load the cassette and replay it. No API calls. No cost. 200ms.

from evalcraft import replay

run = replay("tests/cassettes/my_test.json")
assert run.replayed is True

3. Assert (in CI)¶

Use the built-in scorers to assert behavior.

from evalcraft import assert_tool_called, assert_cost_under

assert assert_tool_called(run, "web_search").passed
assert assert_cost_under(run, max_usd=0.05).passed

Guide sections¶

Section	What you'll learn
Quickstart	Full working example in 5 minutes
Concepts	What cassettes, spans, and fingerprints are
Capture API	`CaptureContext`, `record_llm_call`, `record_tool_call`
Replay Engine	`ReplayEngine`, overrides, diffs
Mock LLM & Tools	`MockLLM`, `MockTool`
Scorers	All `assert_*` functions and `Evaluator`
pytest Plugin	Fixtures and markers for pytest integration
CLI Reference	`capture`, `replay`, `diff`, `eval`, `info`, `mock` commands
Adapters	Auto-capture for OpenAI, Anthropic, LangGraph, CrewAI
CI/CD	GitHub Actions workflows
Changelog	What's new in each release