Anthropic Adapter¶

The AnthropicAdapter monkey-patches the Anthropic SDK so every call to client.messages.create() — sync or async — is automatically recorded into the active CaptureContext.

Install¶

pip install "evalcraft[anthropic]"

Quick start¶

from evalcraft.adapters import AnthropicAdapter
from evalcraft import CaptureContext
import anthropic

client = anthropic.Anthropic()

with CaptureContext(name="anthropic_test", save_path="tests/cassettes/anthropic.json") as ctx:
    with AnthropicAdapter():
        ctx.record_input("What's the weather in Paris?")
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            messages=[{"role": "user", "content": "What's the weather in Paris?"}],
        )
        ctx.record_output(response.content[0].text)

cassette = ctx.cassette
print(cassette.total_tokens)    # actual token count from API
print(cassette.total_cost_usd)  # estimated cost

How it works¶

AnthropicAdapter patches Messages and AsyncMessages at the class level, so all client instances are captured. On exit, the original methods are restored.

Async usage¶

import asyncio
import anthropic
from evalcraft.adapters import AnthropicAdapter
from evalcraft import CaptureContext

client = anthropic.AsyncAnthropic()

async def main():
    async with CaptureContext(name="async_anthropic") as ctx:
        async with AnthropicAdapter():
            ctx.record_input("Summarize the French Revolution")
            response = await client.messages.create(
                model="claude-3-5-haiku-20241022",
                max_tokens=512,
                messages=[{"role": "user", "content": "Summarize the French Revolution"}],
            )
            ctx.record_output(response.content[0].text)

asyncio.run(main())

Tool use¶

When Claude returns tool use blocks, they are included in the recorded span output:

with CaptureContext(name="tool_use_test") as ctx:
    with AnthropicAdapter():
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=[{
                "name": "get_weather",
                "description": "Get weather for a city",
                "input_schema": {
                    "type": "object",
                    "properties": {"city": {"type": "string"}}
                }
            }],
            messages=[{"role": "user", "content": "What's the weather in Paris?"}],
        )

# The span output includes: "[tool_use:get_weather({'city': 'Paris'})]"

Cost estimation¶

The adapter uses a built-in pricing table for Anthropic models:

Model	Input (per 1M tokens)	Output (per 1M tokens)
claude-opus-4-6	$15.00	$75.00
claude-sonnet-4-6	$3.00	$15.00
claude-haiku-4-5-20251001	$0.80	$4.00
claude-3-5-sonnet-20241022	$3.00	$15.00
claude-3-5-haiku-20241022	$0.80	$4.00
claude-3-opus-20240229	$15.00	$75.00
claude-3-haiku-20240307	$0.25	$1.25

For models not in the table, cost_usd is None.

Limitations¶

Not reentrant — do not nest two AnthropicAdapter contexts.
Patches the class, not a specific instance — all client instances are affected.
Streaming responses are not currently intercepted at the span level.

Combining with the pytest plugin¶

import pytest
import anthropic
from evalcraft.adapters import AnthropicAdapter
from evalcraft import assert_cost_under, assert_token_count_under
from evalcraft.eval.scorers import Evaluator

@pytest.mark.evalcraft_capture(name="anthropic_haiku_test")
def test_anthropic_haiku(capture_context):
    client = anthropic.Anthropic()

    with AnthropicAdapter():
        capture_context.record_input("What is 2+2?")
        response = client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=64,
            messages=[{"role": "user", "content": "What is 2+2?"}],
        )
        capture_context.record_output(response.content[0].text)

    cassette = capture_context.cassette
    assert cassette.llm_call_count == 1

    evaluator = Evaluator()
    evaluator.add(assert_cost_under, cassette, max_usd=0.001)
    evaluator.add(assert_token_count_under, cassette, max_tokens=100)
    result = evaluator.run()
    assert result.passed, str(result.failed_assertions)

Import paths¶

# Preferred
from evalcraft.adapters import AnthropicAdapter

# Direct
from evalcraft.adapters.anthropic_adapter import AnthropicAdapter