Agent Governance

Pydantic AI Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 14, 2026 · 9 min read

PydanticAI makes it straightforward to give an agent a Stripe tool and get typed, validated results back from a language model. It also makes it easy to accidentally double-charge a customer the moment you add a ModelRetry — because retrying a tool in PydanticAI means calling it again, and Stripe has no way to know it's a retry without an idempotency key.

This post covers three failure modes specific to PydanticAI's architecture — ModelRetry replay without idempotency keys, shared stripe_client across concurrent RunContext dependencies, and structured result validation loops that re-execute the tool call sequence — and shows the governance pattern that closes all three: restricted Stripe API keys as a first layer, per-run vault keys via a proxy as a second layer.

The standard PydanticAI Stripe pattern

PydanticAI agents are defined with a result type and a set of tools. A Stripe billing agent looks like this:

from dataclasses import dataclass
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext
import stripe
import os

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

@dataclass
class StripeDeps:
    stripe_key: str  # injected at run time

class ChargeResult(BaseModel):
    charge_id: str
    amount_cents: int
    status: str

billing_agent = Agent(
    "openai:gpt-4o",
    deps_type=StripeDeps,
    result_type=ChargeResult,
    system_prompt=(
        "You are a billing agent. Use the stripe_charge tool to create Stripe charges. "
        "Return a ChargeResult with the charge_id, amount_cents, and status."
    ),
)

@billing_agent.tool
async def stripe_charge(
    ctx: RunContext[StripeDeps],
    customer_id: str,
    amount_cents: int,
    description: str,
) -> str:
    """Create a Stripe charge. Returns the charge ID on success."""
    charge = stripe.Charge.create(
        customer=customer_id,
        amount=amount_cents,
        currency="usd",
        description=description,
        api_key=ctx.deps.stripe_key,
    )
    return charge.id

# usage
import asyncio

async def run():
    deps = StripeDeps(stripe_key=os.environ["STRIPE_SECRET_KEY"])
    result = await billing_agent.run(
        "Charge customer cus_ABC123 $29.99 for Pro plan",
        deps=deps,
    )
    print(result.data.charge_id)  # ch_3R4...

asyncio.run(run())

This works for simple cases. But PydanticAI's retry and validation mechanisms — each designed to make agents more reliable — introduce specific failure modes when the tools involved move real money.

Failure mode 1: ModelRetry replay without idempotency keys

Risk: When a tool raises ModelRetry, PydanticAI sends the error back to the LLM and asks it to call the tool again with corrected arguments. If the tool already executed a Stripe charge before raising the retry, the LLM's second attempt creates a duplicate charge. Without an idempotency key, Stripe bills the customer twice.

ModelRetry is PydanticAI's mechanism for giving the LLM a second chance when a tool call produces unusable output. You raise it from inside a tool when the LLM passed bad arguments — a customer ID that doesn't exist, an amount below Stripe's minimum, a currency that isn't supported. The LLM sees the error message and is prompted to try again.

from pydantic_ai import ModelRetry

@billing_agent.tool
async def stripe_charge(
    ctx: RunContext[StripeDeps],
    customer_id: str,
    amount_cents: int,
    description: str,
) -> str:
    """Create a Stripe charge. Returns the charge ID on success."""
    # Validate first — raises ModelRetry before the Stripe call
    if amount_cents < 50:
        raise ModelRetry(f"amount_cents must be >= 50, got {amount_cents}")

    charge = stripe.Charge.create(  # ← Stripe is called HERE
        customer=customer_id,
        amount=amount_cents,
        currency="usd",
        description=description,
        api_key=ctx.deps.stripe_key,
    )

    # What if the charge succeeded but the response parsing fails?
    # Or if the LLM re-calls this tool because it misread the return value?
    return charge.id

The problem appears at the seam between tool execution and LLM interpretation. Stripe's API can succeed (charge created, money moved) while the LLM's subsequent reasoning about the result fails — maybe it expected a JSON object but got a bare charge ID string, or it received a network timeout after Stripe confirmed the charge. In either case, PydanticAI's retry mechanism can cause the tool to be invoked a second time.

The fix is to bind a stable idempotency key at the start of each agent run, before any tool is ever called. The key must survive across all retries within a single agent.run() invocation:

import uuid
from pydantic_ai import Agent, RunContext, ModelRetry

@dataclass
class StripeDeps:
    stripe_key: str
    run_idempotency_key: str  # generated once per agent.run() call

@billing_agent.tool
async def stripe_charge(
    ctx: RunContext[StripeDeps],
    customer_id: str,
    amount_cents: int,
    description: str,
) -> str:
    """Create a Stripe charge. Returns the charge ID on success."""
    if amount_cents < 50:
        raise ModelRetry(f"amount_cents must be >= 50, got {amount_cents}")

    # Idempotency key is stable for the entire run — ModelRetry retries get the same key
    idem_key = f"{ctx.deps.run_idempotency_key}-charge"

    try:
        charge = stripe.Charge.create(
            customer=customer_id,
            amount=amount_cents,
            currency="usd",
            description=description,
            api_key=ctx.deps.stripe_key,
            idempotency_key=idem_key,
        )
    except stripe.error.IdempotencyError:
        # Same key, different parameters — safe to surface as ModelRetry
        raise ModelRetry("A charge with different parameters was already created for this run")

    return charge.id

# Caller generates the run key — one UUID per agent.run() call
async def charge_customer(customer_id: str, amount_cents: int, description: str):
    deps = StripeDeps(
        stripe_key=os.environ["STRIPE_SECRET_KEY"],
        run_idempotency_key=str(uuid.uuid4()),
    )
    result = await billing_agent.run(
        f"Charge {customer_id} {amount_cents} cents for {description}",
        deps=deps,
    )
    return result.data

Pattern: Generate one UUID per agent.run() call and inject it into deps. The tool reads it from ctx.deps.run_idempotency_key, making every retry within the same run idempotent. Stripe returns the original charge object on duplicate requests — no double billing.

For a deeper look at idempotency patterns across frameworks, see our post on Stripe idempotency keys for AI agents.

Failure mode 2: Shared stripe_client across concurrent RunContext deps

Risk: If the StripeDeps object is initialized once at module load time and reused across agent.run() calls, all concurrent runs share the same Stripe API key. A billing agent and a refund agent running concurrently share one key — meaning each agent has access to the other's Stripe permissions.

PydanticAI's dependency injection system is intentionally flexible: you can create a deps object once and pass it to every agent.run() call. In a FastAPI or async web service, it's tempting to initialize the Stripe client once at startup for performance:

# worker.py — initialized once at module load
import stripe
from dataclasses import dataclass

@dataclass
class SharedStripeDeps:
    stripe_key: str

# One key for the whole worker — set once at startup
STRIPE_DEPS = SharedStripeDeps(
    stripe_key=os.environ["STRIPE_SECRET_KEY"]  # sk_live_... with all permissions
)

billing_agent = Agent("openai:gpt-4o", deps_type=SharedStripeDeps, result_type=ChargeResult)
refund_agent  = Agent("openai:gpt-4o", deps_type=SharedStripeDeps, result_type=RefundResult)

@app.post("/charge")
async def charge(req: ChargeRequest):
    return await billing_agent.run(req.prompt, deps=STRIPE_DEPS)  # ← shared key

@app.post("/refund")
async def refund(req: RefundRequest):
    return await billing_agent.run(req.prompt, deps=STRIPE_DEPS)  # ← same key

The issue: STRIPE_DEPS is a single object shared across all concurrent requests. The billing agent needs Charges:Write. The refund agent needs Refunds:Write. The shared key must have both — which means the billing agent is running with Refunds:Write it should never need, and the refund agent has Charges:Write. A prompt injection in either agent can cross into the other's domain.

Stripe restricted keys help at the first layer — you can create separate restricted keys per agent type. But restricted keys alone don't enforce per-run spend caps or give you an audit trail that correlates a specific agent.run() invocation to a specific Stripe charge. A vault proxy adds that second layer:

import httpx
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass

@dataclass
class VaultDeps:
    vault_key: str        # scoped per agent type + run
    run_id: str           # UUID for this agent.run() call

# Tool factory — builds a tool that uses the vault key from deps
def make_stripe_charge_tool(agent: Agent):
    @agent.tool
    async def stripe_charge(
        ctx: RunContext[VaultDeps],
        customer_id: str,
        amount_cents: int,
        description: str,
    ) -> str:
        """Create a Stripe charge via the Keybrake proxy."""
        idem_key = f"{ctx.deps.run_id}-charge"
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                "https://proxy.keybrake.com/stripe/v1/charges",
                headers={"Authorization": f"Bearer {ctx.deps.vault_key}"},
                data={
                    "customer": customer_id,
                    "amount": str(amount_cents),
                    "currency": "usd",
                    "description": description,
                    "idempotency_key": idem_key,
                },
                timeout=10.0,
            )
        if resp.status_code == 429:
            raise ModelRetry("Spend cap reached — cannot create charge")
        resp.raise_for_status()
        return resp.json()["id"]

billing_agent = Agent("openai:gpt-4o", deps_type=VaultDeps, result_type=ChargeResult)
make_stripe_charge_tool(billing_agent)

refund_agent = Agent("openai:gpt-4o", deps_type=VaultDeps, result_type=RefundResult)
# ... make_stripe_refund_tool(refund_agent) with separate vault key policy

# At request time, each agent.run() gets its own scoped vault key
BILLING_VAULT_KEY = os.environ["KEYBRAKE_BILLING_VAULT_KEY"]  # policy: Charges:Write, $500/day cap
REFUND_VAULT_KEY  = os.environ["KEYBRAKE_REFUND_VAULT_KEY"]   # policy: Refunds:Write, $200/day cap

@app.post("/charge")
async def charge(req: ChargeRequest):
    deps = VaultDeps(vault_key=BILLING_VAULT_KEY, run_id=str(uuid.uuid4()))
    return await billing_agent.run(req.prompt, deps=deps)

@app.post("/refund")
async def refund(req: RefundRequest):
    deps = VaultDeps(vault_key=REFUND_VAULT_KEY, run_id=str(uuid.uuid4()))
    return await refund_agent.run(req.prompt, deps=deps)

Pattern: Issue one Keybrake vault key per agent type, each scoped to the exact endpoints that agent type needs. Inject the vault key into VaultDeps at request time (not at module load). The run_id in deps doubles as both the idempotency key prefix and the audit log correlation ID — every Stripe call in the same agent.run() appears under the same run_id in Keybrake's audit log.

Failure mode 3: Structured result validation loops

Risk: When a PydanticAI agent is configured with a Pydantic result_type, the framework validates the LLM's structured output against that schema. If validation fails (wrong field types, missing required fields, failed validators), PydanticAI retries the LLM call — including any tool invocations that happen in the context of generating the result. Multiple validation failures equal multiple Stripe calls.

PydanticAI's typed result system is one of its most useful features: you define a Pydantic model, and the framework ensures the LLM's output conforms to it before returning. When the output doesn't conform, PydanticAI feeds the validation error back to the LLM and asks it to try again. The LLM may call tools as part of rethinking its response.

from pydantic import BaseModel, field_validator

class ChargeResult(BaseModel):
    charge_id: str
    amount_cents: int
    status: str

    @field_validator("charge_id")
    @classmethod
    def validate_charge_id(cls, v: str) -> str:
        if not v.startswith("ch_"):
            raise ValueError(f"charge_id must start with 'ch_', got: {v!r}")
        return v

    @field_validator("amount_cents")
    @classmethod
    def validate_amount(cls, v: int) -> int:
        if v <= 0:
            raise ValueError(f"amount_cents must be positive, got {v}")
        return v

# If the LLM returns:
# {"charge_id": "charge_3R4xxx", "amount_cents": 2999, "status": "succeeded"}
# ↑ "charge_3R4xxx" fails the ch_ validator.
# PydanticAI retries the LLM call. The LLM may re-call stripe_charge to get a "real" ID.
# Second stripe_charge call = second Stripe charge (without idempotency key).

billing_agent = Agent(
    "openai:gpt-4o",
    deps_type=VaultDeps,
    result_type=ChargeResult,  # ← validation enforced by PydanticAI
)

The failure chain: the LLM calls stripe_charge, which succeeds and returns ch_3R4xxxxx. The LLM assembles a ChargeResult but formats the charge_id as "charge_3R4xxxxx" (omitting the underscore). PydanticAI's validator rejects it. The framework retries with the validation error in context. The LLM, now uncertain whether the first charge succeeded, calls stripe_charge again to "make sure" — and Stripe creates a second charge.

The idempotency key pattern from failure mode 1 closes most of this gap: if the second stripe_charge call uses the same run_id-prefixed key, Stripe returns the original charge object. But there's a subtlety: PydanticAI's validation retry is a new LLM call, which generates a new tool call message. If your tool constructs the idempotency key from a per-call counter rather than the stable run_id, the counter may reset between the original call and the validation-triggered retry. Always use the run_id as the key prefix, not a local counter.

@billing_agent.tool
async def stripe_charge(
    ctx: RunContext[VaultDeps],
    customer_id: str,
    amount_cents: int,
    description: str,
) -> str:
    """Create a Stripe charge via the Keybrake proxy."""
    # ✅ Stable across ModelRetry AND validation loop retries —
    #    ctx.deps.run_id never changes within one agent.run() call.
    idem_key = f"{ctx.deps.run_id}-charge"

    # ❌ Avoid: counter-based keys reset between validation retries
    # idem_key = f"charge-{next(counter)}"

    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://proxy.keybrake.com/stripe/v1/charges",
            headers={"Authorization": f"Bearer {ctx.deps.vault_key}"},
            data={
                "customer": customer_id,
                "amount": str(amount_cents),
                "currency": "usd",
                "description": description,
                "idempotency_key": idem_key,
            },
            timeout=10.0,
        )
    if resp.status_code == 429:
        raise ModelRetry("Spend cap reached — cannot create charge")
    resp.raise_for_status()
    data = resp.json()
    # Return structured data the LLM can directly use in ChargeResult
    return f"charge_id={data['id']} status={data.get('status', 'succeeded')}"

Returning a structured string from the tool (rather than a bare charge ID) also reduces the likelihood of validation failures: the LLM can extract charge_id and status directly from the tool's return value without having to infer them from a short opaque string.

Six-control comparison table

Control	Raw `sk_live_` key	Restricted key	Vault key (Keybrake)
Endpoint allowlist	All endpoints	Resource-level (e.g. Charges:Write)	Endpoint-level (e.g. `POST /v1/charges` only)
Daily spend cap	None	None	Per-key USD cap (proxy enforces, returns 429)
Per-agent isolation	No — shared global or shared deps object	Possible if separate keys per agent type	Yes — separate vault key per agent type, enforced at proxy
ModelRetry dedup	Duplicate charges without idempotency key	Duplicate charges without idempotency key	run_id-keyed idempotency prevents duplicates; proxy logs each attempt
Validation loop dedup	Duplicate charges on result validation retry	Duplicate charges on result validation retry	Same run_id key used across validation retries — Stripe deduplicates
Audit trail	Stripe dashboard only	Stripe dashboard only	Keybrake audit log: vault key, run_id, timestamp, amount parsed from response

Putting it together: the governed PydanticAI billing agent

Here's the complete pattern combining all three fixes — per-run idempotency key in deps, per-agent-type vault keys, and structured tool return values that reduce validation loop retries:

import os
import uuid
import httpx
from dataclasses import dataclass
from pydantic import BaseModel, field_validator
from pydantic_ai import Agent, RunContext, ModelRetry

# --- Deps ---

@dataclass
class VaultDeps:
    vault_key: str   # scoped per agent type (injected at request time, not module load)
    run_id: str      # UUID per agent.run() call — idempotency key prefix + audit correlation

# --- Result type ---

class ChargeResult(BaseModel):
    charge_id: str
    amount_cents: int
    status: str

    @field_validator("charge_id")
    @classmethod
    def validate_charge_id(cls, v: str) -> str:
        if not v.startswith("ch_"):
            raise ValueError(f"Invalid charge_id format: {v!r}")
        return v

# --- Agent ---

billing_agent = Agent(
    "openai:gpt-4o",
    deps_type=VaultDeps,
    result_type=ChargeResult,
    system_prompt=(
        "You are a billing agent. Use the stripe_charge tool to create Stripe charges. "
        "The tool returns 'charge_id=ch_xxx status=succeeded'. Extract both fields for ChargeResult."
    ),
)

@billing_agent.tool
async def stripe_charge(
    ctx: RunContext[VaultDeps],
    customer_id: str,
    amount_cents: int,
    description: str,
) -> str:
    """Create a Stripe charge via the Keybrake proxy. Returns charge_id and status."""
    if amount_cents < 50:
        raise ModelRetry(f"amount_cents must be >= 50 (Stripe minimum), got {amount_cents}")

    idem_key = f"{ctx.deps.run_id}-charge"

    async with httpx.AsyncClient(timeout=10.0) as client:
        resp = await client.post(
            "https://proxy.keybrake.com/stripe/v1/charges",
            headers={"Authorization": f"Bearer {ctx.deps.vault_key}"},
            data={
                "customer": customer_id,
                "amount": str(amount_cents),
                "currency": "usd",
                "description": description,
                "idempotency_key": idem_key,
            },
        )

    if resp.status_code == 429:
        raise ModelRetry("Daily spend cap reached — no further charges allowed today")
    if resp.status_code == 403:
        raise ModelRetry("Unauthorized endpoint — vault key policy does not allow this operation")

    resp.raise_for_status()
    data = resp.json()
    return f"charge_id={data['id']} status={data.get('status', 'succeeded')}"

# --- Request handler ---

BILLING_VAULT_KEY = os.environ["KEYBRAKE_BILLING_VAULT_KEY"]

async def charge_customer(customer_id: str, amount_cents: int, description: str) -> ChargeResult:
    deps = VaultDeps(
        vault_key=BILLING_VAULT_KEY,
        run_id=str(uuid.uuid4()),
    )
    result = await billing_agent.run(
        f"Charge customer {customer_id} {amount_cents} cents for: {description}",
        deps=deps,
    )
    return result.data

pytest enforcement suite

import pytest
import asyncio
from unittest.mock import AsyncMock, MagicMock, patch
import uuid, os
from pydantic_ai import models

MOCK_VAULT_KEY = "vault_key_test_pydantic_xxx"
os.environ.setdefault("KEYBRAKE_BILLING_VAULT_KEY", MOCK_VAULT_KEY)

@pytest.mark.asyncio
async def test_model_retry_uses_stable_idempotency_key():
    """ModelRetry retries must reuse the same idempotency key to prevent double-charging."""
    captured_idem_keys = []

    async def mock_post(url, **kwargs):
        data = kwargs.get("data", {})
        captured_idem_keys.append(data.get("idempotency_key", ""))
        mock_resp = MagicMock()
        mock_resp.status_code = 200
        mock_resp.json.return_value = {"id": "ch_3R4test001", "status": "succeeded"}
        mock_resp.raise_for_status = MagicMock()
        return mock_resp

    run_id = str(uuid.uuid4())
    deps = VaultDeps(vault_key=MOCK_VAULT_KEY, run_id=run_id)

    with patch("httpx.AsyncClient") as MockClient:
        MockClient.return_value.__aenter__.return_value.post = mock_post
        # Simulate tool being called twice (ModelRetry scenario)
        await stripe_charge.__wrapped__(
            MagicMock(deps=deps), "cus_test", 2999, "Pro plan"
        )
        await stripe_charge.__wrapped__(
            MagicMock(deps=deps), "cus_test", 2999, "Pro plan"
        )

    # Both calls must use the same idempotency key
    assert len(set(captured_idem_keys)) == 1, (
        "All retries within one run must share the same idempotency key"
    )
    assert captured_idem_keys[0] == f"{run_id}-charge"

@pytest.mark.asyncio
async def test_spend_cap_raises_model_retry():
    """Proxy 429 must raise ModelRetry so the agent handles it gracefully."""
    async def mock_post_429(url, **kwargs):
        mock_resp = MagicMock()
        mock_resp.status_code = 429
        mock_resp.raise_for_status = MagicMock()
        return mock_resp

    deps = VaultDeps(vault_key=MOCK_VAULT_KEY, run_id=str(uuid.uuid4()))

    with patch("httpx.AsyncClient") as MockClient:
        MockClient.return_value.__aenter__.return_value.post = mock_post_429
        with pytest.raises(ModelRetry, match="spend cap"):
            await stripe_charge.__wrapped__(
                MagicMock(deps=deps), "cus_test", 2999, "Pro plan"
            )

@pytest.mark.asyncio
async def test_billing_vault_key_rejected_for_refunds():
    """Billing vault key must return 403 on refund endpoints."""
    async def mock_post_403(url, **kwargs):
        mock_resp = MagicMock()
        mock_resp.status_code = 403
        mock_resp.raise_for_status.side_effect = Exception("403 Forbidden")
        return mock_resp

    async with httpx.AsyncClient() as client:
        with patch.object(client, "post", mock_post_403):
            resp = await client.post(
                "https://proxy.keybrake.com/stripe/v1/refunds",
                headers={"Authorization": f"Bearer {MOCK_VAULT_KEY}"},
                data={"charge": "ch_3R4test001", "amount": "999"},
            )
    assert resp.status_code == 403

@pytest.mark.asyncio
async def test_no_sk_live_key_in_proxy_headers():
    """Vault key sent to proxy must not be a raw sk_live_ Stripe key."""
    captured_headers = []

    async def mock_post(url, **kwargs):
        captured_headers.append(kwargs.get("headers", {}))
        mock_resp = MagicMock()
        mock_resp.status_code = 200
        mock_resp.json.return_value = {"id": "ch_3R4test001", "status": "succeeded"}
        mock_resp.raise_for_status = MagicMock()
        return mock_resp

    deps = VaultDeps(vault_key=MOCK_VAULT_KEY, run_id=str(uuid.uuid4()))

    with patch("httpx.AsyncClient") as MockClient:
        MockClient.return_value.__aenter__.return_value.post = mock_post
        await stripe_charge.__wrapped__(
            MagicMock(deps=deps), "cus_test", 2999, "Pro plan"
        )

    for headers in captured_headers:
        auth = headers.get("Authorization", "")
        assert "sk_live_" not in auth, "Raw Stripe live key must not appear in proxy headers"

Gap analysis

Concurrent agent.run() calls and shared run_id risk. If you run multiple agent.run() calls in the same asyncio task group and pass the same VaultDeps object to all of them (accidentally), they share the same run_id. Two concurrent runs charging the same customer with the same run_id will appear as one charge in Stripe (idempotency deduplication) — but if they charge different customers, only the first call's charge will succeed for each customer, and the second call returns the first call's charge object unchanged. Always generate a fresh uuid.uuid4() per agent.run() call, not per request handler invocation.

PydanticAI streaming and mid-stream tool calls. PydanticAI's agent.run_stream() can call tools during streaming. If the stream is cancelled (client disconnect, timeout) after stripe_charge fires but before the final result is returned, the charge exists in Stripe but the caller never received confirmation. The run_id-keyed idempotency pattern handles the retry case — if the caller retries the same logical operation with the same run_id, Stripe deduplicates — but you need to persist the run_id on the caller's side across the cancel/retry cycle.

Result validator side effects. PydanticAI supports @agent.result_validator functions that run after the LLM returns a result. If you add a validator that calls Stripe to "verify" the charge (e.g., fetching the charge object to confirm its status), that validation call counts against your Stripe rate limits and appears in the audit log. Keep validators pure: validate shape and format only, not external state.

Tool call message history and PII. PydanticAI stores the full message history for each agent.run(), including tool call arguments and return values, in result.all_messages(). If your tool call arguments include real customer IDs, charge amounts, or card details, these appear in plaintext in the message history. If you're logging or persisting message histories for debugging, ensure they're treated with the same access controls as Stripe webhook payloads.

For a broader look at agent payment governance patterns, the LangChain Stripe integration post covers the same progression from bare key to vault key for tool-calling chains, and the CrewAI Stripe post covers per-run vault key governance for multi-agent crews.

FAQ

Is PydanticAI the same as Pydantic? Are there docs on using the regular Pydantic library for governance?

PydanticAI (from the same Pydantic team) is an agent framework built on top of Pydantic's validation primitives. The validation library (Pydantic v2) and the agent framework (PydanticAI) are separate packages: pip install pydantic vs pip install pydantic-ai. For using the Pydantic library to define spend policies and governance models independent of the agent framework, see our post on AI agent API governance in Python.

How does ModelRetry interact with PydanticAI's max_retries setting?

Agent(max_retries=N) limits how many times PydanticAI will retry a failed tool call (including ModelRetry-triggered retries). The default is 1 retry. Setting max_retries=0 disables all tool retries — useful in billing contexts where you'd rather fail loudly than risk a double charge. The idempotency key pattern is still the correct primary defense; max_retries=0 is a belt-and-suspenders addition for billing tools.

Can I use a per-tool max_retries instead of a per-agent setting?

As of PydanticAI 0.0.x, max_retries is an agent-level setting, not a per-tool setting. If your agent has both a billing tool (where you want zero retries) and a lookup tool (where retries are safe), set max_retries=0 at the agent level and handle retry logic explicitly inside the lookup tool rather than raising ModelRetry.

What's the right vault key granularity — per agent type or per request?

Per agent type is sufficient for spend cap enforcement and endpoint isolation. Per request (a fresh vault key per agent.run()) is useful if you need attribution in the audit log at the individual-request level — Keybrake logs the vault key used in each proxied call, so per-request keys give you per-request traceability without needing to correlate by run_id. The cost is vault key provisioning overhead at request time; most deployments use per-agent-type keys and rely on the run_id in the idempotency key for per-request audit correlation.

Does this pattern work with PydanticAI's Gemini or Anthropic backends?

Yes — the vault key and idempotency key patterns are entirely in the tool implementation layer, which is independent of the LLM backend. Whether the agent calls tools via an OpenAI function-calling API, Anthropic's tool use API, or Gemini's function declarations, the underlying tool function receives the same RunContext with the same VaultDeps. The LLM backend only affects how tool call arguments are formatted and parsed, not what happens inside the tool function.

How do I correlate a PydanticAI run_id with a Keybrake audit log entry?

The run_id appears in the idempotency key field of every Stripe call made during that run (e.g., run_id-charge). Keybrake logs the idempotency key passed in the idempotency_key form field for each proxied request. To find all Stripe calls from a specific agent.run() invocation: filter the Keybrake audit log where idempotency_key starts with {run_id}. This gives you the exact Stripe calls, their amounts, their timestamps, and whether they were deduplicated.

Get notified when Keybrake ships

Keybrake is the proxy in the examples above — a scoped API-key vault for the non-LLM SaaS APIs your agent calls. Per-agent vault keys, per-vendor spend caps, and a full audit log. The proxy is live.