AI agents · Circuit breaker · Spend control · Fault tolerance

AI agent circuit breaker: stopping runaway vendor API spend before it compounds

Traditional circuit breakers track error rate — if more than 50% of requests to a service fail, stop sending traffic for 30 seconds. This protects against vendor outages. AI agents introduce a different failure mode: a loop that succeeds on every Stripe call while creating hundreds of payment intents the user never asked for. Failure rate is 0%. Damage is catastrophic. A spend-aware circuit breaker tracks cumulative cost per agent run, not just error rate, and trips when spending velocity exceeds a safe threshold — halting the loop before the total compounds to the point of a chargeback dispute or regulatory flag.

TL;DR

A complete AI agent circuit breaker has two trip conditions: (1) error rate exceeds threshold (standard circuit breaker), and (2) cumulative spend for the current agent run exceeds a per-run cap (spend circuit breaker). Implement spend tracking by parsing vendor API response bodies for cost signals — Stripe's amount field, Twilio's price field, Resend's flat-rate-per-call. Use a per-run atomic counter to track spend; trip the breaker and abort the agent loop when the counter exceeds the cap. Keybrake's vault keys enforce this at the proxy layer — the circuit trips server-side before bad requests reach Stripe.

Why standard circuit breakers don't protect AI agents

The classic circuit breaker pattern from Michael Nygard's Release It! tracks failure rate over a time window:

This protects against vendor downtime. It doesn't protect against an AI agent that correctly calls Stripe 400 times — each call succeeds, failure rate is 0%, but the agent is creating duplicate payment intents in a loop because its planning state got corrupted.

AI agent failure modes that defeat standard circuit breakers:

Agent failure modeFailure rateSpend impactStandard CB protects?
Retry loop on 402 payment declined (agent misinterprets as transient) High (402 errors) Medium (each retry is a new declined charge attempt) Yes — error rate trips it
Planning loop creates duplicate payment intents 0% (all succeed) High ($X × loop_count) No — error rate never trips
Stuck LLM reasoning loop retries the same tool call 0% (all succeed) High (linear with loop count) No
Agent mishandles pagination, calls Stripe 1,000 times for "all charges" 0% (all succeed) Moderate (API calls only, no direct charges, but bandwidth and Stripe overage) No
Vendor 429 rate limit triggers aggressive backoff + retry loop High (429s) Grows with retry duration Partial — trips on error rate but may open before spend accrues

Spend-aware circuit breaker: implementation

A spend-aware circuit breaker adds a cost accumulator to the standard state machine. The key design decision: where to track spend. Options:

Python implementation with an in-process spend tracker:

import asyncio
import time
from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Any
import httpx

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

@dataclass
class SpendAwareCircuitBreaker:
    # Standard circuit breaker config
    failure_threshold: int = 5       # Errors before opening
    recovery_timeout: float = 30.0   # Seconds before half-open probe
    half_open_max: int = 1           # Probes allowed in half-open

    # Spend circuit breaker config
    spend_cap_usd: float = 100.0     # Max spend per run
    spend_window_seconds: float = 3600.0  # Rolling window for spend tracking

    # State
    state: CircuitState = field(default=CircuitState.CLOSED, init=False)
    failure_count: int = field(default=0, init=False)
    last_failure_time: float = field(default=0.0, init=False)
    cumulative_spend_usd: float = field(default=0.0, init=False)
    spend_window_start: float = field(default_factory=time.monotonic, init=False)

    def _reset_spend_if_expired(self):
        now = time.monotonic()
        if now - self.spend_window_start > self.spend_window_seconds:
            self.cumulative_spend_usd = 0.0
            self.spend_window_start = now

    def record_success(self, cost_usd: float = 0.0):
        self._reset_spend_if_expired()
        self.cumulative_spend_usd += cost_usd
        self.failure_count = 0

        if self.cumulative_spend_usd >= self.spend_cap_usd:
            self.state = CircuitState.OPEN
            self.last_failure_time = time.monotonic()
            raise SpendCapExceeded(
                f"Circuit opened: cumulative spend ${self.cumulative_spend_usd:.2f} "
                f"exceeds cap ${self.spend_cap_usd:.2f}"
            )

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.monotonic()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

    def allow_request(self) -> bool:
        if self.state == CircuitState.CLOSED:
            return True
        if self.state == CircuitState.OPEN:
            if time.monotonic() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
                return True
            return False
        if self.state == CircuitState.HALF_OPEN:
            return True
        return False

class SpendCapExceeded(Exception):
    pass

Parsing spend from vendor API responses

To track spend accurately, parse cost signals from each vendor's API response before calling record_success(cost_usd=...):

def parse_stripe_cost(response_body: dict) -> float:
    """Stripe: cost = amount / 100 (amount is in cents for USD)"""
    amount_cents = response_body.get("amount", 0)
    return amount_cents / 100.0

def parse_twilio_cost(response_body: dict) -> float:
    """Twilio: price field is a negative string like '-0.0075'"""
    price = response_body.get("price", "0")
    return abs(float(price))

def parse_resend_cost(_response_body: dict) -> float:
    """Resend: flat rate per email send — check your plan's per-email cost"""
    return 0.001  # ~$1 per 1,000 emails on Resend's base plan

async def guarded_stripe_call(
    circuit_breaker: SpendAwareCircuitBreaker,
    client: httpx.AsyncClient,
    endpoint: str,
    payload: dict
) -> dict:
    if not circuit_breaker.allow_request():
        raise RuntimeError("Circuit open — agent spend cap reached or vendor unavailable")

    try:
        resp = await client.post(
            f"https://proxy.keybrake.com/stripe{endpoint}",
            json=payload
        )
        resp.raise_for_status()
        body = resp.json()
        cost = parse_stripe_cost(body)
        circuit_breaker.record_success(cost_usd=cost)
        return body
    except httpx.HTTPStatusError as e:
        circuit_breaker.record_failure()
        raise

Proxy-enforced spend caps: the reliable alternative

In-process spend tracking has a fundamental weakness: it only works within one process. A distributed agent system running across multiple workers, or an agent framework that spawns subprocesses, has no shared spend state. Each worker's circuit breaker starts from zero.

Keybrake's vault keys enforce spend caps at the proxy layer — server-side, before the request reaches Stripe. The proxy tracks cumulative spend across all requests made with the vault key, regardless of how many processes or workers are making those calls:

# Issue a vault key with a hard spend cap — enforced server-side
import httpx

async def issue_vault_key_for_run(run_id: str, max_spend_usd: float) -> dict:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.keybrake.com/v1/keys",
            headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_TOKEN']}"},
            json={
                "label": f"agent-run-{run_id}",
                "vendor": "stripe",
                "daily_usd_cap": max_spend_usd,
                "allowed_endpoints": ["/v1/payment_intents"],
                "expires_in": "1h"
            }
        )
        return resp.json()

# When vault key cap is hit, Keybrake returns 429 with code=cap_exhausted
# The agent receives this and can halt the loop — the damage is bounded

Layer both approaches: a local circuit breaker for fast in-process protection, and a vault key cap as the authoritative server-side bound. The local breaker catches runaway loops within a process before they even hit the network; the vault key cap protects against distributed loops and implementation errors in the local breaker.

Protection layerLatency overheadDistributed?Reliable without network?
In-process circuit breaker ~0ms (atomic op) No — per-process only Yes
Redis-backed spend tracker ~1ms (Redis round-trip) Yes No — Redis must be available
Keybrake vault key cap ~30-80ms (proxy overhead) Yes — server-side No — requires Keybrake API

Get early access

Related questions

How does a spend circuit breaker interact with Temporal's retry policy?

Temporal's retry policy retries failed activities automatically. If a SpendCapExceeded exception propagates out of a Temporal activity, Temporal will retry the activity — which will immediately hit the breaker again. To prevent this, mark spend cap exceptions as non-retryable in the activity's retry policy: retry_policy = RetryPolicy(non_retryable_error_types=["SpendCapExceeded"]). This surfaces the exception to the workflow, which can then decide to compensate (cancel other in-flight activity branches) or fail the workflow gracefully. The spend circuit breaker is the last line of defense — the workflow should be designed to handle it as a normal terminal condition, not an unexpected error.

What's the right spend cap per agent run?

The right cap depends on your agent's intended action space: calculate the maximum legitimate single-run spend (e.g., an agent that creates one Stripe payment intent per user action should never spend more than the max single payment amount), then add a 20% buffer. For agent runs with variable action counts — agents that might create 1 or 100 payment intents depending on user instructions — use a per-run cap set at the point of agent invocation by the calling code, passed in as a parameter rather than hardcoded. This lets callers set the cap based on user tier or explicit user-specified limits.

Should the circuit breaker halt the entire agent or just the vendor call?

It depends on whether the vendor call is on the critical path of the agent's task. For agents where Stripe is the output (billing agents, payment processing agents), a spend cap trip should halt the entire agent run — there's nothing meaningful to continue without the ability to make payments. For agents where Stripe is optional (agents that attempt billing but can continue with other tasks on failure), the circuit breaker should surface a SpendCapExceeded exception to the agent's tool call layer, which the LLM or orchestrator can then decide whether to continue without that capability. Keybrake's 429 response has a structured {"code": "cap_exhausted"} body that the agent can inspect to distinguish spend cap from transient rate limits.

Further reading