AI agents · Circuit breaker · Spend control · Fault tolerance

AI agent circuit breaker: stopping runaway vendor API spend before it compounds

Traditional circuit breakers track error rate — if more than 50% of requests to a service fail, stop sending traffic for 30 seconds. This protects against vendor outages. AI agents introduce a different failure mode: a loop that succeeds on every Stripe call while creating hundreds of payment intents the user never asked for. Failure rate is 0%. Damage is catastrophic. A spend-aware circuit breaker tracks cumulative cost per agent run, not just error rate, and trips when spending velocity exceeds a safe threshold — halting the loop before the total compounds to the point of a chargeback dispute or regulatory flag.

TL;DR

A complete AI agent circuit breaker has two trip conditions: (1) error rate exceeds threshold (standard circuit breaker), and (2) cumulative spend for the current agent run exceeds a per-run cap (spend circuit breaker). Implement spend tracking by parsing vendor API response bodies for cost signals — Stripe's amount field, Twilio's price field, Resend's flat-rate-per-call. Use a per-run atomic counter to track spend; trip the breaker and abort the agent loop when the counter exceeds the cap. Keybrake's vault keys enforce this at the proxy layer — the circuit trips server-side before bad requests reach Stripe.

Why standard circuit breakers don't protect AI agents

The classic circuit breaker pattern from Michael Nygard's Release It! tracks failure rate over a time window:

Closed: requests flow normally; failures counted
Open: circuit trips when failure rate exceeds threshold; requests rejected for a cooldown period
Half-open: after cooldown, one probe request tests if the service recovered

This protects against vendor downtime. It doesn't protect against an AI agent that correctly calls Stripe 400 times — each call succeeds, failure rate is 0%, but the agent is creating duplicate payment intents in a loop because its planning state got corrupted.

AI agent failure modes that defeat standard circuit breakers:

Agent failure mode	Failure rate	Spend impact	Standard CB protects?
Retry loop on 402 payment declined (agent misinterprets as transient)	High (402 errors)	Medium (each retry is a new declined charge attempt)	Yes — error rate trips it
Planning loop creates duplicate payment intents	0% (all succeed)	High ($X × loop_count)	No — error rate never trips
Stuck LLM reasoning loop retries the same tool call	0% (all succeed)	High (linear with loop count)	No
Agent mishandles pagination, calls Stripe 1,000 times for "all charges"	0% (all succeed)	Moderate (API calls only, no direct charges, but bandwidth and Stripe overage)	No
Vendor 429 rate limit triggers aggressive backoff + retry loop	High (429s)	Grows with retry duration	Partial — trips on error rate but may open before spend accrues

Spend-aware circuit breaker: implementation

A spend-aware circuit breaker adds a cost accumulator to the standard state machine. The key design decision: where to track spend. Options:

In-process atomic counter — fast, resets on restart, only tracks spend within the current process lifetime
Redis atomic counter — survives restarts, works across distributed agents, adds a network hop
Proxy-enforced (Keybrake) — enforced server-side at the credential layer, works regardless of agent implementation

Python implementation with an in-process spend tracker:

import asyncio
import time
from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Any
import httpx

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

@dataclass
class SpendAwareCircuitBreaker:
    # Standard circuit breaker config
    failure_threshold: int = 5       # Errors before opening
    recovery_timeout: float = 30.0   # Seconds before half-open probe
    half_open_max: int = 1           # Probes allowed in half-open

    # Spend circuit breaker config
    spend_cap_usd: float = 100.0     # Max spend per run
    spend_window_seconds: float = 3600.0  # Rolling window for spend tracking

    # State
    state: CircuitState = field(default=CircuitState.CLOSED, init=False)
    failure_count: int = field(default=0, init=False)
    last_failure_time: float = field(default=0.0, init=False)
    cumulative_spend_usd: float = field(default=0.0, init=False)
    spend_window_start: float = field(default_factory=time.monotonic, init=False)

    def _reset_spend_if_expired(self):
        now = time.monotonic()
        if now - self.spend_window_start > self.spend_window_seconds:
            self.cumulative_spend_usd = 0.0
            self.spend_window_start = now

    def record_success(self, cost_usd: float = 0.0):
        self._reset_spend_if_expired()
        self.cumulative_spend_usd += cost_usd
        self.failure_count = 0

        if self.cumulative_spend_usd >= self.spend_cap_usd:
            self.state = CircuitState.OPEN
            self.last_failure_time = time.monotonic()
            raise SpendCapExceeded(
                f"Circuit opened: cumulative spend ${self.cumulative_spend_usd:.2f} "
                f"exceeds cap ${self.spend_cap_usd:.2f}"
            )

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.monotonic()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

    def allow_request(self) -> bool:
        if self.state == CircuitState.CLOSED:
            return True
        if self.state == CircuitState.OPEN:
            if time.monotonic() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
                return True
            return False
        if self.state == CircuitState.HALF_OPEN:
            return True
        return False

class SpendCapExceeded(Exception):
    pass

Parsing spend from vendor API responses

To track spend accurately, parse cost signals from each vendor's API response before calling record_success(cost_usd=...):

def parse_stripe_cost(response_body: dict) -> float:
    """Stripe: cost = amount / 100 (amount is in cents for USD)"""
    amount_cents = response_body.get("amount", 0)
    return amount_cents / 100.0

def parse_twilio_cost(response_body: dict) -> float:
    """Twilio: price field is a negative string like '-0.0075'"""
    price = response_body.get("price", "0")
    return abs(float(price))

def parse_resend_cost(_response_body: dict) -> float:
    """Resend: flat rate per email send — check your plan's per-email cost"""
    return 0.001  # ~$1 per 1,000 emails on Resend's base plan

async def guarded_stripe_call(
    circuit_breaker: SpendAwareCircuitBreaker,
    client: httpx.AsyncClient,
    endpoint: str,
    payload: dict
) -> dict:
    if not circuit_breaker.allow_request():
        raise RuntimeError("Circuit open — agent spend cap reached or vendor unavailable")

    try:
        resp = await client.post(
            f"https://proxy.keybrake.com/stripe{endpoint}",
            json=payload
        )
        resp.raise_for_status()
        body = resp.json()
        cost = parse_stripe_cost(body)
        circuit_breaker.record_success(cost_usd=cost)
        return body
    except httpx.HTTPStatusError as e:
        circuit_breaker.record_failure()
        raise

Proxy-enforced spend caps: the reliable alternative

In-process spend tracking has a fundamental weakness: it only works within one process. A distributed agent system running across multiple workers, or an agent framework that spawns subprocesses, has no shared spend state. Each worker's circuit breaker starts from zero.

Keybrake's vault keys enforce spend caps at the proxy layer — server-side, before the request reaches Stripe. The proxy tracks cumulative spend across all requests made with the vault key, regardless of how many processes or workers are making those calls:

# Issue a vault key with a hard spend cap — enforced server-side
import httpx

async def issue_vault_key_for_run(run_id: str, max_spend_usd: float) -> dict:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.keybrake.com/v1/keys",
            headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_TOKEN']}"},
            json={
                "label": f"agent-run-{run_id}",
                "vendor": "stripe",
                "daily_usd_cap": max_spend_usd,
                "allowed_endpoints": ["/v1/payment_intents"],
                "expires_in": "1h"
            }
        )
        return resp.json()

# When vault key cap is hit, Keybrake returns 429 with code=cap_exhausted
# The agent receives this and can halt the loop — the damage is bounded

Layer both approaches: a local circuit breaker for fast in-process protection, and a vault key cap as the authoritative server-side bound. The local breaker catches runaway loops within a process before they even hit the network; the vault key cap protects against distributed loops and implementation errors in the local breaker.

Protection layer	Latency overhead	Distributed?	Reliable without network?
In-process circuit breaker	~0ms (atomic op)	No — per-process only	Yes
Redis-backed spend tracker	~1ms (Redis round-trip)	Yes	No — Redis must be available
Keybrake vault key cap	~30-80ms (proxy overhead)	Yes — server-side	No — requires Keybrake API

Get early access