AI agents · Circuit breaker · Spend control · Fault tolerance
AI agent circuit breaker: stopping runaway vendor API spend before it compounds
Traditional circuit breakers track error rate — if more than 50% of requests to a service fail, stop sending traffic for 30 seconds. This protects against vendor outages. AI agents introduce a different failure mode: a loop that succeeds on every Stripe call while creating hundreds of payment intents the user never asked for. Failure rate is 0%. Damage is catastrophic. A spend-aware circuit breaker tracks cumulative cost per agent run, not just error rate, and trips when spending velocity exceeds a safe threshold — halting the loop before the total compounds to the point of a chargeback dispute or regulatory flag.
TL;DR
A complete AI agent circuit breaker has two trip conditions: (1) error rate exceeds threshold (standard circuit breaker), and (2) cumulative spend for the current agent run exceeds a per-run cap (spend circuit breaker). Implement spend tracking by parsing vendor API response bodies for cost signals — Stripe's amount field, Twilio's price field, Resend's flat-rate-per-call. Use a per-run atomic counter to track spend; trip the breaker and abort the agent loop when the counter exceeds the cap. Keybrake's vault keys enforce this at the proxy layer — the circuit trips server-side before bad requests reach Stripe.
Why standard circuit breakers don't protect AI agents
The classic circuit breaker pattern from Michael Nygard's Release It! tracks failure rate over a time window:
- Closed: requests flow normally; failures counted
- Open: circuit trips when failure rate exceeds threshold; requests rejected for a cooldown period
- Half-open: after cooldown, one probe request tests if the service recovered
This protects against vendor downtime. It doesn't protect against an AI agent that correctly calls Stripe 400 times — each call succeeds, failure rate is 0%, but the agent is creating duplicate payment intents in a loop because its planning state got corrupted.
AI agent failure modes that defeat standard circuit breakers:
| Agent failure mode | Failure rate | Spend impact | Standard CB protects? |
|---|---|---|---|
| Retry loop on 402 payment declined (agent misinterprets as transient) | High (402 errors) | Medium (each retry is a new declined charge attempt) | Yes — error rate trips it |
| Planning loop creates duplicate payment intents | 0% (all succeed) | High ($X × loop_count) | No — error rate never trips |
| Stuck LLM reasoning loop retries the same tool call | 0% (all succeed) | High (linear with loop count) | No |
| Agent mishandles pagination, calls Stripe 1,000 times for "all charges" | 0% (all succeed) | Moderate (API calls only, no direct charges, but bandwidth and Stripe overage) | No |
| Vendor 429 rate limit triggers aggressive backoff + retry loop | High (429s) | Grows with retry duration | Partial — trips on error rate but may open before spend accrues |
Spend-aware circuit breaker: implementation
A spend-aware circuit breaker adds a cost accumulator to the standard state machine. The key design decision: where to track spend. Options:
- In-process atomic counter — fast, resets on restart, only tracks spend within the current process lifetime
- Redis atomic counter — survives restarts, works across distributed agents, adds a network hop
- Proxy-enforced (Keybrake) — enforced server-side at the credential layer, works regardless of agent implementation
Python implementation with an in-process spend tracker:
import asyncio
import time
from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Any
import httpx
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
@dataclass
class SpendAwareCircuitBreaker:
# Standard circuit breaker config
failure_threshold: int = 5 # Errors before opening
recovery_timeout: float = 30.0 # Seconds before half-open probe
half_open_max: int = 1 # Probes allowed in half-open
# Spend circuit breaker config
spend_cap_usd: float = 100.0 # Max spend per run
spend_window_seconds: float = 3600.0 # Rolling window for spend tracking
# State
state: CircuitState = field(default=CircuitState.CLOSED, init=False)
failure_count: int = field(default=0, init=False)
last_failure_time: float = field(default=0.0, init=False)
cumulative_spend_usd: float = field(default=0.0, init=False)
spend_window_start: float = field(default_factory=time.monotonic, init=False)
def _reset_spend_if_expired(self):
now = time.monotonic()
if now - self.spend_window_start > self.spend_window_seconds:
self.cumulative_spend_usd = 0.0
self.spend_window_start = now
def record_success(self, cost_usd: float = 0.0):
self._reset_spend_if_expired()
self.cumulative_spend_usd += cost_usd
self.failure_count = 0
if self.cumulative_spend_usd >= self.spend_cap_usd:
self.state = CircuitState.OPEN
self.last_failure_time = time.monotonic()
raise SpendCapExceeded(
f"Circuit opened: cumulative spend ${self.cumulative_spend_usd:.2f} "
f"exceeds cap ${self.spend_cap_usd:.2f}"
)
def record_failure(self):
self.failure_count += 1
self.last_failure_time = time.monotonic()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
def allow_request(self) -> bool:
if self.state == CircuitState.CLOSED:
return True
if self.state == CircuitState.OPEN:
if time.monotonic() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
return True
return False
if self.state == CircuitState.HALF_OPEN:
return True
return False
class SpendCapExceeded(Exception):
pass
Parsing spend from vendor API responses
To track spend accurately, parse cost signals from each vendor's API response before calling record_success(cost_usd=...):
def parse_stripe_cost(response_body: dict) -> float:
"""Stripe: cost = amount / 100 (amount is in cents for USD)"""
amount_cents = response_body.get("amount", 0)
return amount_cents / 100.0
def parse_twilio_cost(response_body: dict) -> float:
"""Twilio: price field is a negative string like '-0.0075'"""
price = response_body.get("price", "0")
return abs(float(price))
def parse_resend_cost(_response_body: dict) -> float:
"""Resend: flat rate per email send — check your plan's per-email cost"""
return 0.001 # ~$1 per 1,000 emails on Resend's base plan
async def guarded_stripe_call(
circuit_breaker: SpendAwareCircuitBreaker,
client: httpx.AsyncClient,
endpoint: str,
payload: dict
) -> dict:
if not circuit_breaker.allow_request():
raise RuntimeError("Circuit open — agent spend cap reached or vendor unavailable")
try:
resp = await client.post(
f"https://proxy.keybrake.com/stripe{endpoint}",
json=payload
)
resp.raise_for_status()
body = resp.json()
cost = parse_stripe_cost(body)
circuit_breaker.record_success(cost_usd=cost)
return body
except httpx.HTTPStatusError as e:
circuit_breaker.record_failure()
raise
Proxy-enforced spend caps: the reliable alternative
In-process spend tracking has a fundamental weakness: it only works within one process. A distributed agent system running across multiple workers, or an agent framework that spawns subprocesses, has no shared spend state. Each worker's circuit breaker starts from zero.
Keybrake's vault keys enforce spend caps at the proxy layer — server-side, before the request reaches Stripe. The proxy tracks cumulative spend across all requests made with the vault key, regardless of how many processes or workers are making those calls:
# Issue a vault key with a hard spend cap — enforced server-side
import httpx
async def issue_vault_key_for_run(run_id: str, max_spend_usd: float) -> dict:
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://api.keybrake.com/v1/keys",
headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_TOKEN']}"},
json={
"label": f"agent-run-{run_id}",
"vendor": "stripe",
"daily_usd_cap": max_spend_usd,
"allowed_endpoints": ["/v1/payment_intents"],
"expires_in": "1h"
}
)
return resp.json()
# When vault key cap is hit, Keybrake returns 429 with code=cap_exhausted
# The agent receives this and can halt the loop — the damage is bounded
Layer both approaches: a local circuit breaker for fast in-process protection, and a vault key cap as the authoritative server-side bound. The local breaker catches runaway loops within a process before they even hit the network; the vault key cap protects against distributed loops and implementation errors in the local breaker.
| Protection layer | Latency overhead | Distributed? | Reliable without network? |
|---|---|---|---|
| In-process circuit breaker | ~0ms (atomic op) | No — per-process only | Yes |
| Redis-backed spend tracker | ~1ms (Redis round-trip) | Yes | No — Redis must be available |
| Keybrake vault key cap | ~30-80ms (proxy overhead) | Yes — server-side | No — requires Keybrake API |
Related questions
How does a spend circuit breaker interact with Temporal's retry policy?
Temporal's retry policy retries failed activities automatically. If a SpendCapExceeded exception propagates out of a Temporal activity, Temporal will retry the activity — which will immediately hit the breaker again. To prevent this, mark spend cap exceptions as non-retryable in the activity's retry policy: retry_policy = RetryPolicy(non_retryable_error_types=["SpendCapExceeded"]). This surfaces the exception to the workflow, which can then decide to compensate (cancel other in-flight activity branches) or fail the workflow gracefully. The spend circuit breaker is the last line of defense — the workflow should be designed to handle it as a normal terminal condition, not an unexpected error.
What's the right spend cap per agent run?
The right cap depends on your agent's intended action space: calculate the maximum legitimate single-run spend (e.g., an agent that creates one Stripe payment intent per user action should never spend more than the max single payment amount), then add a 20% buffer. For agent runs with variable action counts — agents that might create 1 or 100 payment intents depending on user instructions — use a per-run cap set at the point of agent invocation by the calling code, passed in as a parameter rather than hardcoded. This lets callers set the cap based on user tier or explicit user-specified limits.
Should the circuit breaker halt the entire agent or just the vendor call?
It depends on whether the vendor call is on the critical path of the agent's task. For agents where Stripe is the output (billing agents, payment processing agents), a spend cap trip should halt the entire agent run — there's nothing meaningful to continue without the ability to make payments. For agents where Stripe is optional (agents that attempt billing but can continue with other tasks on failure), the circuit breaker should surface a SpendCapExceeded exception to the agent's tool call layer, which the LLM or orchestrator can then decide whether to continue without that capability. Keybrake's 429 response has a structured {"code": "cap_exhausted"} body that the agent can inspect to distinguish spend cap from transient rate limits.
Further reading
- AI agent error handling — retry logic, error classification, and backoff patterns that interact with circuit breaker state.
- AI agent rate limiting — per-agent and per-user rate limits at the API gateway layer, complementing spend-aware circuit breakers.
- AI agent spend reporting — observing actual spend per agent run to calibrate circuit breaker thresholds from real-world data.
- AI agent idempotency — idempotency keys that prevent duplicate charges when the circuit breaker interacts with retry logic.
- AI agent policy enforcement — policy-layer controls that work alongside circuit breakers to enforce spend governance at scale.