AI agents · Error handling · Stripe · Spend safety
AI agent error handling for vendor API calls: retries, spend caps, and safe failure modes
Web application retry logic is designed for human-initiated requests: a user clicks "submit," the request fails, the user decides whether to retry. AI agents apply retry logic mechanically — if the retry condition is met, the agent retries, as many times as the loop allows. On vendor APIs that charge money, a retry loop on a transient Stripe 500 or network timeout can turn a $50 transaction attempt into thousands of dollars of charges before the agent is stopped. Safe error handling for agent vendor API calls requires four things: bounded retries with idempotency keys, spend-cap-triggered hard stops, distinguishing retryable from non-retryable errors, and a proxy-enforced ceiling that holds even when agent retry logic malfunctions.
TL;DR
Thread a stable idempotency key (generated once per logical operation, not per retry attempt) through every vendor API call. Bound retries to 3 attempts with exponential backoff and jitter. Treat 402 spend-cap responses as non-retryable hard stops — surface them to the agent as a terminal error, not a transient one. Use a proxy-enforced per-run spend cap as the outer safety net that holds regardless of what the agent's retry logic does.
Why agent retry logic is different from web app retry logic
Standard retry guidance for web services says: retry on 5xx errors and network timeouts; don't retry on 4xx client errors. This is correct for human-initiated requests because the retry count is bounded by human patience. For autonomous agents, the retry count is bounded only by the agent's tool loop configuration — often 10, 20, or unlimited iterations.
The interaction between retry logic and money-spending APIs creates a specific failure mode:
# What an agent's billing tool might look like (Python pseudocode)
def charge_customer(customer_id: str, amount_cents: int) -> dict:
"""Tool: charge a customer for their subscription."""
response = stripe.PaymentIntent.create(
amount=amount_cents,
currency="usd",
customer=customer_id
# No idempotency key — generates a new intent on every call
)
return response
# What happens when Stripe returns a 500:
# Agent sees tool_call_error, decides to retry
# Creates a NEW PaymentIntent each time (no idempotency key)
# 10 retries × $50 = $500 charged before the agent gives up
Two bugs compound: no idempotency key (each retry creates a new charge) and no bounded spend cap (the agent retries until it hits its tool loop limit or succeeds). Either bug alone is bad. Together, they are catastrophic.
Pattern 1: Idempotency keys that survive retries
An idempotency key must be generated once per logical operation, before the first attempt, and reused across all retry attempts for that operation. It must not be regenerated on retry:
import uuid
def charge_customer(
customer_id: str,
amount_cents: int,
idempotency_key: str = None # caller generates this ONCE
) -> dict:
"""Charge a customer. Pass a stable idempotency_key across all retries."""
if idempotency_key is None:
# Generate once here only if caller doesn't provide one
# The SAME key must be used for all retries of this logical charge
raise ValueError("idempotency_key is required — generate it before the retry loop")
max_retries = 3
base_delay = 1.0 # seconds
for attempt in range(max_retries):
try:
response = requests.post(
"https://proxy.keybrake.com/stripe/v1/payment_intents",
headers={
"Authorization": f"Bearer {vault_key}",
"Idempotency-Key": idempotency_key, # same key every retry
},
json={
"amount": amount_cents,
"currency": "usd",
"customer": customer_id,
},
timeout=30,
)
# Hard stop — do NOT retry spend cap exhaustion
if response.status_code == 402:
body = response.json()
if body.get("code") == "cap_exhausted":
return {"error": "spend_cap_exceeded", "retryable": False}
# Non-retryable Stripe 4xx (bad card, invalid customer, etc.)
if 400 <= response.status_code < 500:
return {"error": response.json(), "retryable": False}
response.raise_for_status()
return response.json()
except (requests.Timeout, requests.ConnectionError) as e:
if attempt == max_retries - 1:
return {"error": str(e), "retryable": False}
delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
time.sleep(delay)
return {"error": "max_retries_exceeded", "retryable": False}
# Correct usage: generate idempotency key BEFORE the tool is registered with the agent
run_idempotency_key = f"charge-{customer_id}-{run_id}-{str(uuid.uuid4())}"
# Pass this key with the tool call; the agent doesn't regenerate it on retry
Pattern 2: Distinguishing retryable from non-retryable errors
Agents need clear error signals to decide whether to retry, escalate, or stop. A poorly structured error response causes the agent to guess — and agents guess "retry" by default. Return structured error responses with an explicit retryable field:
| HTTP status | Vendor error | Retryable? | Agent action |
|---|---|---|---|
402 + cap_exhausted |
Keybrake spend cap hit | No — hard stop | Stop, return error to user, do not retry under any circumstances |
402 + card_declined |
Stripe card declined | No — card state won't change | Return to user asking for different payment method |
429 + rate_limit |
Stripe / Twilio rate limit | Yes — with backoff | Wait for Retry-After header value, then retry with same idempotency key |
| 500 | Stripe / Twilio server error | Yes — max 3 attempts | Exponential backoff 1s → 2s → 4s; same idempotency key; fail after 3 attempts |
| timeout | Network timeout | Yes — max 2 attempts | Retry once with same idempotency key (the original request may have succeeded); fail after 2 attempts |
400 + invalid_request |
Malformed request | No — same request will fail again | Return error to LLM — the model can decide whether to fix the arguments and try again |
Pattern 3: Proxy-enforced spend cap as the outer safety net
Even with correct idempotency keys and bounded retries in your tool implementation, there's a second failure mode: the agent itself retries the tool call. LangChain's max_iterations, OpenAI function-calling retry loops, and CrewAI task retries can all cause the tool to be called multiple times — each with a fresh call to charge_customer(), which generates a new idempotency key. Your per-tool retry bounds don't protect against agent-level retries of the entire tool call.
A proxy-enforced spend cap is the outer safety net that holds regardless of how many times the agent calls the tool:
POST https://api.keybrake.com/v1/keys
{
"label": "billing-agent-session-abc123",
"vendor": "stripe",
"daily_usd_cap": 100, # stops after $100 total for this run
"allowed_endpoints": [
"/v1/payment_intents", # allowlist prevents tool-call scope creep
"/v1/payment_intents/*"
],
"expires_in": "30m" # TTL bounds the session regardless of agent state
}
When the cap is hit, the proxy returns 402 cap_exhausted for all subsequent calls — the agent's tool call fails with a non-retryable error, the LLM sees the error, and the agent loop terminates. This works even if the agent's retry logic is misconfigured, because the enforcement is at the network layer, not in the agent's code.
Pattern 4: Surfacing spend-cap errors to the LLM correctly
How you return the spend-cap error to the LLM determines whether the agent loops on it. Return a string that explicitly instructs the model to stop:
def charge_customer(customer_id: str, amount_cents: int, idempotency_key: str) -> str:
"""..."""
response = call_proxy(customer_id, amount_cents, idempotency_key, vault_key)
if response.get("error") == "spend_cap_exceeded":
# Return a string the LLM understands as terminal — not an exception
# that might be caught and retried by the tool-calling framework
return (
"STOP: The spend cap for this agent session has been exhausted. "
"Do not retry this tool. Inform the user that billing could not "
"be completed and that a new session is required."
)
return json.dumps(response)
Returning a clearly instructive string (rather than raising an exception) prevents tool-calling frameworks from treating the spend-cap event as a transient error eligible for automatic retry. The LLM reads the STOP signal in its tool result and terminates the billing loop — which is the desired behavior.
Related questions
Should I use a different idempotency key for each retry or the same one?
Always use the same idempotency key for all retries of the same logical operation. Stripe's idempotency system is designed exactly for this: if the first request succeeded but the response was lost in transit (timeout), retrying with the same idempotency key returns the original result rather than creating a duplicate charge. If you generate a new idempotency key on each retry attempt, each retry creates a new charge — you've defeated the idempotency mechanism. Generate the key once before the retry loop starts (typically derived from a stable logical identifier like f"charge-{customer_id}-{invoice_id}-{agent_run_id}"), store it, and reuse it for every retry attempt in that loop. Never generate it inside the retry loop body.
What's the right spend cap value for a billing agent that charges legitimate amounts?
Set the per-session spend cap to 1.5–2× the maximum expected total spend for a single agent run. For an agent that processes one invoice per session (max $500), set the cap at $750–$1,000. The cap should be high enough that a legitimate single-run maximum doesn't trigger it, but low enough that a runaway loop (10× the normal spend) hits it and stops. If your agent processes batches of invoices, set the cap to max_invoices_per_batch × max_invoice_amount × 1.5. Monitor your audit log for sessions approaching 80% of cap — that's your signal to either raise the cap for that tier or investigate unusual activity.
How should I handle Stripe timeouts specifically — was the charge created or not?
On a Stripe timeout, you don't know whether the charge was created or not. This is exactly the scenario idempotency keys solve: retry with the same idempotency key and Stripe will return the original result if the intent was created, or create it now if it wasn't. Never assume a timeout means failure — check the Stripe Dashboard or use stripe.paymentIntents.list() filtered by metadata before deciding the charge didn't happen. For agents specifically: retry the tool call exactly once with the same idempotency key; if that also times out, return a non-retryable error to the agent with instructions to verify payment status manually before retrying the billing workflow. A successful verification (intent exists) is more valuable than an aggressive retry loop.
Further reading
- Advanced idempotency for AI agent payment calls — deep dive into idempotency key design, multi-step agent workflows, and recovery patterns when the agent restarts mid-billing-sequence.
- AI agent Stripe spend cap — how proxy-layer spend caps work mechanically, including pre-charge enforcement vs. post-charge aggregation and per-agent vs. per-account cap granularity.
- AI agent API key lifecycle — how vault key TTLs interact with retry loops and why short TTLs (5 minutes) are the right default for request-scoped agent calls.
- LangGraph AI agent API key — error handling in LangGraph's cyclic graph structure, where the 429 cap-exhausted response routes through a conditional edge to a budget_exceeded node rather than the default retry behavior.
- Temporal AI agent API key — idempotency and retry semantics in Temporal workflows, where activity retries are automatic and idempotency keys must be passed through the workflow context.