AI agents · Error handling · Stripe · Spend safety

AI agent error handling for vendor API calls: retries, spend caps, and safe failure modes

Web application retry logic is designed for human-initiated requests: a user clicks "submit," the request fails, the user decides whether to retry. AI agents apply retry logic mechanically — if the retry condition is met, the agent retries, as many times as the loop allows. On vendor APIs that charge money, a retry loop on a transient Stripe 500 or network timeout can turn a $50 transaction attempt into thousands of dollars of charges before the agent is stopped. Safe error handling for agent vendor API calls requires four things: bounded retries with idempotency keys, spend-cap-triggered hard stops, distinguishing retryable from non-retryable errors, and a proxy-enforced ceiling that holds even when agent retry logic malfunctions.

TL;DR

Thread a stable idempotency key (generated once per logical operation, not per retry attempt) through every vendor API call. Bound retries to 3 attempts with exponential backoff and jitter. Treat 402 spend-cap responses as non-retryable hard stops — surface them to the agent as a terminal error, not a transient one. Use a proxy-enforced per-run spend cap as the outer safety net that holds regardless of what the agent's retry logic does.

Why agent retry logic is different from web app retry logic

Standard retry guidance for web services says: retry on 5xx errors and network timeouts; don't retry on 4xx client errors. This is correct for human-initiated requests because the retry count is bounded by human patience. For autonomous agents, the retry count is bounded only by the agent's tool loop configuration — often 10, 20, or unlimited iterations.

The interaction between retry logic and money-spending APIs creates a specific failure mode:

# What an agent's billing tool might look like (Python pseudocode)
def charge_customer(customer_id: str, amount_cents: int) -> dict:
    """Tool: charge a customer for their subscription."""
    response = stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id
        # No idempotency key — generates a new intent on every call
    )
    return response

# What happens when Stripe returns a 500:
# Agent sees tool_call_error, decides to retry
# Creates a NEW PaymentIntent each time (no idempotency key)
# 10 retries × $50 = $500 charged before the agent gives up

Two bugs compound: no idempotency key (each retry creates a new charge) and no bounded spend cap (the agent retries until it hits its tool loop limit or succeeds). Either bug alone is bad. Together, they are catastrophic.

Pattern 1: Idempotency keys that survive retries

An idempotency key must be generated once per logical operation, before the first attempt, and reused across all retry attempts for that operation. It must not be regenerated on retry:

import uuid

def charge_customer(
    customer_id: str,
    amount_cents: int,
    idempotency_key: str = None  # caller generates this ONCE
) -> dict:
    """Charge a customer. Pass a stable idempotency_key across all retries."""
    if idempotency_key is None:
        # Generate once here only if caller doesn't provide one
        # The SAME key must be used for all retries of this logical charge
        raise ValueError("idempotency_key is required — generate it before the retry loop")

    max_retries = 3
    base_delay = 1.0  # seconds

    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://proxy.keybrake.com/stripe/v1/payment_intents",
                headers={
                    "Authorization": f"Bearer {vault_key}",
                    "Idempotency-Key": idempotency_key,  # same key every retry
                },
                json={
                    "amount": amount_cents,
                    "currency": "usd",
                    "customer": customer_id,
                },
                timeout=30,
            )

            # Hard stop — do NOT retry spend cap exhaustion
            if response.status_code == 402:
                body = response.json()
                if body.get("code") == "cap_exhausted":
                    return {"error": "spend_cap_exceeded", "retryable": False}

            # Non-retryable Stripe 4xx (bad card, invalid customer, etc.)
            if 400 <= response.status_code < 500:
                return {"error": response.json(), "retryable": False}

            response.raise_for_status()
            return response.json()

        except (requests.Timeout, requests.ConnectionError) as e:
            if attempt == max_retries - 1:
                return {"error": str(e), "retryable": False}
            delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
            time.sleep(delay)

    return {"error": "max_retries_exceeded", "retryable": False}

# Correct usage: generate idempotency key BEFORE the tool is registered with the agent
run_idempotency_key = f"charge-{customer_id}-{run_id}-{str(uuid.uuid4())}"
# Pass this key with the tool call; the agent doesn't regenerate it on retry

Pattern 2: Distinguishing retryable from non-retryable errors

Agents need clear error signals to decide whether to retry, escalate, or stop. A poorly structured error response causes the agent to guess — and agents guess "retry" by default. Return structured error responses with an explicit retryable field:

HTTP status	Vendor error	Retryable?	Agent action
402 + `cap_exhausted`	Keybrake spend cap hit	No — hard stop	Stop, return error to user, do not retry under any circumstances
402 + `card_declined`	Stripe card declined	No — card state won't change	Return to user asking for different payment method
429 + `rate_limit`	Stripe / Twilio rate limit	Yes — with backoff	Wait for `Retry-After` header value, then retry with same idempotency key
500	Stripe / Twilio server error	Yes — max 3 attempts	Exponential backoff 1s → 2s → 4s; same idempotency key; fail after 3 attempts
timeout	Network timeout	Yes — max 2 attempts	Retry once with same idempotency key (the original request may have succeeded); fail after 2 attempts
400 + `invalid_request`	Malformed request	No — same request will fail again	Return error to LLM — the model can decide whether to fix the arguments and try again

Pattern 3: Proxy-enforced spend cap as the outer safety net

Even with correct idempotency keys and bounded retries in your tool implementation, there's a second failure mode: the agent itself retries the tool call. LangChain's max_iterations, OpenAI function-calling retry loops, and CrewAI task retries can all cause the tool to be called multiple times — each with a fresh call to charge_customer(), which generates a new idempotency key. Your per-tool retry bounds don't protect against agent-level retries of the entire tool call.

A proxy-enforced spend cap is the outer safety net that holds regardless of how many times the agent calls the tool:

POST https://api.keybrake.com/v1/keys
{
  "label": "billing-agent-session-abc123",
  "vendor": "stripe",
  "daily_usd_cap": 100,        # stops after $100 total for this run
  "allowed_endpoints": [
    "/v1/payment_intents",     # allowlist prevents tool-call scope creep
    "/v1/payment_intents/*"
  ],
  "expires_in": "30m"          # TTL bounds the session regardless of agent state
}

When the cap is hit, the proxy returns 402 cap_exhausted for all subsequent calls — the agent's tool call fails with a non-retryable error, the LLM sees the error, and the agent loop terminates. This works even if the agent's retry logic is misconfigured, because the enforcement is at the network layer, not in the agent's code.

Pattern 4: Surfacing spend-cap errors to the LLM correctly

How you return the spend-cap error to the LLM determines whether the agent loops on it. Return a string that explicitly instructs the model to stop:

def charge_customer(customer_id: str, amount_cents: int, idempotency_key: str) -> str:
    """..."""
    response = call_proxy(customer_id, amount_cents, idempotency_key, vault_key)

    if response.get("error") == "spend_cap_exceeded":
        # Return a string the LLM understands as terminal — not an exception
        # that might be caught and retried by the tool-calling framework
        return (
            "STOP: The spend cap for this agent session has been exhausted. "
            "Do not retry this tool. Inform the user that billing could not "
            "be completed and that a new session is required."
        )

    return json.dumps(response)

Returning a clearly instructive string (rather than raising an exception) prevents tool-calling frameworks from treating the spend-cap event as a transient error eligible for automatic retry. The LLM reads the STOP signal in its tool result and terminates the billing loop — which is the desired behavior.

Get early access

Related questions

Should I use a different idempotency key for each retry or the same one?

Always use the same idempotency key for all retries of the same logical operation. Stripe's idempotency system is designed exactly for this: if the first request succeeded but the response was lost in transit (timeout), retrying with the same idempotency key returns the original result rather than creating a duplicate charge. If you generate a new idempotency key on each retry attempt, each retry creates a new charge — you've defeated the idempotency mechanism. Generate the key once before the retry loop starts (typically derived from a stable logical identifier like f"charge-{customer_id}-{invoice_id}-{agent_run_id}"), store it, and reuse it for every retry attempt in that loop. Never generate it inside the retry loop body.

What's the right spend cap value for a billing agent that charges legitimate amounts?

Set the per-session spend cap to 1.5–2× the maximum expected total spend for a single agent run. For an agent that processes one invoice per session (max $500), set the cap at $750–$1,000. The cap should be high enough that a legitimate single-run maximum doesn't trigger it, but low enough that a runaway loop (10× the normal spend) hits it and stops. If your agent processes batches of invoices, set the cap to max_invoices_per_batch × max_invoice_amount × 1.5. Monitor your audit log for sessions approaching 80% of cap — that's your signal to either raise the cap for that tier or investigate unusual activity.

How should I handle Stripe timeouts specifically — was the charge created or not?

On a Stripe timeout, you don't know whether the charge was created or not. This is exactly the scenario idempotency keys solve: retry with the same idempotency key and Stripe will return the original result if the intent was created, or create it now if it wasn't. Never assume a timeout means failure — check the Stripe Dashboard or use stripe.paymentIntents.list() filtered by metadata before deciding the charge didn't happen. For agents specifically: retry the tool call exactly once with the same idempotency key; if that also times out, return a non-retryable error to the agent with instructions to verify payment status manually before retrying the billing workflow. A successful verification (intent exists) is more valuable than an aggressive retry loop.