AI agents · Idempotency · Vendor API safety
AI agent idempotency: preventing duplicate vendor charges on retries
Idempotency is the property that running the same operation multiple times produces the same result as running it once. For a human-driven web form, idempotency is a nice-to-have — the user usually doesn't double-submit. For an AI agent, idempotency is mandatory: workflow engines retry failed steps automatically, process crashes resume mid-pipeline, serverless cold starts re-execute in-flight tasks, and stuck loops trigger the same action hundreds of times before a human notices. Each retry without an idempotency key creates a duplicate Stripe charge, duplicate Twilio SMS, or duplicate Resend email — real money out the door, real customer-facing harm. This page covers how to build stable idempotency keys from workflow run IDs, what each vendor's deduplication guarantee actually covers, and how per-run spend caps add a second layer of protection when idempotency keys expire or fail.
TL;DR
Build your idempotency key from immutable, context-stable identifiers: workflow_run_id + "-" + item_id is the standard pattern. The workflow run ID is stable across all retries of the same run. The item ID scopes it to the specific record being processed. Set your idempotency key before making the vendor call, not after — the common mistake is computing the key from the API response, which doesn't exist yet on the first call. Then add a spend cap as a backstop: idempotency keys have expiry windows (24 hours for Stripe, 4 hours for Twilio), and a stuck agent retrying the same action over multiple days will eventually exhaust the deduplication window and produce duplicate charges regardless.
Why agent retries are different from human retries
When a human double-clicks "Submit Payment", the browser typically blocks the second click with a disabled button, and the worst case is one duplicate charge. Agent retries are structurally different in three ways:
- Automatic and silent. Workflow engines like Temporal, Inngest, Hatchet, and Trigger.dev retry failed steps automatically. There's no human in the loop to notice that a step is retrying — it just does. The agent code logs the retry but the engineer is asleep.
- Potentially unbounded. A stuck agent loop can retry the same vendor call thousands of times. Temporal's retry policy defaults to infinite retries with exponential backoff — a broken Stripe call retries until the maximum schedule-to-close timeout, which is 10 years by default. An agent calling Stripe in a loop with a 1-second backoff can make 86,400 attempts in a day.
- Cross-process and cross-restart. Agent failures aren't just network timeouts — they include process crashes, OOM kills, Kubernetes pod evictions, and deployment restarts. When a workflow step resumes after a process crash, it doesn't know whether the previous Stripe call succeeded before the crash. Without an idempotency key, it calls Stripe again.
How to build a stable idempotency key
The core requirement: the idempotency key must be the same value across all retries of the same logical operation, and different across different logical operations.
# Pattern: workflow_run_id + item_id
idempotency_key = f"{workflow_run_id}-{customer_id}"
# Wrong: using a timestamp — different on every retry
idempotency_key = f"{customer_id}-{int(time.time())}"
# Wrong: using a random UUID — different on every retry
import uuid
idempotency_key = f"{customer_id}-{uuid.uuid4()}"
# Wrong: using only the customer ID — collides across different billing runs
idempotency_key = customer_id # Two billing runs for the same customer → same key → second run is deduplicated!
The workflow run ID component ensures uniqueness across different logical runs for the same customer. The customer ID component ensures uniqueness within a run that processes multiple customers. Together they produce a key that's stable across retries of the same step within the same run, and unique across different runs.
Idempotency key sources by workflow framework
| Framework | Stable run ID to use | Key pattern |
|---|---|---|
| Temporal | workflow.GetInfo(ctx).WorkflowExecution.ID + RunID |
f"{workflow_id}-{run_id}-{customer_id}" |
| Inngest | runId (available in function context, stable across step retries) |
`${runId}-${customerId}` |
| Trigger.dev | context.run.id (stable across task retries) |
`${context.run.id}-${customerId}` |
| Prefect | FlowRunContext.get().flow_run.id |
f"{flow_run_id}-{customer_id}" |
| Hatchet | ctx.workflow_run_id() |
f"{run_id}-{customer_id}" |
| Dagster | context.run_id (in op context) |
f"{run_id}-{customer_id}" |
| Airflow | context['run_id'] (task context variable) |
f"{dag_id}-{run_id}-{customer_id}" |
| Apache Beam | Job ID (passed as pipeline argument before submission) | f"{job_id}-{customer_id}" |
What each vendor's idempotency guarantee actually covers
Vendor idempotency keys deduplicate identical requests within a time window. The window and exact semantics vary:
| Vendor | How to send | Dedup window | What's deduplicated |
|---|---|---|---|
| Stripe | Idempotency-Key HTTP header, or idempotency_key SDK parameter |
24 hours | Same key + same endpoint + same API key → returns cached response, no new charge. Different API key (e.g., rotating keys mid-run) voids the deduplication even for the same idempotency key string. |
| Twilio | Custom header — Twilio doesn't have a built-in idempotency key; use application-level dedup (check sent log before calling, or maintain a sent-message ID table). | Application-managed | Twilio has no native per-request deduplication. Application-level dedup: before calling messages.create(), check if a message was already sent for this run_id + recipient combination. Store the Twilio SID after send. On retry, if SID exists, skip the send. |
| Resend | No built-in idempotency key. Application-level dedup required (store sent email IDs against your run ID). | Application-managed | Resend has no per-request deduplication. Application-level dedup: before calling resend.emails.send(), check if an email was already sent for this run_id + recipient combination. Store the Resend email ID after send. |
| Stripe Billing (invoices) | Idempotency-Key header on POST /v1/invoices |
24 hours | Same as Stripe above — covers invoice creation, not finalization or payment. Creating and finalizing are separate API calls, each needing their own idempotency key. |
The gap idempotency keys don't cover: the 24-hour cliff
Stripe's 24-hour idempotency window is the most common point of failure for long-running or stuck agents. If a billing pipeline fails at step 500 of 1,000 and isn't retried until 26 hours later (after incident investigation, on-call escalation, and remediation), the idempotency keys for steps 1–500 have expired. Retrying the pipeline from scratch — or even from step 500 — will re-charge customers 1–499, who were already successfully charged in the first run.
The idempotency key cliff means you need a second layer of protection: a spend cap that prevents the total dollar amount of vendor calls from exceeding a threshold per run, regardless of how many retries occur or when they happen. A vault key with a $5,000 cap on a 1,000-customer billing run at $10 each would catch a 26-hour-late retry that tries to re-charge all 1,000 customers: the first 500 new charges fire (the previous 500 already happened in run 1 but the cap doesn't know about them), and the cap stops the next charge.
This isn't a complete solution — the cap doesn't know about charges made in a previous run with a different vault key — but it limits the blast radius of an idempotency expiry event from "re-charged every customer" to "re-charged at most N dollars worth of customers."
Scoping vault keys as idempotency backstops
import stripe
import requests
def run_billing_step(run_id: str, customers: list, budget_usd: float = 1000.0):
# Issue vault key once per run — the cap is the backstop if idempotency keys expire
resp = requests.post(
"https://proxy.keybrake.com/vault/keys",
headers={"Authorization": f"Bearer {KEYBRAKE_API_KEY}"},
json={
"vendor": "stripe",
"daily_usd_cap": budget_usd,
"allowed_endpoints": ["POST /v1/payment_intents"],
"expires_in": "48h", # longer than Stripe's 24h idempotency window
"agent_run_label": f"billing/{run_id}",
},
)
vault_key = resp.json()["vault_key"]
stripe.api_key = vault_key
stripe.api_base = "https://proxy.keybrake.com/stripe"
for customer in customers:
# Idempotency key is stable across retries of this specific run
idempotency_key = f"{run_id}-{customer['id']}"
try:
charge = stripe.PaymentIntent.create(
amount=customer["amount_cents"],
currency="usd",
customer=customer["id"],
idempotency_key=idempotency_key,
)
except stripe.error.RateLimitError as e:
if "cap_exhausted" in str(e):
raise RuntimeError(f"Spend cap hit at customer {customer['id']} — stopping run")
raise # Transient rate limit — let workflow engine retry
The vault key TTL is set to 48 hours — longer than Stripe's 24-hour idempotency window. This means a run that's retried up to 48 hours after its first attempt still has a live cap, limiting re-charges to the configured dollar ceiling even if idempotency keys for early customers have expired. The agent_run_label ties every vendor call in the audit log to the specific run ID, so you can reconstruct exactly which customers were charged and when — essential for incident investigation when idempotency does fail.
How Keybrake fits
Keybrake acts as the idempotency backstop that operates at the dollar level rather than the per-request level. Idempotency keys prevent individual duplicate calls; vault key caps prevent duplicate spend at the run level. Together they form two-layer protection: idempotency keys handle the common case (same call, same window), caps handle the edge cases (expired keys, different keys, stuck loops that outlast the deduplication window). The Keybrake audit log gives you a complete record of every vendor call with its amount, timestamp, and run label — so when an idempotency failure does occur, you have the data to identify which customers were doubled and by how much.
Related questions
What if two different workflow runs process the same customer at the same time (race condition)?
The idempotency key pattern described here (run_id + customer_id) is per-run — it doesn't prevent two different runs from both charging the same customer. For this you need application-level mutual exclusion: either a database row-level lock on the customer record before the Stripe call, or a distributed lock (Redis, DynamoDB conditional writes) keyed on the customer ID. The workflow engine's concurrency controls can limit how many runs execute simultaneously but don't prevent two sequential runs from independently charging the same customer. Vault key caps help: if run A charges customer X and run B also charges customer X, the cap for run B may be hit sooner — but this is a heuristic, not a guarantee, since the runs have separate vault keys with separate caps.
Does using a vault key proxy change Stripe's idempotency behavior?
The Keybrake proxy forwards the Idempotency-Key header to Stripe unchanged. Stripe's deduplication checks the key against the same API key that was used for the original request. Since the vault key resolves to the same underlying Stripe secret key on every call through the proxy, Stripe's deduplication works correctly: same vault key + same Idempotency-Key header + same endpoint = deduplicated. The proxy doesn't change Stripe's idempotency semantics.
How do I handle Twilio deduplication without a native idempotency key?
Store a sent-message record in your database before the Twilio call (not after): write (run_id, recipient_phone, status="pending") with a unique constraint on (run_id, recipient_phone). On retry, if the insert fails with a unique violation, check whether the message was sent (status="sent") — if yes, skip the Twilio call. If status is "pending" (previous attempt crashed before completing), make the Twilio call and update status to "sent" with the Twilio SID. This write-before-call pattern with idempotent database updates handles all retry scenarios without Twilio native dedup.
Further reading
- Temporal AI agent API key — Temporal's retry model and how WorkflowExecution.ID maps to a stable idempotency key across retries and continues-as-new.
- Inngest AI agent API key — Inngest's step.run() memoization and how runId produces stable idempotency keys across function retries.
- AI agent API key rotation — key rotation is the other source of idempotency failure: rotating the Stripe key mid-run voids idempotency keys issued under the previous key.
- AI agent rate limiting — the relationship between idempotency and rate limiting: rate limit errors are retriable, and the retry must use the same idempotency key as the original call.