Modal Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance
Modal's @app.function(retries=3) is how you make a serverless billing function resilient. It is also the most reliable mechanism for silently firing duplicate Stripe charges when any line of code after the charge raises an exception.
Modal is the cloud compute platform of choice for teams running AI inference workloads and agent pipelines at scale — pay-per-second containers, GPU access in milliseconds, and a Python-native API that makes distributed workloads feel local. When those workloads touch Stripe — agent-triggered charges, usage-based billing, subscription renewals — Modal's retry primitives, shared secret model, and concurrent execution patterns introduce billing failure modes that do not appear in local testing and are invisible in Modal's function logs until a customer notices they have been charged twice.
This post covers three failure modes specific to Modal's architecture: @app.function(retries=N) re-running the entire callable including the Stripe charge on any downstream exception, modal.Secret.from_name() injecting the same unrestricted Stripe key into every concurrent invocation spawned by .map() or .starmap(), and Modal web endpoints serving the same charge request twice when a client retries on a delayed response. Each section includes Python code and the governance pattern that closes it — content-hash idempotency keys at the Stripe layer and per-invocation vault keys via the Keybrake proxy at the key-management layer. A gap analysis closes the post with four additional Modal-specific edge cases.
Failure mode 1: @app.function(retries=N) re-fires Stripe charge on downstream exception
Modal's retries parameter on @app.function specifies how many times Modal should retry a function invocation when it raises an unhandled exception. The intent for a billing function is to handle transient infrastructure failures — a database write timeout, a flaky internal RPC, a momentary network error talking to a data warehouse. The problem is that Modal retries the entire Python callable from the first line: there is no checkpoint within a function invocation, and Modal has no awareness of which side effects completed successfully before the exception was raised.
# billing.py — UNSAFE: retries re-fire stripe.charges.create() on any downstream failure
import modal
import stripe
import os
app = modal.App("billing-agent")
@app.function(
retries=3,
secrets=[modal.Secret.from_name("stripe-prod")], # injects STRIPE_SECRET_KEY
)
def charge_customer(customer_id: str, amount_cents: int, billing_period: str) -> str:
stripe.api_key = os.environ["STRIPE_SECRET_KEY"] # shared production key
# Charge succeeds — Stripe returns ch_A
charge = stripe.charges.create(
amount=amount_cents,
currency="usd",
customer=customer_id,
description=f"Subscription {billing_period}",
# No idempotency_key — every retry = new Stripe charge object
)
# If this write raises a timeout, Modal retries the full function:
# Retry 1: stripe.charges.create() runs again → ch_B
# Retry 2: → ch_C. Retry 3: → ch_D. Four charges total.
write_charge_to_database(customer_id, charge["id"], billing_period)
return charge["id"]
The failure sequence: stripe.charges.create() returns ch_A. write_charge_to_database() raises a psycopg2.OperationalError on a connection timeout. Modal catches the unhandled exception, applies its exponential backoff, and re-runs the function from the beginning. On retry 1, stripe.charges.create() fires again — Stripe has no record linking this request to the prior one — and creates ch_B. With retries=3 and a persistent database issue, the customer is charged four times. Modal's function logs show three failed attempts and a final failure event. The duplicate charges appear only in the Stripe Dashboard.
This pattern is acutely dangerous because the most likely failure point is not Stripe — Stripe's API availability is higher than any downstream database or internal service. Any transient error in the recording step triggers the retry chain, and because the Stripe call precedes the write, every retry re-fires it. The Stripe Dashboard does show duplicate charges, but only if someone is actively looking; there is no automatic alert. Teams typically discover the issue only when a customer disputes a charge or reconciliation fails at month-end.
The fix: content-hash idempotency key + vault key per invocation
The idempotency key must be derived from the billing parameters — not generated at function entry with uuid.uuid4(), which produces a different value on every retry. A SHA-256 hash of (customer_id, amount_cents, billing_period) is stable across every retry of the same invocation, so Stripe deduplicates all retries into the original ch_A regardless of how many times the retries budget fires.
# billing.py — SAFE: content-hash idempotency key + vault key per invocation
import modal
import stripe
import hashlib
import os
app = modal.App("billing-agent")
def billing_idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
raw = f"{customer_id}:{amount_cents}:{billing_period}:modal-billing"
return hashlib.sha256(raw.encode()).hexdigest()[:32]
@app.function(
retries=3,
secrets=[modal.Secret.from_name("keybrake-vault")], # injects VAULT_KEY env var
)
def charge_customer(
customer_id: str,
amount_cents: int,
billing_period: str,
) -> str:
# vault_key is scoped to POST /v1/charges only, daily cap = amount_cents + 10%
stripe_client = stripe.StripeClient(
os.environ["VAULT_KEY"],
base_url="https://proxy.keybrake.com/stripe",
)
idempotency_key = billing_idempotency_key(customer_id, amount_cents, billing_period)
# Same key on every retry — Stripe returns ch_A without creating ch_B, ch_C, ch_D
charge = stripe_client.charges.create(
params={
"amount": amount_cents,
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"metadata": {"billing_period": billing_period},
},
options={"idempotency_key": idempotency_key},
)
write_charge_to_database(customer_id, charge.id, billing_period)
return charge.id
With this pattern, every retry of the same invocation produces the same idempotency key. Stripe recognizes it and returns the original ch_A without creating a new charge. The vault key's daily spend cap provides the backstop: if a bug bypasses the idempotency key for some reason, the cap limits total exposure to slightly above one expected charge rather than to three or four times the expected charge.
Failure mode 2: Secret.from_name() shares one Stripe key across all concurrent .map() invocations
Modal's .map() and .starmap() methods distribute a function call across a list of inputs, running each invocation in a separate container — potentially all in parallel. This is the canonical Modal pattern for billing an entire customer cohort in one shot: one call to charge_customer.map(customers) fires N concurrent invocations that collectively process the entire list faster than any sequential loop.
The problem is that modal.Secret.from_name("stripe-prod") injects the same STRIPE_SECRET_KEY into every invocation in the map. There is no per-invocation or per-customer spend cap at the Stripe key level. A single error in the amount_cents calculation — a missing division, an off-by-one in unit conversion, a stale price object from a cached lookup — will charge every customer in the cohort the wrong amount simultaneously, with no circuit breaker and no way to halt the in-progress map without canceling the entire job.
# orchestrate.py — UNSAFE: one shared key, no per-invocation spend cap
import modal
app = modal.App("billing-orchestrator")
@app.function(secrets=[modal.Secret.from_name("stripe-prod")])
def charge_customer(customer: dict) -> str:
import stripe, os, hashlib
stripe.api_key = os.environ["STRIPE_SECRET_KEY"] # same key in all N containers
# Bug: amount is in dollars, not cents — charges $200 instead of $2.00
charge = stripe.charges.create(
amount=customer["amount_dollars"], # wrong unit — should be amount_cents
currency="usd",
customer=customer["id"],
)
return charge["id"]
@app.local_entrypoint()
def main():
customers = get_customer_cohort() # returns 500 customers
# All 500 invocations share the same STRIPE_SECRET_KEY.
# Bug charges each customer $200 instead of $2 — $100,000 total before anyone notices.
# No per-customer cap, no aggregate cap, no circuit breaker.
results = list(charge_customer.map(customers))
The concurrency amplifies the blast radius. A sequential loop with the same bug would at least allow manual intervention after the first few wrong charges. A .map() across 500 customers fires all 500 concurrently; by the time the first few Stripe webhook callbacks arrive and any monitoring alert fires, most of the charges are already created.
The fix requires per-invocation vault keys, each scoped to a single customer's expected charge amount, issued by the orchestrating function before the map is dispatched. Keybrake issues each vault key with a daily spend cap equal to the expected charge for that customer; even if the amount calculation is wrong, each invocation can charge at most the expected amount before the proxy rejects further charges from that key.
# orchestrate.py — SAFE: per-invocation vault keys, each capped at one customer's amount
import modal
import os
import httpx
app = modal.App("billing-orchestrator")
def issue_vault_key(customer_id: str, amount_cents: int) -> str:
"""Issue a Keybrake vault key scoped to one customer's charge amount."""
resp = httpx.post(
"https://proxy.keybrake.com/admin/vault_keys",
headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_ADMIN_KEY']}"},
json={
"label": f"billing-{customer_id}",
"vendor": "stripe",
"allowed_endpoints": ["POST /v1/charges"],
"daily_usd_cap": round(amount_cents / 100 * 1.1, 2), # cap at amount + 10%
"expires_in_seconds": 3600,
},
)
return resp.json()["vault_key"]
@app.function(secrets=[modal.Secret.from_name("keybrake-admin")])
def charge_customer(customer_id: str, amount_cents: int, billing_period: str, vault_key: str) -> str:
import stripe, hashlib
stripe_client = stripe.StripeClient(
vault_key, # unique per customer, capped at this customer's amount
base_url="https://proxy.keybrake.com/stripe",
)
raw = f"{customer_id}:{amount_cents}:{billing_period}:modal-billing"
idempotency_key = hashlib.sha256(raw.encode()).hexdigest()[:32]
charge = stripe_client.charges.create(
params={
"amount": amount_cents,
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
},
options={"idempotency_key": idempotency_key},
)
return charge.id
@app.local_entrypoint()
def main():
customers = get_customer_cohort()
billing_period = "2026-06"
# Issue one vault key per customer before dispatching the map.
# Each key is capped at that customer's amount — a bad amount_cents calculation
# can overcharge one customer by at most 10% before the proxy rejects the charge.
keyed_customers = [
(c["id"], c["amount_cents"], billing_period, issue_vault_key(c["id"], c["amount_cents"]))
for c in customers
]
results = list(charge_customer.starmap(keyed_customers))
This pattern means each invocation carries its own vault key, and each vault key can authorize charges up to slightly above one customer's expected billing amount. A bug in the amount calculation can now affect one customer at most before the proxy's cap kicks in. The orchestrating function issues all vault keys before dispatching the map, so the total authorized spend is bounded by the sum of all expected charges — not unbounded by whatever key rotation interval Stripe's rate limiting allows.
Failure mode 3: Modal web endpoints receive the same charge request twice on client retry
Modal web endpoints — created with @app.function() plus the @modal.web_endpoint() decorator — run in Modal containers that can handle concurrent requests. A Modal web endpoint serving as a billing API works well for agent-triggered charges: the agent calls POST /bill, Modal spins up a container (or routes to a warm one), and the endpoint calls Stripe. The billing failure occurs when the caller retries the HTTP request.
The retry scenario: the agent calls POST /bill. Modal is cold-starting a container for the first invocation (typical cold start: 1–4 seconds). The agent's HTTP client times out after 3 seconds and retries the request. Modal now has two concurrent requests for the same billing event — one being handled by the cold-starting container that just came up, and one freshly dispatched. Both containers call stripe.charges.create() without an idempotency key, because the idempotency key was not passed by the caller and is not derived server-side. Two charges are created: the first container creates ch_A, the second creates ch_B.
# UNSAFE: web endpoint derives no idempotency key server-side
import modal
from modal import web_endpoint
import stripe, os
app = modal.App("billing-api")
@app.function(secrets=[modal.Secret.from_name("stripe-prod")])
@web_endpoint(method="POST")
def bill(body: dict) -> dict:
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
# Two concurrent requests (agent retry on cold-start timeout) both reach here.
# Neither has an idempotency key. Both create a Stripe charge.
charge = stripe.charges.create(
amount=body["amount_cents"],
currency="usd",
customer=body["customer_id"],
description=body.get("description", "Agent charge"),
# No idempotency_key — concurrent retries = duplicate charges
)
return {"charge_id": charge["id"], "status": "ok"}
The key distinction from the retry failure mode is that here the caller controls the retry, not Modal. Modal does not deduplicate concurrent HTTP requests to a web endpoint — it routes each to an available container and expects the application to handle idempotency. The fix is to derive the idempotency key server-side from the request parameters, so that even if two concurrent requests arrive for the same billing event, both produce the same idempotency key and Stripe deduplicates them.
# SAFE: server-side idempotency key derived from request body
import modal
from modal import web_endpoint
import stripe, hashlib, os
app = modal.App("billing-api")
@app.function(secrets=[modal.Secret.from_name("keybrake-vault")])
@web_endpoint(method="POST")
def bill(body: dict) -> dict:
customer_id = body["customer_id"]
amount_cents = int(body["amount_cents"])
billing_period = body["billing_period"]
stripe_client = stripe.StripeClient(
os.environ["VAULT_KEY"], # scoped to POST /v1/charges, daily cap set at key creation
base_url="https://proxy.keybrake.com/stripe",
)
# Derived from request params — stable across concurrent retries of the same event
raw = f"{customer_id}:{amount_cents}:{billing_period}:modal-billing"
idempotency_key = hashlib.sha256(raw.encode()).hexdigest()[:32]
charge = stripe_client.charges.create(
params={
"amount": amount_cents,
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"metadata": {"billing_period": billing_period},
},
options={"idempotency_key": idempotency_key},
)
return {"charge_id": charge.id, "status": "ok"}
With the idempotency key derived server-side, both concurrent requests — the original and the retry — produce the same key. The first request to reach Stripe creates ch_A. The second request recognizes the idempotency key and returns the same ch_A response without creating ch_B. The caller receives two successful responses with the same charge_id, which is the correct behavior for an idempotent billing endpoint.
Approach comparison
| Approach | Retry protection | Per-invocation spend cap | Concurrent dedup | Audit log | One-click revoke |
|---|---|---|---|---|---|
Raw STRIPE_SECRET_KEY in Modal Secret |
None | None | None | Stripe Dashboard only | Full key rotation required |
| Idempotency key only (no vault key) | Yes (Stripe dedup) | None | Yes (Stripe dedup) | Stripe Dashboard only | Full key rotation required |
| Vault key only (no idempotency key) | Partial (cap limits damage) | Yes | Partial (cap limits damage) | Keybrake audit log | Instant (vault key revoke) |
| Idempotency key + vault key via Keybrake proxy | Yes | Yes | Yes | Keybrake + Stripe both | Instant |
Gap analysis: four additional Modal billing edge cases
1. @app.function(allow_concurrent_inputs=N) multiplies the dedup risk
Modal supports serving multiple concurrent inputs per container via the allow_concurrent_inputs parameter. With allow_concurrent_inputs=10, a single container processes up to 10 invocations simultaneously in async coroutines. If a billing function uses allow_concurrent_inputs without async-safe Stripe client initialization — for example, setting stripe.api_key as a module-level assignment that async coroutines share — one coroutine can overwrite the key while another is mid-request. This is not a Stripe-level deduplication failure but a credential contamination failure: invocation A may charge using invocation B's vault key. Issue each vault key per invocation and use stripe.StripeClient(vault_key) as an instance (not the module-level singleton) to isolate credentials across concurrent coroutines.
2. @app.function(schedule=modal.Cron(...)) with overlapping runs
Modal supports cron-scheduled functions via modal.Cron. If a billing function is scheduled to run monthly and a single run takes longer than the cron interval (uncommon but possible for large cohorts), Modal may spawn a second concurrent run before the first completes. Both runs iterate over the same customer list and both call Stripe without cross-run coordination. Content-hash idempotency keys eliminate the duplicate charges, but the two concurrent runs will both log "processed" entries for each customer, doubling the write volume to any downstream billing database. The idempotency key closes the Stripe risk; a database-level unique constraint on (customer_id, billing_period) closes the double-write risk.
3. Modal Function.spawn() and gather() fire-and-forget gap
Modal's .spawn() method launches a function asynchronously and returns a FunctionCall handle without waiting for the result. When a billing orchestrator uses charge_customer.spawn() for each customer and then calls modal.functions.gather(*handles) to collect results, a failure in gather() — for example, a timeout in the gather itself — leaves the spawned invocations running. If the orchestrator retries the entire orchestration flow in response to the gather timeout, it spawns new invocations for all customers even though the original invocations are still executing. Content-hash idempotency keys protect against duplicate Stripe charges from the parallel execution, but the orchestrator should track completed invocation IDs before retrying to avoid exponential spawn growth on repeated orchestration failures.
4. Modal NetworkFileSystem idempotency key cache invalidation
Some Modal billing implementations cache idempotency keys or charge IDs in a modal.NetworkFileSystem or a modal.Volume to implement pre-flight deduplication: "check if this customer was already charged this billing_period before calling Stripe." This pattern works reliably for sequential runs but introduces a race condition in concurrent runs: two invocations for the same customer (from a retry or concurrent map) both read the NFS cache simultaneously, both find no entry, and both proceed to charge Stripe. The cache write happens after the charge, so both invocations write to the NFS cache successfully — and both create a Stripe charge. The content-hash idempotency key at the Stripe layer closes this race condition regardless of what the local cache says, making the NFS dedup cache a convenient read optimization rather than the primary safety mechanism.
Pytest enforcement suite
# tests/test_modal_billing_governance.py
import pytest
import hashlib
import stripe
from unittest.mock import patch, MagicMock
def billing_idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
raw = f"{customer_id}:{amount_cents}:{billing_period}:modal-billing"
return hashlib.sha256(raw.encode()).hexdigest()[:32]
def test_idempotency_key_deterministic():
"""Same inputs always produce the same key — retries are safe."""
k1 = billing_idempotency_key("cus_ABC", 2000, "2026-06")
k2 = billing_idempotency_key("cus_ABC", 2000, "2026-06")
assert k1 == k2
def test_idempotency_key_distinct_across_customers():
"""Different customers produce different keys — no cross-customer dedup."""
k1 = billing_idempotency_key("cus_ABC", 2000, "2026-06")
k2 = billing_idempotency_key("cus_XYZ", 2000, "2026-06")
assert k1 != k2
def test_idempotency_key_distinct_across_periods():
"""Different billing periods produce different keys."""
k1 = billing_idempotency_key("cus_ABC", 2000, "2026-06")
k2 = billing_idempotency_key("cus_ABC", 2000, "2026-07")
assert k1 != k2
def test_stripe_client_uses_vault_key_not_env():
"""Stripe client must use vault_key arg, not module-level stripe.api_key."""
vault_key = "vault_key_test_abc123"
with patch("stripe.StripeClient") as mock_client_cls:
mock_client = MagicMock()
mock_client_cls.return_value = mock_client
mock_client.charges.create.return_value = MagicMock(id="ch_test")
client = stripe.StripeClient(vault_key, base_url="https://proxy.keybrake.com/stripe")
mock_client_cls.assert_called_once_with(vault_key, base_url="https://proxy.keybrake.com/stripe")
def test_map_invocations_use_distinct_vault_keys():
"""Each customer in a .map() must receive a distinct vault key."""
customers = [{"id": "cus_A", "amount_cents": 2000}, {"id": "cus_B", "amount_cents": 5000}]
vault_keys = {c["id"]: f"vault_key_{c['id']}_unique" for c in customers}
# All vault keys must be distinct
assert len(set(vault_keys.values())) == len(customers)
FAQ
Does Modal's built-in retries differ from try/except retry loops inside the function?
Yes. An explicit try/except loop inside the function body lets you call the Stripe charge only once, outside the retry scope, and retry only the downstream write step. Modal's retries parameter retries the entire function callable, including any Stripe calls at the top. If you use Modal's built-in retry parameter, content-hash idempotency keys are mandatory. If you manage retries yourself inside the function body, you can narrow the retry scope to the failing step and avoid the duplicate charge entirely — but you lose Modal's automatic retry reporting.
Can I pass idempotency_key through Modal's job queue to avoid re-computation?
Yes. If the orchestrator issues an idempotency key before dispatching the .map() and passes it as a parameter, the function uses the pre-issued key on every invocation — including retries. This is equivalent to deriving the key server-side from the billing parameters and is safe as long as the key is derived from the billing parameters (not generated with uuid4()) so that retries produce the same key.
Do I need a vault key per function invocation if I'm only processing one customer per invocation?
Yes. The vault key's spend cap is the mechanism that limits blast radius per invocation. Without it, a bug in any single invocation can make arbitrary charges against the production Stripe key. With a per-invocation vault key capped at the expected charge amount, the worst-case exposure per invocation is bounded. The administrative overhead of issuing vault keys is low: the Keybrake API issues a key in one HTTP call, and keys can be pre-issued for all customers before the map is dispatched.
How does the Keybrake proxy handle Modal cold starts?
The Keybrake proxy is a persistent HTTPS endpoint outside Modal — it does not share Modal's cold start behavior. When your Modal function starts up and makes its first request to proxy.keybrake.com/stripe/v1/charges, the proxy is already running and responds within the proxy's normal latency (typically <10ms overhead). The cold start delay only affects the Modal container, not the proxy.
What happens if the Keybrake proxy is unreachable during a Modal function invocation?
The stripe.StripeClient call will raise a stripe.error.APIConnectionError. This propagates as an unhandled exception in the Modal function, which triggers Modal's retry logic (if configured). Because the charge was never submitted to Stripe — the proxy rejected the connection before forwarding — the content-hash idempotency key ensures the retry is safe. The customer is not charged until the proxy is reachable and the charge request completes successfully.
Can I use a single vault key for all customers in a .map() if I set a high enough spend cap?
You can, but it eliminates the per-customer blast-radius isolation that per-invocation keys provide. A single shared vault key with a cap of $10,000 for a 500-customer billing run still allows any individual invocation to charge up to $10,000 (until the aggregate cap is reached) if a bug produces a wrong amount. Per-invocation keys capped at each customer's expected amount limit each invocation's worst-case exposure to slightly above one charge, regardless of what the others do.
Add spend caps and audit logs to your Modal Stripe integration
Keybrake issues scoped vault keys for your Modal billing functions — one key per invocation, each capped at the expected charge amount. The proxy enforces limits, logs every Stripe call, and gives you a kill-switch that revokes a key without rotating your production secret.
← All posts · Dagster Stripe integration · Celery Stripe integration · Airflow Stripe integration