Modal · AI agents · API key security

Modal AI agent API key: spend caps for serverless agent functions

Modal makes deploying Python agent functions trivially easy — add a decorator, call .remote(), and Modal handles containers, scaling, and GPU allocation. That ease of scale is exactly what makes uncapped API keys dangerous: a Modal function that calls Stripe can go from one instance to fifty in seconds with no per-invocation spending control. This page covers the specific risk profile of Modal AI agents and the vault-key pattern that adds spending guardrails without changing how you use Modal Secrets.

TL;DR

Modal functions are stateless and infinitely parallelizable — both are excellent properties for compute, but they mean every concurrent invocation hits the same vendor API key with no per-run budget. A vault key proxy adds the per-invocation dimension: each function invocation issues a short-lived vault key with its own dollar cap, uses it to call the vendor through the proxy, and the key expires when the function returns. No shared mutable credential state across concurrent runs.

How Modal AI agents call vendor APIs

In Modal, agent code lives inside functions decorated with @app.function(). You call them with .remote() for a single invocation or .map() for parallel execution across an iterable. The pattern looks like:

import modal, stripe, os

app = modal.App("billing-agent")

stripe_secret = modal.Secret.from_name("stripe-secret")  # STRIPE_SECRET_KEY

@app.function(secrets=[stripe_secret])
def charge_customer(customer_id: str, amount_cents: int) -> dict:
    stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # same key for every invocation
    return stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
    )

@app.local_entrypoint()
def main():
    customer_ids = load_customer_batch()  # could be 5 or 5,000
    list(charge_customer.map(customer_ids, kwargs={"amount_cents": 2999}))

The modal.Secret pattern is correct — never hardcode credentials. The problem is structural: every invocation of charge_customer uses the same full-access STRIPE_SECRET_KEY, and .map() can parallelize that to hundreds of concurrent Stripe calls with no per-batch or per-invocation cap.

Three gaps Modal's native tooling doesn't fill for vendor spend control

GapWhat happens in practiceModal's answer
No per-invocation spend cap A .map() call over a customer list with a data bug processes 5,000 customers instead of 50. Modal faithfully runs all 5,000 invocations in parallel. The cap on damage is your Stripe account limit, not the $150 you expected for this batch job. None. Modal logs show invocation counts and durations, not vendor dollar amounts.
No per-invocation revoke You call modal app stop after noticing the batch is misbehaving. Invocations that have already started may be mid-execution and have already made Stripe calls. Rotating the Stripe secret in Modal revokes it for every future invocation — but doesn't stop the ones that already have the key in memory. App stop halts new invocations but cannot revoke credentials from running ones.
No per-call audit with invocation context Modal function logs show the call inputs and outputs, but don't parse dollar amounts from Stripe responses or cross-reference individual Stripe charges with the Modal invocation ID that made them. Function logs and traces. No vendor cost parsing, no invocation-to-charge attribution.

The concurrency risk: why Modal makes this worse than a cron job

A cron-based billing job runs sequentially — one customer at a time — so a data bug processes N bad records one by one, giving you time to notice and cancel. Modal's .map() runs all N in parallel. A 5,000-item list is 5,000 concurrent Stripe calls in the time it takes to process one.

This isn't a Modal design flaw — it's the correct behavior for CPU-bound batch work. It becomes a liability only when the batch items involve vendor API calls that cost money. The vault key pattern adds the dimension Modal is missing: per-invocation spend caps that are enforced by the proxy, not by your function code.

Scoping vault keys per invocation in Modal

The vault key is issued at the start of each function invocation and expires when the function returns. With Modal's stateless execution model, this is natural:

import modal, httpx, stripe, os

app = modal.App("billing-agent")

keybrake_secret = modal.Secret.from_name("keybrake-secret")  # KEYBRAKE_API_KEY
# STRIPE_SECRET_KEY lives in Keybrake, not in Modal

@app.function(secrets=[keybrake_secret])
def charge_customer(customer_id: str, amount_cents: int) -> dict:
    # Issue a short-lived vault key for this invocation only
    r = httpx.post(
        "https://proxy.keybrake.com/vault/keys",
        headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_API_KEY']}"},
        json={
            "vendor": "stripe",
            "daily_usd_cap": 50.0,                  # cap per invocation
            "allowed_endpoints": ["POST /v1/payment_intents"],
            "expires_in": "5m",                      # short — function runs in seconds
            "agent_run_label": f"modal-billing/{customer_id}",
        },
    )
    vault_key = r.json()["vault_key"]

    stripe.api_key = vault_key
    stripe.api_base = "https://proxy.keybrake.com/stripe"
    return stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        idempotency_key=f"modal-{customer_id}-{amount_cents}",
    )

@app.local_entrypoint()
def main():
    customer_ids = load_customer_batch()
    list(charge_customer.map(customer_ids, kwargs={"amount_cents": 2999}))

Each concurrent invocation now has its own vault key with its own $50 cap. A 5,000-item map with a data bug that tries to charge every customer $2,999 will hit the per-invocation cap on the first invocation that exceeds $50 and return a 429 — which your error handling catches — instead of completing 5,000 Stripe charges in parallel.

The Modal Secret configuration only needs KEYBRAKE_API_KEY — the Stripe secret never touches your Modal environment.

Modal cron schedules and the compounding risk

Modal supports cron-scheduled functions via @app.function(schedule=modal.Cron("0 9 * * *")). A daily billing function that runs correctly 364 days a year and produces a data error on day 365 will run all concurrent invocations before anyone is watching. The vault key's per-invocation cap is the control that fires automatically — no operator needed, no alert latency.

How Keybrake fits

Keybrake is the proxy layer between your Modal functions and Stripe, Twilio, or Resend. You replace the vendor's modal.Secret with a single KEYBRAKE_API_KEY secret, and issue short-lived vault keys inside each function invocation. The real vendor secrets stay in Keybrake, never in Modal. Each invocation has an independent dollar cap, and the audit log records every call with the customer ID or run label you set — queryable by vendor, date, and invocation context.

Get early access

Related questions

Does issuing a vault key per invocation add significant latency to Modal functions?

The vault key issuance call (POST /vault/keys) adds one HTTP round-trip at function start. For functions that run for seconds or more, this is negligible — typically under 50ms. For sub-second functions, the overhead is proportionally higher. In those cases, you can issue a vault key per batch (outside the .map()) rather than per invocation, accepting that the cap is per-batch rather than per-item. The tradeoff is explicit: slightly less granular enforcement in exchange for lower overhead.

Can I use Modal containers as a long-running agent rather than a one-off function?

Yes — Modal supports @app.cls() container classes that can hold state across method calls. For long-running agent containers, issue the vault key in the @modal.enter() lifecycle method (called once when the container starts) and store it in self. The vault key's TTL should match the expected container lifetime. If the container runs longer than the key TTL, you'll need to re-issue in the method body — a check against the key's expires_at returned at issue time works well here.

What's the right per-invocation cap for a Modal billing function?

Set the cap to the maximum legitimate spend for a single invocation, plus a small buffer. If each invocation charges one customer for a maximum of $99.99, a cap of $120 gives you a 20% buffer for legitimate edge cases while ensuring that a data error producing a $9,999 charge returns a 429 rather than completing. The cap is not a budget — it's a circuit breaker. Err toward tight rather than generous: too-tight caps produce catchable exceptions you'll notice; too-loose caps defeat the purpose.

Further reading