FastAPI · AI agents · API key security

FastAPI AI agent API key: scoping vendor calls in async agent endpoints

FastAPI is the dominant Python framework for AI agent backends — its async-first design, automatic OpenAPI generation, and dependency injection system make it the natural host for agent HTTP endpoints, tool-call handlers, and webhook receivers. FastAPI's asyncio concurrency model means multiple agent requests are handled simultaneously in the same process. Each concurrent request can call Stripe, Twilio, or Resend with no per-request dollar cap. A single misconfigured prompt or recursive tool-call loop exhausts daily budget across all concurrent agent runs. This page covers the vault-key pattern that scopes vendor API spend per FastAPI agent request.

TL;DR

Issue a vault key in a FastAPI dependency using Depends() — the dependency runs once per request, issues a Keybrake vault key scoped to that request's budget, and injects it into the route handler. The route's agent tools use the vault key as the Stripe credential via stripe.api_key = vault_key and stripe.base_url = "https://proxy.keybrake.com/stripe". When the request completes (or fails), the vault key expires automatically per its TTL. Revoking a runaway request mid-flight is a single DELETE /vault/keys/{key_id} API call from your admin endpoint.

How FastAPI AI agent backends call vendor APIs

A typical FastAPI agent endpoint receives a prompt, runs an LLM loop with tools, and calls Stripe in the tool implementations:

from fastapi import FastAPI
import stripe, os

app = FastAPI()
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # shared across all requests

@app.post("/agent/run")
async def run_agent(request: AgentRequest):
    # LLM tool loop — may call charge_customer() many times
    result = await agent_loop(request.prompt, tools=[charge_customer])
    return {"result": result}

async def charge_customer(customer_id: str, amount_cents: int) -> dict:
    intent = stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
    )
    return {"intent_id": intent.id, "status": intent.status}

This pattern has two risks that FastAPI's concurrency model amplifies. First, the Stripe key is shared across all concurrent requests via the module-level stripe.api_key. Ten simultaneous agent requests each in their own LLM tool loop can each call charge_customer() concurrently — 10 agent runs × an uncapped number of tool calls per run = unbounded Stripe spend, with no per-run limit. Second, a stuck LLM reasoning loop (a model that keeps emitting tool calls because each tool result triggers another) is not time-bounded by Stripe rate limits — it continues until the model's context window fills or your server exhausts its event loop, and every tool call charges the shared key.

Three gaps FastAPI's native tooling doesn't fill for vendor spend control

Gap	What happens in practice	FastAPI's answer
No per-request spend cap	FastAPI has no concept of per-request vendor API budget. A route handler can call Stripe zero or a thousand times depending on the LLM's reasoning. `BackgroundTasks` dispatches additional work after the response is sent — vendor calls in background tasks continue after the HTTP response returns. Rate limiting middleware (SlowAPI, limits) caps request rate, not vendor spend. There is no built-in hook to inspect or stop vendor API calls made inside async route handlers.	FastAPI rate limiting middleware caps HTTP request rate. No per-request vendor dollar cap exists in the framework.
No mid-request vendor revoke without process restart	The Stripe key is a module-level variable. Rotating it requires either an application restart (new processes get the new key; running requests complete with the old one) or a custom thread-safe key swap (fragile, not part of the Stripe SDK contract). Cancelling an async task (`task.cancel()` on the asyncio task) raises `CancelledError`, which may or may not propagate cleanly through the Stripe SDK's blocking HTTP calls depending on the underlying httpx/aiohttp integration. There is no per-request credential that expires independently of the application process.	No per-request vendor credential. Revoking a key affects all concurrent requests using the same `stripe.api_key`.
No per-call audit with request context	FastAPI's built-in logging captures request/response metadata (method, path, status, latency). Stripe's dashboard logs by API key and timestamp. Correlating which FastAPI request triggered which Stripe charge requires adding structured logging with a `request_id` middleware and manually injecting that ID into every Stripe API call's metadata. Teams that don't do this upfront have no way to reconstruct what a specific agent run charged when an incident occurs.	FastAPI middleware can add request IDs. Propagating them into Stripe call metadata and correlating with Stripe's dashboard is manual application work with no built-in support.

The async concurrency amplification risk

FastAPI's async model is the core strength and the core risk. A single uvicorn worker handles many requests concurrently via asyncio. When 20 agent requests arrive simultaneously, 20 concurrent LLM tool loops are active in the same event loop. Each tool loop can call Stripe on every iteration. An agent making 5 Stripe calls per reasoning step with 10 reasoning steps produces 50 Stripe calls per request — 20 concurrent requests produce 1,000 simultaneous Stripe calls from one FastAPI instance.

FastAPI's BackgroundTasks adds another layer. If your agent endpoint fires background processing after returning the HTTP response, those background tasks continue calling Stripe after the client has already received their reply. The request is "done" from the user's perspective, but vendor spend continues accumulating in the background. Standard rate-limit middleware only sees the original request — it doesn't cap background task calls.

Scoping vault keys per FastAPI agent request

from fastapi import FastAPI, Depends
import httpx, stripe, os
from typing import Annotated

app = FastAPI()

KEYBRAKE_BASE = "https://proxy.keybrake.com"
KEYBRAKE_ADMIN_KEY = os.environ["KEYBRAKE_ADMIN_KEY"]

async def get_vault_key(request: AgentRequest) -> str:
    """FastAPI dependency: issues a per-request vault key."""
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"{KEYBRAKE_BASE}/vault/keys",
            headers={"Authorization": f"Bearer {KEYBRAKE_ADMIN_KEY}"},
            json={
                "vendor": "stripe",
                "daily_usd_cap": request.budget_usd,
                "allowed_endpoints": ["POST /v1/payment_intents"],
                "expires_in": "30m",
                "label": f"agent-run-{request.run_id}",
            },
        )
    resp.raise_for_status()
    return resp.json()["vault_key"]

VaultKey = Annotated[str, Depends(get_vault_key)]

@app.post("/agent/run")
async def run_agent(request: AgentRequest, vault_key: VaultKey):
    # Each request gets its own scoped vault key
    result = await agent_loop(
        prompt=request.prompt,
        tools=[make_charge_tool(vault_key)],
    )
    return {"result": result}

def make_charge_tool(vault_key: str):
    async def charge_customer(customer_id: str, amount_cents: int) -> dict:
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                f"{KEYBRAKE_BASE}/stripe/v1/payment_intents",
                headers={"Authorization": f"Bearer {vault_key}"},
                json={
                    "amount": amount_cents,
                    "currency": "usd",
                    "customer": customer_id,
                },
            )

        if resp.status_code == 429 and resp.json().get("code") == "cap_exhausted":
            return {
                "error": "spend_cap_reached",
                "message": "Budget exhausted for this agent run. Do not retry.",
            }

        resp.raise_for_status()
        return resp.json()

    return charge_customer

The get_vault_key dependency runs once per request before the route handler executes. Each request receives a fresh vault key scoped to request.budget_usd — concurrent requests each get their own independent cap. The make_charge_tool(vault_key) closure captures the per-request vault key, so LLM tool calls within the same request share one accumulating cap regardless of how many times the tool fires. When the cap is exhausted, the tool returns a structured error message that instructs the model not to retry — preventing the model from re-calling the tool in a retry loop.

How Keybrake fits

Keybrake is the proxy layer between your FastAPI agent tools and Stripe, Twilio, or Resend. The vault key issued per request replaces the shared stripe.api_key that was previously a module-level variable accessible to all concurrent requests. Each FastAPI request gets its own vault key with its own cap — a runaway LLM tool loop in one request hits its cap and stops without affecting any other concurrent agent run. The real Stripe secret stays in Keybrake and never appears in FastAPI logs, uvicorn access logs, or OpenTelemetry traces. Revoking a specific in-flight request's key is a single DELETE /vault/keys/{key_id} call — effective on the next proxied vendor call within that request, without restarting uvicorn, without affecting other requests.

Get early access

Related questions

How do I prevent the LLM from retrying after cap exhaustion?

Return a structured response from the tool function that explicitly instructs the model not to retry: {"error": "spend_cap_reached", "message": "Budget exhausted for this agent run. Do not retry."}. Most LLM tool-calling implementations (OpenAI function tools, LangChain tools, Pydantic AI tools) surface the tool's return value as a tool message in the conversation. If the model's system prompt includes a general instruction like "If a tool returns spend_cap_reached, stop immediately and report the budget limit to the user," the model will terminate the reasoning loop rather than retrying. Without this, models trained on retry-on-failure patterns may interpret a 429 error as transient and re-call the tool.

How do I track which FastAPI request triggered which Stripe charge?

Use the vault key's label field as your correlation ID. Set it to your request's run_id (or a FastAPI middleware-generated request ID) when issuing the key. Keybrake's audit log records this label on every proxied call. Query the audit log by label = "agent-run-{run_id}" to get all vendor calls made by a specific FastAPI request, with amounts, timestamps, and response codes — without cross-referencing your application logs with Stripe's dashboard.

How does this interact with FastAPI BackgroundTasks?

Pass the vault key explicitly to background tasks rather than letting them inherit a shared key. In your route handler, capture vault_key from the dependency and pass it as an argument to background_tasks.add_task(my_task, vault_key=vault_key). The background task then uses that per-request vault key for any vendor calls — its spend counts against the same cap as the foreground request. This means background task vendor calls are bounded by the same per-request budget. If the background task runs after the vault key's TTL expires, the proxied calls will fail with an expired-key error — size your TTL to cover the expected background task duration.

Can I use this with OpenAI function tools registered in a FastAPI endpoint?

Yes. The vault key is passed into the tool function's closure — the OpenAI SDK doesn't need to know about it. Register your tool functions as closures that capture vault_key, then pass those closures to the OpenAI client.chat.completions.create(tools=[...]) call. When the model emits a tool call, your dispatch logic calls the closure with the model's arguments. The closure uses the captured vault key for vendor API calls through the Keybrake proxy. The model receives the tool result and the cap accumulates — if the cap is exhausted, the tool returns the structured error message and the model stops.