Flask · AI agents · API key management · Python

Flask AI agent API key management: scoped vault keys for agent tool functions

Flask is the most common Python choice for building AI agent tool backends: a lightweight REST API that receives tool calls from a LangChain, OpenAI Agents SDK, or CrewAI agent, executes Python code, calls Stripe or Twilio, and returns a result. The problem with API keys in this pattern is structural: os.environ['STRIPE_KEY'] is set once at process startup and shared across every concurrent request. All twenty users running agents simultaneously use the same Stripe credential. There's no per-user spend cap, no per-request endpoint scope, and no way to stop one user's misbehaving agent without rotating the key for all users. Vault keys solve this by adding per-request credential scoping as a thin layer in front of your existing tool functions.

TL;DR

Store the Keybrake API token in your Flask config (instead of the raw Stripe secret). Add a before_request hook that issues a vault key scoped to the current user's session and stores it in Flask's g object. Tool functions read the vault key from g.vault_key and call https://proxy.keybrake.com/stripe/... instead of the Stripe API directly. Add an after_request hook that revokes the vault key. No changes to tool function signatures; the vault key flows through Flask's request context.

The Flask AI agent pattern

A typical Flask AI agent tool backend looks like this:

import os
import stripe
from flask import Flask, request, jsonify

app = Flask(__name__)
stripe.api_key = os.environ["STRIPE_KEY"]  # shared across ALL requests

@app.post("/tools/charge")
def charge_tool():
    data = request.get_json()
    intent = stripe.PaymentIntent.create(
        amount=data["amount_cents"],
        currency="usd",
        customer=data["customer_id"]
    )
    return jsonify(intent)

This is clean and simple. The problem: stripe.api_key is a module-level global set to the same value for every concurrent request. A buggy agent in one user's session can create charges on behalf of any customer, can call stripe.Refund.create() just as easily as stripe.PaymentIntent.create(), and there's no way to stop that user's agent without rotating the key and disrupting every other active session.

Adding vault keys via Flask request context

The vault key pattern integrates with Flask through before_request / teardown_request hooks and Flask's g context object:

import os, requests
from flask import Flask, request, jsonify, g

app = Flask(__name__)
KEYBRAKE_TOKEN = os.environ["KEYBRAKE_TOKEN"]

@app.before_request
def issue_vault_key():
    # Skip for health check routes that don't call vendor APIs
    if request.path in ["/health", "/metrics"]:
        return

    user_id = request.headers.get("X-User-Id", "anonymous")
    run_id = request.headers.get("X-Agent-Run-Id", "unknown")

    resp = requests.post(
        "https://api.keybrake.com/v1/keys",
        headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"},
        json={
            "label": f"flask-{user_id}-{run_id}",
            "vendor": "stripe",
            "allowed_endpoints": [
                "/v1/payment_intents",
                "/v1/payment_intents/*"
            ],
            "daily_usd_cap": 1000,  # per-request cap
            "expires_in": "5m"
        },
        timeout=5
    )
    resp.raise_for_status()
    key_data = resp.json()
    g.vault_key = key_data["token"]
    g.vault_key_id = key_data["id"]

@app.teardown_request
def revoke_vault_key(exc):
    if hasattr(g, "vault_key_id"):
        try:
            requests.delete(
                f"https://api.keybrake.com/v1/keys/{g.vault_key_id}",
                headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"},
                timeout=5
            )
        except Exception:
            pass  # Key will expire via TTL if revocation fails

@app.post("/tools/charge")
def charge_tool():
    data = request.get_json()
    # Use vault key from request context — not the shared stripe.api_key
    resp = requests.post(
        "https://proxy.keybrake.com/stripe/v1/payment_intents",
        headers={"Authorization": f"Bearer {g.vault_key}"},
        json={
            "amount": data["amount_cents"],
            "currency": "usd",
            "customer": data["customer_id"]
        }
    )
    if resp.status_code == 429 and resp.json().get("code") == "cap_exhausted":
        return jsonify({"error": "spend_cap_exceeded"}), 402
    resp.raise_for_status()
    return jsonify(resp.json())

Vault keys with Flask-based LangChain tool servers

When Flask serves as a LangChain custom tool backend, vault key issuance integrates cleanly with the tool server's request lifecycle. LangChain's RemoteRunnable and LangServe tools send POST requests to your Flask endpoint with the tool call arguments. The vault key is issued in before_request, flows through g, and is revoked in teardown_request — transparent to the LangChain tool definitions:

from langchain.tools import tool
from langchain_core.runnables import RunnableLambda
from flask import g
import requests

# LangChain tool that uses the per-request vault key from Flask's g
@tool
def charge_customer(customer_id: str, amount_cents: int) -> dict:
    """Create a Stripe payment intent for a customer."""
    resp = requests.post(
        "https://proxy.keybrake.com/stripe/v1/payment_intents",
        headers={"Authorization": f"Bearer {g.vault_key}"},
        json={"amount": amount_cents, "currency": "usd", "customer": customer_id}
    )
    resp.raise_for_status()
    return resp.json()

The tool function doesn't know or care that the credential in g.vault_key is a vault key rather than a raw Stripe secret — it sends the same Authorization: Bearer header either way. The proxy handles the policy enforcement and audit logging transparently.

Flask with Celery: vault keys for async agent tasks

Many Flask AI agent backends use Celery for async task processing — the Flask endpoint enqueues a task, and a Celery worker processes it asynchronously. Vault keys require a different lifecycle approach for async tasks, since Flask's g context doesn't persist across the queue boundary:

from celery import Celery
import requests

celery_app = Celery("tasks", broker="redis://localhost:6379/0")

@celery_app.task(bind=True)
def process_billing_task(self, run_id: str, customer_ids: list):
    # Issue vault key at the START of the Celery task — not in Flask's before_request
    resp = requests.post(
        "https://api.keybrake.com/v1/keys",
        headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"},
        json={
            "label": f"celery-billing-{run_id}",
            "vendor": "stripe",
            "allowed_endpoints": ["/v1/payment_intents", "/v1/payment_intents/*"],
            "daily_usd_cap": 5000,
            "expires_in": "30m"  # Longer TTL for async tasks
        }
    )
    vault_key = resp.json()["token"]
    key_id = resp.json()["id"]

    try:
        for customer_id in customer_ids:
            charge_via_proxy(customer_id, vault_key)
    finally:
        requests.delete(
            f"https://api.keybrake.com/v1/keys/{key_id}",
            headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"}
        )

Latency and concurrency considerations

Concern	Impact	Mitigation
Vault key issuance latency	~30–80ms per request (HTTPS API call to api.keybrake.com)	Use async issuance (aiohttp/httpx async) for high-throughput endpoints; or pre-issue keys for session-based workflows with a 15-minute TTL
Issuance failure	before_request fails → Flask returns 500 for all agent requests	Wrap issuance in try/except; on failure, log and optionally fall back to rate-limited direct Stripe call; alert on error rate
Revocation failure	teardown_request silently fails → key expires via TTL	TTL is the safety net; revocation failure is non-critical if TTL is short (5 minutes)
Concurrent workers sharing KEYBRAKE_TOKEN	All workers use the same Keybrake API token to issue vault keys	Expected — the Keybrake API token is the master credential; vault keys are the per-request credentials. Each worker issues its own vault keys independently.

Get early access