Flask · AI agents · API key management · Python
Flask AI agent API key management: scoped vault keys for agent tool functions
Flask is the most common Python choice for building AI agent tool backends: a lightweight REST API that receives tool calls from a LangChain, OpenAI Agents SDK, or CrewAI agent, executes Python code, calls Stripe or Twilio, and returns a result. The problem with API keys in this pattern is structural: os.environ['STRIPE_KEY'] is set once at process startup and shared across every concurrent request. All twenty users running agents simultaneously use the same Stripe credential. There's no per-user spend cap, no per-request endpoint scope, and no way to stop one user's misbehaving agent without rotating the key for all users. Vault keys solve this by adding per-request credential scoping as a thin layer in front of your existing tool functions.
TL;DR
Store the Keybrake API token in your Flask config (instead of the raw Stripe secret). Add a before_request hook that issues a vault key scoped to the current user's session and stores it in Flask's g object. Tool functions read the vault key from g.vault_key and call https://proxy.keybrake.com/stripe/... instead of the Stripe API directly. Add an after_request hook that revokes the vault key. No changes to tool function signatures; the vault key flows through Flask's request context.
The Flask AI agent pattern
A typical Flask AI agent tool backend looks like this:
import os
import stripe
from flask import Flask, request, jsonify
app = Flask(__name__)
stripe.api_key = os.environ["STRIPE_KEY"] # shared across ALL requests
@app.post("/tools/charge")
def charge_tool():
data = request.get_json()
intent = stripe.PaymentIntent.create(
amount=data["amount_cents"],
currency="usd",
customer=data["customer_id"]
)
return jsonify(intent)
This is clean and simple. The problem: stripe.api_key is a module-level global set to the same value for every concurrent request. A buggy agent in one user's session can create charges on behalf of any customer, can call stripe.Refund.create() just as easily as stripe.PaymentIntent.create(), and there's no way to stop that user's agent without rotating the key and disrupting every other active session.
Adding vault keys via Flask request context
The vault key pattern integrates with Flask through before_request / teardown_request hooks and Flask's g context object:
import os, requests
from flask import Flask, request, jsonify, g
app = Flask(__name__)
KEYBRAKE_TOKEN = os.environ["KEYBRAKE_TOKEN"]
@app.before_request
def issue_vault_key():
# Skip for health check routes that don't call vendor APIs
if request.path in ["/health", "/metrics"]:
return
user_id = request.headers.get("X-User-Id", "anonymous")
run_id = request.headers.get("X-Agent-Run-Id", "unknown")
resp = requests.post(
"https://api.keybrake.com/v1/keys",
headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"},
json={
"label": f"flask-{user_id}-{run_id}",
"vendor": "stripe",
"allowed_endpoints": [
"/v1/payment_intents",
"/v1/payment_intents/*"
],
"daily_usd_cap": 1000, # per-request cap
"expires_in": "5m"
},
timeout=5
)
resp.raise_for_status()
key_data = resp.json()
g.vault_key = key_data["token"]
g.vault_key_id = key_data["id"]
@app.teardown_request
def revoke_vault_key(exc):
if hasattr(g, "vault_key_id"):
try:
requests.delete(
f"https://api.keybrake.com/v1/keys/{g.vault_key_id}",
headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"},
timeout=5
)
except Exception:
pass # Key will expire via TTL if revocation fails
@app.post("/tools/charge")
def charge_tool():
data = request.get_json()
# Use vault key from request context — not the shared stripe.api_key
resp = requests.post(
"https://proxy.keybrake.com/stripe/v1/payment_intents",
headers={"Authorization": f"Bearer {g.vault_key}"},
json={
"amount": data["amount_cents"],
"currency": "usd",
"customer": data["customer_id"]
}
)
if resp.status_code == 429 and resp.json().get("code") == "cap_exhausted":
return jsonify({"error": "spend_cap_exceeded"}), 402
resp.raise_for_status()
return jsonify(resp.json())
Vault keys with Flask-based LangChain tool servers
When Flask serves as a LangChain custom tool backend, vault key issuance integrates cleanly with the tool server's request lifecycle. LangChain's RemoteRunnable and LangServe tools send POST requests to your Flask endpoint with the tool call arguments. The vault key is issued in before_request, flows through g, and is revoked in teardown_request — transparent to the LangChain tool definitions:
from langchain.tools import tool
from langchain_core.runnables import RunnableLambda
from flask import g
import requests
# LangChain tool that uses the per-request vault key from Flask's g
@tool
def charge_customer(customer_id: str, amount_cents: int) -> dict:
"""Create a Stripe payment intent for a customer."""
resp = requests.post(
"https://proxy.keybrake.com/stripe/v1/payment_intents",
headers={"Authorization": f"Bearer {g.vault_key}"},
json={"amount": amount_cents, "currency": "usd", "customer": customer_id}
)
resp.raise_for_status()
return resp.json()
The tool function doesn't know or care that the credential in g.vault_key is a vault key rather than a raw Stripe secret — it sends the same Authorization: Bearer header either way. The proxy handles the policy enforcement and audit logging transparently.
Flask with Celery: vault keys for async agent tasks
Many Flask AI agent backends use Celery for async task processing — the Flask endpoint enqueues a task, and a Celery worker processes it asynchronously. Vault keys require a different lifecycle approach for async tasks, since Flask's g context doesn't persist across the queue boundary:
from celery import Celery
import requests
celery_app = Celery("tasks", broker="redis://localhost:6379/0")
@celery_app.task(bind=True)
def process_billing_task(self, run_id: str, customer_ids: list):
# Issue vault key at the START of the Celery task — not in Flask's before_request
resp = requests.post(
"https://api.keybrake.com/v1/keys",
headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"},
json={
"label": f"celery-billing-{run_id}",
"vendor": "stripe",
"allowed_endpoints": ["/v1/payment_intents", "/v1/payment_intents/*"],
"daily_usd_cap": 5000,
"expires_in": "30m" # Longer TTL for async tasks
}
)
vault_key = resp.json()["token"]
key_id = resp.json()["id"]
try:
for customer_id in customer_ids:
charge_via_proxy(customer_id, vault_key)
finally:
requests.delete(
f"https://api.keybrake.com/v1/keys/{key_id}",
headers={"Authorization": f"Bearer {KEYBRAKE_TOKEN}"}
)
Latency and concurrency considerations
| Concern | Impact | Mitigation |
|---|---|---|
| Vault key issuance latency | ~30–80ms per request (HTTPS API call to api.keybrake.com) | Use async issuance (aiohttp/httpx async) for high-throughput endpoints; or pre-issue keys for session-based workflows with a 15-minute TTL |
| Issuance failure | before_request fails → Flask returns 500 for all agent requests | Wrap issuance in try/except; on failure, log and optionally fall back to rate-limited direct Stripe call; alert on error rate |
| Revocation failure | teardown_request silently fails → key expires via TTL | TTL is the safety net; revocation failure is non-critical if TTL is short (5 minutes) |
| Concurrent workers sharing KEYBRAKE_TOKEN | All workers use the same Keybrake API token to issue vault keys | Expected — the Keybrake API token is the master credential; vault keys are the per-request credentials. Each worker issues its own vault keys independently. |
Related questions
Can I use the vault key approach with Flask-RESTful or Flask-RESTX?
Yes. Flask-RESTful and Flask-RESTX are both built on top of Flask and use the same request lifecycle hooks. The before_request and teardown_request decorators work identically in both frameworks. Flask-RESTX's resource classes can access g.vault_key from Flask's application context the same way plain Flask view functions do. If you use a Flask-RESTX namespace, you can also scope the before_request hook to specific namespaces using the @ns.before_request decorator, so vault key issuance only happens on routes that actually call vendor APIs.
What if my Flask agent makes multiple tool calls within a single request — does it need multiple vault keys?
No. One vault key per request is the right model. The vault key's spend cap accumulates across all calls made with that key during the request — so if your agent makes three Stripe calls totaling $300 and the cap is $1,000, all three are allowed and the cumulative spend is tracked. You only need multiple vault keys if different tool calls within the same request need different vendor scopes (e.g., some calls go to Stripe, some to Twilio) — in that case, issue a separate vault key per vendor in before_request and store them as g.stripe_vault_key and g.twilio_vault_key.
How do I test vault key integration in Flask unit tests?
Use Flask's test client with the application context to populate g.vault_key directly in tests. Alternatively, create a test configuration that sets USE_VAULT_KEYS = False and falls back to direct Stripe test mode calls — gate the before_request hook on this config flag so tests don't need to reach out to the Keybrake API. For integration tests that verify the vault key enforcement behavior itself (spend cap blocking, endpoint enforcement), use the Keybrake API's test mode, which uses test-mode vault keys that proxy to Stripe's test environment.
Further reading
- FastAPI AI agent API key — the same vault key pattern for FastAPI's async ASGI request model, which handles per-request context differently from Flask's WSGI model.
- LangChain Stripe API key — how to configure a LangChain agent's Stripe tool to use vault keys instead of direct API keys, including CrewAI and AutoGen patterns.
- AI agent API key lifecycle — the four lifecycle phases (issuance, enforcement, expiration, revocation) and how they map to Flask's request lifecycle hooks.
- AI agent credential management — broader credential management architecture for Python-based agent backends including environment variable management and secret rotation.
- Celery AI agent API key — vault key lifecycle for Celery workers used in async Flask agent architectures, where keys must be issued at task start rather than at request start.