Agent security · API key management · Production ops
AI agent API key best practices: the 7-control checklist for production
Most API key guidance is written for humans — developers who rotate keys on a schedule, notice spend on a weekly billing review, and can be reached on Slack when something goes wrong. AI agents don't follow that model. They can issue thousands of API calls in the time it takes a human to open a browser tab. Here's the checklist that accounts for that difference.
TL;DR
Seven controls: restricted scope (endpoint allowlist), per-agent keys (not shared), per-run expiry (auto-rotate at run end), pre-call spend cap (not a billing alert), sub-second revoke (not a vendor rotation), queryable audit log (with agent context, not just request IDs), and no key in agent context (vault key pattern). Most production agents hitting money-moving APIs need all seven. For Stripe, Twilio, and Resend specifically, the vendor-native controls cover controls 1–2 partially but miss controls 3–7 entirely.
Why standard key hygiene isn't enough for agents
Standard key hygiene: use restricted keys, rotate quarterly, store in a secrets manager, never commit to git. That's the right baseline for human-operated services. It falls short for agents for three reasons:
- Agents loop. A human developer making an API call makes one, checks the result, makes the next. An agent in a stuck reasoning loop can make the same call hundreds of times in a minute. The blast radius of a misconfigured key is much larger.
- Agents are multi-instance. A scheduled agent, a concurrent web request handler, and a background job can all hold the same key simultaneously. Revoking the key to stop one breaks all three.
- Agents are opaque. When a developer calls an API, the call shows up in logs with the developer's IP and user agent. When an agent calls an API, you see the agent process's IP — no agent identity, no run ID, no way to join the vendor's log to your application log without additional instrumentation.
Control 1 — Restricted scope (endpoint allowlist)
The agent should only be able to call the endpoints it actually needs. For a billing agent: payment_intents.create and customers.retrieve. Not refunds.create. Not account.update. Not payouts.create.
Vendor support: Stripe restricted keys support endpoint-level restrictions. Twilio AuthTokens and Resend API keys do not — they're all-or-nothing per account.
Gap: endpoint restriction is path-level, not parameter-level. You can restrict to POST /v1/payment_intents but you can't say "only for customer IDs in this list." A proxy-enforced allowlist can add parameter-level checks the vendor's native restriction doesn't support.
Control 2 — Per-agent keys (not shared)
Each agent role should have its own key. Billing Agent gets a billing key. Refund Agent gets a refund key. Notification Agent gets a notification key.
Why: if the Refund Agent is compromised, you revoke the refund key. The Billing Agent keeps running. If they share a key, you've taken down two production paths to stop one incident.
In practice: most frameworks default to one key per application, not one key per agent. This requires intentional setup — separate Stripe restricted keys or separate vault keys per agent role.
Control 3 — Per-run expiry
Long-lived keys accumulate exfiltration risk. An agent run that completes in 30 minutes doesn't need a key that's valid for 90 days. Issue the key with an expires_in matching the expected run duration; the key is automatically inactive after the run ends.
Vendor support: Stripe, Twilio, and Resend don't support key expiry natively — keys are valid until manually revoked. Per-run expiry requires a proxy that issues ephemeral credentials mapped to the real key.
Control 4 — Pre-call spend cap (not a billing alert)
A billing alert fires after spend has already happened — typically 1–24 hours after the threshold is crossed. A spend cap blocks the call before it goes to the vendor. For a stuck loop, the difference is the entire blast radius beyond the cap.
Vendor support: Twilio has a spend threshold alert (after-the-fact). Stripe has Radar Rules (fraud product, not designed for this). Resend has no spend control at all. Pre-call enforcement requires a proxy.
What to set the cap to: 2–3× the expected daily spend for the agent. High enough not to trigger on legitimate peak usage; low enough that a stuck loop hits it before it's a billing problem.
Control 5 — Sub-second revoke
Revoking a Stripe key takes 1–5 minutes to propagate. At a stuck loop's call rate (hundreds of calls/minute), that's hundreds of additional calls after you clicked Delete. Sub-second revoke requires either a proxy (which can mark the vault key inactive instantly) or a circuit-breaker flag (which requires your code to check it before every call).
Why it matters: incident response at 2am is already hard. Knowing that your revoke takes effect in one second instead of three minutes changes the playbook from "minimize damage" to "prevent damage."
Control 6 — Queryable audit log with agent context
After an incident, you need to answer: which agent made this call, in which run, at what time, for which amount, and did it hit any policies? Stripe's request log gives you request_id, timestamp, and response status. It doesn't know about your agent_run_id or agent_name.
A proxy audit log adds those columns. The query that matters most:
SELECT ts, endpoint, amount_usd, policy_check_result
FROM calls
WHERE metadata->>'agent_run_id' = 'run_xyz'
AND policy_check_result != 'allowed'
ORDER BY ts DESC;
This shows you the exact moment the agent started hitting policy limits, which call triggered enforcement, and the total spend up to that point.
Control 7 — No real key in agent context
The agent should never see the real Stripe secret key. If the agent is a multi-step reasoning loop with tool calls, and the reasoning context is logged (for debugging), or the LLM provider stores it (for safety review), or the context is accidentally returned in an error message — the real key is now in external storage.
The vault key pattern ensures the agent only ever holds a vault_key_xxx — a per-agent, per-run, scoped credential that maps to the real key only inside the proxy. The real key is held in a secrets manager, never exposed to the agent runtime.
The 7-control checklist
| Control | Stripe native | Twilio native | Resend native | Vault key proxy |
|---|---|---|---|---|
| 1. Restricted scope | Partial (path-level) | No | No | Yes (path + param) |
| 2. Per-agent keys | Manual (create multiple) | Manual | Manual | Yes (issue per agent role) |
| 3. Per-run expiry | No | No | No | Yes (expires_in field) |
| 4. Pre-call spend cap | No | No (alert only) | No | Yes (daily_usd_cap) |
| 5. Sub-second revoke | No (1-5 min propagation) | No (30s-2 min) | Yes (instant) | Yes (<1s) |
| 6. Agent-context audit log | No (request_id only) | No | No | Yes (agent_run_id, metadata) |
| 7. No real key in agent | Depends on how you use it | Depends | Depends | Yes (vault_key only) |
How Keybrake fits
Keybrake is the proxy that closes controls 3–7 for Stripe, Twilio, and Resend. You issue vault keys from the Keybrake dashboard, attach the policy you need, and the agents call proxy.keybrake.com/<vendor> instead of the vendor directly. Controls 1–2 (restricted scope, per-agent keys) are handled at the vault key policy level — each vault key specifies its own endpoint allowlist. The real secret keys live in Keybrake's secrets store, not in your agent's environment.
Related questions
Is this overkill for a simple scheduled job that calls Stripe once per hour?
For a scheduled job that runs at a known cadence and calls a single known endpoint, controls 1 and 2 (restricted scope, dedicated key) are probably enough. The elevated-risk scenario is an agent with a reasoning loop or multi-step plan — one where the agent decides how many times to call an API, not a timer. If your Stripe calls are agent-driven rather than schedule-driven, all seven controls are relevant.
Do I need all seven controls before going to production?
For read-only agents (reporting, data retrieval): controls 1 and 2 are the priority; the others are nice-to-have. For write agents that charge customers or send messages: controls 1, 2, 4, and 5 are the minimum; 3, 6, and 7 complete the hardened setup. In practice, the vault key pattern (Keybrake) implements controls 3–7 automatically once you've done 1–2 at the policy configuration level — so "add the proxy" is a single step that covers most of the checklist.
Where does LiteLLM fit in this checklist?
LiteLLM covers the same seven controls but for LLM inference endpoints (OpenAI, Anthropic, Gemini, etc.) — not for Stripe, Twilio, or Resend. The two proxies are complementary: LiteLLM caps your model spend; Keybrake caps your vendor-API spend. An agent that uses both has a spend cap on every outbound call, LLM and SaaS tool alike. See our LiteLLM + Stripe page for the side-by-side scope comparison.
Further reading
- Stripe API key with restricted access — the 10-control coverage matrix for Stripe's native restriction feature.
- AI agent kill switch patterns — the four stop mechanisms, with real latency numbers for controls 4 and 5.
- AI agent audit trail — the audit log schema for control 6, including the SQL queries that matter post-incident.
- How to give an AI agent a Stripe API key — the complete 5-step guide that covers the setup side of this checklist.