API key management · Agent security · Incident response
AI agent API key rotation: when to rotate vs. when to revoke
When an AI agent goes rogue or a key is suspected compromised, you have two options: rotate the production key (new key, old one deprecated, 30–90 second propagation window where calls fail) or revoke the agent's vault key (sub-second, no production change). They solve different problems. Getting the choice wrong costs either downtime or blast radius.
TL;DR
Rotate the production key when: the key itself was exposed (logged, committed to git, seen in a breach). Revoke the agent vault key when: the agent is behaving incorrectly but the underlying key is still private. Most 3am "my agent is doing something weird" incidents call for revocation, not rotation. The proxy-based vault key pattern gives you sub-second agent-specific revocation without touching the production key at all.
The rotation timeline you need to understand
Key rotation sounds instant. It isn't. Here's what actually happens when you rotate a Stripe restricted key:
- Create new key (seconds) — Stripe generates the new key; you copy it.
- Deploy new key to all consumers (1–5 min) — update env vars, redeploy every service using the key, wait for the deployment to propagate.
- Deprecate old key (seconds) — mark the old key for deletion in Stripe Dashboard.
- Old key stops working (typically immediate, but CDN/proxy caches can add 30–60 seconds).
During steps 2–4, any service still using the old key gets 401s. If your agent is running a long task, it errors mid-run. If the agent is handling customer-facing requests, those requests fail. The total blast from "I notice the agent is misbehaving" to "the key is rotated and deployed" is typically 3–10 minutes with a well-practiced runbook, and 15–30 minutes at 3am when you're doing it for the first time.
The revocation timeline (with vault keys)
When the agent holds a vault key rather than the production key, revocation is a one-call operation:
curl -X DELETE https://api.keybrake.com/vault-keys/vault_key_abc \
-H "Authorization: Bearer YOUR_KEYBRAKE_KEY"
# Response: {"status": "revoked", "effective_at": "2026-06-01T03:42:11Z"}
The proxy marks the vault key as revoked. The next API call from the agent — the very next one, milliseconds later — gets a 403 with a vault_key_revoked error code. The agent receives this as a tool error and (if well-written) stops and reports the failure. The production Stripe key is untouched. Every other service keeps working. Total time from decision to effect: under 5 seconds, including the time to open your dashboard.
Decision table: rotation vs. revocation
| Scenario | Right action | Why |
|---|---|---|
| Agent is stuck in a loop, burning spend | Revoke vault key | Key hasn't been exposed; you just need to stop this agent. Production key stays intact. |
| Key was committed to a public git repo | Rotate production key | The key itself is exposed. Any existing vault keys under it are also compromised — rotate, then re-issue vault keys with the new production key. |
| Agent responded strangely to a suspicious prompt | Revoke vault key first, investigate | Fastest way to stop potential blast. Investigate whether the key was actually exfiltrated before deciding whether to also rotate. |
| Scheduled compliance rotation (90-day cycle) | Rotate production key | This is a policy rotation, not an incident. Zero-downtime rotation (see below) lets you rotate without service interruption. |
| Agent's vault key was seen in application logs | Revoke vault key | The vault key is exposed, not the production key. Revoke the specific vault key; production key stays intact. |
| Third-party vendor breach included your API key | Rotate production key immediately | Assume full key exposure. Revoke all active vault keys, rotate, re-issue. |
Zero-downtime rotation (when you must rotate the production key)
When you do need to rotate the production key — because it was exposed — the proxy layer makes it zero-downtime:
- Create new production key in Stripe Dashboard.
- Add new key to Keybrake as the upstream for your Stripe vendor config. This takes effect on the next proxied request — no agent code changes.
- Verify proxied calls are working with the new key (check the audit log for successful responses).
- Remove old key from Keybrake and deprecate it in Stripe. No deployments required; all vault keys keep working because they reference the proxy, not the production key.
This is the structural advantage of the proxy pattern: the production key is a configuration detail at the proxy, not a secret distributed across every service that uses it. Rotating the production key is a 2-minute operation, not a 15-minute deployment.
Ephemeral vault keys: rotation by default
The best rotation strategy for agents is one where you never have to manually rotate: issue vault keys that expire automatically at the end of each run. An agent run that is expected to take 30 minutes gets a vault key that expires in 45 minutes:
{
"vendor": "stripe",
"daily_usd_cap": 100,
"allowed_endpoints": ["POST /v1/payment_intents", "GET /v1/customers/*"],
"expires_in": "45m",
"agent_run_id": "checkout_run_abc123"
}
When the run ends normally, the vault key expires harmlessly. If the run is killed mid-way, the vault key expires on its own schedule — the agent can't use it for a future run because it didn't exist before this run and won't exist after it ends. There is no key to rotate; the key's lifetime IS the run's lifetime.
How Keybrake fits
Keybrake provides the vault key layer: issue keys with per-run expiry, revoke them in one click or one API call, and rotate the underlying production key in the proxy config without touching agent code. The Free tier covers 1,000 proxied requests/month; the Hobby tier ($29/month) adds all vendors, 30-day audit retention, and webhook alerts when a vault key is revoked.
Related questions
How often should I rotate production API keys for agents?
If you're using vault keys (ephemeral, per-run), the production key rotation cycle is a compliance decision rather than an operational one — 90 days is common for SOC 2 compliance. If agents hold production keys directly (no proxy), rotate every 30 days and on any suspected exposure. The more agents sharing a key, the more important rotation becomes: a single compromised context window can exfiltrate a key that was handed to it as a tool parameter.
What happens to in-flight agent calls when I revoke a vault key?
The next API call from the agent after the revocation takes effect returns a 403. Calls that are currently in-flight (the HTTP request is already sent to the proxy) complete normally — revocation affects the next call, not the one being processed at the moment you hit revoke. This is a one-request window of latency, typically under 100ms. If you need truly atomic revocation, use the hard_revoke flag, which drops in-flight connections as well — useful for suspected active exfiltration.
Can I rotate a vault key without the agent noticing?
Not in the traditional sense (vault keys are not rotated, they're revoked and re-issued). But you can issue a new vault key with identical policy and pass it to the agent as a dependency. The agent calls the new key; the old key expires on its schedule. From the agent's perspective, nothing changed — it just has a new key string in its context. This is the "rolling" vault key pattern: useful for long-running agents where you want to change the policy mid-run without stopping the agent.
Further reading
- AI agent kill switch patterns — four patterns for stopping a runaway agent, their latencies, and when to use each.
- Rotate vs. revoke: the 2am incident playbook — step-by-step guide for the 10 minutes after you realize something is wrong.
- AI agent audit trail — the log schema that makes post-incident forensics possible after a rotation or revocation event.
- AI agent API key best practices — the complete checklist: scoping, issuance, rotation, revocation, and audit for every SaaS key your agent holds.