API key management · Agent security · Incident response

AI agent API key rotation: when to rotate vs. when to revoke

When an AI agent goes rogue or a key is suspected compromised, you have two options: rotate the production key (new key, old one deprecated, 30–90 second propagation window where calls fail) or revoke the agent's vault key (sub-second, no production change). They solve different problems. Getting the choice wrong costs either downtime or blast radius.

TL;DR

Rotate the production key when: the key itself was exposed (logged, committed to git, seen in a breach). Revoke the agent vault key when: the agent is behaving incorrectly but the underlying key is still private. Most 3am "my agent is doing something weird" incidents call for revocation, not rotation. The proxy-based vault key pattern gives you sub-second agent-specific revocation without touching the production key at all.

The rotation timeline you need to understand

Key rotation sounds instant. It isn't. Here's what actually happens when you rotate a Stripe restricted key:

Create new key (seconds) — Stripe generates the new key; you copy it.
Deploy new key to all consumers (1–5 min) — update env vars, redeploy every service using the key, wait for the deployment to propagate.
Deprecate old key (seconds) — mark the old key for deletion in Stripe Dashboard.
Old key stops working (typically immediate, but CDN/proxy caches can add 30–60 seconds).

During steps 2–4, any service still using the old key gets 401s. If your agent is running a long task, it errors mid-run. If the agent is handling customer-facing requests, those requests fail. The total blast from "I notice the agent is misbehaving" to "the key is rotated and deployed" is typically 3–10 minutes with a well-practiced runbook, and 15–30 minutes at 3am when you're doing it for the first time.

The revocation timeline (with vault keys)

When the agent holds a vault key rather than the production key, revocation is a one-call operation:

curl -X DELETE https://api.keybrake.com/vault-keys/vault_key_abc \
  -H "Authorization: Bearer YOUR_KEYBRAKE_KEY"
# Response: {"status": "revoked", "effective_at": "2026-06-01T03:42:11Z"}

The proxy marks the vault key as revoked. The next API call from the agent — the very next one, milliseconds later — gets a 403 with a vault_key_revoked error code. The agent receives this as a tool error and (if well-written) stops and reports the failure. The production Stripe key is untouched. Every other service keeps working. Total time from decision to effect: under 5 seconds, including the time to open your dashboard.

Decision table: rotation vs. revocation

Scenario	Right action	Why
Agent is stuck in a loop, burning spend	Revoke vault key	Key hasn't been exposed; you just need to stop this agent. Production key stays intact.
Key was committed to a public git repo	Rotate production key	The key itself is exposed. Any existing vault keys under it are also compromised — rotate, then re-issue vault keys with the new production key.
Agent responded strangely to a suspicious prompt	Revoke vault key first, investigate	Fastest way to stop potential blast. Investigate whether the key was actually exfiltrated before deciding whether to also rotate.
Scheduled compliance rotation (90-day cycle)	Rotate production key	This is a policy rotation, not an incident. Zero-downtime rotation (see below) lets you rotate without service interruption.
Agent's vault key was seen in application logs	Revoke vault key	The vault key is exposed, not the production key. Revoke the specific vault key; production key stays intact.
Third-party vendor breach included your API key	Rotate production key immediately	Assume full key exposure. Revoke all active vault keys, rotate, re-issue.

Zero-downtime rotation (when you must rotate the production key)

When you do need to rotate the production key — because it was exposed — the proxy layer makes it zero-downtime:

Create new production key in Stripe Dashboard.
Add new key to Keybrake as the upstream for your Stripe vendor config. This takes effect on the next proxied request — no agent code changes.
Verify proxied calls are working with the new key (check the audit log for successful responses).
Remove old key from Keybrake and deprecate it in Stripe. No deployments required; all vault keys keep working because they reference the proxy, not the production key.

This is the structural advantage of the proxy pattern: the production key is a configuration detail at the proxy, not a secret distributed across every service that uses it. Rotating the production key is a 2-minute operation, not a 15-minute deployment.

Ephemeral vault keys: rotation by default

The best rotation strategy for agents is one where you never have to manually rotate: issue vault keys that expire automatically at the end of each run. An agent run that is expected to take 30 minutes gets a vault key that expires in 45 minutes:

{
  "vendor": "stripe",
  "daily_usd_cap": 100,
  "allowed_endpoints": ["POST /v1/payment_intents", "GET /v1/customers/*"],
  "expires_in": "45m",
  "agent_run_id": "checkout_run_abc123"
}

When the run ends normally, the vault key expires harmlessly. If the run is killed mid-way, the vault key expires on its own schedule — the agent can't use it for a future run because it didn't exist before this run and won't exist after it ends. There is no key to rotate; the key's lifetime IS the run's lifetime.

How Keybrake fits

Keybrake provides the vault key layer: issue keys with per-run expiry, revoke them in one click or one API call, and rotate the underlying production key in the proxy config without touching agent code. The Free tier covers 1,000 proxied requests/month; the Hobby tier ($29/month) adds all vendors, 30-day audit retention, and webhook alerts when a vault key is revoked.

Get early access