Agent operations · Incident response

AI agent kill switch: the four patterns, with latency numbers

Your agent is stuck in a loop. You have 10 seconds before it issues refund number 4,300, or sends SMS number 18,000, or emails customer number 340,000. Here are the four kill switches that actually exist, how fast each one really is, and where each one leaks.

TL;DR

Four real kill switches for a running agent: network-level block (cut the path from agent to internet), credential revoke (rotate or delete the API key), circuit-breaker (a flag your own code checks before every outbound call), and human-in-the-loop (every action goes through an approval gate). Their real stop latencies are roughly seconds, 1-8 minutes, next request, never-started — with tradeoffs on leak surface and operational cost that matter more than speed. For money-moving agents, the minimum is credential revoke; the best is a proxy-enforced policy-level kill that combines the speed of a flag with the trust boundary of a real network block.

The rule of 10 seconds

If an agent is stuck in a tight loop calling a money-moving API (Stripe charges.create, Twilio messages, Resend emails), you have about ten seconds before the blast radius crosses "manageable" into "postmortem." That's the window a kill switch has to close. Slower than ten seconds, you've still saved some of the damage, but you're doing cleanup for days.

The four patterns below are ordered by how they actually fit into that ten-second window.

Pattern 1 — network-level block (cut the path)

What it is: you cut the agent's ability to reach the outbound endpoint. Means in practice: kill the agent process, remove its network route (iptables rule, security-group change, container network detach), or take down the upstream DNS. It's the brute-force option — you're not stopping the specific bad call, you're stopping all calls.

Stop latency: near-instant once the block lands. The tricky part is landing it. Killing a process takes seconds; changing a security group on AWS propagates in 10-30 seconds; DNS changes can take minutes to propagate depending on TTL.

Leak surface: in-flight requests still complete. If the agent already fired 20 charges.create calls in parallel before you cut the path, those 20 still go through. For async tasks queued on SQS / Celery / Temporal, the queue keeps draining into the void and the agent thinks everything worked.

Fails when: the agent isn't running on your infrastructure. If it's a Claude-in-browser agent using a third-party MCP endpoint, or a scheduled Lambda, or a LangChain process on a contractor's laptop — you can't cut their network. You have to revoke the credential instead.

When to reach for it: you have direct access to the host, you're comfortable with a blunt instrument, and the alternative is worse.

Pattern 2 — credential revoke

What it is: you invalidate the API key the agent is using. Vendor dashboard → delete key → the next call fails with 401 Unauthorized. Doesn't matter where the agent runs.

Stop latency: this is where people get surprised. Most vendors have a propagation tail — the call that's already past the edge cache keeps working for some window after you click Delete. Measured worst case from our own tests:

Stripe: ~1-3 minutes after rotation for the old key to start failing. The platform documents "key changes are live within minutes." In our runs: median 45s, p95 3m12s.
Twilio: ~30s-2 minutes. Auth tokens update across edge POPs, and Twilio keeps a short cache.
Resend: near-instant — they invalidate synchronously. This is unusual.
OpenAI: 1-5 minutes — longer cache horizon, especially for org keys.

Leak surface: during the propagation window, the agent keeps burning through calls. If the loop does 1 call every 400ms, a 3-minute tail is 450 more calls than the moment you clicked Delete.

Fails when: the agent has multiple keys (one per vendor, one per environment) and you only revoke the one you remembered. Also: rotating the key means every legitimate consumer using that key also breaks — you're causing a self-inflicted outage to stop one agent.

When to reach for it: agent is off-infrastructure, and you accept the propagation tail. First-line response for most incidents.

Pattern 3 — circuit-breaker (a flag your code checks)

What it is: you wrap every outbound call in a check against a "kill flag" stored somewhere both the agent and the operator can reach — a Redis key, a feature flag, a column in your DB. Before each API call, the agent reads the flag; if set, it aborts the call.

Stop latency: next-request-fast. The flag can be flipped in milliseconds; the agent sees it on its next call. If the loop is doing a call every few hundred ms, you stop within one call of the flag flip.

Leak surface: requests already in flight still complete. Agents that batch or pipeline (parallel fan-out to issue 50 refunds at once) will complete all 50. Agents that make long-running calls (video generation, complex Stripe Checkout sessions) finish the current call. Bigger problem: the flag only works if the agent's code actually reads it before every call. One forgotten code path — one SDK wrapper that skips the check — and the kill switch is a lie.

Fails when: the agent code is not yours. Third-party SDKs, MCP servers, Stripe Agent Toolkit — none of them know about your flag. Also: a malicious or misaligned agent can read its own code and decide to skip the check ("I am trying to help the user; I should not abort just because a flag is set"). It's a cooperative kill switch, not an enforced one.

When to reach for it: the agent is yours, the code is yours, and you've added the check at every outbound boundary. Best for fast rollback without credential rotation drama.

Pattern 4 — human-in-the-loop

What it is: every money-moving call routes to an approval queue (Slack message, dashboard notification, email) and only proceeds after a human clicks Approve. This isn't a reactive kill switch — it's a preventative one. The agent can't "run away" because it never runs without supervision.

Stop latency: N/A — requests never started. The "kill" is decline-the-approval.

Leak surface: alert fatigue. If the agent does 80 approvals per day and 78 are trivial, the human starts Approve-all'ing the first 60 without looking, and the two bad calls slip through unnoticed. Also: the operational overhead breaks the point of agents — if every call needs a human, you're just paying more for a slower SaaS.

Fails when: the use case requires throughput. Approvals don't scale to 10,000 calls/day. They also don't scale to async agents that run at 3am.

When to reach for it: high-stakes, low-volume agents. A finance agent issuing ACH transfers. A marketing agent launching a six-figure ad campaign. Not a support agent answering tickets.

Coverage matrix

Pattern	Stop latency	Works off-infra	Enforced (vs cooperative)	Leaks in-flight?
Network block	Seconds (once landed)	No	Enforced	Yes
Credential revoke	1-5 min (propagation)	Yes	Enforced	Yes (for the tail)
Circuit-breaker flag	Next request	If agent reads flag	Cooperative	Yes
Human-in-the-loop	N/A (never starts)	Yes	Enforced (upstream)	No

Proxy-enforced policy — the fifth option, which is really the combination

If you sit a proxy between the agent and the vendor API, the "revoke" button becomes a policy change on the proxy instead of a rotation at the vendor. That collapses the propagation tail from minutes to sub-second (the next packet hitting the proxy gets rejected on the new policy) and gets you the enforced-by-network-boundary property you want from credential revoke. You keep working for other legitimate consumers — they're using different vault keys with different policies — so you don't self-outage. And the audit log records everything that happened up to the revoke, including the call that tripped it.

This is pattern 2 with sub-second propagation and pattern 3 without the cooperative-flag leak. The trade: you added a hop to the request path. In our benchmarks the hop is 6-14ms when the proxy is co-located with the agent's region, 40-80ms cross-region — usually below the noise floor of the vendor's own response time.

An incident playbook for 2am

t=0: you get the page. "Stripe charges up 94x in the last 5 minutes."
t=5s: open the Keybrake dashboard (or wherever your policy lives). Flip the revoke flag for the agent's vault key.
t=10s: the next outbound call from the agent hits the proxy, matches the now-deleted policy, 401s back to the agent. The SDK retry logic tries twice, gives up, the loop errors out.
t=60s: query the audit log: SELECT * FROM calls WHERE vault_key='vault_xyz' ORDER BY ts DESC LIMIT 50. Count the damage, enumerate the customers affected, decide whether to refund.
t=5min: the agent runner (you have one — Temporal, Airflow, something) surfaces the 401 and pages again. Kill the process. Postmortem in the morning.

Without a proxy, that timeline is 3-8 minutes at step 3, and the audit at step 4 is cobbled together from vendor dashboards with no agent_run_id to join on.

How Keybrake fits

Keybrake is a governance proxy for the non-LLM SaaS APIs your agent hits — Stripe, Twilio, Resend. You issue a vault key per agent or per run, attach a policy (daily $ cap, endpoint allowlist, customer allowlist, max-amount-per-call, expires-at), and the agent calls proxy.keybrake.com/<vendor> as if it were the real endpoint. Revoke is policy-level and takes effect on the next packet. Audit log includes every call with parsed cost, agent_run_id, policy-check result, and vendor response. If you ever wanted "revoke at 2am that takes effect by 2:00:01," this is the shape of it.

Get early access

Related questions

Can I build the proxy-enforced version myself?

Yes — it's not a complicated primitive. The hard parts are the vendor-specific stuff: parsing cost from response headers/bodies (different per vendor), handling idempotency-key semantics correctly so retries don't double-charge, and making the policy engine fast enough to not dominate the request hop. The secondary hard part is the audit schema — you want one row per call with enough columns to write the "all calls that hit a cap this week" query in a single SELECT, and that schema is the part that's easy to get wrong on the first try. If you want to own it, the patterns are documented in our Stripe-agent blog post.

Is a circuit-breaker flag ever the right answer on its own?

For internal tools with a single code path you fully control — yes. A Redis-backed flag your own code checks before every outbound call is simple, fast, and costs nothing. It breaks down when the agent uses third-party SDKs, when there are multiple code paths you forgot about, or when the agent is an LLM that could decide to ignore the check. For money-moving calls at production scale, use it in combination with a proxy or credential-revoke path, not alone.

Why not just set a low per-day spend cap on the credential and let it fail hard at the vendor?

Stripe supports amount limits via Radar Rules (fraud product, awkward fit) and nothing native for "max $X per day per key." Twilio has a billing alert but no automatic stop. Resend doesn't expose a per-key cap. So "let the vendor cap you" works in theory for none of our target vendors, and you're stuck building the cap somewhere else — either in your code (pattern 3) or in a proxy (pattern 2 with sub-second revoke, aka us).

How fast does Keybrake's revoke actually take effect?

Sub-second in the median case. Policy state is held in-memory on each proxy instance and flushed to a shared backing store on every change; the flush propagates to peer instances within ~200ms. Worst-case revoke latency is the next request hitting a peer that hasn't received the flush yet — bounded at one second by our current sync cadence. Compared to Stripe's 1-5 minute rotation tail, it's two to three orders of magnitude faster.

Does human-in-the-loop plus a proxy make sense?

Yes, for high-stakes low-volume. Run the agent through the proxy; set policy such that calls over $N or to certain customer IDs hit a "needs-approval" state; the proxy queues them; the human clicks through a Slack-delivered approval UI; the proxy releases. You get preventative kill from pattern 4 and reactive kill from pattern 2+, without building both separately.