AI agents · access control · API key security

AI agent access control: why static API scopes aren't enough for autonomous agents

Access control for AI agents is fundamentally different from access control for human users or traditional services. Static RBAC and API scopes answer who can call an API and which endpoints they can reach — but autonomous agents introduce two requirements those tools can't meet: per-run spend caps (how much can this specific agent execution charge before it's stopped?) and per-run revoke (how do I stop this specific agent run without affecting other running agents?). This page maps the three-layer access control model for AI agents and shows where static scopes end and where per-run enforcement begins.

TL;DR

AI agent access control has three layers: authentication (which agent is calling?), authorization (which endpoints can it call?), and enforcement (how much can it spend, and can you stop it mid-run?). Static API keys and OAuth scopes cover the first two layers but leave the third entirely unaddressed. Per-run vault keys issued at the start of each agent execution close the enforcement gap: each vault key carries a per-run dollar cap, an endpoint allowlist, a TTL, and a per-run revoke path. Static credentials handle identity; vault keys handle runtime enforcement.

The three-layer access control model for AI agents

Layer	Question it answers	Traditional tool	Limitation for AI agents
1. Authentication	Who is this agent? Is it authorized to call at all?	API keys, OAuth client credentials, mTLS certificates	Identifies the service, not the specific execution. All runs of the same agent share the same credential — you can't distinguish run A from run B at the authentication layer.
2. Authorization	Which resources and endpoints can this agent call?	OAuth scopes, Stripe Restricted Keys, AWS IAM policies, RBAC roles	Scopes are static and shared across all runs. Granting a Stripe Restricted Key `payment_intents:write` scope lets every run of the agent create payment intents without limit. Scopes control endpoint access, not per-run usage volume.
3. Enforcement	How much can this specific run charge? Can it be stopped?	No standard tool fills this layer for AI agents	Cloud billing alarms fire after spend (8-48h lag). Vendor rate limits cap per-second velocity, not per-run dollars. Secrets rotation is account-wide and blunt. No native tool provides per-run pre-call enforcement.

Why static API scopes don't provide enforcement

A Stripe Restricted Key with payment_intents:write scope tells Stripe: "any request arriving with this key is authorized to create payment intents." It does not tell Stripe: "stop accepting requests from this key once the total charges created in this session exceed $500." The key is account-level and permanent — it cannot be scoped to a single agent run's budget.

OAuth scopes share the same limitation. An access token with transactions:write scope allows the bearer to write transactions until the token expires. The scope encodes endpoint authorization (what) but has no mechanism for runtime budget enforcement (how much). Adding more granular scopes (read vs. write, by resource type) is a horizontal expansion of authorization but does not add a vertical budget cap.

RBAC policies (AWS IAM, GCP IAM, Kubernetes RBAC) express: "subject X can perform action Y on resource Z." None of those three dimensions represents dollar spend. An IAM policy can grant s3:PutObject but cannot say "stop once cumulative storage costs reach $100 this month per agent execution."

What enforcement actually requires for AI agents

Effective access control enforcement for AI agents requires four properties that static scopes cannot provide:

Per-run identity — each agent execution is a distinct principal, not just one instance of a shared service identity. "Agent billing-pipeline, run ID abc-123" is a different enforcement subject than "Agent billing-pipeline, run ID def-456" even if both use the same codebase and same Stripe account.
Pre-call dollar cap — the cap fires before the vendor API call is forwarded, not after the charge lands on your billing statement. A $500 cap must block the $501st dollar of spend in real time, not send an alert 8 hours later.
Sub-second revoke — stopping a specific agent run must take effect within a single request cycle, not after Secrets Manager propagation, not after container recycling, not after a rate limit resets.
Per-call audit with run context — each vendor API call in the audit log must carry the agent run ID, so you can reconstruct exactly what a specific execution charged after an incident.

The enforcement gap in practice

Consider a LangChain agent that calls Stripe in a tool. The agent's Stripe Restricted Key has payment_intents:write scope — it can create payment intents. The key was scoped correctly by the access control team: it can't access refunds, customers, or subscription data. But:

The LangChain agent enters a reasoning loop and calls create_payment_intent() 200 times before context window exhaustion. Each call creates a new charge. The Restricted Key's scope says nothing about how many times the tool can fire.
An incident is detected. To stop the agent, the team rotates the Restricted Key in the Stripe dashboard. This revokes access for every agent, every environment, and every service using that key — collateral damage across production, staging, and other agent deployments.
A customer reports a double charge. The team opens Stripe's dashboard and sees 200 events from the same key. There's no way to filter by agent run ID — Stripe's logs record the key identifier and timestamp, not the agent's run_id from the LangChain runner.

All three failures occur despite correct Tier 1 (authentication) and Tier 2 (authorization) controls. The Restricted Key is scoped correctly. The agent is authenticated. The problem is exclusively at Tier 3: no per-run spend cap, no per-run revoke, no per-run audit log.

Per-run vault keys as the enforcement layer

import httpx, os

KEYBRAKE_BASE = "https://proxy.keybrake.com"
KEYBRAKE_ADMIN_KEY = os.environ["KEYBRAKE_ADMIN_KEY"]

def start_agent_run(run_id: str, budget_usd: float) -> str:
    """Issue a per-run vault key. Call this once, at run start."""
    resp = httpx.post(
        f"{KEYBRAKE_BASE}/vault/keys",
        headers={"Authorization": f"Bearer {KEYBRAKE_ADMIN_KEY}"},
        json={
            "vendor": "stripe",
            "daily_usd_cap": budget_usd,
            "allowed_endpoints": ["POST /v1/payment_intents"],
            "expires_in": "1h",
            "label": run_id,   # carries run context into every audit log entry
        },
    )
    resp.raise_for_status()
    return resp.json()["vault_key"]

def revoke_agent_run(key_id: str) -> None:
    """Revoke a specific run's key. Effective on the next proxied call."""
    httpx.delete(
        f"{KEYBRAKE_BASE}/vault/keys/{key_id}",
        headers={"Authorization": f"Bearer {KEYBRAKE_ADMIN_KEY}"},
    ).raise_for_status()

# Usage: per-run key, not a shared credential
vault_key = start_agent_run(run_id="billing-run-abc123", budget_usd=200.0)

# Agent tools receive vault_key and use proxy.keybrake.com as the Stripe base URL
agent = BillingAgent(stripe_key=vault_key, stripe_base="https://proxy.keybrake.com/stripe")

The vault key maps to Tier 3: it carries a per-run dollar cap (enforcement), a per-run revoke path (kill switch), and a per-run label (audit correlation). Tiers 1 and 2 remain unchanged — the admin key authenticates the vault key issuance, and the vault key's endpoint allowlist is the authorization scope for the run. The real Stripe Restricted Key stays in Keybrake; the vault key presented to the agent is revocable without affecting any other run.

Access control comparison: static scopes vs. per-run vault keys

Property	Stripe Restricted Key	Per-run vault key (Keybrake)
Endpoint authorization	Yes — scope to specific API methods	Yes — allowed_endpoints allowlist per key
Per-run dollar cap	No — no spend limit per key issuance	Yes — daily_usd_cap enforced pre-call
Per-run revoke	No — rotating the key kills all users of it	Yes — DELETE one key, one run stops
Per-call audit with run context	No — Stripe logs by key, not run_id	Yes — label field in every audit log entry
Expiry scoped to run	No — Restricted Keys don't auto-expire	Yes — expires_in TTL per vault key
Vendor coverage	Stripe only	Any vendor API (Stripe, Twilio, Resend, etc.)

How Keybrake fits

Keybrake provides the enforcement layer for AI agent access control. Authentication (admin API key, OAuth) and authorization (vault key endpoint allowlist, equivalent to scopes) are both present in the vault key issuance call — but the critical additions are the per-run dollar cap, the TTL, and the per-run label. The proxy enforces the cap pre-call: when cumulative spend in a run reaches daily_usd_cap, the next request returns 429 before reaching Stripe. The label field ties every audit log entry to the agent run ID — one query by label reconstructs the full cost of any specific run. The DELETE endpoint revokes a single run's key without touching any other run's credential.

Get early access

Related questions

How does this relate to zero-trust security for AI agents?

Zero-trust assumes no implicit trust based on network location — every request must be authenticated and authorized. For AI agents, zero-trust's "authenticate every request" principle means per-run credentials, not shared long-lived keys. Per-run vault keys implement zero-trust at the request level: each agent execution is authenticated by its vault key, and authorization is scoped to that execution's endpoint allowlist and spend cap. The Keybrake proxy is the policy enforcement point (PEP) in zero-trust terms — it evaluates every request against the vault key's policy before forwarding it. See AI agent zero-trust for a full mapping of zero-trust principles to agent execution patterns.

How do per-run vault keys interact with multi-agent systems?

In multi-agent systems (orchestrator + sub-agents, CrewAI teams, LangGraph multi-actor), each agent should have its own vault key. An orchestrator that spawns sub-agents issues a vault key for each sub-agent's budget — typically a fraction of the orchestrator's total budget. A sub-agent that exceeds its cap is stopped without stopping the orchestrator or other sub-agents. The orchestrator's own vault key covers its direct vendor calls (orchestration overhead). This per-agent-per-run model provides attribution: you can query Keybrake's audit log by sub-agent label to see each agent's individual contribution to the run's total spend.

How is this different from what OPA (Open Policy Agent) provides?

OPA evaluates policy at request time against structured policy documents (Rego). It can express complex authorization rules: "agent X can call endpoint Y if attribute Z is true." OPA is an excellent tool for Tier 2 (authorization) — it can encode nuanced allowlist policies. But OPA is stateless: it evaluates each request independently. Cumulative spend caps require stateful counters that accumulate across requests for the same run — OPA's Rego evaluation doesn't natively maintain counters across requests. Enforcing a $500 per-run cap requires a stateful service that tracks how much the run has spent so far, not a per-request policy evaluation. Keybrake's atomic counter per vault key is that stateful enforcement layer. OPA and Keybrake are complementary: OPA for complex authorization rules, Keybrake for stateful cumulative spend enforcement.