Architecture · AI agents · Vendor API gateway · Control plane
The control plane problem: why your AI agent fleet needs a vendor API gateway
When one agent makes one API call, a
.envfile is a perfectly fine credential store. When fifty agents make calls to Stripe, Twilio, and Resend, you have a control plane problem — and a.envfile is not a control plane.
The missing layer in most agent stacks isn't another LLM gateway. It's a proxy that handles what LLM gateways deliberately don't: the vendor API calls your agents make that move real money, with no concept of "how much has this specific run spent so far."
The scale inflection point
There's a specific point in every team's agent journey where the credential model breaks. It doesn't break because of security negligence — it breaks because of a structural mismatch between how SaaS API keys were designed and how autonomous agents actually use them.
SaaS API keys — your Stripe secret, your Twilio auth token, your Resend API key — were designed for server-to-server integrations that are essentially deterministic. A webhook handler fires once per event. A nightly batch job runs at a predictable cadence. A checkout session is initiated by a human clicking a button. In all of these cases, the credential is static because the caller is constrained: by a human action, by a cron schedule, by the event rate of a production system.
Autonomous agents are none of these things. They're goal-directed processes that decide dynamically how many times to call a vendor and in what sequence. A billing agent given a queue of 200 overdue invoices can — and sometimes will — issue 200 charge retries in a four-minute window because it misread the idempotency key semantics. A customer support agent given permission to issue refunds will, under certain failure modes, issue the same refund three times while waiting for confirmation. A data pipeline agent given access to Resend will, when confronted with a timeout, re-queue the same batch send it already dispatched.
The inflection point isn't "you have a lot of agents." It's "you have any agents that call vendor APIs with real money on the other side, with no hard ceiling on how much they can spend in a single run."
Why LLM gateways don't cover this
If you're running a non-trivial agent stack, you're probably already routing LLM traffic through LiteLLM, Portkey, or OpenRouter. These are the right tools for that layer. They speak the OpenAI-compatible protocol, they count tokens, they enforce budget limits in token-months, they provide model fallback logic, and they give you a unified dashboard across providers. They're good at what they do.
What they don't do is see your Stripe traffic. When your agent's charge_customer tool fires, that call goes directly to api.stripe.com — not through LiteLLM. LiteLLM's budget enforcement operates on tokens, not on the dollar value of payment intents. Its audit log records model invocations, not vendor API calls. Its kill-switch rotates the model provider credential, not the Stripe key your billing agent is using.
This isn't a gap in LiteLLM's design — it's a deliberate scope boundary. Token economics and vendor economics are genuinely different problems. Counting tokens requires parsing OpenAI-compatible JSON. Counting Stripe spend requires parsing PaymentIntent.amount from a Stripe response body. Counting Twilio spend requires multiplying message count by carrier rate from Twilio's price field. Counting Resend spend requires knowing their per-email rate and applying it per successful delivery. These parsers are vendor-specific, not schema-generic.
The governance stack we described in The 2026 agent governance stack is deliberately two-layer: an LLM gateway at Layer 1 for token economics, a vendor API gateway at Layer 3 for SaaS API economics. The second layer doesn't replace the first. It covers what the first explicitly doesn't.
Why traditional API gateways get agents wrong
The next tool engineers reach for is Kong, AWS API Gateway, Nginx, or Caddy with reverse proxy config. These are mature, production-grade, and designed exactly for putting a gateway in front of an upstream service. You could theoretically route your agent's Stripe traffic through Kong with a rate-limit plugin.
The problem isn't the infrastructure — it's that the policy primitives don't match the agent risk model.
Traditional API gateways enforce policy on the request: rate limits by IP, authentication by JWT scope, routing by path prefix. These are the right primitives for a service-to-service mesh where the consumers are well-defined, bounded services that don't dynamically decide to retry 47 times.
Agent risks are denominated differently:
- Spend is the risk unit, not requests-per-second. An agent that makes 10 requests per second for 30 seconds is fine if each call costs $0.001 (Twilio domestic SMS). The same RPS for 30 seconds is catastrophic if each call is a $100 PaymentIntent. Kong's rate-limit plugin counts requests. It doesn't parse the amount field from the Stripe response body.
- The credential is shared, not per-caller. Traditional gateways authenticate the request by verifying a JWT that identifies the calling service. For agents, the "calling service" is the same process running 50 concurrent agent runs — they all share one
STRIPE_SECRET_KEY. There's no JWT per agent-run to rate-limit against. You'd need to inject per-run credentials — which is exactly what a vault key proxy does. - Revocation granularity doesn't exist. If you want to stop one agent run without stopping all 49 other concurrent runs, Kong has no mechanism for it. You'd have to block by the IP of the running process — which, in a serverless or container-scheduled environment, is the same IP as all the other runs.
The four properties a vendor API gateway for agents must have
The right gateway for agent vendor calls needs four properties that neither LLM gateways nor traditional gateways provide:
Property 1
Short-lived, per-run credentials
The agent should never hold the real vendor API key. It should hold a vault key — a token that the gateway maps to the real key and that expires when the agent run ends (or when you revoke it explicitly). Per-run credential issuance means a stuck run can be stopped without touching the 49 runs that are working correctly, without rotating the master credential, and without a propagation delay. The vault key is the fundamental unit of control.
Property 2
Per-session spend tracking with vendor-specific cost parsing
The gateway must parse the cost of each call from the vendor's response — not from a fixed pricing table, but from the actual response fields. Stripe returns amount and currency on PaymentIntent objects. Twilio returns price on Message resources. Resend charges per delivery at a fixed rate. These parsers are vendor-specific and must be maintained as vendor APIs evolve. The gateway accumulates spend per vault key and enforces a cap before forwarding the next request if the cap would be exceeded.
Property 3
Endpoint allowlisting at the policy layer, not the infrastructure layer
An agent given permission to create invoices shouldn't be able to delete payment methods. This sounds obvious but it's hard to enforce with a static API key, even a Stripe Restricted Key — because Restricted Keys define allowed endpoints at issuance time, not at runtime per-agent. The gateway needs to enforce allowed_endpoints from a per-vault-key policy, evaluated on every request, before forwarding. An agent trying to call DELETE /v1/payment_methods when its policy only allows POST /v1/invoices gets a 403 from the gateway before Stripe ever sees the request.
Property 4
Sub-second revocation with no collateral damage
When something goes wrong, the correct response is to revoke the vault key for the specific run. Not rotate the master Stripe secret (which takes 2–5 minutes to propagate through Stripe's infrastructure and immediately breaks every other consumer). Not kill the agent process (which may have already queued work on other workers). Just revoke the vault key: one API call, one row update in the key store, enforced on the next forwarded request from that run. The full reasoning on why rotation is not an emergency stop is in the rotate vs revoke playbook.
What the architecture looks like in practice
The gateway sits as a thin reverse proxy between your agent's tool functions and the vendor APIs. The agent code changes in exactly two ways: the base URL changes from the vendor's domain to the proxy's domain, and the API key changes from the static vendor key to a short-lived vault key issued for this run.
Before (direct vendor calls): Agent tool → api.stripe.com/v1/payment_intents (STRIPE_SECRET_KEY) Agent tool → api.twilio.com/2010-04-01/Messages (TWILIO_AUTH_TOKEN) Agent tool → api.resend.com/emails (RESEND_API_KEY) After (via vendor API gateway): Agent tool → proxy.keybrake.com/stripe/v1/payment_intents (vault_key_abc) Agent tool → proxy.keybrake.com/twilio/Messages (vault_key_abc) Agent tool → proxy.keybrake.com/resend/emails (vault_key_abc) Gateway per-request flow: 1. Authenticate: vault_key_abc → look up policy 2. Check endpoint: allowed? → forward or 403 3. Check spend: (today_spend + this_call_estimate) ≤ daily_usd_cap? → forward or 402 4. Swap credential: vault_key_abc → real VENDOR_KEY for upstream call 5. Parse response cost: amount/price/fixed_rate → update spend counter 6. Log: run_id, vendor, endpoint, status, cost_usd, latency_ms
The vault key holds the policy. You issue it at the start of each agent run with the parameters for that run:
# Issue a vault key before the agent run starts
curl -X POST https://proxy.keybrake.com/v1/keys \
-H "Authorization: Bearer $KEYBRAKE_API_KEY" \
-d '{
"vendor": "stripe",
"daily_usd_cap": 500,
"allowed_endpoints": ["/v1/payment_intents", "/v1/invoices"],
"expires_in": 3600
}'
# Response
{
"vault_key": "vault_key_abc123xyz",
"expires_at": "2026-06-04T19:00:00Z",
"policy": {
"vendor": "stripe",
"daily_usd_cap": 500,
"allowed_endpoints": ["/v1/payment_intents", "/v1/invoices"]
}
}
The agent run uses this vault key instead of the Stripe secret. When the run ends — successfully or not — the vault key is revoked. If something goes wrong mid-run, revocation is one API call:
# Revoke immediately — takes effect on the next request from this run
curl -X DELETE https://proxy.keybrake.com/v1/keys/vault_key_abc123xyz \
-H "Authorization: Bearer $KEYBRAKE_API_KEY"
# → 204 No Content. Next forwarded request from this vault key gets 401.
The critical implementation detail: parsing cost from the response
Most engineers, when they sketch this architecture, focus on the credential layer and the policy layer. The part they underestimate is the cost parser.
To enforce a daily USD cap, the gateway needs to know how much each request costs. This is vendor-specific in ways that aren't obvious:
- Stripe: The cost of a
POST /v1/payment_intentsis in the response body —amountdivided by 100 (cents to dollars) times thecurrencyconversion if non-USD. But only ifstatusissucceededorrequires_capture. Apayment_intent.createthat returnsrequires_actionhasn't moved money yet. Getting this wrong — either over-counting or under-counting — makes your spend cap unreliable. - Twilio: The cost of a
POST /2010-04-01/Messagesis in thepricefield — but only once the message transitions todelivered, not when it's queued. For spend cap enforcement, you either use Twilio's per-SMS rate table (approximate but real-time) or poll the message status callback (accurate but delayed). - Resend: Fixed per-email rate against your plan. Simpler to parse — count successful 200s, multiply by rate — but requires knowing what plan the account is on.
This is why a vendor API gateway for agents can't be a generic reverse proxy with a rate-limit plugin. The cost semantics are baked into each vendor's data model. You need vendor-specific parsers, and you need to maintain them as vendor APIs evolve.
For a detailed breakdown of what a minimal but production-appropriate implementation looks like, see the full AI agent API gateway implementation guide — it covers the Node.js proxy handler, the SQLite vault key store, all three vendor cost parsers, and the build-vs-buy decision matrix.
When to build it yourself vs use a managed proxy
The minimal self-hosted version — SQLite vault key store, three vendor parsers, Node.js proxy handler — takes about four hours to build to a point where it can handle production traffic with reasonable reliability. The operational burden after that is maintenance: keeping cost parsers current as vendor APIs update, handling the edge cases in multi-currency cost attribution, monitoring the proxy latency (should be under 5ms for a well-implemented pass-through).
Build it yourself if:
- You need to add vendors not covered by managed services (industry-specific APIs, internal services).
- You're in a compliance environment where traffic can't leave your infrastructure even with a self-hosted option.
- You want to extend the policy model with custom logic (e.g., routing spend to different internal cost centers by agent type).
Use a managed proxy (like Keybrake) if:
- Your vendors are Stripe, Twilio, and Resend — the three most common agent-facing SaaS APIs with parseable cost signals.
- You don't want to own the parser maintenance when Stripe updates their API versioning or Twilio changes their pricing response format.
- You want the vault key management, audit log, and dashboard without building a separate admin surface.
The architecture is the same in both cases. The tradeoff is operational surface, not technical approach.
The control plane principle
The underlying principle here is older than AI agents. It's the same reason Kubernetes has a control plane separate from worker nodes, why AWS IAM issues short-lived session credentials rather than long-lived root keys, and why Vault exists as a dedicated secrets manager rather than a config file.
When you have a process that's going to make decisions autonomously about how to use a resource — compute, money, communication channels — you need a separate control plane that can observe and constrain those decisions without participating in them. The process and the control plane should be able to operate independently, including in failure modes. The control plane's job is specifically to be the limiting factor that the process cannot override.
For AI agents calling SaaS vendor APIs, that control plane is a vendor API gateway with vault key issuance, per-run spend enforcement, and sub-second revocation. The agent code doesn't know or care that the gateway exists — it just uses the vault key it was issued, the same way it would use any other API credential. The control plane operates at a level the agent code can't see and can't bypass.
If you have agents calling LLMs, you have an LLM gateway. If those same agents call Stripe, Twilio, or Resend, the control plane is incomplete without the second layer. That's the gap. This post is about the gap.
Keybrake closes the vendor API gap
Vault key issuance, per-run spend caps, endpoint allowlisting, and a per-call audit log for Stripe, Twilio, and Resend. Two-line agent code change. Join the waitlist for v1 access.