Architecture · API gateway · AI agents · Vendor API security
AI agent API gateway: routing, policy enforcement, and spend control for multi-vendor agent calls
An LLM gateway like LiteLLM handles the token budget and model routing for the AI side of your agents. But your agents don't just call LLMs — they call Stripe to charge customers, Twilio to send SMS, Resend to deliver email, and Shopify to fulfill orders. These vendor calls carry real money and real side effects. An AI agent API gateway sits between your agent and these vendor APIs, enforcing per-agent spend caps, scoping credentials to individual runs, logging every call to an audit trail, and providing a kill-switch that works in under one second. This guide covers what that gateway looks like, how to build a minimal version, and when a managed proxy makes more sense than DIY.
TL;DR
An AI agent API gateway is a reverse proxy that: (1) authenticates requests using short-lived vault keys, not long-lived vendor API keys; (2) enforces per-agent policies (spend caps, endpoint allowlists, TTLs); (3) translates vault key requests to real vendor API calls; and (4) logs every request and its cost. The agent never sees the real Stripe or Twilio API key — only the vault key that expires when the agent run ends.
Why agents need a different kind of API gateway
Traditional API gateways (Kong, AWS API Gateway, nginx) are designed for human-facing services: they rate-limit by IP, authenticate users via JWT or OAuth, and route traffic between microservices. They are not designed for the specific risks that autonomous agents introduce:
- Unbounded spend in a tight loop. An agent that encounters an error and retries a Stripe charge can spend thousands of dollars before the next human review. Traditional gateways enforce request-per-second limits, not total dollar spend caps.
- Credential scope creep. Agents use the full vendor API key, which grants access to every endpoint. An agent given permission to create invoices doesn't need — and shouldn't have — permission to delete payment methods or modify webhook endpoints.
- Side-effect attribution. When 50 users' agents are all calling Stripe through the same API key, there is no way to attribute which charge came from which agent run without instrumentation. Traditional access logs don't parse vendor response bodies for cost data.
- Emergency revocation speed. Rotating a vendor API key to stop a runaway agent takes minutes and disrupts all other users' agents. A vault key kill-switch revokes one agent's credential in milliseconds without affecting others.
The AI agent API gateway architecture
The gateway sits at the edge of your vendor API calls:
┌─────────────────────────────────────────────────────────────┐
│ Agent (LLM + tools) │
│ │
│ chargeCustomer({ amount: 100, customer: "cus_xxx" }) │
└──────────────────────────┬───────────────────────────────────┘
│ vault_key_xxx (short-lived, scoped)
▼
┌─────────────────────────────────────────────────────────────┐
│ AI Agent API Gateway (proxy) │
│ │
│ 1. Authenticate: vault_key_xxx → look up policy │
│ 2. Enforce: daily_usd_cap=$500, allowed=/v1/payment_intents │
│ 3. Forward: swap vault key → real STRIPE_SECRET_KEY │
│ 4. Log: request + response + cost (parsed from response) │
│ 5. Revoke: mark vault_key_xxx spent after response returns │
└──────────────────────────┬───────────────────────────────────┘
│ real Stripe API key (never leaves gateway)
▼
api.stripe.com
The agent has the vault key. The gateway has the real vendor API key. The two never meet in the agent's process memory.
Minimal self-hosted gateway implementation
A minimal AI agent API gateway in Node.js requires three components:
1. Vault key store and policy enforcement
// gateway/vault.ts
import Database from "better-sqlite3";
const db = new Database("./data.db");
db.exec(`
CREATE TABLE IF NOT EXISTS vault_keys (
id TEXT PRIMARY KEY,
token TEXT UNIQUE NOT NULL,
vendor TEXT NOT NULL,
allowed_endpoints TEXT NOT NULL, -- JSON array
daily_usd_cap REAL NOT NULL,
daily_usd_spent REAL DEFAULT 0,
expires_at INTEGER NOT NULL,
revoked INTEGER DEFAULT 0
);
CREATE TABLE IF NOT EXISTS audit_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
vault_key_id TEXT NOT NULL,
vendor TEXT NOT NULL,
method TEXT NOT NULL,
path TEXT NOT NULL,
status_code INTEGER,
cost_usd REAL,
request_at INTEGER NOT NULL
);
`);
export function enforcePolicy(
token: string,
method: string,
path: string,
): { valid: true; keyId: string } | { valid: false; reason: string } {
const key = db
.prepare("SELECT * FROM vault_keys WHERE token = ?")
.get(token) as any;
if (!key) return { valid: false, reason: "unknown_token" };
if (key.revoked) return { valid: false, reason: "revoked" };
if (Date.now() / 1000 > key.expires_at) return { valid: false, reason: "expired" };
if (key.daily_usd_spent >= key.daily_usd_cap)
return { valid: false, reason: "cap_exhausted" };
const allowed: string[] = JSON.parse(key.allowed_endpoints);
const pathAllowed = allowed.some((pattern) => {
if (pattern.endsWith("/*")) {
return path.startsWith(pattern.slice(0, -2));
}
return path === pattern;
});
if (!pathAllowed) return { valid: false, reason: "endpoint_not_allowed" };
return { valid: true, keyId: key.id };
}
2. Vendor proxy handler
// gateway/proxy.ts
import http from "http";
import https from "https";
import { enforcePolicy } from "./vault.ts";
const VENDOR_TARGETS: Record<string, { host: string; realKey: string }> = {
stripe: {
host: "api.stripe.com",
realKey: process.env.STRIPE_SECRET_KEY!,
},
twilio: {
host: "api.twilio.com",
realKey: `${process.env.TWILIO_ACCOUNT_SID}:${process.env.TWILIO_AUTH_TOKEN}`,
},
resend: {
host: "api.resend.com",
realKey: process.env.RESEND_API_KEY!,
},
};
export function createProxyServer() {
return http.createServer(async (req, res) => {
// URL pattern: /stripe/v1/payment_intents → vendor=stripe, path=/v1/payment_intents
const match = req.url?.match(/^\/(\w+)(\/.*)/);
if (!match) {
res.writeHead(404).end(JSON.stringify({ error: "unknown_vendor" }));
return;
}
const [, vendor, vendorPath] = match;
const target = VENDOR_TARGETS[vendor];
if (!target) {
res.writeHead(404).end(JSON.stringify({ error: "unsupported_vendor" }));
return;
}
// Extract vault key from Authorization header
const auth = req.headers["authorization"] ?? "";
const token = auth.startsWith("Bearer ") ? auth.slice(7) : "";
const check = enforcePolicy(token, req.method!, vendorPath);
if (!check.valid) {
res.writeHead(check.reason === "cap_exhausted" ? 429 : 401).end(
JSON.stringify({ error: check.reason }),
);
return;
}
// Forward to vendor with real API key
const proxyReq = https.request(
{
hostname: target.host,
path: vendorPath,
method: req.method,
headers: {
...req.headers,
host: target.host,
authorization: `Bearer ${target.realKey}`,
},
},
(proxyRes) => {
res.writeHead(proxyRes.statusCode!, proxyRes.headers);
proxyRes.pipe(res);
// Cost parsing and audit logging happen on response body (vendor-specific)
},
);
req.pipe(proxyReq);
});
}
3. Cost parsing per vendor
Each vendor exposes cost differently — you need vendor-specific parsers:
// Cost parsing per vendor
function parseCost(vendor: string, statusCode: number, responseBody: any): number {
if (vendor === "stripe") {
// Stripe: parse amount from PaymentIntent or Charge responses
if (responseBody.object === "payment_intent" && responseBody.amount) {
return responseBody.amount / 100; // Stripe amounts are in cents
}
return 0;
}
if (vendor === "twilio") {
// Twilio: price is in the response body for messages and calls
if (responseBody.price) {
return Math.abs(parseFloat(responseBody.price));
}
return 0;
}
if (vendor === "resend") {
// Resend: fixed rate ~$0.001 per email, no per-request cost in response
if (statusCode === 200 && responseBody.id) return 0.001;
return 0;
}
return 0;
}
Build vs buy decision matrix
| Factor | Build self-hosted | Use managed (Keybrake) |
|---|---|---|
| Time to first scoped call | 2–5 days (proxy + vault key store + policy enforcement) | ~30 minutes (POST /v1/keys, change API base URL) |
| Ongoing maintenance | You own TLS cert renewal, SQLite backups, Node.js upgrades, and vendor API schema changes | Managed — vendor schema changes handled by Keybrake |
| Audit log compliance | You build storage, retention, and querying; GDPR deletion is your problem | 90-day retention on Team plan; one-click data export |
| Vendor expansion | Each new vendor requires a new proxy handler, cost parser, and policy type | New vendors added by Keybrake; same vault key API across all vendors |
| Control and customization | Full control — custom policy types, custom cost parsing, internal LDAP integration | Standard policy types (spend cap, endpoint allowlist, TTL, user attribution) |
| Appropriate for | Teams with compliance requirements that prevent third-party proxies; 10+ vendors with non-standard APIs | Most AI agent teams calling Stripe, Twilio, Resend, Shopify, Postmark, Segment |
Vault key API used by agents
Regardless of whether you build or use a managed gateway, the agent-facing API should follow the same pattern:
# Issue a vault key (one per agent run or per tool call)
POST https://api.keybrake.com/v1/keys
Authorization: Bearer ${KEYBRAKE_TOKEN}
{
"label": "agent-run-${runId}",
"vendor": "stripe",
"allowed_endpoints": ["/v1/payment_intents", "/v1/payment_intents/*"],
"daily_usd_cap": 500,
"expires_in": "10m"
}
→ { "id": "vk_xxx", "token": "vault_key_xxx" }
# Use the vault key against the proxy
POST https://proxy.keybrake.com/stripe/v1/payment_intents
Authorization: Bearer vault_key_xxx
(standard Stripe API request body)
→ standard Stripe API response (or 429 with {"code":"cap_exhausted"} if over cap)
# Revoke when done
DELETE https://api.keybrake.com/v1/keys/vk_xxx
Authorization: Bearer ${KEYBRAKE_TOKEN}
→ 204 No Content
The agent only needs the vault_key_xxx token and the proxy URL. The real vendor API key is never distributed to the agent process.
Related questions
How is an AI agent API gateway different from LiteLLM or other LLM gateways?
LLM gateways like LiteLLM handle routing between LLM providers (OpenAI, Anthropic, Gemini), token budgeting, and model fallback. They deal exclusively with the LLM side of the agent. An AI agent API gateway handles the vendor side — Stripe, Twilio, Resend, Shopify, and the other SaaS APIs that agents call to take real-world actions. The two are complementary, not competitive: use LiteLLM (or LiteLLM + Keybrake) to control LLM spend; use an agent API gateway to control vendor spend. The risk profiles are different too — exceeding an LLM budget wastes inference dollars; exceeding a Stripe spend cap moves real money out of customer accounts.
Does an AI agent API gateway add meaningful latency?
A self-hosted gateway on the same network as your agent adds 1–5ms — negligible compared to LLM inference times (500ms–5s) and vendor API roundtrips (50–200ms). A managed gateway like Keybrake adds ~10–30ms from the proxy hop, depending on geographic proximity. For interactive agents where response latency matters, deploy a managed gateway in a region close to your agent infrastructure. For batch agents (overnight job runs), latency is irrelevant. The cost of not having spend enforcement — a runaway loop on Stripe — is measured in dollars, not milliseconds.
Can an AI agent API gateway handle webhook verification for inbound vendor events?
The gateway pattern described here covers outbound agent calls (agent → vendor). Inbound webhooks (vendor → your server) are a separate concern. For Stripe webhooks, verification uses the webhook signing secret — a separate credential from the API key, not proxied through the gateway. Keybrake does not currently proxy inbound webhooks. For webhook security specifically (verifying that an inbound request came from Stripe rather than an attacker), see the AI agent webhook security guide.
What's the right granularity for vault key issuance — per agent, per run, or per tool call?
Per agent run is the most common granularity: issue one vault key at the start of an agent's task and revoke it when the task completes. Per tool call (one vault key per Stripe API call) provides the tightest spend isolation but adds latency overhead (~50ms per tool call) and complicates streaming agent patterns. Per agent (one long-lived vault key for an agent's entire lifetime) defeats the purpose — if the agent runs for days, the vault key's spend cap needs to be very large to avoid false cap hits. Start with per-run: it aligns vault key lifetime with agent run lifetime, provides clear attribution in the audit log, and keeps the spend cap meaningful (the total expected spend for one agent task).
Further reading
- AI agent spend reporting — how to aggregate vault key audit logs into per-agent, per-user, and per-feature spend dashboards.
- AI agent policy enforcement — building the policy evaluation layer of the gateway: allowlist matching, cap calculation, and enforcement ordering.
- AI agent zero-trust — extending gateway security with mutual TLS, token binding, and short-TTL credentials for high-assurance agent deployments.
- AI agent audit trail — designing the audit log schema for agent API calls: what to record, how long to retain, and how to query for incident investigation.
- AI agent cost management — end-to-end cost management strategy for agents calling multiple vendors, including gateway enforcement, budget alerts, and chargeback attribution.