Architecture · API gateway · AI agents · Vendor API security

AI agent API gateway: routing, policy enforcement, and spend control for multi-vendor agent calls

An LLM gateway like LiteLLM handles the token budget and model routing for the AI side of your agents. But your agents don't just call LLMs — they call Stripe to charge customers, Twilio to send SMS, Resend to deliver email, and Shopify to fulfill orders. These vendor calls carry real money and real side effects. An AI agent API gateway sits between your agent and these vendor APIs, enforcing per-agent spend caps, scoping credentials to individual runs, logging every call to an audit trail, and providing a kill-switch that works in under one second. This guide covers what that gateway looks like, how to build a minimal version, and when a managed proxy makes more sense than DIY.

TL;DR

An AI agent API gateway is a reverse proxy that: (1) authenticates requests using short-lived vault keys, not long-lived vendor API keys; (2) enforces per-agent policies (spend caps, endpoint allowlists, TTLs); (3) translates vault key requests to real vendor API calls; and (4) logs every request and its cost. The agent never sees the real Stripe or Twilio API key — only the vault key that expires when the agent run ends.

Why agents need a different kind of API gateway

Traditional API gateways (Kong, AWS API Gateway, nginx) are designed for human-facing services: they rate-limit by IP, authenticate users via JWT or OAuth, and route traffic between microservices. They are not designed for the specific risks that autonomous agents introduce:

The AI agent API gateway architecture

The gateway sits at the edge of your vendor API calls:

┌─────────────────────────────────────────────────────────────┐
│                     Agent (LLM + tools)                      │
│                                                              │
│  chargeCustomer({ amount: 100, customer: "cus_xxx" })        │
└──────────────────────────┬───────────────────────────────────┘
                           │ vault_key_xxx (short-lived, scoped)
                           ▼
┌─────────────────────────────────────────────────────────────┐
│               AI Agent API Gateway (proxy)                   │
│                                                              │
│  1. Authenticate: vault_key_xxx → look up policy             │
│  2. Enforce: daily_usd_cap=$500, allowed=/v1/payment_intents  │
│  3. Forward: swap vault key → real STRIPE_SECRET_KEY         │
│  4. Log: request + response + cost (parsed from response)    │
│  5. Revoke: mark vault_key_xxx spent after response returns  │
└──────────────────────────┬───────────────────────────────────┘
                           │ real Stripe API key (never leaves gateway)
                           ▼
                    api.stripe.com

The agent has the vault key. The gateway has the real vendor API key. The two never meet in the agent's process memory.

Minimal self-hosted gateway implementation

A minimal AI agent API gateway in Node.js requires three components:

1. Vault key store and policy enforcement

// gateway/vault.ts
import Database from "better-sqlite3";

const db = new Database("./data.db");

db.exec(`
  CREATE TABLE IF NOT EXISTS vault_keys (
    id TEXT PRIMARY KEY,
    token TEXT UNIQUE NOT NULL,
    vendor TEXT NOT NULL,
    allowed_endpoints TEXT NOT NULL,  -- JSON array
    daily_usd_cap REAL NOT NULL,
    daily_usd_spent REAL DEFAULT 0,
    expires_at INTEGER NOT NULL,
    revoked INTEGER DEFAULT 0
  );
  CREATE TABLE IF NOT EXISTS audit_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    vault_key_id TEXT NOT NULL,
    vendor TEXT NOT NULL,
    method TEXT NOT NULL,
    path TEXT NOT NULL,
    status_code INTEGER,
    cost_usd REAL,
    request_at INTEGER NOT NULL
  );
`);

export function enforcePolicy(
  token: string,
  method: string,
  path: string,
): { valid: true; keyId: string } | { valid: false; reason: string } {
  const key = db
    .prepare("SELECT * FROM vault_keys WHERE token = ?")
    .get(token) as any;

  if (!key) return { valid: false, reason: "unknown_token" };
  if (key.revoked) return { valid: false, reason: "revoked" };
  if (Date.now() / 1000 > key.expires_at) return { valid: false, reason: "expired" };
  if (key.daily_usd_spent >= key.daily_usd_cap)
    return { valid: false, reason: "cap_exhausted" };

  const allowed: string[] = JSON.parse(key.allowed_endpoints);
  const pathAllowed = allowed.some((pattern) => {
    if (pattern.endsWith("/*")) {
      return path.startsWith(pattern.slice(0, -2));
    }
    return path === pattern;
  });

  if (!pathAllowed) return { valid: false, reason: "endpoint_not_allowed" };

  return { valid: true, keyId: key.id };
}

2. Vendor proxy handler

// gateway/proxy.ts
import http from "http";
import https from "https";
import { enforcePolicy } from "./vault.ts";

const VENDOR_TARGETS: Record<string, { host: string; realKey: string }> = {
  stripe: {
    host: "api.stripe.com",
    realKey: process.env.STRIPE_SECRET_KEY!,
  },
  twilio: {
    host: "api.twilio.com",
    realKey: `${process.env.TWILIO_ACCOUNT_SID}:${process.env.TWILIO_AUTH_TOKEN}`,
  },
  resend: {
    host: "api.resend.com",
    realKey: process.env.RESEND_API_KEY!,
  },
};

export function createProxyServer() {
  return http.createServer(async (req, res) => {
    // URL pattern: /stripe/v1/payment_intents → vendor=stripe, path=/v1/payment_intents
    const match = req.url?.match(/^\/(\w+)(\/.*)/);
    if (!match) {
      res.writeHead(404).end(JSON.stringify({ error: "unknown_vendor" }));
      return;
    }

    const [, vendor, vendorPath] = match;
    const target = VENDOR_TARGETS[vendor];
    if (!target) {
      res.writeHead(404).end(JSON.stringify({ error: "unsupported_vendor" }));
      return;
    }

    // Extract vault key from Authorization header
    const auth = req.headers["authorization"] ?? "";
    const token = auth.startsWith("Bearer ") ? auth.slice(7) : "";
    const check = enforcePolicy(token, req.method!, vendorPath);

    if (!check.valid) {
      res.writeHead(check.reason === "cap_exhausted" ? 429 : 401).end(
        JSON.stringify({ error: check.reason }),
      );
      return;
    }

    // Forward to vendor with real API key
    const proxyReq = https.request(
      {
        hostname: target.host,
        path: vendorPath,
        method: req.method,
        headers: {
          ...req.headers,
          host: target.host,
          authorization: `Bearer ${target.realKey}`,
        },
      },
      (proxyRes) => {
        res.writeHead(proxyRes.statusCode!, proxyRes.headers);
        proxyRes.pipe(res);
        // Cost parsing and audit logging happen on response body (vendor-specific)
      },
    );

    req.pipe(proxyReq);
  });
}

3. Cost parsing per vendor

Each vendor exposes cost differently — you need vendor-specific parsers:

// Cost parsing per vendor
function parseCost(vendor: string, statusCode: number, responseBody: any): number {
  if (vendor === "stripe") {
    // Stripe: parse amount from PaymentIntent or Charge responses
    if (responseBody.object === "payment_intent" && responseBody.amount) {
      return responseBody.amount / 100; // Stripe amounts are in cents
    }
    return 0;
  }

  if (vendor === "twilio") {
    // Twilio: price is in the response body for messages and calls
    if (responseBody.price) {
      return Math.abs(parseFloat(responseBody.price));
    }
    return 0;
  }

  if (vendor === "resend") {
    // Resend: fixed rate ~$0.001 per email, no per-request cost in response
    if (statusCode === 200 && responseBody.id) return 0.001;
    return 0;
  }

  return 0;
}

Build vs buy decision matrix

FactorBuild self-hostedUse managed (Keybrake)
Time to first scoped call 2–5 days (proxy + vault key store + policy enforcement) ~30 minutes (POST /v1/keys, change API base URL)
Ongoing maintenance You own TLS cert renewal, SQLite backups, Node.js upgrades, and vendor API schema changes Managed — vendor schema changes handled by Keybrake
Audit log compliance You build storage, retention, and querying; GDPR deletion is your problem 90-day retention on Team plan; one-click data export
Vendor expansion Each new vendor requires a new proxy handler, cost parser, and policy type New vendors added by Keybrake; same vault key API across all vendors
Control and customization Full control — custom policy types, custom cost parsing, internal LDAP integration Standard policy types (spend cap, endpoint allowlist, TTL, user attribution)
Appropriate for Teams with compliance requirements that prevent third-party proxies; 10+ vendors with non-standard APIs Most AI agent teams calling Stripe, Twilio, Resend, Shopify, Postmark, Segment

Vault key API used by agents

Regardless of whether you build or use a managed gateway, the agent-facing API should follow the same pattern:

# Issue a vault key (one per agent run or per tool call)
POST https://api.keybrake.com/v1/keys
Authorization: Bearer ${KEYBRAKE_TOKEN}
{
  "label": "agent-run-${runId}",
  "vendor": "stripe",
  "allowed_endpoints": ["/v1/payment_intents", "/v1/payment_intents/*"],
  "daily_usd_cap": 500,
  "expires_in": "10m"
}
→ { "id": "vk_xxx", "token": "vault_key_xxx" }

# Use the vault key against the proxy
POST https://proxy.keybrake.com/stripe/v1/payment_intents
Authorization: Bearer vault_key_xxx
(standard Stripe API request body)
→ standard Stripe API response (or 429 with {"code":"cap_exhausted"} if over cap)

# Revoke when done
DELETE https://api.keybrake.com/v1/keys/vk_xxx
Authorization: Bearer ${KEYBRAKE_TOKEN}
→ 204 No Content

The agent only needs the vault_key_xxx token and the proxy URL. The real vendor API key is never distributed to the agent process.

Get early access

Related questions

How is an AI agent API gateway different from LiteLLM or other LLM gateways?

LLM gateways like LiteLLM handle routing between LLM providers (OpenAI, Anthropic, Gemini), token budgeting, and model fallback. They deal exclusively with the LLM side of the agent. An AI agent API gateway handles the vendor side — Stripe, Twilio, Resend, Shopify, and the other SaaS APIs that agents call to take real-world actions. The two are complementary, not competitive: use LiteLLM (or LiteLLM + Keybrake) to control LLM spend; use an agent API gateway to control vendor spend. The risk profiles are different too — exceeding an LLM budget wastes inference dollars; exceeding a Stripe spend cap moves real money out of customer accounts.

Does an AI agent API gateway add meaningful latency?

A self-hosted gateway on the same network as your agent adds 1–5ms — negligible compared to LLM inference times (500ms–5s) and vendor API roundtrips (50–200ms). A managed gateway like Keybrake adds ~10–30ms from the proxy hop, depending on geographic proximity. For interactive agents where response latency matters, deploy a managed gateway in a region close to your agent infrastructure. For batch agents (overnight job runs), latency is irrelevant. The cost of not having spend enforcement — a runaway loop on Stripe — is measured in dollars, not milliseconds.

Can an AI agent API gateway handle webhook verification for inbound vendor events?

The gateway pattern described here covers outbound agent calls (agent → vendor). Inbound webhooks (vendor → your server) are a separate concern. For Stripe webhooks, verification uses the webhook signing secret — a separate credential from the API key, not proxied through the gateway. Keybrake does not currently proxy inbound webhooks. For webhook security specifically (verifying that an inbound request came from Stripe rather than an attacker), see the AI agent webhook security guide.

What's the right granularity for vault key issuance — per agent, per run, or per tool call?

Per agent run is the most common granularity: issue one vault key at the start of an agent's task and revoke it when the task completes. Per tool call (one vault key per Stripe API call) provides the tightest spend isolation but adds latency overhead (~50ms per tool call) and complicates streaming agent patterns. Per agent (one long-lived vault key for an agent's entire lifetime) defeats the purpose — if the agent runs for days, the vault key's spend cap needs to be very large to avoid false cap hits. Start with per-run: it aligns vault key lifetime with agent run lifetime, provides clear attribution in the audit log, and keeps the spend cap meaningful (the total expected spend for one agent task).

Further reading