Express.js · Node.js · AI agents · API key management

Express.js AI agent API key management: scoped vault keys for Node agent backends

Express.js is the most common Node.js choice for AI agent tool backends: a POST route receives tool call arguments from an OpenAI Agents SDK, LangChain.js, or Vercel AI SDK agent, executes JavaScript, calls Stripe or Twilio using process.env.STRIPE_KEY, and returns JSON. The problem is structural: process.env.STRIPE_KEY is a module-level global set once at process startup and shared across every concurrent request. Twenty users running agents simultaneously all hit the same Stripe credential — there's no per-user spend cap, no per-request endpoint scope, no way to stop one runaway agent without rotating the key for everyone. Vault keys solve this with an Express middleware that issues a per-request scoped credential, attaches it to the req object, and revokes it in a response hook.

TL;DR

Store KEYBRAKE_TOKEN in your environment instead of the raw Stripe secret. Write an Express middleware that issues a vault key in async (req, res, next), attaches it as req.vaultKey, wraps res.json() to trigger revocation after the response body is sent, then calls next(). Route handlers read req.vaultKey and POST to https://proxy.keybrake.com/stripe/... instead of calling the Stripe SDK directly. No changes to route handler signatures.

The Express AI agent tool backend pattern

A typical Express AI agent tool backend looks like this:

import express from "express";
import Stripe from "stripe";

const app = express();
app.use(express.json());

const stripe = new Stripe(process.env.STRIPE_KEY);  // shared across ALL requests

app.post("/tools/charge", async (req, res) => {
  const { amount, customerId } = req.body;
  const intent = await stripe.paymentIntents.create({
    amount,
    currency: "usd",
    customer: customerId,
  });
  res.json(intent);
});

This is clean Node.js. The problem: the Stripe instance is initialized once at module load with the same API key for every concurrent request. A stuck agent loop creating payment intents for User A can also hit endpoints that should be locked to User B's data — there's no request-scoped enforcement boundary. Rotating the key to kill the runaway agent disrupts every active session simultaneously.

Adding vault keys via Express middleware

Express middleware runs before route handlers and has access to the req and res objects that routes already use:

import fetch from "node-fetch"; // or native fetch in Node 18+

async function vaultKeyMiddleware(req, res, next) {
  // Skip non-agent routes (health checks, static assets, etc.)
  if (!req.path.startsWith("/tools/")) {
    return next();
  }

  const userId = req.headers["x-user-id"] ?? "anonymous";
  const runId = req.headers["x-agent-run-id"] ?? "unknown";

  try {
    const issueResp = await fetch("https://api.keybrake.com/v1/keys", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.KEYBRAKE_TOKEN}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        label: `express-${userId}-${runId}`,
        vendor: "stripe",
        allowed_endpoints: [
          "/v1/payment_intents",
          "/v1/payment_intents/*",
        ],
        daily_usd_cap: 500,
        expires_in: "5m",
      }),
    });

    if (!issueResp.ok) throw new Error(`Keybrake issuance: ${issueResp.status}`);
    const keyData = await issueResp.json();
    req.vaultKey = keyData.token;
    req.vaultKeyId = keyData.id;
  } catch (err) {
    console.error("Vault key issuance failed:", err);
    req.vaultKey = null;
    req.vaultKeyId = null;
  }

  // Revoke vault key after the response body is finished
  res.on("finish", async () => {
    if (req.vaultKeyId) {
      try {
        await fetch(`https://api.keybrake.com/v1/keys/${req.vaultKeyId}`, {
          method: "DELETE",
          headers: { "Authorization": `Bearer ${process.env.KEYBRAKE_TOKEN}` },
        });
      } catch (_) {
        // Key expires via TTL if revocation fails — acceptable
      }
    }
  });

  next();
}

app.use(vaultKeyMiddleware);

Route handlers now use the per-request vault key from req.vaultKey:

app.post("/tools/charge", async (req, res) => {
  if (!req.vaultKey) {
    return res.status(503).json({ error: "vault_key_unavailable" });
  }

  const resp = await fetch(
    "https://proxy.keybrake.com/stripe/v1/payment_intents",
    {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${req.vaultKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        amount: req.body.amount,
        currency: "usd",
        customer: req.body.customerId,
      }),
    }
  );

  if (resp.status === 429) {
    const body = await resp.json();
    if (body.code === "cap_exhausted") {
      return res.status(402).json({ error: "spend_cap_exceeded" });
    }
  }

  if (!resp.ok) {
    return res.status(resp.status).json(await resp.json());
  }

  res.json(await resp.json());
});

Using vault keys with the Stripe Node SDK's custom base URL

If you prefer the Stripe SDK's typed methods over raw fetch calls, the Stripe Node SDK v8+ supports custom base URLs via StripeClient. This lets you keep your existing Stripe SDK usage while routing through the Keybrake proxy:

import Stripe from "stripe";

// In your route handler or a per-request factory:
function createStripeClient(vaultKey) {
  return new Stripe(vaultKey, {
    // Route through Keybrake proxy instead of api.stripe.com
    host: "proxy.keybrake.com",
    protocol: "https",
    path: "/stripe",
  });
}

app.post("/tools/charge", async (req, res) => {
  const stripeClient = createStripeClient(req.vaultKey);
  const intent = await stripeClient.paymentIntents.create({
    amount: req.body.amount,
    currency: "usd",
    customer: req.body.customerId,
  });
  res.json(intent);
});

This is the cleanest migration path if you have existing Stripe SDK calls — minimal code changes, same typed API, policy enforcement handled by the proxy.

Express with Bull/BullMQ: vault keys for async agent queues

Many Express AI agent backends use Bull or BullMQ for async task processing. The vault key from the Express middleware does not cross the queue boundary. Issue a fresh vault key inside the Bull worker's process callback:

import { Worker } from "bullmq";
import fetch from "node-fetch";

const billingWorker = new Worker("billing-agent", async (job) => {
  const { runId, customerIds } = job.data;

  // Issue vault key at worker process start — not in Express middleware
  const issueResp = await fetch("https://api.keybrake.com/v1/keys", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.KEYBRAKE_TOKEN}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      label: `bull-billing-${runId}`,
      vendor: "stripe",
      allowed_endpoints: ["/v1/payment_intents", "/v1/payment_intents/*"],
      daily_usd_cap: 5000,
      expires_in: "30m",
    }),
  });

  const { token: vaultKey, id: vaultKeyId } = await issueResp.json();

  try {
    for (const customerId of customerIds) {
      await chargeCustomer(customerId, vaultKey);
    }
  } finally {
    await fetch(`https://api.keybrake.com/v1/keys/${vaultKeyId}`, {
      method: "DELETE",
      headers: { "Authorization": `Bearer ${process.env.KEYBRAKE_TOKEN}` },
    });
  }
});

Concurrency and performance considerations

Concern	Impact	Mitigation
Middleware latency	~30–80ms per request (async HTTPS to api.keybrake.com)	Node's async model handles this non-blocking — concurrent requests don't queue. For very high throughput, pre-issue session-scoped keys with a 15-minute TTL stored in Redis
Issuance failure behavior	`req.vaultKey = null`; route handlers must check and return 503	Add a retry with exponential backoff inside the middleware (1 retry, 100ms delay). If both attempts fail, 503 is the right response — treat vault key availability as a service dependency
Response streaming (`res.write` / SSE)	`res.on("finish")` fires correctly for streaming responses too — Node's http.ServerResponse emits "finish" when the last byte is flushed regardless of streaming mode	No additional handling needed for Server-Sent Events or chunked transfer encoding — "finish" fires at the right time
PM2 cluster mode	Multiple PM2 workers each hold their own Node process and issue independent vault keys using the same KEYBRAKE_TOKEN	Expected. KEYBRAKE_TOKEN is the master credential shared across workers; vault keys are the per-request scoped tokens. No shared state needed between PM2 workers for vault key management.

Get early access

Related questions

Can I use vault keys with Express Router sub-routers for different vendor scopes?

Yes — and this is cleaner than a single top-level middleware. Attach a vault key middleware specific to each Express Router and configure different vendor scopes per router. For example: stripeRouter.use(vaultKeyMiddleware({ vendor: "stripe", endpoints: ["/v1/payment_intents/*"], cap: 500 })) and twilioRouter.use(vaultKeyMiddleware({ vendor: "twilio", cap: 50 })). This gives each router its own spend boundary rather than sharing a single vault key across calls to multiple vendors within the same request. The middleware factory pattern (a function that returns a middleware function) is idiomatic Express for this use case.

How does vault key revocation work with Express's async error handling?

If an Express route throws an unhandled error that bubbles to your error-handling middleware (app.use((err, req, res, next) => ...)), the res.on("finish") listener in the vault key middleware still fires — "finish" is emitted whenever the response is sent, regardless of whether it was a 200 or 500. If the request errors before a response is sent at all (e.g., the process crashes), the key expires via its TTL. The 5-minute TTL means worst-case exposure is 5 minutes, not indefinite — acceptable for most use cases. For stricter requirements, use a shorter TTL (1–2 minutes) combined with a keepalive pattern that extends the TTL for long-running agent requests.

What's the right vault key TTL for an Express AI agent tool backend?

Set the TTL to the 95th percentile of your agent request duration plus a small buffer. For synchronous agent tool calls that complete in under 30 seconds, a 1–2 minute TTL is right. For streaming or long-running agent sessions that can span several minutes, use a 15-minute TTL with explicit revocation on session end. The TTL is your safety net if revocation fails — a too-long TTL means a leaked key stays active longer than needed. A too-short TTL means the key might expire mid-request before the agent finishes its tool call sequence. Check your Express access logs for P95 request duration on agent tool routes to calibrate.