Cloudflare Workers · AI agents · API key management · edge AI
Cloudflare Workers AI agent API key management: vault keys for edge AI workflows
Cloudflare Workers is increasingly the runtime for edge AI agent logic: Workers AI for inference, Cloudflare Workflows for multi-step orchestration, AI Gateway for LLM routing, and Durable Objects for agent state. When these Workers need to call Stripe, Twilio, or Resend, the API key story is the same as every other serverless runtime: the key lives in Workers Secrets, shared across all invocations of the Worker, with no per-invocation spend cap or endpoint scope. Vault keys fit naturally into Cloudflare's architecture: store the Keybrake API token in Workers Secrets, issue per-invocation vault keys at the start of each workflow step, and the real Stripe secret never travels to the edge at all.
TL;DR
Store the Keybrake API token (not the Stripe secret) in a Cloudflare Workers Secret. At the start of each Worker invocation or Workflow step, call the Keybrake API to issue a vault key scoped to the workflow's spend cap and allowed endpoints. Use the vault key token as the Authorization: Bearer header when proxying vendor calls through https://proxy.keybrake.com/stripe/.... The real Stripe secret lives in Keybrake's infrastructure — not in Workers Secrets, not in the Worker's memory, and not in any fetch request that could be logged at the edge.
Cloudflare's AI agent primitives and where credentials fit
Cloudflare has assembled a set of primitives for building AI agents at the edge:
- Workers AI — run inference on Cloudflare's GPU network directly from a Worker. Used for intent classification, text generation, and embedding generation at low latency.
- Cloudflare Workflows — multi-step, durable workflow orchestration built into the Workers runtime. Each step is a Worker function with retry semantics. The equivalent of Temporal or Prefect for Workers.
- AI Gateway — a Cloudflare-managed LLM proxy that adds rate limiting, caching, and observability for calls to OpenAI, Anthropic, and other LLM providers. Analogous to Keybrake but for LLM endpoints, not vendor SaaS APIs.
- Durable Objects — stateful edge compute; holds agent conversation state, pending actions, and workflow progress in a single strongly-consistent object.
- Workers Secrets — encrypted environment variables bound to a Worker; the right place to store the Keybrake API token used to issue vault keys.
AI Gateway handles LLM API key governance. Keybrake handles vendor SaaS API key governance (Stripe, Twilio, Resend). They're complementary: an AI agent can route its LLM calls through AI Gateway and its Stripe calls through Keybrake simultaneously.
The vault key pattern for Cloudflare Workers
// Worker: issue vault key, proxy Stripe call, revoke key
export default {
async fetch(request: Request, env: Env): Promise {
const body = await request.json() as { customerId: string; amountCents: number };
const workflowId = request.headers.get("X-Workflow-Id") ?? crypto.randomUUID();
// Step 1: Issue vault key using the Keybrake token from Workers Secrets
const keyRes = await fetch("https://api.keybrake.com/v1/keys", {
method: "POST",
headers: {
"Authorization": `Bearer ${env.KEYBRAKE_TOKEN}`, // from Workers Secret
"Content-Type": "application/json",
},
body: JSON.stringify({
label: `workers-charge-${workflowId}`,
vendor: "stripe",
allowed_endpoints: [
"/v1/payment_intents",
"/v1/payment_intents/*"
],
daily_usd_cap: 500,
expires_in: "5m"
}),
});
if (!keyRes.ok) {
return new Response(JSON.stringify({ error: "vault_key_issuance_failed" }), {
status: 500, headers: { "Content-Type": "application/json" }
});
}
const { token: vaultKey, id: keyId } = await keyRes.json() as { token: string; id: string };
try {
// Step 2: Proxy Stripe call through Keybrake using the vault key
const stripeRes = await fetch(
"https://proxy.keybrake.com/stripe/v1/payment_intents",
{
method: "POST",
headers: {
"Authorization": `Bearer ${vaultKey}`, // vault key, not Stripe secret
"Content-Type": "application/json",
},
body: JSON.stringify({
amount: body.amountCents,
currency: "usd",
customer: body.customerId,
}),
}
);
if (stripeRes.status === 429) {
const err = await stripeRes.json() as { code?: string };
if (err.code === "cap_exhausted") {
return new Response(
JSON.stringify({ error: "spend_cap_exceeded" }),
{ status: 402, headers: { "Content-Type": "application/json" } }
);
}
}
const payment = await stripeRes.json();
return new Response(JSON.stringify(payment), {
status: stripeRes.status,
headers: { "Content-Type": "application/json" }
});
} finally {
// Step 3: Revoke vault key — fire and forget (TTL is safety net)
await fetch(`https://api.keybrake.com/v1/keys/${keyId}`, {
method: "DELETE",
headers: { "Authorization": `Bearer ${env.KEYBRAKE_TOKEN}` },
});
}
}
};
Vault keys in Cloudflare Workflows
Cloudflare Workflows provides durable multi-step orchestration with automatic retry. Each workflow step is a distinct execution unit — vault keys should be issued per step rather than per workflow, because a workflow can span minutes to hours and individual steps are independently retried:
import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from 'cloudflare:workers';
interface BillingWorkflowParams {
customerId: string;
amountCents: number;
runId: string;
}
export class BillingWorkflow extends WorkflowEntrypoint {
async run(event: WorkflowEvent, step: WorkflowStep) {
const { customerId, amountCents, runId } = event.payload;
// Issue a vault key scoped to this workflow step (not the entire workflow)
const vaultKey = await step.do("issue-vault-key", async () => {
const res = await fetch("https://api.keybrake.com/v1/keys", {
method: "POST",
headers: { "Authorization": `Bearer ${this.env.KEYBRAKE_TOKEN}` },
body: JSON.stringify({
label: `workflow-${runId}-charge-step`,
vendor: "stripe",
allowed_endpoints: ["/v1/payment_intents", "/v1/payment_intents/*"],
daily_usd_cap: 1000,
expires_in: "10m" // Longer TTL for workflow steps with retry
}),
});
const key = await res.json() as { token: string; id: string };
return key;
});
// Charge the customer using the per-step vault key
const payment = await step.do("charge-customer", async () => {
const res = await fetch("https://proxy.keybrake.com/stripe/v1/payment_intents", {
method: "POST",
headers: { "Authorization": `Bearer ${vaultKey.token}` },
body: JSON.stringify({ amount: amountCents, currency: "usd", customer: customerId }),
});
return await res.json();
});
// Revoke the step's vault key
await step.do("revoke-vault-key", async () => {
await fetch(`https://api.keybrake.com/v1/keys/${vaultKey.id}`, {
method: "DELETE",
headers: { "Authorization": `Bearer ${this.env.KEYBRAKE_TOKEN}` },
});
});
return payment;
}
}
Cloudflare AI Gateway vs. Keybrake: complementary, not competing
| Property | Cloudflare AI Gateway | Keybrake |
|---|---|---|
| What it proxies | LLM providers: OpenAI, Anthropic, Mistral, Cohere, Workers AI | Vendor SaaS APIs: Stripe, Twilio, Resend |
| Spend enforcement | LLM token cost caps | Dollar spend caps on vendor charges (Stripe amount, Twilio SMS price) |
| Audit log | LLM request/response logs with token counts | Vendor API call logs with dollar cost per call |
| Per-execution scoping | Rate limiting by AI Gateway key, not per-workflow-run | Per-run vault keys with individual spend caps and endpoint allowlists |
| Revocation | No per-execution revocation | Per-vault-key revocation; kills one run without affecting others |
| How they interact | Both can be active simultaneously: AI Gateway routes LLM calls, Keybrake proxies vendor SaaS calls. A Cloudflare Workflow step can call Workers AI via AI Gateway and Stripe via Keybrake in the same step. | |
Durable Objects for vault key state caching
For high-throughput Workers that process many short requests (e.g., a webhook handler receiving thousands of events per minute), issuing a vault key per-request adds overhead. Durable Objects can cache vault keys for a session or user, reducing issuance to once per session rather than once per request:
// Durable Object: caches vault key for a user session
export class AgentSession extends DurableObject {
private vaultKey: { token: string; id: string; expiresAt: number } | null = null;
async getVaultKey(env: Env): Promise {
const now = Date.now();
// Reuse cached key if it expires more than 2 minutes from now
if (this.vaultKey && this.vaultKey.expiresAt > now + 120_000) {
return this.vaultKey.token;
}
// Issue new vault key (15-minute session key, reused across requests)
const res = await fetch("https://api.keybrake.com/v1/keys", {
method: "POST",
headers: { "Authorization": `Bearer ${env.KEYBRAKE_TOKEN}` },
body: JSON.stringify({
label: `session-${this.ctx.id}`,
vendor: "stripe",
allowed_endpoints: ["/v1/payment_intents", "/v1/payment_intents/*"],
daily_usd_cap: 200, // Per-session cap
expires_in: "15m"
}),
});
this.vaultKey = await res.json() as { token: string; id: string; expiresAt: number };
return this.vaultKey.token;
}
}
Related questions
Does Cloudflare's edge network add latency to requests going to proxy.keybrake.com?
Cloudflare Workers run on Cloudflare's edge network in 200+ cities. The fetch from a Worker to proxy.keybrake.com exits Cloudflare's network and travels to Keybrake's infrastructure as a standard HTTPS request. If Keybrake's proxy is hosted on a major cloud provider (GCP, AWS, or Fly.io), the latency from a Cloudflare PoP to the proxy is typically 10–50ms — comparable to any cross-cloud HTTPS call. Keybrake's proxy then forwards to Stripe at similar latency. The total round-trip overhead versus direct-to-Stripe is approximately one extra HTTPS hop, typically 20–80ms depending on geographic proximity.
Can I use Cloudflare KV to cache vault keys across Workers invocations?
Yes. Cloudflare KV is appropriate for caching vault keys that are valid for multiple minutes and shared across many Worker invocations for the same user session or workflow run. Store the vault key token and its expiration timestamp in KV with the user session ID or workflow run ID as the key. Set the KV entry's TTL to match the vault key's TTL minus a safety buffer (e.g., if the vault key expires in 15 minutes, set KV TTL to 12 minutes). On each Worker invocation, read from KV first and only call the Keybrake API if no cached key exists. This reduces vault key issuance to once per session rather than once per request.
How does this work with Cloudflare Workers calling multiple vendor APIs (Stripe and Twilio) in the same invocation?
Issue a separate vault key for each vendor. A Workers invocation that needs to both charge a customer (Stripe) and send an SMS confirmation (Twilio) issues two vault keys: one with vendor: "stripe" and one with vendor: "twilio". Each key has its own spend cap and endpoint allowlist. The two key issuance calls can be made in parallel (Promise.all) to avoid sequential latency. Both keys are revoked in a finally block when the invocation completes. The audit log in Keybrake shows both the Stripe charge and the Twilio SMS as separate entries attributed to the same workflow run label.
What's the relationship between Cloudflare Workers Secrets and the Stripe secret key?
In the vault key pattern, the Stripe secret key is not stored in Workers Secrets at all. It lives in Keybrake's encrypted secrets vault and never travels to Cloudflare's edge. Workers Secrets holds the Keybrake API token — a credential that can issue short-lived vault keys. The security property is that a compromised Workers Secret (the Keybrake token) can only issue vault keys, not make direct Stripe API calls. An attacker with the Keybrake token could issue vault keys with the policies you've configured (vendor: stripe, allowed_endpoints, spend cap), but cannot call Stripe's API without routing through Keybrake's proxy — where every call is logged, capped, and can be blocked by revoking all outstanding keys via the Keybrake dashboard.
Further reading
- Deno Deploy AI agent API key — the same vault key pattern for Deno Deploy's V8-isolate serverless runtime, which shares similar per-invocation isolation properties with Cloudflare Workers.
- AI agent API key lifecycle — how vault key TTL and revocation work as lifecycle phases aligned to the Cloudflare Workflow step lifecycle.
- AI agent budget enforcement — why per-call spend caps are distinct from cloud billing alarms, and how the atomic enforcement model works in the proxy layer.
- AI agent webhook authentication — authenticating Cloudflare Workers webhook handlers that trigger agent runs, ensuring the vault key is issued only for authenticated webhook deliveries.
- AI agent observability — how to correlate Keybrake audit log entries (vendor API calls) with Cloudflare Workers observability (request traces, Worker CPU time) for end-to-end agent run visibility.