Azure Durable Functions · AI agents · API key security
Azure Durable Functions AI agent API key: scoping vendor calls in durable orchestrations
Azure Durable Functions is a serverless extension that adds stateful workflow capabilities to Azure Functions — orchestrator functions coordinate activity functions with guaranteed execution, fan-out/fan-in via context.df.Task.all(), and waitForExternalEvent for human-in-the-loop flows. AI agent teams on Azure adopt Durable Functions because it handles durable execution, automatic retries, and parallel fan-out without managing queues or workers. When Task.all() dispatches activity functions that call Stripe, Twilio, or Resend, Durable Functions' reliability features become vendor spend amplifiers: Task.all() schedules N concurrent activity calls each making independent vendor requests with no shared dollar cap, callActivityWithRetry re-executes failed activities that may have already charged Stripe, and there is no per-orchestration dollar limit in the Durable Functions runtime. This page covers the vault-key pattern that bounds vendor spend per Durable Functions orchestration.
TL;DR
Issue the vault key in the first activity call of your Durable Functions orchestrator. Pass the resulting vault key and the instanceId as part of every downstream activity's input. In fan-out, map over customer records and include the vault key and a stable itemIndex in each activity input — all concurrent activities share one cap that accumulates atomically. Use instanceId + itemIndex as the Stripe idempotency key so callActivityWithRetry retries are safe. Revoking a runaway orchestration is a single DELETE /vault/keys/{key_id} call — no terminateAsync, no Key Vault rotation, no function restart.
How Durable Functions AI agent orchestrations call vendor APIs
A typical agent billing orchestrator uses Task.all() to fan out charge activities across all customers:
// orchestrator.ts
import * as df from "durable-functions";
const orchestrator = df.orchestrator(function* (context) {
// Fetch customer list
const customers: Customer[] = yield context.df.callActivity("FetchCustomers", {
planId: context.bindingData.input.planId
});
// Fan-out: charge all customers concurrently
const tasks = customers.map((customer) =>
context.df.callActivityWithRetry(
"ChargeCustomer",
new df.RetryOptions(30_000, 3),
{
customer,
stripeKey: process.env.STRIPE_SECRET_KEY // ← same key for all activities
}
)
);
return yield context.df.Task.all(tasks); // N simultaneous Stripe calls
});
df.app.orchestration("BillingOrchestrator", orchestrator);
This pattern has two compounding risks. First, Task.all() schedules all activity tasks concurrently — for 3,000 customers, this dispatches 3,000 simultaneous activity function executions each calling Stripe with the same STRIPE_SECRET_KEY. Azure's per-Function App concurrency limit is the only constraint; there is no dollar-spend stop condition at the orchestrator level. Second, callActivityWithRetry re-schedules failed activities up to MaxNumberOfAttempts times. If an activity threw an error after Stripe had applied the charge, the retry re-calls Stripe without an idempotency key, creating duplicate charges at the rate of your retry policy.
Three gaps Durable Functions' native tooling doesn't fill for vendor spend control
| Gap | What happens in practice | Durable Functions' answer |
|---|---|---|
| No per-orchestration spend cap | Durable Functions has no mechanism to halt a Task.all() fan-out when cumulative vendor API spend reaches a dollar threshold. Concurrency control (maxConcurrentActivityFunctions in host.json) limits concurrent activity executions by count, not cost. Azure Monitor cost alerts fire after spend has occurred — too late to stop a fan-out that completes in minutes. Orchestration timeout limits wall-clock duration, not dollars spent. If you use Task.any() with a timeout task, you can interrupt the fan-out early — but this abandons in-flight activities without a dollar-aware trigger. |
Azure Cost Management sends alerts after spend. No pre-call, per-orchestration dollar cap in the Durable Functions runtime. |
| No mid-orchestration vendor revoke without Key Vault rotation | The Stripe API key is typically stored in Azure Key Vault and loaded as an app setting or fetched by each activity at cold start. Rotating the Key Vault secret creates a new version — but warm function instances that already loaded the secret into process memory continue using the old version until the function host is recycled. Terminating an orchestration via terminateAsync marks it failed in the Durable Functions history, but activity tasks that are already executing (messages on the Activity Queue) continue to completion. The termination signal takes effect on the next orchestrator replay, not on already-dispatched activities. |
terminateAsync marks the orchestration as terminated but cannot interrupt in-flight activity executions. No per-orchestration API key scoping that revokes cleanly. |
| No per-call audit with orchestration context | Application Insights captures function invocation events and durations but doesn't parse dollar amounts from Stripe response bodies, correlate Stripe PaymentIntent.id values with the Durable Functions instanceId and activity iteration index in a structured cost table, or provide a queryable per-orchestration spend summary. Reconstructing what a runaway orchestration charged requires cross-referencing Application Insights telemetry, Durable Functions history, and the Stripe dashboard — there is no shared identifier that propagates automatically from instanceId into Stripe's response metadata without explicit instrumentation. |
Application Insights logs function invocations and errors. No structured vendor cost tracking or instanceId-to-charge correlation natively. |
The fan-out/fan-in amplification risk
Task.all() is the canonical fan-out pattern in Durable Functions and the primary spend amplifier for agent billing workflows. Unlike sequential activity calls (where you can check a running total between calls), Task.all() schedules all activities as concurrent queue messages — each becomes an independent activity function execution with no coordination between instances. A Task.all() over 500 customer records dispatches 500 concurrent Stripe calls. The orchestrator resumes only when the last activity completes — but the vendor charges have already been applied by every activity that succeeded before a cap would have been enforced.
The retry amplification compounds this. callActivityWithRetry with a RetryOptions(30_000, 3) policy reschedules the activity up to 3 additional times on failure. If the activity threw an exception after Stripe had applied the charge (e.g. a network timeout after a successful Stripe API call), the retry re-calls Stripe without an idempotency key — each retry independently charges the customer. An activity that fails after being retried 3 times can produce 4 separate charges for the same customer.
Scoping vault keys per Durable Functions orchestration
// orchestrator.ts — with vault key
import * as df from "durable-functions";
interface VaultKeyResult {
vault_key: string;
key_id: string;
expires_at: string;
}
const orchestrator = df.orchestrator(function* (context) {
const instanceId = context.df.instanceId;
// Step 1: Issue a vault key scoped to this orchestration
const vault: VaultKeyResult = yield context.df.callActivity("IssueVaultKey", {
vendor: "stripe",
daily_usd_cap: context.bindingData.input.budgetUsd ?? 500,
allowed_endpoints: ["POST /v1/payment_intents"],
expires_in: "2h",
agent_run_label: `azure-durable/${instanceId}`
});
// Step 2: Fetch customers
const customers: Customer[] = yield context.df.callActivity("FetchCustomers", {
planId: context.bindingData.input.planId
});
// Step 3: Fan-out — all activities share the same vault key and cap
const tasks = customers.map((customer, itemIndex) =>
context.df.callActivityWithRetry(
"ChargeCustomer",
new df.RetryOptions(30_000, 3, 1, 2, ["CapExhausted"]), // don't retry CapExhausted
{
customer,
vault_key: vault.vault_key,
orchestration_id: instanceId,
item_index: itemIndex
}
)
);
return yield context.df.Task.all(tasks);
});
df.app.orchestration("BillingOrchestrator", orchestrator);
// charge-customer-activity.ts
import * as df from "durable-functions";
const KEYBRAKE_BASE = "https://proxy.keybrake.com";
df.app.activity("ChargeCustomer", {
handler: async (input: ChargeInput) => {
const { customer, vault_key, orchestration_id, item_index } = input;
const idempotencyKey = `${orchestration_id}-${item_index}`;
const res = await fetch(`${KEYBRAKE_BASE}/stripe/v1/payment_intents`, {
method: "POST",
headers: {
"Authorization": `Bearer ${vault_key}`,
"Idempotency-Key": idempotencyKey,
"Content-Type": "application/json"
},
body: JSON.stringify({
amount: customer.amount_cents,
currency: "usd",
customer: customer.id
})
});
if (res.status === 429) {
const body = await res.json();
if (body.code === "cap_exhausted") {
// Throw a named class so RetryOptions can exclude it
class CapExhausted extends Error {
constructor(message: string) { super(message); this.name = "CapExhausted"; }
}
throw new CapExhausted(body.message);
}
}
if (!res.ok) throw new Error(`Stripe error: ${res.status}`);
return res.json();
}
});
df.app.activity("IssueVaultKey", {
handler: async (input: VaultKeyInput): Promise<VaultKeyResult> => {
const res = await fetch(`${KEYBRAKE_BASE}/vault/keys`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.KEYBRAKE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify(input)
});
if (!res.ok) throw new Error(`Keybrake error: ${res.status}`);
return res.json();
}
});
The IssueVaultKey activity issues the vault key once at the start of the orchestration. The orchestrator passes vault.vault_key along with instanceId and itemIndex into each fan-out activity's input. All concurrent ChargeCustomer activities receive the same vault key and the same cap accumulates atomically across all concurrent executions. The idempotency key orchestration_id-item_index is stable across retries: Durable Functions replays the orchestrator deterministically with the same instanceId, and the same itemIndex is computed from the same array position. RetryOptions with a handle exclusion list prevents cap exhaustion errors from being retried.
How Keybrake fits
Keybrake is the proxy layer between your charge activity functions and Stripe, Twilio, or Resend. The vault key issued in IssueVaultKey replaces the full-access Stripe key previously loaded from Key Vault into activity function process memory. The real Stripe secret stays in Keybrake — it never appears in Application Insights telemetry, Durable Functions execution history, or activity function environment variables. For Task.all() fan-out, all concurrent activities receive the same vault key in their input and the same cap accumulates atomically. Revoking a runaway orchestration is a single DELETE /vault/keys/{key_id} call — effective on the next proxied request from any activity, with no terminateAsync call, no Key Vault rotation, and no impact on other orchestrations running different vault keys.
Related questions
Does the vault key survive the Durable Functions orchestrator replay mechanism?
Yes — because the vault key is issued inside a callActivity call, and Durable Functions stores all activity outputs in the execution history. On orchestrator replay, the framework returns the stored activity result from history without re-executing the activity. This means the vault key is issued exactly once per orchestration instance, even across multiple orchestrator replays triggered by activity completions. The vault key is not re-issued on replay — it is replayed from history. This is the correct behavior: you want one vault key per orchestration run, not one per replay event.
How do I prevent RetryOptions from retrying cap exhaustion errors?
The Durable Functions SDK's RetryOptions constructor accepts a handle parameter in some language SDKs (Python, .NET) that specifies exception types not to retry. In the Node.js SDK, use a try/catch wrapper around callActivityWithRetry to detect CapExhausted errors and return early rather than letting the retry machinery engage. Alternatively, check the error type in the orchestrator after Task.all() completes and filter out cap-exhaustion results from the retry candidates. The key principle is that cap exhaustion is intentional and terminal — retrying it wastes time and burns the retry budget before getting the same 429 response.
What vault key TTL should I use for orchestrations that call waitForExternalEvent?
Issue the vault key after the external event is received rather than at orchestration start. If your orchestration has a pattern like issue vault key → start vendor calls → wait for approval → resume vendor calls, the vault key will expire during the approval wait. Instead, split the orchestration into two vault key issuance points: one before the first set of vendor calls and a second IssueVaultKey activity after waitForExternalEvent returns. Each vault key covers only the vendor calls in its phase. This keeps TTLs tight (minutes, not days) and avoids issuing long-lived keys to cover multi-day wait states.
Further reading
- Temporal AI agent API key — similar per-workflow vault key pattern; Temporal Activities map to Durable Functions activities and the same idempotency-key-from-workflow-ID pattern applies.
- AWS Step Functions AI agent API key — the AWS equivalent of Durable Functions fan-out, with Map state replacing Task.all() and the same vault-key-via-ItemSelector pattern.
- AI agent idempotency — why orchestration-ID-based idempotency keys are essential when callActivityWithRetry re-executes failed activities that may have already charged Stripe.
- AI agent spend reporting — the four SQL queries that give per-orchestration cost visibility that Application Insights doesn't provide natively.