Multi-Agent Systems · API Key Governance

Five things multi-agent systems break when they share an API key

Keybrake · June 1, 2026 · 9 min read

Sharing one API key across CrewAI agents, LangGraph nodes, or AutoGen assistants looks fine in development. In production, it produces five distinct failure modes that are hard to debug precisely because the key itself gives you no visibility into which agent caused what.

This post names the five breakages, shows concrete examples in LangGraph, CrewAI, and AutoGen, and walks through the per-agent vault key pattern that closes all of them without changing how you write your agents.

The pattern that feels fine until it isn't

Multi-agent frameworks encourage a clean environment variable convention: set STRIPE_SECRET_KEY once, import it in every agent that needs Stripe, and let each agent construct its own client. The pattern is idiomatic in all three major frameworks:

# LangGraph — two nodes, one key
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]   # module-level, shared by all nodes

@graph.node
def billing_node(state):
    return stripe.Charge.create(amount=state["amount"], ...)

@graph.node
def refund_node(state):
    return stripe.Refund.create(charge=state["charge_id"], ...)


# CrewAI — two agents, one key
stripe_key = os.environ["STRIPE_SECRET_KEY"]   # passed to both agent tools

class ChargeCustomerTool(BaseTool):
    def _run(self, amount: int, customer_id: str):
        stripe.api_key = stripe_key
        return stripe.Charge.create(...)

class IssueRefundTool(BaseTool):
    def _run(self, charge_id: str):
        stripe.api_key = stripe_key
        return stripe.Refund.create(...)


# AutoGen — two assistants, one key
def stripe_tool(action: str, **kwargs):
    stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
    if action == "charge":
        return stripe.Charge.create(**kwargs)
    if action == "refund":
        return stripe.Refund.create(**kwargs)

All three patterns are functionally correct. The agents call the right Stripe endpoints, the responses come back as expected, and tests pass cleanly. The problem isn't in development — it's in the five things that break when you run this in production with real money and concurrent agents.

Breakage 1: Attribution collapse

What breaks

You can't tell which agent spent what

Every charge, refund, and SMS appears under the same API key in the vendor dashboard. The Stripe event log shows Request-Id values but no agent context. When $2,400 processes in a Tuesday afternoon run, you cannot reconstruct which agent triggered it, which user interaction drove that agent, or whether it was intentional throughput or a runaway loop.

Stripe's event log, Twilio's usage records, and Resend's activity feed are all keyed by API key, not by caller. When one key serves three agents, all three look identical at the vendor level. You can add metadata to individual Stripe calls — stripe.Charge.create(metadata={"agent": "billing_agent"}) — but this requires every tool, in every framework, to consistently pass the right metadata. One tool that forgets produces a gap. One new agent that skips metadata re-opens the attribution problem.

The deeper issue: metadata is under your control during development, but attribution is a governance property. It should be enforced at the infrastructure layer, not remembered by each tool author. When you have ten agents across two frameworks maintained by three engineers, "remember to add metadata" is not a policy — it's a hope.

Breakage 2: Rate-limit contention

What breaks

One agent's burst kills the quota for all agents

Stripe's default rate limit is 100 read requests/second and 100 write requests/second per key. When your LangGraph graph has a high-throughput data-enrichment node running concurrent calls on the same key as a real-time billing node, the enrichment burst triggers 429s that propagate to the billing node — which had nothing to do with the burst. The billing node fails for a reason caused by a different part of the graph.

Rate limits in Stripe, Twilio, and Resend are enforced per API key. Multi-agent systems often have agents with very different call patterns: a research agent that bursts (many short calls in parallel) and a billing agent that is steady-state (one call per user action). When both share a key, the research agent's burst consumes the rate limit budget that the billing agent needs. The billing agent gets 429s. Your retry logic kicks in, which consumes more of the shared budget, which makes the contention worse.

In a CrewAI crew with hierarchical mode, the orchestrating agent dispatching concurrent sub-tasks to worker agents makes this worse: sub-tasks often hit the same Stripe endpoint in parallel, and the orchestrator has no visibility into the accumulated request rate because each agent operates independently.

Breakage 3: Blast radius on compromise

What breaks

One leaked key forces a full rotation that kills all agents

Your AutoGen assistant logs a debug trace to your observability platform. The trace includes the full tool call context, which includes the Stripe key. The key appears in a Datadog dashboard that an intern has access to. The correct response is to rotate the key immediately. Rotating the key kills every agent, script, and integration that uses it — including the low-risk read-only analytics agent that had nothing to do with the exposure.

Shared keys are a risk-aggregation problem. Agents in a multi-agent system typically have different risk profiles: some agents need write access to sensitive endpoints (charge creation, subscription management), others only need read access to low-sensitivity data (customer lookup, invoice list). When all agents share one key, the key must be as permissive as the most permissive agent. When that key is compromised, the blast radius includes all agents, not just the one that was exposed.

Stripe's answer to this is Restricted Keys — you can issue a key scoped to specific endpoints and permissions. But one Restricted Key per agent requires either issuing multiple Stripe keys (which gets unwieldy past five agents) or using a proxy that issues scoped derived keys without requiring a Stripe key rotation per agent. For a detailed look at what Restricted Keys cover and where they fall short, see Why your Stripe Restricted Key probably isn't restricted enough.

Breakage 4: Scope too broad for every agent

What breaks

The least-privileged agent gets the most-privileged key

Your LangGraph graph has three nodes: a customer lookup node (Customers: Read), a subscription status node (Subscriptions: Read), and a billing node (Charges: Write, Subscriptions: Write). A single Restricted Key serving all three must grant Charges: Write and Subscriptions: Write — because the billing node needs them. The lookup node and status node now operate with write access they should never have. If either is compromised via prompt injection or a logic bug, the damage is not limited to reads.

The principle of least privilege says each agent should only have the permissions it needs. In a multi-agent system, different agents need different permissions by design — that's often the whole point of decomposing a complex workflow into specialized agents. But if the key is shared, the key must be permissive enough for the most privileged agent. The less-privileged agents are silently over-provisioned.

This is not a theoretical concern. Prompt injection attacks against agents specifically try to get agents to use their existing tool access for unintended purposes. An agent that should only read customer records but has a key that also allows charge creation is a higher-value target than one that genuinely cannot create charges. The attack surface is determined by the key, not by the tool definition — a malicious prompt can instruct the agent to call Stripe directly, bypassing the tool schema.

Breakage 5: Audit log collapse

What breaks

One event stream, no per-agent reconstruction

Your multi-agent billing system processes 800 API calls in a day. Something went wrong — a customer was double-charged. The Stripe event log shows the call timeline, but all 800 calls are attributed to the same key. To reconstruct which agent made which call, you need to cross-reference your application logs, match request IDs, correlate timestamps, and join across multiple log sources. If any log source dropped an event, the reconstruction is incomplete. If agents ran concurrently, interleaved timestamps make sequencing ambiguous.

Auditability is not just a security concern — it's an operational necessity in any system that moves money. When a customer disputes a charge, the question is "which agent did this and why." When an agent loop runs, the question is "how many times did this call get made." Shared keys produce a single flat event stream at the vendor level. Without a proxy layer that records agent identity per call at request time, post-incident reconstruction is manual and incomplete.

The metadata workaround (passing metadata={"agent_id": "billing_v2"} in each Stripe call) gets you partway there, but it only works for Stripe, it requires every tool to implement it consistently, it doesn't cover HTTP-level metadata like the calling process or vault key, and it's post-hoc — the metadata is in the charge record, but there's no unified audit table you can SELECT agent_id, SUM(amount) FROM audit_log GROUP BY agent_id across all vendors.

The per-agent vault key pattern

The fix is to issue one key per agent — not one Stripe key per agent, but one derived proxy key per agent that maps to the real vendor key at the proxy layer. This is the vault key pattern: each agent gets a vault_key_xxx with a policy (allowed endpoints, daily spend cap, expiry), and the proxy holds the real Stripe key. No agent ever sees the real key.

The change is minimal in all three frameworks. The only difference is the base URL and the key value:

# LangGraph — one vault key per node
import stripe

billing_client = stripe.StripeClient(
    api_key=os.environ["VAULT_KEY_BILLING"],
    base_url="https://proxy.keybrake.com/stripe/v1/"
)

refund_client = stripe.StripeClient(
    api_key=os.environ["VAULT_KEY_REFUNDS"],
    base_url="https://proxy.keybrake.com/stripe/v1/"
)

@graph.node
def billing_node(state):
    return billing_client.charges.create(amount=state["amount"], ...)

@graph.node
def refund_node(state):
    return refund_client.refunds.create(charge=state["charge_id"], ...)

Each vault key has been issued with a policy that matches the agent's actual needs:

# VAULT_KEY_BILLING policy
{
  "vendor": "stripe",
  "allowed_endpoints": ["POST /v1/charges", "POST /v1/payment_intents"],
  "daily_usd_cap": 5000,
  "expires_in": "8h"
}

# VAULT_KEY_REFUNDS policy
{
  "vendor": "stripe",
  "allowed_endpoints": ["POST /v1/refunds"],
  "daily_usd_cap": 500,
  "expires_in": "4h"
}

The same pattern applies to CrewAI. Each agent role gets its own vault key, passed to its tool set at initialization:

# CrewAI — vault key per agent role
from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
import stripe

def make_stripe_client(vault_key: str) -> stripe.StripeClient:
    return stripe.StripeClient(
        api_key=vault_key,
        base_url="https://proxy.keybrake.com/stripe/v1/"
    )

class ChargeCustomerTool(BaseTool):
    name = "charge_customer"
    description = "Create a charge for a customer"
    client: stripe.StripeClient

    def _run(self, amount: int, customer_id: str) -> dict:
        return self.client.charges.create(
            params={"amount": amount, "customer": customer_id, "currency": "usd"}
        )

billing_agent = Agent(
    role="Billing Specialist",
    tools=[ChargeCustomerTool(client=make_stripe_client(os.environ["VAULT_KEY_BILLING"]))]
)

refund_agent = Agent(
    role="Refund Specialist",
    tools=[IssueRefundTool(client=make_stripe_client(os.environ["VAULT_KEY_REFUNDS"]))]
)

For AutoGen, vault keys slot into the function tool initialization, with each assistant receiving only the client scoped to its role:

# AutoGen — vault key per assistant
import autogen
import stripe

def make_billing_tool(vault_key: str):
    client = stripe.StripeClient(
        api_key=vault_key,
        base_url="https://proxy.keybrake.com/stripe/v1/"
    )
    def charge_customer(amount: int, customer_id: str) -> dict:
        return client.charges.create(
            params={"amount": amount, "customer": customer_id, "currency": "usd"}
        )
    return charge_customer

def make_refund_tool(vault_key: str):
    client = stripe.StripeClient(
        api_key=vault_key,
        base_url="https://proxy.keybrake.com/stripe/v1/"
    )
    def issue_refund(charge_id: str) -> dict:
        return client.refunds.create(params={"charge": charge_id})
    return issue_refund

billing_assistant = autogen.AssistantAgent(
    name="billing_assistant",
    llm_config={
        "tools": [{"function": make_billing_tool(os.environ["VAULT_KEY_BILLING"])}]
    }
)

refund_assistant = autogen.AssistantAgent(
    name="refund_assistant",
    llm_config={
        "tools": [{"function": make_refund_tool(os.environ["VAULT_KEY_REFUNDS"])}]
    }
)

How the five breakages map to the vault key pattern

Breakage	What the vault key provides
Attribution collapse	Each vault key is unique per agent — the proxy audit log records `vault_key_id` on every call. `GROUP BY vault_key_id` reconstructs per-agent spend without relying on metadata consistency.
Rate-limit contention	Each vault key has its own daily request budget. A burst on `VAULT_KEY_BILLING` does not consume quota from `VAULT_KEY_REFUNDS`. The proxy enforces budget before forwarding — one agent can't starve another.
Blast radius on compromise	Revoking `VAULT_KEY_REFUNDS` kills only the refund agent. The real Stripe key is never rotated. Other agents continue working.
Scope too broad	The `allowed_endpoints` policy on each vault key is enforced at the proxy. The refund agent cannot create charges even if a prompt injection attempts it — the call is rejected before reaching Stripe.
Audit log collapse	The proxy records every call: `vault_key_id`, `endpoint`, `amount_usd`, `vendor_request_id`, `ts`. One table, all vendors, per-agent granularity. No cross-log correlation required.

The migration is one environment variable per agent

The only code change in your agent is the API key value and the base URL. Your tool definitions, graph topology, crew configuration, and agent prompts are untouched. The Stripe Python SDK, Twilio Helper Library, and Resend SDK all support base URL overrides — the proxy is transparent to the SDK beyond the URL.

The operational change is how you manage keys: instead of one STRIPE_SECRET_KEY in your environment, you have one vault key per agent role, each with an explicit policy. Vault keys are short-lived (hours, not months), automatically expired, and revocable per-agent. The real Stripe key stays in the proxy's secure store and is never exported to any agent runtime.

For more on why even Stripe's own Restricted Keys don't fully solve the scope problem, see Why your Stripe Restricted Key probably isn't restricted enough. For the CrewAI-specific key management patterns, see CrewAI API key management. For the AutoGen patterns, see AutoGen agent API key setup.

Per-agent vault keys for Stripe, Twilio, and Resend

Keybrake issues scoped vault keys with endpoint allowlists, per-day spend caps, and a per-call audit log — one key per agent, no Stripe key rotation required. Join the waitlist to try it on your multi-agent system.