Agent Governance

OpenAI Swarm Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 15, 2026 · 9 min read

OpenAI Swarm — the lightweight experimental multi-agent framework released alongside the Swarm research paper — makes multi-agent Stripe integration straightforward to prototype. The risk surfaces in three specific places: context_variables propagates the bare Stripe API key to every agent in the handoff chain regardless of role; tool exceptions trigger an LLM retry cycle that re-calls your Stripe tool with no idempotency key; and max_turns permits multiple billing iterations in a single run() invocation with no spend-cap enforcement between iterations.

This post covers all three failure modes specific to OpenAI Swarm and the two-layer governance pattern that closes each one: a restricted Stripe API key as a first layer, and per-role vault keys via a spend-cap proxy as a second.

The standard OpenAI Swarm Stripe pattern

The standard pattern for a Stripe-capable Swarm agent defines a tool function that calls Stripe, puts the API key in context_variables for sharing across agents, and runs the swarm with client.run(). The tool function receives context_variables as its first argument, pulls the key, and executes the charge:

from swarm import Swarm, Agent
import stripe

client = Swarm()

def charge_stripe(
    context_variables: dict,
    customer_id: str,
    amount_cents: int,
    billing_period: str,
) -> str:
    stripe.api_key = context_variables["stripe_key"]  # sk_live_...

    charge = stripe.Charge.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        description=f"Subscription {billing_period}",
        # No idempotency_key
    )
    return f"Charge created: {charge.id} status={charge.status}"


billing_agent = Agent(
    name="billing",
    instructions=(
        "You handle subscription billing. "
        "Use charge_stripe to create Stripe charges."
    ),
    functions=[charge_stripe],
)

result = client.run(
    agent=billing_agent,
    messages=[{"role": "user", "content": "Charge customer cus_Abc123 $29 for the June plan"}],
    context_variables={"stripe_key": "sk_live_xxx"},
)

print(result.messages[-1]["content"])

This works correctly in the happy path. Three distinct failure modes emerge when you introduce agent handoffs, transient tool errors, or multi-step billing workflows with a non-trivial max_turns budget.

Failure mode 1: `context_variables` propagates the bare Stripe key across every handoff

Swarm passes context_variables to every agent invoked during a run, including agents that receive control via a handoff function. A common design puts "shared configuration" in context_variables — including the Stripe key — so the billing agent, the refund agent, and the customer service agent can all reach Stripe. The consequence is that every agent in the handoff chain has the same full-permission bare key, regardless of its intended role:

from swarm import Swarm, Agent

client = Swarm()

def charge_stripe(context_variables, customer_id: str,
                  amount_cents: int, billing_period: str) -> str:
    import stripe
    stripe.api_key = context_variables["stripe_key"]  # Full sk_live_ key
    charge = stripe.Charge.create(
        amount=amount_cents, currency="usd",
        customer=customer_id,
        description=f"Subscription {billing_period}",
    )
    return f"Charged: {charge.id}"

def lookup_charge(context_variables, charge_id: str) -> str:
    import stripe
    # This is a read-only lookup — but stripe_key in context_variables
    # is the same sk_live_ key as the billing agent uses.
    stripe.api_key = context_variables["stripe_key"]
    charge = stripe.Charge.retrieve(charge_id)
    return f"Status: {charge.status}, Amount: {charge.amount}"

def handoff_to_support():
    """Transfer to the customer support agent."""
    return support_agent  # support_agent also receives context_variables["stripe_key"]

billing_agent = Agent(
    name="billing",
    instructions="Handle subscription billing.",
    functions=[charge_stripe, handoff_to_support],
)

support_agent = Agent(
    name="support",
    instructions=(
        "Handle customer service requests. "
        "Use lookup_charge to check charge status."
    ),
    functions=[lookup_charge],
    # Has access to context_variables["stripe_key"] through Swarm's handoff mechanism.
    # If the LLM calls stripe.Charge.create() directly or a prompt injection
    # tricks it into calling charge_stripe from billing_agent's context,
    # the full billing key is already in scope.
)

result = client.run(
    agent=billing_agent,
    messages=[{"role": "user", "content": "Charge cus_Abc123 $29 for June, then transfer to support."}],
    context_variables={"stripe_key": "sk_live_xxx"},
)

What breaks: When handoff_to_support() transfers control, Swarm passes the full context_variables dict — including stripe_key: sk_live_xxx — to support_agent. The support agent has no business creating Stripe charges, but if a follow-up prompt is ambiguous ("sort out that charge"), an adversarial prompt injection reaches it, or a future developer adds a charge_stripe import, the full billing key is already available. There is no key rotation, no scope narrowing, and no audit differentiation between what the billing agent did and what the support agent did — both use the same Stripe key in your audit log.

The fix: never put the bare Stripe key in context_variables. Instead, issue per-role vault keys at the proxy layer and inject them via factory closures. The billing agent's tool factory gets a vault key scoped to POST /v1/charges; the support agent's tool factory gets a vault key scoped to GET /v1/charges only. If the support agent somehow calls a billing tool using its audit key, the proxy rejects the POST with 403:

import hashlib, os
import stripe
from swarm import Swarm, Agent

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]  # POST /v1/charges only
AUDIT_KEY   = os.environ["KEYBRAKE_VAULT_KEY_AUDIT"]    # GET /v1/charges only


def make_billing_tools():
    billing_client = stripe.StripeClient(
        api_key=BILLING_KEY,
        base_url=PROXY_URL + "/stripe/",
    )

    def charge_stripe(
        context_variables: dict,
        customer_id: str,
        amount_cents: int,
        billing_period: str,
    ) -> str:
        idempotency_key = hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

        try:
            charge = billing_client.charges.create(params={
                "amount":          int(amount_cents),
                "currency":        "usd",
                "customer":        customer_id,
                "description":     f"Subscription {billing_period}",
                "idempotency_key": idempotency_key,
            })
            return f"Charged: {charge.id} status={charge.status}"
        except stripe.StripeError as e:
            return f"Stripe error: {e}"  # Return, not raise — prevents LLM retry loop

    return [charge_stripe]


def make_support_tools():
    audit_client = stripe.StripeClient(
        api_key=AUDIT_KEY,
        base_url=PROXY_URL + "/stripe/",
    )

    def lookup_charge(context_variables: dict, charge_id: str) -> str:
        try:
            charge = audit_client.charges.retrieve(charge_id)
            return f"Status: {charge.status}, Amount: {charge.amount}"
        except stripe.StripeError as e:
            return f"Stripe error: {e}"

    return [lookup_charge]


client = Swarm()

billing_agent = Agent(
    name="billing",
    instructions="Handle subscription billing using charge_stripe.",
    functions=make_billing_tools(),
)

support_agent = Agent(
    name="support",
    instructions="Handle customer service using lookup_charge for charge status.",
    functions=make_support_tools(),
    # No stripe_key in context_variables — audit vault key is closed over in make_support_tools()
)

What this fixes: Vault keys are bound at tool-factory time via closure — they never appear in context_variables, so Swarm's handoff mechanism cannot leak them. The billing agent's charge_stripe closure holds the billing vault key (POST /v1/charges only). The support agent's lookup_charge closure holds the audit vault key (GET /v1/charges only). If the support agent somehow attempts a POST /v1/charges call, the proxy returns 403. The spend-cap proxy logs every tool call under its respective vault key, giving you per-agent audit differentiation in one place.

Failure mode 2: Tool exception triggers LLM retry without idempotency key

Swarm's run() loop processes tool calls one at a time and sends each result back to the LLM as a tool message. When a tool function raises an uncaught exception, Swarm catches it and returns the exception string as the tool result — the LLM sees something like "Error: APIConnectionError: Connection reset by peer". The LLM, wanting to complete the billing task, calls the tool again. The original Stripe charge may have already completed before the network error occurred. Second call with no idempotency key = second charge:

import stripe
from swarm import Swarm, Agent

client = Swarm()

def charge_stripe(context_variables, customer_id: str,
                  amount_cents: int, billing_period: str) -> str:
    stripe.api_key = context_variables["stripe_key"]

    # stripe.Charge.create() sends the charge to Stripe.
    # If Stripe accepts the charge but the response takes too long
    # (e.g. POST /v1/charges times out on the client side), stripe-python
    # raises APIConnectionError. The charge already exists in Stripe.
    charge = stripe.Charge.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        description=f"Subscription {billing_period}",
        # No idempotency_key
    )
    return f"Charged: {charge.id}"

    # If the above raises:
    #   stripe.error.APIConnectionError: Connection reset by peer
    # Swarm sends:
    #   {"role": "tool", "content": "Error: APIConnectionError: Connection reset by peer"}
    # The LLM calls charge_stripe again.
    # stripe.Charge.create() fires again — second charge in Stripe, no deduplication.


billing_agent = Agent(
    name="billing",
    instructions="Handle billing. Retry if the tool returns an error.",
    functions=[charge_stripe],
)

result = client.run(
    agent=billing_agent,
    messages=[{"role": "user", "content": "Charge cus_Abc123 $29 for June"}],
    context_variables={"stripe_key": "sk_live_xxx"},
    max_turns=5,  # LLM will retry up to 5 times on tool errors
)

What breaks: stripe.error.APIConnectionError means the connection to Stripe's API dropped — but the charge may have already been created on Stripe's end before the network failure. Without an idempotency key, the LLM's second call to charge_stripe creates a completely new Stripe charge for the same customer, amount, and billing period. The Stripe Dashboard shows two distinct charges with two distinct charge_id values. The customer sees two entries on their credit card statement. The Swarm run log shows two successful tool results with different charge IDs, making it appear that billing ran twice by intent rather than by error.

Two changes close this: compute a content-hash idempotency key before the Stripe call (so all retries of the same billing operation resolve to the same charge), and return the StripeError as a string instead of re-raising (so Swarm sends a final error to the LLM rather than triggering another retry turn):

import hashlib, os
import stripe
from swarm import Swarm, Agent

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]


def make_billing_tools():
    billing_client = stripe.StripeClient(
        api_key=BILLING_KEY,
        base_url=PROXY_URL + "/stripe/",
    )

    def charge_stripe(
        context_variables: dict,
        customer_id: str,
        amount_cents: int,
        billing_period: str,
    ) -> str:
        idempotency_key = hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

        try:
            charge = billing_client.charges.create(params={
                "amount":          int(amount_cents),
                "currency":        "usd",
                "customer":        customer_id,
                "description":     f"Subscription {billing_period}",
                "idempotency_key": idempotency_key,
            })
            return f"Charged: {charge.id} status={charge.status} idem={idempotency_key}"
        except stripe.StripeError as e:
            # Return the error as a string — Swarm sends this to the LLM as a tool result.
            # The LLM can escalate or report the failure without triggering another retry.
            # Do NOT re-raise — re-raising causes Swarm to retry the tool call.
            return f"Stripe error (not retried): {e} idempotency_key={idempotency_key}"

    return [charge_stripe]


client = Swarm()
billing_agent = Agent(
    name="billing",
    instructions=(
        "Handle subscription billing using charge_stripe. "
        "If charge_stripe returns a Stripe error, report it to the user — do not retry."
    ),
    functions=make_billing_tools(),
)

result = client.run(
    agent=billing_agent,
    messages=[{"role": "user", "content": "Charge cus_Abc123 $29 for June"}],
    context_variables={},
)

What this fixes: The idempotency key is derived from (customer_id, amount_cents, billing_period) — the same parameters the LLM passes on every call attempt for the same billing operation. Whether charge_stripe runs once, twice, or five times for the June invoice of cus_Abc123, Stripe deduplicates all requests with the same key and returns the original charge. Returning the StripeError as a string instead of re-raising means Swarm presents it to the LLM as a final result; the instruction "do not retry" gives the LLM clear guidance to escalate rather than loop. The vault key routes through the spend-cap proxy, writing a deduplicated audit log entry.

Failure mode 3: `max_turns` permits multiple billing iterations per run

Swarm's run() function continues processing tool calls and LLM responses until the LLM stops requesting tools or max_turns is reached. The default max_turns is effectively unbounded (set to float("inf") in the reference implementation). In a multi-step billing workflow — validate customer, check existing charges, create the charge, send a receipt, update CRM — the LLM may call charge_stripe at multiple points if intermediate steps return ambiguous results or if the agent instruction includes conditional retry logic:

from swarm import Swarm, Agent
import stripe

client = Swarm()

def check_existing_charge(context_variables, customer_id: str, billing_period: str) -> str:
    stripe.api_key = context_variables["stripe_key"]
    charges = stripe.Charge.list(customer=customer_id, limit=5)
    for ch in charges.data:
        if billing_period in (ch.description or ""):
            return f"Existing charge found: {ch.id} status={ch.status}"
    return "No existing charge found for this period."

def charge_stripe(context_variables, customer_id: str,
                  amount_cents: int, billing_period: str) -> str:
    stripe.api_key = context_variables["stripe_key"]
    charge = stripe.Charge.create(
        amount=amount_cents, currency="usd", customer=customer_id,
        description=f"Subscription {billing_period}",
        # No idempotency_key
    )
    return f"Charged: {charge.id}"

billing_agent = Agent(
    name="billing",
    instructions=(
        "Handle subscription billing. "
        "Always check for existing charges first. "
        "If the charge status is 'pending' or unclear, create a new charge."
    ),
    # "Pending or unclear" instructs the LLM to call charge_stripe again
    # if the prior call returned an indeterminate result. Combined with a
    # high max_turns, multiple Stripe calls can fire in one run().
    functions=[check_existing_charge, charge_stripe],
)

result = client.run(
    agent=billing_agent,
    messages=[{"role": "user", "content": "Make sure cus_Abc123 is billed $29 for June."}],
    context_variables={"stripe_key": "sk_live_xxx"},
    max_turns=10,  # 10 turns allows multiple charge_stripe calls
)

What breaks: The agent instruction "if the charge status is 'pending' or unclear, create a new charge" combined with a max_turns=10 budget can lead the LLM to call charge_stripe multiple times in one run(). A stripe.Charge with status pending is a normal in-flight charge — charging again produces a duplicate. Without a daily spend cap enforced at the proxy layer, nothing between the LLM and Stripe prevents charge_stripe from being called 3 or 4 times in a single run before the LLM exhausts its turn budget. Each call creates an independent Stripe charge with no deduplication.

The proxy spend-cap provides a backstop: set a daily USD cap per vault key that matches the maximum expected single-customer charge for the billing period. Even if the LLM calls charge_stripe ten times, the proxy rejects all calls after the cap is hit. The idempotency key guarantees that any legitimate retry of the same billing operation collapses to one Stripe charge regardless of how many tool calls the LLM makes:

import hashlib, os
import stripe
from swarm import Swarm, Agent

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]  # daily_usd_cap=30 in Keybrake policy
AUDIT_KEY   = os.environ["KEYBRAKE_VAULT_KEY_AUDIT"]


def make_billing_tools():
    billing_client = stripe.StripeClient(api_key=BILLING_KEY, base_url=PROXY_URL + "/stripe/")
    audit_client   = stripe.StripeClient(api_key=AUDIT_KEY,   base_url=PROXY_URL + "/stripe/")

    def check_existing_charge(
        context_variables: dict, customer_id: str, billing_period: str
    ) -> str:
        try:
            charges = audit_client.charges.list(params={"customer": customer_id, "limit": 5})
            for ch in charges.data:
                if billing_period in (ch.description or ""):
                    return f"Existing charge: {ch.id} status={ch.status}"
            return "No existing charge for this period."
        except stripe.StripeError as e:
            return f"Lookup error: {e}"

    def charge_stripe(
        context_variables: dict,
        customer_id: str,
        amount_cents: int,
        billing_period: str,
    ) -> str:
        idempotency_key = hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

        try:
            charge = billing_client.charges.create(params={
                "amount":          int(amount_cents),
                "currency":        "usd",
                "customer":        customer_id,
                "description":     f"Subscription {billing_period}",
                "idempotency_key": idempotency_key,
            })
            return f"Charged: {charge.id} status={charge.status}"
        except stripe.StripeError as e:
            return f"Stripe error: {e}"

    return [check_existing_charge, charge_stripe]


client = Swarm()
billing_agent = Agent(
    name="billing",
    instructions=(
        "Handle subscription billing. "
        "Check for existing charges first. "
        "If a charge already exists for the billing period, report it — do not charge again."
    ),
    functions=make_billing_tools(),
)

result = client.run(
    agent=billing_agent,
    messages=[{"role": "user", "content": "Make sure cus_Abc123 is billed $29 for June."}],
    context_variables={},
    max_turns=10,
)

What this fixes: Three controls work together. First, check_existing_charge uses the audit vault key (GET /v1/charges only) — the LLM can look up prior charges without creating new ones, and the updated instruction to report rather than re-charge prevents the "pending = charge again" loop. Second, the content-hash idempotency key means that even if charge_stripe is called multiple times for the same (customer_id, amount_cents, billing_period) tuple, Stripe deduplicates them to one charge. Third, the billing vault key's daily spend cap (30 USD) is enforced at the proxy layer — after one successful $29 charge, the cap is hit and all subsequent POST /v1/charges attempts are rejected with a 402 error that the LLM sees as a final tool result. No amount of max_turns can spend beyond the policy limit.

One-line proxy override

The Keybrake proxy is compatible with the Stripe Python SDK's StripeClient interface. Switching a Swarm tool function from direct Stripe calls to the proxy requires changing how the client is initialized — the tool function body stays the same:

# Before — direct to Stripe with module-level api_key
import stripe
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
charge = stripe.Charge.create(amount=2900, currency="usd", customer="cus_Abc123")

# After — routes through Keybrake proxy, enforces spend cap, writes audit log
from stripe import StripeClient
stripe_client = StripeClient(
    api_key=os.environ["KEYBRAKE_VAULT_KEY"],
    base_url="https://proxy.keybrake.com/stripe/",
)
charge = stripe_client.charges.create(
    params={"amount": 2900, "currency": "usd", "customer": "cus_Abc123"}
)

No changes to the Swarm agent definition, the client.run() call, or the function signatures are required. Only the Stripe client initialization inside the tool factory changes.

Comparison: raw key vs restricted key vs vault key

Property	Raw `sk_live_` key	Restricted Stripe key	Vault key (Keybrake proxy)
Endpoint allowlist	No — full API access	Partial — Stripe-enforced resource set	Yes — per-key policy, any Stripe endpoint
Daily spend cap	No	No	Yes — proxy enforces USD cap per vault key
Per-agent isolation	No — all agents share one key	No — all agents share one restricted key	Yes — billing agent vs support agent get different vault keys
Handoff key leak	Leaks via `context_variables`	Leaks via `context_variables`	Never in `context_variables` — closed over in tool factory
Idempotency key guard	Only if you add it manually	Only if you add it manually	Only if you add it manually (idempotency at Stripe layer)
Audit log	Stripe Dashboard only	Stripe Dashboard only	Proxy audit table with vault key, agent name, timestamp, amount
Kill switch	Rotate secret in all agents	Revoke in Stripe Dashboard	One-click vault key revoke in Keybrake Dashboard

pytest enforcement suite

These five tests verify idempotency, vault key isolation, retry safety, and spend-cap rejection against the proxy:

import hashlib, pytest
from unittest.mock import MagicMock, patch
from swarm import Swarm, Agent

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = "vault_billing_test"
AUDIT_KEY   = "vault_audit_test"


def _idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
    return hashlib.sha256(
        f"{customer_id}:{amount_cents}:{billing_period}".encode()
    ).hexdigest()[:32]


def test_idempotency_key_is_deterministic():
    """Same billing params must produce the same idempotency key across all retries."""
    k1 = _idempotency_key("cus_abc", 2900, "2026-06")
    k2 = _idempotency_key("cus_abc", 2900, "2026-06")
    assert k1 == k2

def test_different_periods_produce_different_keys():
    """Different billing periods must not share an idempotency key."""
    k_june = _idempotency_key("cus_abc", 2900, "2026-06")
    k_july = _idempotency_key("cus_abc", 2900, "2026-07")
    assert k_june != k_july

def test_stripe_error_returned_not_raised():
    """Tool function must return StripeError as string, not re-raise it."""
    import stripe
    from stripe import StripeClient

    mock_client = MagicMock(spec=StripeClient)
    mock_client.charges.create.side_effect = stripe.APIConnectionError("timeout")

    def charge_stripe(context_variables, customer_id, amount_cents, billing_period):
        idem_key = _idempotency_key(customer_id, amount_cents, billing_period)
        try:
            charge = mock_client.charges.create(params={
                "amount": int(amount_cents), "currency": "usd",
                "customer": customer_id,
                "description": f"Subscription {billing_period}",
                "idempotency_key": idem_key,
            })
            return f"Charged: {charge.id}"
        except stripe.StripeError as e:
            return f"Stripe error: {e}"  # Must return, not raise

    result = charge_stripe({}, "cus_abc", 2900, "2026-06")
    assert result.startswith("Stripe error:")
    # If this raised, Swarm would retry the tool — that would cause duplicate charges

def test_billing_key_cannot_list_charges():
    """Billing vault key must only allow POST /v1/charges — read operations rejected."""
    import stripe
    from stripe import StripeClient

    mock_billing_client = MagicMock(spec=StripeClient)
    mock_billing_client.charges.list.side_effect = stripe.PermissionError(
        "403: vault key policy denies GET /v1/charges"
    )

    def check_existing_charge_with_billing_key(context_variables, customer_id, billing_period):
        try:
            charges = mock_billing_client.charges.list(params={"customer": customer_id, "limit": 5})
            return "found" if charges.data else "none"
        except stripe.StripeError as e:
            return f"error: {e}"

    result = check_existing_charge_with_billing_key({}, "cus_abc", "2026-06")
    assert "error" in result  # Billing key must not allow list — use audit key

def test_spend_cap_rejection_stops_loop():
    """After spend cap is hit, proxy returns error that the tool function must surface cleanly."""
    import stripe
    from stripe import StripeClient

    call_count = 0

    mock_capped_client = MagicMock(spec=StripeClient)
    def cap_side_effect(**kwargs):
        nonlocal call_count
        call_count += 1
        if call_count > 1:
            raise stripe.StripeError("402: daily spend cap exceeded for vault key")
        charge = MagicMock()
        charge.id = "ch_test_001"
        charge.status = "succeeded"
        return charge

    mock_capped_client.charges.create.side_effect = cap_side_effect

    def charge_stripe(context_variables, customer_id, amount_cents, billing_period):
        idem_key = _idempotency_key(customer_id, amount_cents, billing_period)
        try:
            charge = mock_capped_client.charges.create(params={
                "amount": int(amount_cents), "currency": "usd",
                "customer": customer_id,
                "description": f"Subscription {billing_period}",
                "idempotency_key": idem_key,
            })
            return f"Charged: {charge.id}"
        except stripe.StripeError as e:
            return f"Stripe error: {e}"

    r1 = charge_stripe({}, "cus_abc", 2900, "2026-06")
    r2 = charge_stripe({}, "cus_abc", 2900, "2026-06")  # Would be capped at proxy
    assert "ch_test_001" in r1
    assert "spend cap exceeded" in r2

Gap analysis

Five gaps remain after applying idempotency keys and vault key isolation in Swarm:

Parallel tool calls in one LLM turn. The underlying Chat Completions API can return multiple tool call objects in a single response. Swarm's reference implementation processes them sequentially, but custom Swarm wrappers or modifications that process tool_calls in parallel can fire two charge_stripe calls simultaneously. The idempotency key collapses identical calls, but different amounts or different periods in the same LLM response produce separate charges that are both valid. Verify that your Swarm run loop processes billing tool calls sequentially.
Agent instruction prompt injection via context_variables. If any context_variables value is derived from user input (customer name, billing note, description), a malicious input can inject instructions into the agent's context. Combined with a billing tool, this can cause the LLM to call charge_stripe with attacker-controlled parameters. Validate and sanitize all context_variables values that originate from user input before the client.run() call.
Handoff function returning a string (non-Agent) as a fallback. Swarm handoff functions can return either an Agent object (transfers control) or a string (stays with current agent). A handoff function that returns a fallback string on error means the billing agent retains control instead of transferring to the support agent. If the billing agent's subsequent instruction includes retry logic, it may call charge_stripe again. Test handoff functions under error conditions to verify the correct agent receives control.
Streaming mode and partial tool call results. Swarm supports a stream=True mode that yields chunks as the LLM generates them. If streaming is interrupted mid-tool-call (client disconnect, timeout), the partial tool call may not be recorded in the message history. A subsequent client.run() call to resume the session won't know whether the prior tool call completed. The content-hash idempotency key is the only protection against this — without it, the resumed run may fire a second Stripe charge.
Swarm Result.context_variables update exposing keys. Swarm tool functions can return a Result object with updated context_variables. If a tool function returns a Result that includes a Stripe charge ID or other billing data in context_variables, that data propagates to all subsequent agents in the run. Ensure no tool function returns Stripe keys, tokens, or sensitive billing parameters via Result.context_variables.

FAQ

Does Swarm support concurrent agent runs?

The reference OpenAI Swarm implementation is single-threaded and processes one agent run at a time within a single client.run() call. However, if your application calls client.run() concurrently from multiple threads or async tasks (for different customers), each run operates independently with its own message history. Vault keys with per-key daily spend caps and per-key audit logs give you isolation between concurrent runs — a single bare Stripe key shared across concurrent client.run() calls cannot be isolated by customer or by billing period.

Is OpenAI Swarm production-ready?

OpenAI released Swarm as an educational and experimental framework, not as a production-ready library. For production multi-agent Stripe billing, consider using the governance patterns in this series with a production framework such as LangChain, CrewAI, or AutoGen, or with the OpenAI Assistants API directly. The failure modes documented here — idempotency key gaps, context_variables key leakage, retry cycle vulnerability — apply to any Chat Completions-based multi-agent system, not just Swarm.

What happens if the LLM calls `charge_stripe` with a different `billing_period` on retry?

The content-hash idempotency key is derived from (customer_id, amount_cents, billing_period). If the LLM passes a different billing_period value on retry — for example, "June 2026" on the first call and "2026-06" on the retry — the idempotency key changes and Stripe creates a new charge. Normalize the billing_period format before computing the key (e.g., always use YYYY-MM) and include validation in the tool function that rejects non-normalized period values.

How do I handle a Swarm agent that needs to charge multiple customers in one run?

If an agent charges multiple customers (e.g., a batch billing agent that iterates over a customer list), each charge must have a unique idempotency key. The content-hash approach handles this automatically — hashlib.sha256(f"{customer_id}:{amount_cents}:{billing_period}".encode()) produces a distinct key per customer. The daily spend cap on the billing vault key should be set to the expected total across all customers in the batch, not just one customer. For large batches, consider splitting across multiple vault keys with per-customer caps so a runaway loop on one customer can't exhaust the cap for others.

Can I use the module-level `stripe.api_key` pattern instead of `StripeClient`?

The module-level stripe.api_key = "vault_xxx" pattern works for single-agent, single-threaded runs. But it is process-global — if two Swarm runs execute in the same process (even sequentially), the last stripe.api_key assignment wins. For multi-agent or concurrent setups, use stripe.StripeClient(api_key=VAULT_KEY, base_url=PROXY_URL) to create an isolated client instance per tool factory. The StripeClient approach also makes the proxy base_url override straightforward without modifying global state.

How does the vault key's billing-period spend cap interact with Swarm's `max_turns`?

The vault key's daily USD cap and Swarm's max_turns are independent controls. max_turns limits the number of LLM-tool-call cycles in one client.run() invocation. The vault key's cap limits the total USD spent against Stripe in one rolling 24-hour window across all runs that use the same vault key. Both should be set: max_turns prevents runaway agentic loops at the Swarm layer; the spend cap prevents runaway billing at the proxy layer even if Swarm allows more turns than expected. The idempotency key prevents duplicate charges within the same billing operation regardless of how many turns occur.

Put the brakes on your Swarm agent's Stripe key

Keybrake issues vault keys your Swarm tool factories can close over — scoped to the exact Stripe endpoints each agent role needs, with daily USD caps that stop runaway billing before it starts. One proxy, per-agent audit logs, one-click revoke.