Agent Governance
OpenAI Swarm Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance
OpenAI Swarm — the lightweight experimental multi-agent framework released alongside the Swarm research paper — makes multi-agent Stripe integration straightforward to prototype. The risk surfaces in three specific places: context_variables propagates the bare Stripe API key to every agent in the handoff chain regardless of role; tool exceptions trigger an LLM retry cycle that re-calls your Stripe tool with no idempotency key; and max_turns permits multiple billing iterations in a single run() invocation with no spend-cap enforcement between iterations.
This post covers all three failure modes specific to OpenAI Swarm and the two-layer governance pattern that closes each one: a restricted Stripe API key as a first layer, and per-role vault keys via a spend-cap proxy as a second.
The standard OpenAI Swarm Stripe pattern
The standard pattern for a Stripe-capable Swarm agent defines a tool function that calls Stripe, puts the API key in context_variables for sharing across agents, and runs the swarm with client.run(). The tool function receives context_variables as its first argument, pulls the key, and executes the charge:
from swarm import Swarm, Agent
import stripe
client = Swarm()
def charge_stripe(
context_variables: dict,
customer_id: str,
amount_cents: int,
billing_period: str,
) -> str:
stripe.api_key = context_variables["stripe_key"] # sk_live_...
charge = stripe.Charge.create(
amount=amount_cents,
currency="usd",
customer=customer_id,
description=f"Subscription {billing_period}",
# No idempotency_key
)
return f"Charge created: {charge.id} status={charge.status}"
billing_agent = Agent(
name="billing",
instructions=(
"You handle subscription billing. "
"Use charge_stripe to create Stripe charges."
),
functions=[charge_stripe],
)
result = client.run(
agent=billing_agent,
messages=[{"role": "user", "content": "Charge customer cus_Abc123 $29 for the June plan"}],
context_variables={"stripe_key": "sk_live_xxx"},
)
print(result.messages[-1]["content"])
This works correctly in the happy path. Three distinct failure modes emerge when you introduce agent handoffs, transient tool errors, or multi-step billing workflows with a non-trivial max_turns budget.
Failure mode 1: context_variables propagates the bare Stripe key across every handoff
Swarm passes context_variables to every agent invoked during a run, including agents that receive control via a handoff function. A common design puts "shared configuration" in context_variables — including the Stripe key — so the billing agent, the refund agent, and the customer service agent can all reach Stripe. The consequence is that every agent in the handoff chain has the same full-permission bare key, regardless of its intended role:
from swarm import Swarm, Agent
client = Swarm()
def charge_stripe(context_variables, customer_id: str,
amount_cents: int, billing_period: str) -> str:
import stripe
stripe.api_key = context_variables["stripe_key"] # Full sk_live_ key
charge = stripe.Charge.create(
amount=amount_cents, currency="usd",
customer=customer_id,
description=f"Subscription {billing_period}",
)
return f"Charged: {charge.id}"
def lookup_charge(context_variables, charge_id: str) -> str:
import stripe
# This is a read-only lookup — but stripe_key in context_variables
# is the same sk_live_ key as the billing agent uses.
stripe.api_key = context_variables["stripe_key"]
charge = stripe.Charge.retrieve(charge_id)
return f"Status: {charge.status}, Amount: {charge.amount}"
def handoff_to_support():
"""Transfer to the customer support agent."""
return support_agent # support_agent also receives context_variables["stripe_key"]
billing_agent = Agent(
name="billing",
instructions="Handle subscription billing.",
functions=[charge_stripe, handoff_to_support],
)
support_agent = Agent(
name="support",
instructions=(
"Handle customer service requests. "
"Use lookup_charge to check charge status."
),
functions=[lookup_charge],
# Has access to context_variables["stripe_key"] through Swarm's handoff mechanism.
# If the LLM calls stripe.Charge.create() directly or a prompt injection
# tricks it into calling charge_stripe from billing_agent's context,
# the full billing key is already in scope.
)
result = client.run(
agent=billing_agent,
messages=[{"role": "user", "content": "Charge cus_Abc123 $29 for June, then transfer to support."}],
context_variables={"stripe_key": "sk_live_xxx"},
)
What breaks: When handoff_to_support() transfers control, Swarm passes the full context_variables dict — including stripe_key: sk_live_xxx — to support_agent. The support agent has no business creating Stripe charges, but if a follow-up prompt is ambiguous ("sort out that charge"), an adversarial prompt injection reaches it, or a future developer adds a charge_stripe import, the full billing key is already available. There is no key rotation, no scope narrowing, and no audit differentiation between what the billing agent did and what the support agent did — both use the same Stripe key in your audit log.
The fix: never put the bare Stripe key in context_variables. Instead, issue per-role vault keys at the proxy layer and inject them via factory closures. The billing agent's tool factory gets a vault key scoped to POST /v1/charges; the support agent's tool factory gets a vault key scoped to GET /v1/charges only. If the support agent somehow calls a billing tool using its audit key, the proxy rejects the POST with 403:
import hashlib, os
import stripe
from swarm import Swarm, Agent
PROXY_URL = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"] # POST /v1/charges only
AUDIT_KEY = os.environ["KEYBRAKE_VAULT_KEY_AUDIT"] # GET /v1/charges only
def make_billing_tools():
billing_client = stripe.StripeClient(
api_key=BILLING_KEY,
base_url=PROXY_URL + "/stripe/",
)
def charge_stripe(
context_variables: dict,
customer_id: str,
amount_cents: int,
billing_period: str,
) -> str:
idempotency_key = hashlib.sha256(
f"{customer_id}:{amount_cents}:{billing_period}".encode()
).hexdigest()[:32]
try:
charge = billing_client.charges.create(params={
"amount": int(amount_cents),
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"idempotency_key": idempotency_key,
})
return f"Charged: {charge.id} status={charge.status}"
except stripe.StripeError as e:
return f"Stripe error: {e}" # Return, not raise — prevents LLM retry loop
return [charge_stripe]
def make_support_tools():
audit_client = stripe.StripeClient(
api_key=AUDIT_KEY,
base_url=PROXY_URL + "/stripe/",
)
def lookup_charge(context_variables: dict, charge_id: str) -> str:
try:
charge = audit_client.charges.retrieve(charge_id)
return f"Status: {charge.status}, Amount: {charge.amount}"
except stripe.StripeError as e:
return f"Stripe error: {e}"
return [lookup_charge]
client = Swarm()
billing_agent = Agent(
name="billing",
instructions="Handle subscription billing using charge_stripe.",
functions=make_billing_tools(),
)
support_agent = Agent(
name="support",
instructions="Handle customer service using lookup_charge for charge status.",
functions=make_support_tools(),
# No stripe_key in context_variables — audit vault key is closed over in make_support_tools()
)
What this fixes: Vault keys are bound at tool-factory time via closure — they never appear in context_variables, so Swarm's handoff mechanism cannot leak them. The billing agent's charge_stripe closure holds the billing vault key (POST /v1/charges only). The support agent's lookup_charge closure holds the audit vault key (GET /v1/charges only). If the support agent somehow attempts a POST /v1/charges call, the proxy returns 403. The spend-cap proxy logs every tool call under its respective vault key, giving you per-agent audit differentiation in one place.
Failure mode 2: Tool exception triggers LLM retry without idempotency key
Swarm's run() loop processes tool calls one at a time and sends each result back to the LLM as a tool message. When a tool function raises an uncaught exception, Swarm catches it and returns the exception string as the tool result — the LLM sees something like "Error: APIConnectionError: Connection reset by peer". The LLM, wanting to complete the billing task, calls the tool again. The original Stripe charge may have already completed before the network error occurred. Second call with no idempotency key = second charge:
import stripe
from swarm import Swarm, Agent
client = Swarm()
def charge_stripe(context_variables, customer_id: str,
amount_cents: int, billing_period: str) -> str:
stripe.api_key = context_variables["stripe_key"]
# stripe.Charge.create() sends the charge to Stripe.
# If Stripe accepts the charge but the response takes too long
# (e.g. POST /v1/charges times out on the client side), stripe-python
# raises APIConnectionError. The charge already exists in Stripe.
charge = stripe.Charge.create(
amount=amount_cents,
currency="usd",
customer=customer_id,
description=f"Subscription {billing_period}",
# No idempotency_key
)
return f"Charged: {charge.id}"
# If the above raises:
# stripe.error.APIConnectionError: Connection reset by peer
# Swarm sends:
# {"role": "tool", "content": "Error: APIConnectionError: Connection reset by peer"}
# The LLM calls charge_stripe again.
# stripe.Charge.create() fires again — second charge in Stripe, no deduplication.
billing_agent = Agent(
name="billing",
instructions="Handle billing. Retry if the tool returns an error.",
functions=[charge_stripe],
)
result = client.run(
agent=billing_agent,
messages=[{"role": "user", "content": "Charge cus_Abc123 $29 for June"}],
context_variables={"stripe_key": "sk_live_xxx"},
max_turns=5, # LLM will retry up to 5 times on tool errors
)
What breaks: stripe.error.APIConnectionError means the connection to Stripe's API dropped — but the charge may have already been created on Stripe's end before the network failure. Without an idempotency key, the LLM's second call to charge_stripe creates a completely new Stripe charge for the same customer, amount, and billing period. The Stripe Dashboard shows two distinct charges with two distinct charge_id values. The customer sees two entries on their credit card statement. The Swarm run log shows two successful tool results with different charge IDs, making it appear that billing ran twice by intent rather than by error.
Two changes close this: compute a content-hash idempotency key before the Stripe call (so all retries of the same billing operation resolve to the same charge), and return the StripeError as a string instead of re-raising (so Swarm sends a final error to the LLM rather than triggering another retry turn):
import hashlib, os
import stripe
from swarm import Swarm, Agent
PROXY_URL = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]
def make_billing_tools():
billing_client = stripe.StripeClient(
api_key=BILLING_KEY,
base_url=PROXY_URL + "/stripe/",
)
def charge_stripe(
context_variables: dict,
customer_id: str,
amount_cents: int,
billing_period: str,
) -> str:
idempotency_key = hashlib.sha256(
f"{customer_id}:{amount_cents}:{billing_period}".encode()
).hexdigest()[:32]
try:
charge = billing_client.charges.create(params={
"amount": int(amount_cents),
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"idempotency_key": idempotency_key,
})
return f"Charged: {charge.id} status={charge.status} idem={idempotency_key}"
except stripe.StripeError as e:
# Return the error as a string — Swarm sends this to the LLM as a tool result.
# The LLM can escalate or report the failure without triggering another retry.
# Do NOT re-raise — re-raising causes Swarm to retry the tool call.
return f"Stripe error (not retried): {e} idempotency_key={idempotency_key}"
return [charge_stripe]
client = Swarm()
billing_agent = Agent(
name="billing",
instructions=(
"Handle subscription billing using charge_stripe. "
"If charge_stripe returns a Stripe error, report it to the user — do not retry."
),
functions=make_billing_tools(),
)
result = client.run(
agent=billing_agent,
messages=[{"role": "user", "content": "Charge cus_Abc123 $29 for June"}],
context_variables={},
)
What this fixes: The idempotency key is derived from (customer_id, amount_cents, billing_period) — the same parameters the LLM passes on every call attempt for the same billing operation. Whether charge_stripe runs once, twice, or five times for the June invoice of cus_Abc123, Stripe deduplicates all requests with the same key and returns the original charge. Returning the StripeError as a string instead of re-raising means Swarm presents it to the LLM as a final result; the instruction "do not retry" gives the LLM clear guidance to escalate rather than loop. The vault key routes through the spend-cap proxy, writing a deduplicated audit log entry.
Failure mode 3: max_turns permits multiple billing iterations per run
Swarm's run() function continues processing tool calls and LLM responses until the LLM stops requesting tools or max_turns is reached. The default max_turns is effectively unbounded (set to float("inf") in the reference implementation). In a multi-step billing workflow — validate customer, check existing charges, create the charge, send a receipt, update CRM — the LLM may call charge_stripe at multiple points if intermediate steps return ambiguous results or if the agent instruction includes conditional retry logic:
from swarm import Swarm, Agent
import stripe
client = Swarm()
def check_existing_charge(context_variables, customer_id: str, billing_period: str) -> str:
stripe.api_key = context_variables["stripe_key"]
charges = stripe.Charge.list(customer=customer_id, limit=5)
for ch in charges.data:
if billing_period in (ch.description or ""):
return f"Existing charge found: {ch.id} status={ch.status}"
return "No existing charge found for this period."
def charge_stripe(context_variables, customer_id: str,
amount_cents: int, billing_period: str) -> str:
stripe.api_key = context_variables["stripe_key"]
charge = stripe.Charge.create(
amount=amount_cents, currency="usd", customer=customer_id,
description=f"Subscription {billing_period}",
# No idempotency_key
)
return f"Charged: {charge.id}"
billing_agent = Agent(
name="billing",
instructions=(
"Handle subscription billing. "
"Always check for existing charges first. "
"If the charge status is 'pending' or unclear, create a new charge."
),
# "Pending or unclear" instructs the LLM to call charge_stripe again
# if the prior call returned an indeterminate result. Combined with a
# high max_turns, multiple Stripe calls can fire in one run().
functions=[check_existing_charge, charge_stripe],
)
result = client.run(
agent=billing_agent,
messages=[{"role": "user", "content": "Make sure cus_Abc123 is billed $29 for June."}],
context_variables={"stripe_key": "sk_live_xxx"},
max_turns=10, # 10 turns allows multiple charge_stripe calls
)
What breaks: The agent instruction "if the charge status is 'pending' or unclear, create a new charge" combined with a max_turns=10 budget can lead the LLM to call charge_stripe multiple times in one run(). A stripe.Charge with status pending is a normal in-flight charge — charging again produces a duplicate. Without a daily spend cap enforced at the proxy layer, nothing between the LLM and Stripe prevents charge_stripe from being called 3 or 4 times in a single run before the LLM exhausts its turn budget. Each call creates an independent Stripe charge with no deduplication.
The proxy spend-cap provides a backstop: set a daily USD cap per vault key that matches the maximum expected single-customer charge for the billing period. Even if the LLM calls charge_stripe ten times, the proxy rejects all calls after the cap is hit. The idempotency key guarantees that any legitimate retry of the same billing operation collapses to one Stripe charge regardless of how many tool calls the LLM makes:
import hashlib, os
import stripe
from swarm import Swarm, Agent
PROXY_URL = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"] # daily_usd_cap=30 in Keybrake policy
AUDIT_KEY = os.environ["KEYBRAKE_VAULT_KEY_AUDIT"]
def make_billing_tools():
billing_client = stripe.StripeClient(api_key=BILLING_KEY, base_url=PROXY_URL + "/stripe/")
audit_client = stripe.StripeClient(api_key=AUDIT_KEY, base_url=PROXY_URL + "/stripe/")
def check_existing_charge(
context_variables: dict, customer_id: str, billing_period: str
) -> str:
try:
charges = audit_client.charges.list(params={"customer": customer_id, "limit": 5})
for ch in charges.data:
if billing_period in (ch.description or ""):
return f"Existing charge: {ch.id} status={ch.status}"
return "No existing charge for this period."
except stripe.StripeError as e:
return f"Lookup error: {e}"
def charge_stripe(
context_variables: dict,
customer_id: str,
amount_cents: int,
billing_period: str,
) -> str:
idempotency_key = hashlib.sha256(
f"{customer_id}:{amount_cents}:{billing_period}".encode()
).hexdigest()[:32]
try:
charge = billing_client.charges.create(params={
"amount": int(amount_cents),
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"idempotency_key": idempotency_key,
})
return f"Charged: {charge.id} status={charge.status}"
except stripe.StripeError as e:
return f"Stripe error: {e}"
return [check_existing_charge, charge_stripe]
client = Swarm()
billing_agent = Agent(
name="billing",
instructions=(
"Handle subscription billing. "
"Check for existing charges first. "
"If a charge already exists for the billing period, report it — do not charge again."
),
functions=make_billing_tools(),
)
result = client.run(
agent=billing_agent,
messages=[{"role": "user", "content": "Make sure cus_Abc123 is billed $29 for June."}],
context_variables={},
max_turns=10,
)
What this fixes: Three controls work together. First, check_existing_charge uses the audit vault key (GET /v1/charges only) — the LLM can look up prior charges without creating new ones, and the updated instruction to report rather than re-charge prevents the "pending = charge again" loop. Second, the content-hash idempotency key means that even if charge_stripe is called multiple times for the same (customer_id, amount_cents, billing_period) tuple, Stripe deduplicates them to one charge. Third, the billing vault key's daily spend cap (30 USD) is enforced at the proxy layer — after one successful $29 charge, the cap is hit and all subsequent POST /v1/charges attempts are rejected with a 402 error that the LLM sees as a final tool result. No amount of max_turns can spend beyond the policy limit.
One-line proxy override
The Keybrake proxy is compatible with the Stripe Python SDK's StripeClient interface. Switching a Swarm tool function from direct Stripe calls to the proxy requires changing how the client is initialized — the tool function body stays the same:
# Before — direct to Stripe with module-level api_key
import stripe
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
charge = stripe.Charge.create(amount=2900, currency="usd", customer="cus_Abc123")
# After — routes through Keybrake proxy, enforces spend cap, writes audit log
from stripe import StripeClient
stripe_client = StripeClient(
api_key=os.environ["KEYBRAKE_VAULT_KEY"],
base_url="https://proxy.keybrake.com/stripe/",
)
charge = stripe_client.charges.create(
params={"amount": 2900, "currency": "usd", "customer": "cus_Abc123"}
)
No changes to the Swarm agent definition, the client.run() call, or the function signatures are required. Only the Stripe client initialization inside the tool factory changes.
Comparison: raw key vs restricted key vs vault key
| Property | Raw sk_live_ key |
Restricted Stripe key | Vault key (Keybrake proxy) |
|---|---|---|---|
| Endpoint allowlist | No — full API access | Partial — Stripe-enforced resource set | Yes — per-key policy, any Stripe endpoint |
| Daily spend cap | No | No | Yes — proxy enforces USD cap per vault key |
| Per-agent isolation | No — all agents share one key | No — all agents share one restricted key | Yes — billing agent vs support agent get different vault keys |
| Handoff key leak | Leaks via context_variables |
Leaks via context_variables |
Never in context_variables — closed over in tool factory |
| Idempotency key guard | Only if you add it manually | Only if you add it manually | Only if you add it manually (idempotency at Stripe layer) |
| Audit log | Stripe Dashboard only | Stripe Dashboard only | Proxy audit table with vault key, agent name, timestamp, amount |
| Kill switch | Rotate secret in all agents | Revoke in Stripe Dashboard | One-click vault key revoke in Keybrake Dashboard |
pytest enforcement suite
These five tests verify idempotency, vault key isolation, retry safety, and spend-cap rejection against the proxy:
import hashlib, pytest
from unittest.mock import MagicMock, patch
from swarm import Swarm, Agent
PROXY_URL = "https://proxy.keybrake.com"
BILLING_KEY = "vault_billing_test"
AUDIT_KEY = "vault_audit_test"
def _idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
return hashlib.sha256(
f"{customer_id}:{amount_cents}:{billing_period}".encode()
).hexdigest()[:32]
def test_idempotency_key_is_deterministic():
"""Same billing params must produce the same idempotency key across all retries."""
k1 = _idempotency_key("cus_abc", 2900, "2026-06")
k2 = _idempotency_key("cus_abc", 2900, "2026-06")
assert k1 == k2
def test_different_periods_produce_different_keys():
"""Different billing periods must not share an idempotency key."""
k_june = _idempotency_key("cus_abc", 2900, "2026-06")
k_july = _idempotency_key("cus_abc", 2900, "2026-07")
assert k_june != k_july
def test_stripe_error_returned_not_raised():
"""Tool function must return StripeError as string, not re-raise it."""
import stripe
from stripe import StripeClient
mock_client = MagicMock(spec=StripeClient)
mock_client.charges.create.side_effect = stripe.APIConnectionError("timeout")
def charge_stripe(context_variables, customer_id, amount_cents, billing_period):
idem_key = _idempotency_key(customer_id, amount_cents, billing_period)
try:
charge = mock_client.charges.create(params={
"amount": int(amount_cents), "currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"idempotency_key": idem_key,
})
return f"Charged: {charge.id}"
except stripe.StripeError as e:
return f"Stripe error: {e}" # Must return, not raise
result = charge_stripe({}, "cus_abc", 2900, "2026-06")
assert result.startswith("Stripe error:")
# If this raised, Swarm would retry the tool — that would cause duplicate charges
def test_billing_key_cannot_list_charges():
"""Billing vault key must only allow POST /v1/charges — read operations rejected."""
import stripe
from stripe import StripeClient
mock_billing_client = MagicMock(spec=StripeClient)
mock_billing_client.charges.list.side_effect = stripe.PermissionError(
"403: vault key policy denies GET /v1/charges"
)
def check_existing_charge_with_billing_key(context_variables, customer_id, billing_period):
try:
charges = mock_billing_client.charges.list(params={"customer": customer_id, "limit": 5})
return "found" if charges.data else "none"
except stripe.StripeError as e:
return f"error: {e}"
result = check_existing_charge_with_billing_key({}, "cus_abc", "2026-06")
assert "error" in result # Billing key must not allow list — use audit key
def test_spend_cap_rejection_stops_loop():
"""After spend cap is hit, proxy returns error that the tool function must surface cleanly."""
import stripe
from stripe import StripeClient
call_count = 0
mock_capped_client = MagicMock(spec=StripeClient)
def cap_side_effect(**kwargs):
nonlocal call_count
call_count += 1
if call_count > 1:
raise stripe.StripeError("402: daily spend cap exceeded for vault key")
charge = MagicMock()
charge.id = "ch_test_001"
charge.status = "succeeded"
return charge
mock_capped_client.charges.create.side_effect = cap_side_effect
def charge_stripe(context_variables, customer_id, amount_cents, billing_period):
idem_key = _idempotency_key(customer_id, amount_cents, billing_period)
try:
charge = mock_capped_client.charges.create(params={
"amount": int(amount_cents), "currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
"idempotency_key": idem_key,
})
return f"Charged: {charge.id}"
except stripe.StripeError as e:
return f"Stripe error: {e}"
r1 = charge_stripe({}, "cus_abc", 2900, "2026-06")
r2 = charge_stripe({}, "cus_abc", 2900, "2026-06") # Would be capped at proxy
assert "ch_test_001" in r1
assert "spend cap exceeded" in r2
Gap analysis
Five gaps remain after applying idempotency keys and vault key isolation in Swarm:
- Parallel tool calls in one LLM turn. The underlying Chat Completions API can return multiple tool call objects in a single response. Swarm's reference implementation processes them sequentially, but custom Swarm wrappers or modifications that process
tool_callsin parallel can fire twocharge_stripecalls simultaneously. The idempotency key collapses identical calls, but different amounts or different periods in the same LLM response produce separate charges that are both valid. Verify that your Swarm run loop processes billing tool calls sequentially. - Agent instruction prompt injection via
context_variables. If anycontext_variablesvalue is derived from user input (customer name, billing note, description), a malicious input can inject instructions into the agent's context. Combined with a billing tool, this can cause the LLM to callcharge_stripewith attacker-controlled parameters. Validate and sanitize allcontext_variablesvalues that originate from user input before theclient.run()call. - Handoff function returning a string (non-
Agent) as a fallback. Swarm handoff functions can return either anAgentobject (transfers control) or a string (stays with current agent). A handoff function that returns a fallback string on error means the billing agent retains control instead of transferring to the support agent. If the billing agent's subsequent instruction includes retry logic, it may callcharge_stripeagain. Test handoff functions under error conditions to verify the correct agent receives control. - Streaming mode and partial tool call results. Swarm supports a
stream=Truemode that yields chunks as the LLM generates them. If streaming is interrupted mid-tool-call (client disconnect, timeout), the partial tool call may not be recorded in the message history. A subsequentclient.run()call to resume the session won't know whether the prior tool call completed. The content-hash idempotency key is the only protection against this — without it, the resumed run may fire a second Stripe charge. - Swarm
Result.context_variablesupdate exposing keys. Swarm tool functions can return aResultobject with updatedcontext_variables. If a tool function returns aResultthat includes a Stripe charge ID or other billing data incontext_variables, that data propagates to all subsequent agents in the run. Ensure no tool function returns Stripe keys, tokens, or sensitive billing parameters viaResult.context_variables.
FAQ
Does Swarm support concurrent agent runs?
The reference OpenAI Swarm implementation is single-threaded and processes one agent run at a time within a single client.run() call. However, if your application calls client.run() concurrently from multiple threads or async tasks (for different customers), each run operates independently with its own message history. Vault keys with per-key daily spend caps and per-key audit logs give you isolation between concurrent runs — a single bare Stripe key shared across concurrent client.run() calls cannot be isolated by customer or by billing period.
Is OpenAI Swarm production-ready?
OpenAI released Swarm as an educational and experimental framework, not as a production-ready library. For production multi-agent Stripe billing, consider using the governance patterns in this series with a production framework such as LangChain, CrewAI, or AutoGen, or with the OpenAI Assistants API directly. The failure modes documented here — idempotency key gaps, context_variables key leakage, retry cycle vulnerability — apply to any Chat Completions-based multi-agent system, not just Swarm.
What happens if the LLM calls charge_stripe with a different billing_period on retry?
The content-hash idempotency key is derived from (customer_id, amount_cents, billing_period). If the LLM passes a different billing_period value on retry — for example, "June 2026" on the first call and "2026-06" on the retry — the idempotency key changes and Stripe creates a new charge. Normalize the billing_period format before computing the key (e.g., always use YYYY-MM) and include validation in the tool function that rejects non-normalized period values.
How do I handle a Swarm agent that needs to charge multiple customers in one run?
If an agent charges multiple customers (e.g., a batch billing agent that iterates over a customer list), each charge must have a unique idempotency key. The content-hash approach handles this automatically — hashlib.sha256(f"{customer_id}:{amount_cents}:{billing_period}".encode()) produces a distinct key per customer. The daily spend cap on the billing vault key should be set to the expected total across all customers in the batch, not just one customer. For large batches, consider splitting across multiple vault keys with per-customer caps so a runaway loop on one customer can't exhaust the cap for others.
Can I use the module-level stripe.api_key pattern instead of StripeClient?
The module-level stripe.api_key = "vault_xxx" pattern works for single-agent, single-threaded runs. But it is process-global — if two Swarm runs execute in the same process (even sequentially), the last stripe.api_key assignment wins. For multi-agent or concurrent setups, use stripe.StripeClient(api_key=VAULT_KEY, base_url=PROXY_URL) to create an isolated client instance per tool factory. The StripeClient approach also makes the proxy base_url override straightforward without modifying global state.
How does the vault key's billing-period spend cap interact with Swarm's max_turns?
The vault key's daily USD cap and Swarm's max_turns are independent controls. max_turns limits the number of LLM-tool-call cycles in one client.run() invocation. The vault key's cap limits the total USD spent against Stripe in one rolling 24-hour window across all runs that use the same vault key. Both should be set: max_turns prevents runaway agentic loops at the Swarm layer; the spend cap prevents runaway billing at the proxy layer even if Swarm allows more turns than expected. The idempotency key prevents duplicate charges within the same billing operation regardless of how many turns occur.
Put the brakes on your Swarm agent's Stripe key
Keybrake issues vault keys your Swarm tool factories can close over — scoped to the exact Stripe endpoints each agent role needs, with daily USD caps that stop runaway billing before it starts. One proxy, per-agent audit logs, one-click revoke.