Agent Governance

CrewAI + Stripe: spend limits and a kill switch for your billing agent

CrewAI makes it trivial to wire a Stripe tool into a multi-agent crew. The framework handles tool routing, retries, and memory — but it hands the agent a raw Stripe API key with no cap on what it can spend, no way to revoke mid-run, and no per-call audit trail. This post covers the gap and shows a pattern that closes it.

The standard CrewAI Stripe tool

Most teams start with something like this — a BaseTool subclass that wraps the Stripe Python library:

import stripe
from crewai.tools import BaseTool
from pydantic import BaseModel, Field

class CreateChargeInput(BaseModel):
    amount: int = Field(..., description="Amount in cents")
    currency: str = Field(default="usd")
    customer_id: str = Field(..., description="Stripe customer ID")
    description: str = Field(default="")

class CreateChargeTool(BaseTool):
    name: str = "create_stripe_charge"
    description: str = "Create a Stripe charge for a customer"
    args_schema: type[BaseModel] = CreateChargeInput

    def _run(self, amount: int, currency: str,
             customer_id: str, description: str) -> str:
        stripe.api_key = "sk_live_..."  # ← long-lived key
        charge = stripe.Charge.create(
            amount=amount,
            currency=currency,
            customer=customer_id,
            description=description,
        )
        return f"Charge created: {charge.id}"

The tool is clean. The problem is the key embedded in it — or passed in from an environment variable with no constraints attached.

Three ways it breaks in production

Failure 1: the runaway loop. The billing crew is given a task to "charge all overdue accounts." A lookup bug causes it to iterate over the same account repeatedly. Each loop iteration calls CreateChargeTool. The loop runs for 40 minutes before someone notices. By then: 300 duplicate charges, $45,000 out the door, and a customer support queue that takes three days to clear.

Failure 2: the stuck agent mid-run. The agent is processing a legitimate batch when its context window fills and it starts hallucinating customer IDs. You want to stop it immediately. The only way to cut the Stripe key is to rotate it in the Stripe dashboard — which also breaks every other service using the same key. The "kill switch" costs you a 20-minute emergency rotation across your entire stack.

Failure 3: no audit trail on the agent's actions. Something looks off in next month's revenue numbers. You need to know which charges were made autonomously versus manually. Stripe logs every charge, but nothing in those logs says "this was made by the billing-crew agent on run #47." Reconstructing the timeline takes hours of cross-referencing timestamps.

None of these are CrewAI bugs. They're architecture gaps — places where the framework's job ends and yours begins.

Step 1: start with a Stripe restricted key

The fastest partial fix is to use a Stripe restricted API key instead of the live secret key. A restricted key lets you lock the key to exactly the Stripe API resources the agent needs.

For a billing agent that only creates charges and reads customer data:

# Stripe restricted key permissions for a billing agent

Charges:        Write
Customers:      Read
Refunds:        None  ← agent can't issue refunds autonomously
PaymentIntents: Write
All else:       None

See the full Stripe restricted key permissions reference for the complete list of ~60 toggles and which combinations make sense for each agent archetype.

A restricted key limits blast radius. It doesn't cap spend — the agent can still create charges without limit — and it doesn't give you a per-run kill switch. For that, you need a proxy layer.

Step 2: route through a governance proxy

The cleanest pattern for CrewAI Stripe governance is a reverse proxy that sits between the agent and the Stripe API. The agent uses a short-lived vault key (not the real Stripe key). The proxy enforces policy, forwards the call, and logs every request.

The proxy has three responsibilities:

From the agent's perspective, the only change is the base URL and the key:

import stripe

# Before: live Stripe key, api.stripe.com
stripe.api_key = "sk_live_..."

# After: vault key, proxy URL
stripe.api_key = "vault_key_abc123"
stripe.api_base = "https://proxy.keybrake.com/stripe"

The rest of the tool code is unchanged. CrewAI doesn't know a proxy is involved.

Updated CrewAI tool with governance

Here's the full updated tool, parameterized so each crew run gets its own vault key:

import stripe
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
import os

class CreateChargeInput(BaseModel):
    amount: int = Field(..., description="Amount in cents")
    currency: str = Field(default="usd")
    customer_id: str = Field(..., description="Stripe customer ID")
    description: str = Field(default="")

class GovernedChargeTool(BaseTool):
    name: str = "create_stripe_charge"
    description: str = "Create a Stripe charge for a customer"
    args_schema: type[BaseModel] = CreateChargeInput
    vault_key: str = Field(..., description="Per-run vault key from Keybrake")
    proxy_url: str = Field(
        default="https://proxy.keybrake.com/stripe"
    )

    def _run(self, amount: int, currency: str,
             customer_id: str, description: str) -> str:
        stripe.api_key = self.vault_key
        stripe.api_base = self.proxy_url

        charge = stripe.Charge.create(
            amount=amount,
            currency=currency,
            customer=customer_id,
            description=description,
        )
        return f"Charge created: {charge.id}"

And in the crew definition:

from crewai import Crew, Agent, Task

# Provision a vault key before the crew starts
# (call Keybrake admin API or use a pre-generated key per run)
vault_key = os.environ["BILLING_VAULT_KEY"]

charge_tool = GovernedChargeTool(vault_key=vault_key)

billing_agent = Agent(
    role="Billing Specialist",
    goal="Process overdue account charges accurately",
    backstory="Expert at managing billing operations with strict accuracy requirements",
    tools=[charge_tool],
    verbose=True,
)

billing_task = Task(
    description="Review overdue accounts and charge each one the outstanding balance. "
                "Do not charge any account more than once. "
                "Stop immediately if any charge fails.",
    agent=billing_agent,
    expected_output="List of successful charge IDs and any failures",
)

crew = Crew(agents=[billing_agent], tasks=[billing_task])
result = crew.kickoff()

Policy configuration for a billing agent

When you provision the vault key, you attach a policy. A typical billing-agent policy:

{
  "vendor": "stripe",
  "daily_usd_cap": 50000,
  "allowed_endpoints": [
    "/v1/charges",
    "/v1/customers",
    "/v1/payment_intents"
  ],
  "expires_at": "2026-06-12T23:59:59Z"
}

What each field does:

FieldEffect
daily_usd_cap Proxy rejects any call that would push the day's proxied spend past this limit. The agent gets a 402 response and stops.
allowed_endpoints Any call to an endpoint not in this list (e.g. /v1/refunds) gets a 403. The agent can't issue refunds even if it tries.
expires_at After this time the vault key is invalid. Short-lived keys are the fastest way to limit damage from a runaway run: just wait for expiry.

You can revoke the vault key before expiry from the Keybrake dashboard at any time — this is the kill switch. The live Stripe key is never touched.

What the audit log gives you

Every proxied call gets a row in the audit log:

run_id        | billing-crew-2026-06-12-run-47
agent_id      | billing-specialist
timestamp_utc | 2026-06-12T14:23:01Z
vendor        | stripe
endpoint      | /v1/charges
method        | POST
http_status   | 200
vendor_req_id | req_abc123
cost_usd      | 249.00
policy_id     | pol_xyz

When something looks off in next month's numbers, you query by run_id and get every charge the agent made, in order, with timestamps. No cross-referencing Stripe logs against your task queue. The agent's footprint is a named, queryable entity.

How this compares to other frameworks

The same pattern works for LangChain agents with Stripe tools and for agents built on the OpenAI Agents SDK. CrewAI's multi-agent architecture adds one wrinkle: a crew can have multiple agents sharing the same tool. In that case, provision one vault key per crew run (not per agent), so the per-run spend cap is shared across all agents in the crew. If Agent A charges $40,000 and Agent B tries to charge $20,000 more, the second call is blocked at $50,000.

For a crew where agents have meaningfully different risk profiles — a read-heavy research agent and a write-heavy billing agent — provision separate vault keys with different policies. The proxy tracks spend per-key, not per-crew.

FAQ

Can I use a Stripe restricted key AND a proxy at the same time?

Yes, and you should. The restricted key limits which Stripe resources the agent can touch; the proxy adds spend caps, per-run revocation, and the audit log. They're complementary layers. The proxy holds the restricted key and presents a vault key to the agent — the agent never sees the Stripe credentials directly.

What happens when the agent hits the spend cap?

The proxy returns HTTP 402 with a JSON body describing the policy violation. The Stripe client library raises a stripe.error.CardError-style exception (actually a generic API error with status 402). In CrewAI, the agent's tool returns an error string. A well-prompted agent will stop and report the failure; a poorly-prompted agent may retry — but the proxy will keep blocking until the cap resets or the key is rotated.

Does the proxy add latency?

Minimal — typically under 5ms for the proxy hop. For a billing agent making tens to hundreds of calls per run, this is negligible. Stripe's own p99 API latency is ~300ms; the proxy adds less than 2% overhead.

How do I provision vault keys programmatically?

Via the Keybrake admin API. You POST a key-creation request with a policy attached, and get a vault key back. In a CI/CD or orchestrator context, you generate the key at the start of each crew run and pass it in as an environment variable or constructor argument to the tool. The key expires with the run.

What if the CrewAI agent retries a failed charge?

CrewAI can instruct an agent to retry on tool failure. If the failure was a policy violation (spend cap hit, endpoint blocked, key expired), retrying will keep failing — the proxy state doesn't change between retries. Design your task description to include explicit stop conditions: "if any charge fails, report the failure and stop, do not retry." This is good agent hygiene regardless of the governance layer.

Can the proxy enforce merchant allowlists (limit charges to specific customers)?

The policy supports an allowed_customer_ids list. The proxy inspects the POST body and rejects calls where customer is not in the list. This is useful for a crew that should only ever touch a specific cohort — for example, a dunning agent that should only charge accounts that are explicitly in the overdue queue, not any arbitrary customer ID the agent constructs.

Add governance to your CrewAI Stripe tools

The proxy is live at proxy.keybrake.com. Drop in a vault key and two lines of config — the rest of your tool code stays identical.