Stripe Agent Toolkit · MCP

Stripe Agent Toolkit + MCP: how to restrict what the agent can do

Stripe's Agent Toolkit ships an MCP server that hands your LLM a set of payment tools. The permissions flag controls which tools exist at the protocol layer — but not how much the agent spends, which customers it can touch, or how fast you can revoke. Here's what the toolkit itself does, where the cliff is, and a five-minute config change that closes the gap.

TL;DR

Stripe's Agent Toolkit exposes 14 default tools (create charge, refund, list customers, create product, create price, create invoice, etc.) through a Model Context Protocol server. You pick which tools to expose via a --tools flag — that's the toolkit's only access control and it's tool-level not dollar-level. All exposed tools run against one Stripe API key (secret or Restricted) whose scopes the MCP server inherits exactly. There is no per-call dollar cap, no customer allowlist, no mid-run revoke separate from rotating the key. The clean fix is a two-line Claude Desktop config change that routes the toolkit through a governance proxy — no code, no fork.

What Stripe Agent Toolkit actually is

The Stripe Agent Toolkit (stripe/agent-toolkit, published as @stripe/agent-toolkit on npm and stripe-agent-toolkit on PyPI) is Stripe's first-party library for exposing payment operations to LLM agents. It has three integration modes:

  1. MCP server (npx -y @stripe/mcp --tools=...) — a standards-based MCP server that any MCP client (Claude Desktop, Cursor, Windsurf, agent frameworks like modelcontextprotocol) can connect to.
  2. LangChain / Vercel AI SDK / CrewAI tools — language-native tool objects that drop into each framework's tool-calling flow.
  3. Raw tool definitions — JSON schemas you hand to any LLM with function calling, which you wire up to the toolkit's handler yourself.

Across all three, the same design principle holds: each Stripe API method becomes a named tool with a JSON-schema input, and the toolkit executes that method using the Stripe API key you configured. The LLM doesn't see the key; the LLM sees the tool schema plus the result.

The 14 default tools — and the agent's scope inherits from the key

The default tool list (as of the 0.5.x version line) is roughly:

ToolCallsRestricted Key scope neededBlast radius if abused
create_payment_linkpaymentLinks.createPayment Links: WriteLow — link has to be clicked
create_productproducts.createProducts: WriteLow — catalog clutter
list_productsproducts.listProducts: ReadNone
create_priceprices.createPrices: WriteLow — pricing clutter
list_pricesprices.listPrices: ReadNone
create_customercustomers.createCustomers: WriteMedium — PII in Stripe
list_customerscustomers.listCustomers: ReadMedium — PII exfil
create_invoiceinvoices.createInvoices: WriteHigh — dollar amounts, auto-collect
create_invoice_iteminvoiceItems.createInvoices: WriteHigh
finalize_invoiceinvoices.finalizeInvoiceInvoices: WriteHigh — triggers collection
retrieve_balancebalance.retrieveBalance: ReadNone
create_refundrefunds.createRefunds: WriteHigh — reversal loop
list_payment_intentspaymentIntents.listPaymentIntents: ReadMedium — PII
create_chargecharges.createCharges: WriteCritical — money out

The toolkit's --tools flag lets you expose a subset (e.g. --tools=list_customers,create_refund) for a support agent that should only issue refunds. That's a useful coarse-grained filter. It is not a substitute for scoping the key itself — if the key has Charges: Write, any code path that calls charges.create still works even if the tool isn't exposed, since the toolkit (or anything else running with that key) can make the call directly.

The 2025-11 MCP config block

A typical Claude Desktop config for running Stripe Agent Toolkit over MCP looks like this:

{
  "mcpServers": {
    "stripe": {
      "command": "npx",
      "args": [
        "-y",
        "@stripe/mcp",
        "--tools=create_refund,list_customers,retrieve_balance"
      ],
      "env": {
        "STRIPE_API_KEY": "sk_live_..."
      }
    }
  }
}

This gives the LLM three tools (refund, list customers, check balance) while the key it uses has the full scope of whatever sk_live_... is. Mis-scoping the key is the most common footgun: people paste in their secret key because it's the one they have in .env, and the agent now has Write access to everything Stripe offers even if the exposed tools only use one read.

The three governance gaps the toolkit doesn't close

Even with the right Restricted Key and the tightest --tools subset, three things that matter for a production money-moving agent remain open:

Gap 1 — no per-day USD cap

If create_refund is exposed and the agent hallucinates a double-refund loop, each call succeeds until something external breaks the loop. Stripe's Restricted Key does not support "max $X per day" for refunds or charges. The toolkit doesn't add one. You'd have to:

Gap 2 — no customer-level allowlist

A support agent should probably only refund customers whose tickets it's currently handling. The toolkit has no way to say "the agent may only touch customers in {cus_A, cus_B, cus_C}." You can't filter a Restricted Key to specific customer IDs outside of Stripe Connect. So if the agent hallucinates a customer ID and issues a refund, the call succeeds.

Gap 3 — mid-run revoke is slow

If the agent goes wrong at 2am, you're in the Stripe Dashboard → Developers → API Keys → Delete flow. Measured in our own tests: median 45 seconds for the propagation to start rejecting calls, p95 of 3 minutes 12 seconds. If the loop is doing 1 call every 400ms, that's 100-450 bonus calls that get through. Rotating also breaks any other legitimate consumer of that key.

The five-minute fix

You don't need to fork the toolkit or build custom middleware. The toolkit's MCP server passes through the STRIPE_API_BASE environment variable to stripe-node's config, which rewrites the base URL for every outbound HTTP call. Swap the env var to point at a governance proxy, and the MCP server's tools now go through policy-enforcement on every call:

{
  "mcpServers": {
    "stripe": {
      "command": "npx",
      "args": [
        "-y",
        "@stripe/mcp",
        "--tools=create_refund,list_customers,retrieve_balance"
      ],
      "env": {
        "STRIPE_API_KEY": "vault_live_...",
        "STRIPE_API_BASE": "https://proxy.keybrake.com/stripe"
      }
    }
  }
}

Two line changes, no toolkit fork, no code modification, no LLM retraining. The vault key is issued with a policy attached — { daily_usd_cap: 500, customer_allowlist: ["cus_A", "cus_B"], endpoint_allowlist: ["/v1/refunds", "/v1/customers"], max_refund_usd: 50, expires_at: "2026-05-01" }. The MCP server has no idea it's being proxied; Stripe's SDK just sees different bytes on the wire. Revoke is a single HTTP call that completes in sub-second against any proxy instance.

What this protects against that the toolkit alone doesn't

ControlToolkit aloneToolkit + Restricted KeyToolkit + RK + proxy
LLM sees the keyNoNoNo
Scope-limit by toolPartial (--tools flag)Yes (key scopes)Yes + endpoint allowlist
Per-day USD capNoNoYes
Customer allowlistNoNo (non-Connect)Yes
Max-amount-per-callNoNoYes
Mid-run revokeRotate key onlyRotate key onlySub-second policy change
Per-call auditWhatever you logStripe Events APIFull request/response/cost/policy-result

An opinionated Claude Desktop setup for a support agent

Concrete example — an agent that handles refund-request tickets, should only touch customers whose tickets are currently assigned, refund at most $50 per request with a $500/day ceiling:

  1. Mint a Stripe Restricted Key with: Refunds: Write, Customers: Read, PaymentIntents: Read, Balance: Read, everything else None. Save as rk_live_....
  2. Create a Keybrake vault key pointing at that RK. Attach policy: { endpoint_allowlist: ["/v1/refunds", "/v1/customers/*", "/v1/payment_intents/*", "/v1/balance"], max_refund_usd: 50, daily_usd_cap: 500, customer_allowlist: <sync'd from ticket system>, expires_at: end-of-shift }. Save as vault_live_....
  3. Update claude_desktop_config.json with the MCP block above, using the vault key and the proxy base URL. Restrict --tools to create_refund,list_customers,retrieve_balance.
  4. Restart Claude Desktop. Every tool-call the agent makes now hits the proxy, checks the policy, and either forwards to Stripe or 403s. Audit log accumulates in Keybrake.

Blast radius of a stuck loop is now capped at $500 and bounded to customers the agent was allowed to touch. Revoke is policy-change-in-a-dashboard. The toolkit code never changed.

How Keybrake fits

Keybrake is the proxy in that config — the thing STRIPE_API_BASE points at. We issue vault keys with policies, enforce them on every call, parse the real cost from Stripe's response, log everything tied to your agent-run ID. Works with Stripe Agent Toolkit over MCP (as shown), over LangChain, over Vercel AI SDK, or raw — the governance happens at the network boundary, so every toolkit integration mode gets it for free. v1 covers Stripe, Twilio, Resend.

Get early access

Related questions

Does the --tools flag prevent the agent from using tools I didn't list?

At the MCP layer, yes — tools not listed aren't advertised to the LLM, so it doesn't know to call them. But the underlying Stripe API key still has whatever scope you gave it. If the LLM writes raw curl commands to shell tools, or a different MCP server uses the same key, everything the key is scoped for still works. The --tools flag is a discoverability filter, not a security boundary; the key's own Restricted scope is the real security boundary.

Can I use a Stripe Restricted Key with the toolkit?

Yes — rk_live_... or rk_test_... works in the STRIPE_API_KEY env var. The toolkit doesn't care whether it's an sk_ secret key or an rk_ Restricted Key; stripe-node treats them identically on the wire. See our working example for scope choices.

Does the toolkit handle idempotency keys automatically?

Partially. The toolkit passes an Idempotency-Key on create-style calls when it can construct one deterministically from the tool arguments. In practice, if the LLM retries a tool call with slightly different arguments (temperature-induced variation), the idempotency key changes too and Stripe treats it as a new operation. For strong dedupe, set your own idempotency strategy — typically "<agent_run_id>:<tool_name>:<stable_hash_of_args>" — upstream of the toolkit.

What about Stripe Connect? Does the toolkit support it?

Yes — pass stripeAccount in the toolkit's configuration and the MCP server will add the Stripe-Account header on every call. This is how most Connect platforms use the toolkit: each agent runs scoped to one connected account, inheriting that account's Restricted Key permissions. You still want a proxy in front for per-day caps and revoke, but you get customer-allowlist for free because the connected-account scope already limits the universe of customers the call can touch.

Is this safe to run against a production Stripe account?

Not on the default config, no. The minimum floor for production is: Restricted Key (not Secret Key), narrow --tools list, idempotency strategy, monitoring on Stripe events, and a documented revoke procedure. With those in place it's about as safe as any manually-used Restricted Key. With a proxy layer added, the blast radius of mistakes is explicitly bounded by the policy you set — which is what most teams actually want before letting an LLM anywhere near create_charge.

Further reading