Agent Governance

Cohere Command R Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 18, 2026 · 9 min read

Cohere's Command R and Command R+ models make it easy to build billing-capable agents with tool calling — define a charge_stripe tool, pass it to co.chat(), and run a while tool_calls: loop until the model stops invoking tools. Three specific failure modes emerge in production: Command R+ can return multiple tool_calls entries in a single response, firing two Stripe charges simultaneously before any tool result is registered; the Cohere SDK's RequestOptions(max_retries=N) compounds a multi-step loop by retrying the API call after a network error — the retry has no memory of the charge that already completed, and the model calls charge_stripe again; and Cohere's stateless chat API means sessions reconstructed from a stored chat_history will replay completed billing operations when an ambiguous follow-up prompt references prior context.

The standard Cohere + Stripe setup

A typical Cohere billing agent looks like this with the v1 Python SDK:

import cohere
import stripe

co = cohere.Client(api_key=COHERE_API_KEY)
stripe.api_key = STRIPE_KEY  # ← bare key, shared by all calls

TOOLS = [{
    "name": "charge_stripe",
    "description": "Charge a customer for their monthly subscription.",
    "parameter_definitions": {
        "customer_id": {"description": "Stripe customer ID", "type": "str", "required": True},
        "amount_cents": {"description": "Amount to charge in cents", "type": "int", "required": True},
        "billing_period": {"description": "Billing period identifier e.g. '2026-Q2'", "type": "str", "required": True}
    }
}]

def run_billing_agent(message: str, chat_history: list) -> str:
    response = co.chat(
        model="command-r-plus-08-2024",
        message=message,
        tools=TOOLS,
        chat_history=chat_history,
    )

    while response.tool_calls:
        tool_results = []
        for tc in response.tool_calls:  # ← iterates ALL tool calls in one response
            result = execute_tool(tc.name, tc.parameters)
            tool_results.append({
                "call": tc,
                "outputs": [{"result": result}]
            })

        response = co.chat(
            model="command-r-plus-08-2024",
            message="",
            chat_history=response.chat_history,
            tools=TOOLS,
            tool_results=tool_results,
        )

    return response.text

Clean, readable, and correct for single-tool-call scenarios. The problems surface when the model returns more than one tool call, when the loop catches an exception mid-iteration, or when the session is reconstructed from a stored history.

Failure mode 1: parallel `tool_calls` emit two charges in one response

Command R+ supports parallel tool calling. When a billing task has scope ambiguity — "process all Q2 outstanding invoices," "charge both the starter and pro tier customers," "handle this batch of five accounts" — the model may return two charge_stripe entries in a single response.tool_calls list instead of one at a time.

What goes wrong: the for tc in response.tool_calls: loop executes both charges sequentially before any Stripe result is registered. The first stripe.Charge.create() completes. The second raises a transient APIConnectionError (Stripe accepted the request but the network response never arrived). The caller has no record of the first charge completing. The outer retry fires the full loop again. Both charges run a second time. Customer billed twice.

The model has no way to know this happened. From its perspective, it emitted two tool calls and received no results — so on the retry it generates the same two tool calls again. The Stripe duplicate-charge protection (same card, same amount within a few seconds) may or may not catch it depending on whether the customer ID and amount combination triggers the heuristic.

Here's a minimal reproduction with Command R+:

response = co.chat(
    model="command-r-plus-08-2024",
    message="Process all Q2 outstanding invoices for accounts A100 and A101",
    tools=TOOLS,
)

# response.tool_calls may be:
# [
#   ToolCall(name='charge_stripe', parameters={'customer_id': 'cus_A100', 'amount_cents': 4900, 'billing_period': '2026-Q2'}),
#   ToolCall(name='charge_stripe', parameters={'customer_id': 'cus_A101', 'amount_cents': 4900, 'billing_period': '2026-Q2'}),
# ]
#
# Both fire in the for-loop. If the second raises APIConnectionError,
# the retry re-runs all tool calls — including the one that already charged cus_A100.

An idempotency key that is stable across all retries is the only correct fix. If stripe.Charge.create() receives the same idempotency key as a prior completed charge, Stripe returns the original charge object rather than creating a new one.

Failure mode 2: `RequestOptions(max_retries=N)` compounds the multi-step loop

The Cohere Python SDK exposes request_options on every API call for configuring timeout, retries, and headers. It is common to add retry logic at the SDK level for resilience:

from cohere.core import RequestOptions

response = co.chat(
    model="command-r-plus-08-2024",
    message=message,
    tools=TOOLS,
    chat_history=chat_history,
    request_options=RequestOptions(max_retries=3, timeout_in_seconds=30),
)

What goes wrong: in a multi-step loop, co.chat() is called twice per iteration — once to get tool calls, once to pass tool results back. The second call (passing tool_results) can fail after Stripe has already charged the customer. When RequestOptions(max_retries=3) retries the second co.chat() call, the Cohere API receives the same message with the same tool results. The model sees the confirmed charge result and continues correctly. But if the failure happens on the first co.chat() call in the iteration — the one that emitted the tool calls — the SDK retries without knowing which tool calls already executed. Worse: if the application itself wraps the whole loop in a retry decorator, a network error after the Stripe charge completes causes the outer retry to re-run the loop from the beginning, calling charge_stripe again with no idempotency key and no memory of the prior charge.

There are two failure layers here. The SDK-level retry is mostly safe if you apply it only to the tool-result submission call. The dangerous layer is application-level retry on the whole loop:

@retry(max_attempts=3, exceptions=(requests.Timeout, cohere.CohereAPIError))
def run_billing_agent(message, chat_history):
    # If any co.chat() call raises, the decorator re-runs this entire function.
    # Stripe was already charged in the first attempt. Second attempt charges again.
    response = co.chat(model=MODEL, message=message, tools=TOOLS, chat_history=chat_history)
    while response.tool_calls:
        results = [execute_tool(tc.name, tc.parameters) for tc in response.tool_calls]
        response = co.chat(model=MODEL, message="", tool_results=results, ...)
    return response.text

Every framework covered in this series has this same outer-retry problem. The fix is always the same: an idempotency key derived from the billing operation's content, not from the run attempt. A content-hash key derived from (customer_id, amount_cents, billing_period) is identical whether it's the first attempt or the third, so Stripe collapses all retries into a single charge.

Failure mode 3: `chat_history` accumulation replays billing on resumed sessions

Cohere's chat API is stateless. There is no server-side session. The caller reconstructs context on every call by passing chat_history — a list of prior USER, CHATBOT, TOOL, and TOOL_RESULTS turns. A typical customer-service billing agent stores this history in a database and reloads it when the customer opens a new support conversation.

What goes wrong: stored chat_history contains the prior tool call and its result: TOOL(charge_stripe, {customer_id: ..., amount_cents: ..., billing_period: "2026-Q2"}) followed by TOOL_RESULTS({status: "succeeded", charge_id: "ch_..."}). This history is fed back into the next session. The customer sends a follow-up message: "can you also handle June?" or "please retry that." The model sees a completed billing tool call in its context window. It does not know that "retry" is ambiguous — it calls charge_stripe again with the same or updated arguments. There is no deduplication because the new call is technically for a different conversation turn, not a retry of the same SDK call. Stripe creates a new charge.

Here is the chat history structure that creates the replay risk:

# Chat history stored in DB after a successful billing session
stored_history = [
    {"role": "USER",    "message": "Process Q1 invoice for cus_A100"},
    {"role": "CHATBOT", "message": "", "tool_calls": [
        {"name": "charge_stripe", "parameters": {"customer_id": "cus_A100", "amount_cents": 4900, "billing_period": "2026-Q1"}}
    ]},
    {"role": "TOOL",    "tool_results": [{"call": ..., "outputs": [{"status": "succeeded", "charge_id": "ch_xyz"}]}]},
    {"role": "CHATBOT", "message": "Successfully charged $49.00 for Q1. Let me know if you need anything else."},
]

# New session, customer follows up
response = co.chat(
    model="command-r-plus-08-2024",
    message="Now process Q2 as well",  # Legitimate new request
    chat_history=stored_history,        # ← Prior billing is visible to the model
    tools=TOOLS,
)
# Model sees: Q1 charge already done via charge_stripe.
# "Q2 as well" → calls charge_stripe again for 2026-Q2. Correct.
# But: "same as last time" → may call charge_stripe with 2026-Q1 again. Duplicate.
# "retry if it failed" → same as above. Duplicate charge if Q1 succeeded.

The fix has two parts. First, a check_existing_charge tool (using a read-only audit vault key) gives the model a way to look up prior charge status before creating a new one. Second, a content-hash idempotency key collapses any duplicate charge_stripe calls with the same (customer_id, amount_cents, billing_period) tuple into a single Stripe charge, regardless of how many conversation turns produced the call.

The two-layer fix

The pattern that closes all three failure modes combines a Stripe restricted key with a per-run vault key from a spend-cap proxy. Neither layer alone is sufficient.

Layer 1: content-hash idempotency key

Derive the idempotency key from the billing operation's content, not from the request attempt. The same (customer_id, amount_cents, billing_period) tuple always produces the same key — so parallel tool calls, application-level retries, and chat-history replays all collapse to a single Stripe charge:

import hashlib

def make_idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
    payload = f"{customer_id}:{amount_cents}:{billing_period}:cohere-billing"
    return hashlib.sha256(payload.encode()).hexdigest()[:40]

def charge_stripe_tool(customer_id: str, amount_cents: int, billing_period: str) -> dict:
    idem_key = make_idempotency_key(customer_id, amount_cents, billing_period)
    try:
        charge = stripe.Charge.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
            idempotency_key=idem_key,
        )
        return {"status": "succeeded", "charge_id": charge.id}
    except stripe.error.StripeError as e:
        # Return as string — do NOT re-raise.
        # Re-raising causes the Cohere loop to surface an exception,
        # which application-level retry wrappers treat as retriable.
        return {"status": "error", "message": str(e)}

Layer 2: per-run vault keys via Keybrake proxy

A restricted Stripe key limits which endpoints the agent can call, but it does not limit how much it can charge. A vault key from the proxy adds a daily USD cap per key — so a runaway billing loop for one customer cannot exhaust the day's budget for all customers, and a compromised key cannot drain the account:

import cohere
import stripe

co = cohere.Client(api_key=COHERE_API_KEY)

def make_billing_tool(vault_key: str):
    """Returns a charge_stripe callable bound to one vault key."""
    stripe_client = stripe.StripeClient(
        api_key=vault_key,
        base_url="https://proxy.keybrake.com/stripe/",
    )

    def charge_stripe_tool(customer_id: str, amount_cents: int, billing_period: str) -> dict:
        idem_key = make_idempotency_key(customer_id, amount_cents, billing_period)
        try:
            charge = stripe_client.charges.create(params={
                "amount": amount_cents,
                "currency": "usd",
                "customer": customer_id,
            }, options={"idempotency_key": idem_key})
            return {"status": "succeeded", "charge_id": charge.id}
        except stripe.StripeError as e:
            return {"status": "error", "message": str(e)}

    return charge_stripe_tool

def run_billing_agent(message: str, chat_history: list) -> str:
    vault_key = get_vault_key("billing")  # per-run key from Keybrake
    charge_fn = make_billing_tool(vault_key)

    response = co.chat(
        model="command-r-plus-08-2024",
        message=message,
        tools=TOOLS,
        chat_history=chat_history,
    )

    while response.tool_calls:
        tool_results = []
        for tc in response.tool_calls:
            if tc.name == "charge_stripe":
                result = charge_fn(**tc.parameters)
            tool_results.append({"call": tc, "outputs": [result]})

        response = co.chat(
            model="command-r-plus-08-2024",
            message="",
            chat_history=response.chat_history,
            tools=TOOLS,
            tool_results=tool_results,
        )

    return response.text

The one-line proxy override is stripe.StripeClient(api_key=vault_key, base_url="https://proxy.keybrake.com/stripe/"). The proxy enforces the endpoint allowlist (billing vault key: POST /v1/charges only) and the daily USD cap (billing vault key cap = expected max single-run charge). An audit vault key (GET /v1/charges only, no cap) powers the check_existing_charge lookup tool that guards against chat_history replay.

Comparison: raw key vs restricted key vs vault key

Property	Raw key (`sk_live_`)	Restricted key	Vault key (proxy)
Endpoint allowlist	All Stripe endpoints	Selected resource types	Exact method+path (`POST /v1/charges`)
Daily USD cap	None	None	Per-key cap enforced at proxy
Per-run isolation	Module-level global — all calls share	Same global problem	New key per `co.chat()` loop run
Parallel tool call guard	No dedup — two charges fire	No dedup	Idempotency key collapses duplicates
SDK/app retry guard	No guard — re-fires charge	No guard	Content-hash idem key across all retries
Chat history replay guard	No guard	No guard	Audit vault key powers pre-charge lookup; idem key collapses replays
Audit log	Stripe dashboard only	Stripe dashboard only	Per-request structured log at proxy (customer, agent run ID, key, amount, timestamp)

Pytest enforcement suite

import hashlib, pytest
from unittest.mock import patch, MagicMock

def make_idempotency_key(customer_id, amount_cents, billing_period):
    payload = f"{customer_id}:{amount_cents}:{billing_period}:cohere-billing"
    return hashlib.sha256(payload.encode()).hexdigest()[:40]

def test_idempotency_key_is_deterministic():
    k1 = make_idempotency_key("cus_A100", 4900, "2026-Q2")
    k2 = make_idempotency_key("cus_A100", 4900, "2026-Q2")
    assert k1 == k2

def test_different_periods_produce_different_keys():
    k1 = make_idempotency_key("cus_A100", 4900, "2026-Q2")
    k2 = make_idempotency_key("cus_A100", 4900, "2026-Q3")
    assert k1 != k2

def test_stripe_error_returned_not_raised(charge_fn):
    with patch("stripe.StripeClient.charges.create",
               side_effect=stripe.error.APIConnectionError("timeout")):
        result = charge_fn("cus_A100", 4900, "2026-Q2")
    assert result["status"] == "error"
    assert "timeout" in result["message"]
    # No exception propagated — no application-level retry trigger

def test_parallel_tool_calls_deduplicated():
    calls = []
    def fake_charge(customer_id, amount_cents, billing_period, **kw):
        idem = make_idempotency_key(customer_id, amount_cents, billing_period)
        calls.append(idem)
        return MagicMock(id="ch_test")

    # Simulate two parallel tool_calls for the same billing operation
    with patch("stripe.Charge.create", side_effect=fake_charge):
        result_a = charge_stripe_tool("cus_A100", 4900, "2026-Q2")
        result_b = charge_stripe_tool("cus_A100", 4900, "2026-Q2")

    # Same idempotency key used for both calls
    assert calls[0] == calls[1]

def test_per_run_vault_keys_are_distinct():
    key_a = get_vault_key("billing")
    key_b = get_vault_key("billing")
    assert key_a != key_b  # Each run issues a fresh vault key from the proxy

Gap analysis

1. Cohere v2 API (`ClientV2`) messages format

The v2 API uses an OpenAI-compatible messages list instead of chat_history. The chat-history replay risk is identical — stored messages are reconstructed on session resume. The parallel tool-call and retry failure modes also apply unchanged. Apply the same idempotency key and vault key patterns; only the SDK call signature differs (co.chat(messages=[...], tools=[...]) vs co.chat(message=..., chat_history=[...])).

2. Command A and future model releases

Cohere's Command A model (2025) has a 256k context window and is optimized for agentic tasks. A larger context window increases the chat-history replay risk: more prior billing operations fit in context, and the model has more historical evidence to draw on when deciding whether to re-execute a tool. Content-hash idempotency keys are context-window-agnostic — the key is derived from the operation's content, not the conversation position.

3. Cohere Embed + Rerank in billing pipelines

Some billing agents use Cohere Embed to retrieve relevant invoice records from a vector store before calling charge_stripe. If the retrieval step returns the same invoice record twice (near-duplicate embeddings, re-indexed documents), the agent may call charge_stripe twice for the same invoice in a single run. A content-hash idempotency key derived from the invoice record's (customer_id, amount_cents, billing_period) fields deduplicates this at the Stripe layer regardless of how many retrieval results the model consumed.

4. Structured generation and tool-call schema mismatch

Command R+ uses a trained tool-calling format. When the model's output does not match the declared parameter_definitions schema (wrong type for amount_cents, missing billing_period), the Cohere SDK raises a cohere.BadRequestError or returns a malformed ToolCall object. Application code that catches this and retries the full co.chat() call restarts the model from the user message — if the model previously succeeded in calling charge_stripe before hitting the schema error on a subsequent tool, the retry re-executes the successful charge. Validate tool output types before passing them to Stripe; never retry the full loop on schema errors after any Stripe call has completed.

FAQ

Does Cohere's `force_single_step=True` prevent parallel tool calls?

force_single_step=True (v1 API) forces the model to emit exactly one tool call per response before waiting for a result. This prevents parallel tool calls from the same response, which closes failure mode 1. However, it does not address SDK/app-level retry (failure mode 2) or chat_history replay (failure mode 3). It also increases latency for multi-step billing workflows. Use it if your billing logic is strictly sequential; apply idempotency keys regardless.

Can I use Stripe's built-in idempotency without a content-hash key?

You can pass any string as the idempotency key. A UUID generated once per loop run works if the retry logic always uses the same UUID. The problem is that UUIDs are regenerated on application restart, container redeploy, or after an uncaught exception clears the local variable. A content-hash key derived from (customer_id, amount_cents, billing_period) survives all of these events because it is recomputed from data, not stored state.

How do I handle two legitimate charges for the same customer in the same billing period?

Add a disambiguator to the key: (customer_id, amount_cents, billing_period, charge_type) where charge_type is "subscription", "overage", "setup-fee", etc. This keeps the key stable across retries while allowing multiple distinct charges per period.

What happens if the vault key daily cap is exhausted mid-batch?

The proxy returns 429 Daily cap exceeded. The charge_stripe_tool function catches this as a StripeError and returns {"status": "error", "message": "daily cap exceeded"}. The model receives this as a tool result and can either stop the loop or report the cap to the caller. The key distinction from an uncapped key: the proxy enforces the cap on a single vault key. Other customers' billing runs use their own vault keys with their own caps — one runaway batch does not exhaust the shared Stripe account.

Does this pattern work with Cohere's multi-agent `connectors` API?

Cohere's connectors (server-side data retrieval integrations) do not expose tool-calling in the same way. For custom tool use with the Cohere connectors API, the same pattern applies: wrap the Stripe call in a connector handler that computes a content-hash idempotency key before calling stripe.Charge.create(). Per-run vault keys require that the connector handler receive the vault key per request, not at initialization time.

Should I use `request_options=RequestOptions(max_retries=0)` to disable SDK retries?

Disabling SDK retries is a reasonable safeguard for the tool-result submission call (the second co.chat() in the loop). For the initial call (getting tool calls), SDK retries are mostly safe — the model hasn't called any tools yet. The most important layer to protect is the application-level retry wrapper around the whole loop. That retry must not re-run charge_stripe without an idempotency key. Whether or not SDK retries are enabled, a content-hash idempotency key in the tool function is the correct guard.

Scoped keys for every billing call

Keybrake issues per-run vault keys with endpoint allowlists and daily USD caps — so parallel tool calls, retry loops, and session replays all collapse to a single Stripe charge. Drop-in proxy endpoint, one line of code to switch.