Agent Governance

Mistral Agents API Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 18, 2026 · 9 min read

Mistral's Agents API makes it straightforward to build a billing-capable agent with function calling — define a charge_stripe tool, attach it to an agent at agents.create() time, and call agents.complete() for each billing request. Three specific failure modes emerge at scale: Mistral large models emit parallel tool_calls in a single completion, potentially firing two Stripe charges before any result is registered; the Stripe key baked into agent-level tool definitions is shared across all concurrent agents.complete() calls with no per-run isolation; and production retry wrappers on agents.complete() retrigger billing after transient HTTP errors, firing a second charge on a customer who was already billed.

This post covers all three failure modes specific to the Mistral Agents API and the two-layer governance pattern that closes each one: a restricted Stripe API key as a first layer, and per-run vault keys via a spend-cap proxy as a second.

The standard Mistral Agents API Stripe pattern

The standard pattern for a Stripe-capable Mistral agent creates a persistent agent with a charge_stripe function tool and calls agents.complete() per billing request. The tool function is defined at the module level, imports the Stripe API key from the environment, and creates the charge:

import os
import json
import stripe
from mistralai import Mistral

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # sk_live_...
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Tool schema registered with the agent
charge_stripe_schema = {
    "type": "function",
    "function": {
        "name": "charge_stripe",
        "description": "Create a Stripe charge for a customer subscription.",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_id":    {"type": "string",  "description": "Stripe customer ID (cus_...)"},
                "amount_cents":   {"type": "integer", "description": "Amount to charge in cents"},
                "billing_period": {"type": "string",  "description": "Billing period, e.g. '2026-06'}
            },
            "required": ["customer_id", "amount_cents", "billing_period"]
        }
    }
}

# Agent created once at startup — tool definition stored at agent level
agent = client.beta.agents.create(
    model="mistral-large-latest",
    description="Subscription billing agent",
    instructions="Handle subscription billing. Use charge_stripe to create Stripe charges.",
    tools=[charge_stripe_schema],
)


def run_billing(customer_id: str, amount_cents: int, billing_period: str) -> str:
    response = client.beta.agents.complete(
        agent_id=agent.id,
        messages=[{
            "role": "user",
            "content": f"Charge customer {customer_id} ${amount_cents // 100} for {billing_period}."
        }]
    )

    # Iterate tool calls and execute them
    for choice in response.choices:
        if choice.message.tool_calls:
            for tool_call in choice.message.tool_calls:
                if tool_call.function.name == "charge_stripe":
                    args = json.loads(tool_call.function.arguments)
                    charge = stripe.Charge.create(
                        amount=args["amount_cents"],
                        currency="usd",
                        customer=args["customer_id"],
                        description=f"Subscription {args['billing_period']}",
                        # No idempotency_key
                    )
                    return f"Charged: {charge.id} status={charge.status}"

    return "No charge created"

This works correctly in the common case. Three failure modes surface at production scale.

Failure mode 1: Parallel tool calls emit two `charge_stripe` invocations in one completion

Mistral large models support parallel function calling — the ability to return multiple tool_calls entries in a single assistant message. This is useful when independent tools can run concurrently (e.g., get_customer_info + check_existing_charges), but it becomes dangerous when a billing workflow has any ambiguity about scope. If the user message mentions two customers, two billing periods, or the model interprets "process all outstanding invoices" as requiring two simultaneous charges, it emits both as tool_calls in one response. The standard iteration loop executes both before any result is registered:

import os, json
import stripe
from mistralai import Mistral

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# ⚠ Agent prompt is ambiguous about scope — "process outstanding invoices"
response = client.beta.agents.complete(
    agent_id=agent.id,
    messages=[{
        "role": "user",
        "content": (
            "Process all outstanding invoices for Q2: "
            "cus_Abc123 owes $29 for June, cus_Def456 owes $29 for June."
        )
    }]
)

# Mistral-large may return:
# choice.message.tool_calls = [
#   ToolCall(function=FunctionCall(name="charge_stripe",
#            arguments='{"customer_id":"cus_Abc123","amount_cents":2900,"billing_period":"2026-06"}')),
#   ToolCall(function=FunctionCall(name="charge_stripe",
#            arguments='{"customer_id":"cus_Def456","amount_cents":2900,"billing_period":"2026-06"}'))
# ]

for choice in response.choices:
    if choice.message.tool_calls:
        for tool_call in choice.message.tool_calls:
            if tool_call.function.name == "charge_stripe":
                args = json.loads(tool_call.function.arguments)
                # Both charges fire here before any result goes back to the model.
                # If the second stripe.Charge.create() raises a transient error,
                # the first charge already completed — but the caller has no record of it.
                charge = stripe.Charge.create(
                    amount=args["amount_cents"],
                    currency="usd",
                    customer=args["customer_id"],
                    description=f"Subscription {args['billing_period']}",
                    # No idempotency_key — each call is treated as a new charge
                )
                print(f"Charged {args['customer_id']}: {charge.id}")

What breaks: Mistral-large emits two simultaneous charge_stripe tool calls in one completion. Both fire immediately in the iteration loop — cus_Abc123 and cus_Def456 are each billed. If the second stripe.Charge.create() raises a transient network error, the partial-success state is invisible to the model: no tool result was ever sent back, so the model has no record that the first charge completed. On the retry turn (or when the caller's outer retry fires), the model calls charge_stripe again for both customers — the first customer gets billed twice. Neither Stripe nor the model can distinguish the retry from the original without an idempotency key.

The fix is to compute a content-hash idempotency key from (customer_id, amount_cents, billing_period) before each Stripe call, so all retries of the same billing operation — regardless of which turn or which parallel slot they come from — resolve to the same Stripe charge. Wrap the Stripe call in a try/except that returns the error as a string so the model sees a final result rather than an exception that triggers another retry:

import hashlib, os, json
import stripe
from mistralai import Mistral

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]  # POST /v1/charges only

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

billing_client = stripe.StripeClient(
    api_key=BILLING_KEY,
    base_url=PROXY_URL + "/stripe/",
)


def execute_tool_call(tool_call) -> str:
    if tool_call.function.name != "charge_stripe":
        return f"Unknown tool: {tool_call.function.name}"

    args = json.loads(tool_call.function.arguments)
    customer_id    = args["customer_id"]
    amount_cents   = int(args["amount_cents"])
    billing_period = args["billing_period"]

    idempotency_key = hashlib.sha256(
        f"{customer_id}:{amount_cents}:{billing_period}".encode()
    ).hexdigest()[:32]

    try:
        charge = billing_client.charges.create(params={
            "amount":          amount_cents,
            "currency":        "usd",
            "customer":        customer_id,
            "description":     f"Subscription {billing_period}",
            "idempotency_key": idempotency_key,
        })
        return f"Charged: {charge.id} status={charge.status} idem={idempotency_key}"
    except stripe.StripeError as e:
        # Return error as string — model sees a final result, not an exception
        return f"Stripe error (not retried): {e} idem={idempotency_key}"


response = client.beta.agents.complete(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Process outstanding invoices for Q2..."}]
)

for choice in response.choices:
    if choice.message.tool_calls:
        for tool_call in choice.message.tool_calls:
            result = execute_tool_call(tool_call)
            print(result)

What this fixes: The idempotency key is derived from (customer_id, amount_cents, billing_period) — deterministic across all calls for the same billing operation. Whether Mistral emits the same charge_stripe call in two parallel slots, across two retry turns, or across two separate agents.complete() invocations, Stripe deduplicates all requests with the same key and returns the original charge object. Returning StripeError as a string prevents the model from entering a retry loop on its next turn. The vault key routes all calls through the spend-cap proxy, writing one deduplicated audit log entry per unique (customer_id, amount_cents, billing_period) triple.

Failure mode 2: Agent-level key definition shares one Stripe key across all concurrent runs

Mistral agents are created once at startup via agents.create() and reused across many agents.complete() calls. The standard pattern defines the Stripe client at module load time — either as a module-level global or baked into the tool function via the enclosing scope — and the same client is used by every concurrent billing run. In a multi-tenant billing scenario where dozens of customer invoices are processed simultaneously, every concurrent agents.complete() call shares the same Stripe key with the same daily spend cap:

import os, json
import stripe
from mistralai import Mistral
from concurrent.futures import ThreadPoolExecutor

# Module-level Stripe client — shared by ALL concurrent agents.complete() calls
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # sk_live_...
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

agent = client.beta.agents.create(
    model="mistral-large-latest",
    instructions="Handle subscription billing.",
    tools=[charge_stripe_schema],
)


def bill_customer(customer: dict) -> str:
    response = client.beta.agents.complete(
        agent_id=agent.id,  # Same agent_id for every customer
        messages=[{
            "role": "user",
            "content": f"Charge {customer['id']} ${customer['amount_usd']} for {customer['period']}."
        }]
    )
    # Tool call execution uses module-level stripe.api_key
    for choice in response.choices:
        if choice.message.tool_calls:
            for tc in choice.message.tool_calls:
                args = json.loads(tc.function.arguments)
                # Every thread uses the same sk_live_ key — no per-customer isolation
                charge = stripe.Charge.create(**args)
                return f"{customer['id']}: {charge.id}"
    return f"{customer['id']}: no charge"


customers = [
    {"id": "cus_Abc123", "amount_usd": 29, "period": "2026-06"},
    {"id": "cus_Def456", "amount_usd": 99, "period": "2026-06"},
    # ... 50 more customers
]

# All 52 threads share the same Stripe key with no per-customer spend cap
with ThreadPoolExecutor(max_workers=10) as pool:
    results = list(pool.map(bill_customer, customers))

What breaks: All 52 concurrent billing threads share the same Stripe key with no per-customer or per-run isolation. If the monthly billing run for a Team customer at $99/month has a bug that retries ten times, a single customer's billing loop can exhaust the day's proxy spend cap — blocking every other customer's invoice for the rest of the day. In a stricter scenario: if the same billing key is used for both production charges and an internal test suite that runs concurrently, a test environment mistake can charge a production customer. There's no audit differentiation between which agents.complete() call created which Stripe charge, because all calls use the same key.

The fix is to issue per-run vault keys from the proxy — one vault key per billing operation or per billing session. Each vault key carries its own daily cap equal to the expected maximum for that customer tier. The vault key is injected at run time, not stored at agent-creation time, so a runaway billing loop for one customer cannot affect another:

import hashlib, os, json
import stripe
from mistralai import Mistral
from concurrent.futures import ThreadPoolExecutor

PROXY_URL = "https://proxy.keybrake.com"
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

agent = client.beta.agents.create(
    model="mistral-large-latest",
    instructions="Handle subscription billing.",
    tools=[charge_stripe_schema],
)


def make_billing_client(vault_key: str) -> stripe.StripeClient:
    """Create a Stripe client scoped to one billing run via per-run vault key."""
    return stripe.StripeClient(
        api_key=vault_key,
        base_url=PROXY_URL + "/stripe/",
    )


def execute_charge(billing_client: stripe.StripeClient, args: dict) -> str:
    idempotency_key = hashlib.sha256(
        f"{args['customer_id']}:{args['amount_cents']}:{args['billing_period']}".encode()
    ).hexdigest()[:32]
    try:
        charge = billing_client.charges.create(params={
            "amount":          int(args["amount_cents"]),
            "currency":        "usd",
            "customer":        args["customer_id"],
            "description":     f"Subscription {args['billing_period']}",
            "idempotency_key": idempotency_key,
        })
        return f"Charged: {charge.id} status={charge.status}"
    except stripe.StripeError as e:
        return f"Stripe error: {e}"


def bill_customer(customer: dict) -> str:
    # Fetch a per-run vault key from the proxy (POST /admin/vault-keys)
    # Each key has: daily_cap = customer tier max, allowed_endpoints = [POST /v1/charges]
    vault_key = customer["vault_key"]  # provisioned per billing run, not shared
    billing_client = make_billing_client(vault_key)

    response = client.beta.agents.complete(
        agent_id=agent.id,
        messages=[{
            "role": "user",
            "content": f"Charge {customer['id']} ${customer['amount_usd']} for {customer['period']}."
        }]
    )

    for choice in response.choices:
        if choice.message.tool_calls:
            for tc in choice.message.tool_calls:
                if tc.function.name == "charge_stripe":
                    args = json.loads(tc.function.arguments)
                    return execute_charge(billing_client, args)

    return f"{customer['id']}: no charge"

What this fixes: Each billing run receives its own vault key from the proxy, provisioned with a daily cap equal to the expected maximum for that customer's tier (e.g., $99 for Team, $399 for Scale). A runaway billing loop for one customer can exhaust only that customer's vault key cap — it cannot affect other customers' billing runs because they hold separate vault keys. The proxy audit log records every charge under its respective vault key, giving per-customer billing attribution in one queryable table. The agent-level tool definition no longer embeds a Stripe key, so rotating Stripe credentials requires only a proxy-side update.

Failure mode 3: Retry wrapper on `agents.complete()` retriggers billing after partial completion

Production systems wrap agents.complete() in a retry decorator — tenacity, backoff, or a manual loop — to handle transient Mistral API errors (HTTP 429, 503). The retry fires a new agents.complete() call from the start of the user turn. If the original agents.complete() call succeeded far enough for the model to call charge_stripe and Stripe to accept the charge, but the Mistral API then timed out before returning the final completion to the caller, the retry starts a fresh completion. The model has no memory of the previous turn's tool execution; it calls charge_stripe again. Second Stripe charge, same customer, same amount:

import time, os, json
import stripe
from mistralai import Mistral

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])


def bill_with_retry(agent_id: str, customer_id: str,
                    amount_cents: int, billing_period: str,
                    max_retries: int = 3) -> str:
    last_error = None
    for attempt in range(max_retries):
        try:
            # Each retry starts a FRESH agents.complete() from the user turn.
            # If a prior attempt got far enough for the model to call charge_stripe
            # and Stripe to create the charge — but the Mistral API then timed out —
            # this retry fires another completion. The model calls charge_stripe again.
            response = client.beta.agents.complete(
                agent_id=agent_id,
                messages=[{
                    "role": "user",
                    "content": (
                        f"Charge customer {customer_id} "
                        f"${amount_cents // 100} for {billing_period}."
                    )
                }]
            )

            for choice in response.choices:
                if choice.message.tool_calls:
                    for tc in choice.message.tool_calls:
                        if tc.function.name == "charge_stripe":
                            args = json.loads(tc.function.arguments)
                            # First attempt: Stripe accepts the charge, Mistral times out.
                            # Retry: Stripe has no idempotency key → creates a second charge.
                            charge = stripe.Charge.create(
                                amount=args["amount_cents"],
                                currency="usd",
                                customer=args["customer_id"],
                                description=f"Subscription {args['billing_period']}",
                                # No idempotency_key
                            )
                            return f"Charged: {charge.id}"

        except Exception as e:
            last_error = e
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # exponential backoff — meanwhile, Stripe already charged

    return f"Failed after {max_retries} attempts: {last_error}"

What breaks: The retry loop fires a brand-new agents.complete() call on each attempt. The Mistral model starts from the user message with no knowledge of prior tool calls. It calls charge_stripe again — Stripe sees a new POST request without an idempotency key and creates a new charge. With max_retries=3 and all three attempts partially completing before timeout, the customer is billed three times. The Mistral completion log shows three distinct completion requests, each with a successful charge_stripe tool call result, making it appear that billing was intentionally triggered three times.

The idempotency key is the only safe mechanism here: since it must be computed from billing-operation parameters before the Stripe call, it is stable across all retry attempts. Even if agents.complete() fires three times and the model calls charge_stripe three times, Stripe deduplicates all three and returns the original charge:

import hashlib, time, os, json
import stripe
from mistralai import Mistral

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
billing_client = stripe.StripeClient(
    api_key=BILLING_KEY,
    base_url=PROXY_URL + "/stripe/",
)


def compute_idempotency_key(customer_id: str,
                             amount_cents: int,
                             billing_period: str) -> str:
    return hashlib.sha256(
        f"{customer_id}:{amount_cents}:{billing_period}".encode()
    ).hexdigest()[:32]


def execute_charge_safe(args: dict) -> str:
    idem = compute_idempotency_key(
        args["customer_id"],
        int(args["amount_cents"]),
        args["billing_period"],
    )
    try:
        charge = billing_client.charges.create(params={
            "amount":          int(args["amount_cents"]),
            "currency":        "usd",
            "customer":        args["customer_id"],
            "description":     f"Subscription {args['billing_period']}",
            "idempotency_key": idem,
        })
        return f"Charged: {charge.id} status={charge.status} idem={idem}"
    except stripe.StripeError as e:
        return f"Stripe error: {e} idem={idem}"


def bill_with_retry(agent_id: str, customer_id: str,
                    amount_cents: int, billing_period: str,
                    max_retries: int = 3) -> str:
    last_error = None
    for attempt in range(max_retries):
        try:
            response = client.beta.agents.complete(
                agent_id=agent_id,
                messages=[{
                    "role": "user",
                    "content": (
                        f"Charge customer {customer_id} "
                        f"${amount_cents // 100} for {billing_period}."
                    )
                }]
            )

            for choice in response.choices:
                if choice.message.tool_calls:
                    for tc in choice.message.tool_calls:
                        if tc.function.name == "charge_stripe":
                            args = json.loads(tc.function.arguments)
                            # Safe: idempotency key derived from billing parameters,
                            # stable across all retry attempts.
                            # Stripe deduplicates; proxy logs one entry per unique key.
                            return execute_charge_safe(args)

        except Exception as e:
            last_error = e
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)

    return f"Failed after {max_retries} attempts: {last_error}"

What this fixes: The idempotency key is derived from (customer_id, amount_cents, billing_period) before the Stripe call — independently of which agents.complete() attempt generated it and which parallel tool slot it came from. If bill_with_retry fires three times and the Mistral model calls charge_stripe three times across those attempts, all three POST requests carry the same idempotency key. Stripe returns the original charge object on the second and third requests; the customer is billed exactly once. The vault key routes all three requests through the spend-cap proxy, which records one audit entry per unique idempotency key and attributes the deduplicated charge to the correct billing vault key.

Comparison: raw key vs restricted key vs vault key

Capability	Raw Stripe key	Restricted Stripe key	Vault key (proxy)
Endpoint allowlist	All endpoints	Configured at Stripe	Configured per vault key
Daily USD spend cap	None	None	Per vault key, enforced at proxy
Per-run isolation	None — all concurrent runs share key	None — all runs share key	Yes — one vault key per billing run
Parallel tool call guard	None — each call creates a new charge	None	Idempotency key deduplicates at Stripe layer
Retry wrapper guard	None — each retry attempt re-charges	None	Idempotency key deduplicates across attempts
Audit log	Stripe Dashboard only	Stripe Dashboard only	Proxy audit table, per vault key, queryable
Kill switch	Rotate key (breaks all agents)	Rotate key (breaks all agents)	Revoke vault key (one agent, zero downtime)

pytest suite

import hashlib
import pytest
from unittest.mock import MagicMock, patch


# --- Idempotency key stability ---

def test_idempotency_key_deterministic():
    """Same billing parameters → same key across all retry attempts."""
    def compute_key(customer_id, amount_cents, billing_period):
        return hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

    key1 = compute_key("cus_Abc123", 2900, "2026-06")
    key2 = compute_key("cus_Abc123", 2900, "2026-06")
    assert key1 == key2, "Idempotency key must be stable across retry attempts"


def test_different_periods_produce_different_keys():
    """Different billing periods → different idempotency keys (no cross-period dedup)."""
    def compute_key(customer_id, amount_cents, billing_period):
        return hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

    june_key = compute_key("cus_Abc123", 2900, "2026-06")
    july_key = compute_key("cus_Abc123", 2900, "2026-07")
    assert june_key != july_key, "Different billing periods must produce distinct idempotency keys"


# --- Error handling ---

def test_stripe_error_returned_not_raised():
    """StripeError must be returned as string so model sees a final result, not an exception."""
    import stripe

    with patch("stripe.StripeClient") as MockClient:
        mock_charges = MagicMock()
        mock_charges.create.side_effect = stripe.StripeError("Connection reset by peer")
        MockClient.return_value.charges = mock_charges

        billing_client = MockClient(api_key="vk_test", base_url="https://proxy.example.com/stripe/")

        try:
            billing_client.charges.create(params={"amount": 2900})
            result = "Charged: ch_xxx"
        except stripe.StripeError as e:
            result = f"Stripe error: {e}"

        assert result.startswith("Stripe error:"), (
            "StripeError must be caught and returned as string, not re-raised"
        )


# --- Per-run vault key isolation ---

def test_per_run_vault_keys_are_distinct():
    """Each billing run gets its own vault key — no shared key across concurrent runs."""
    vault_keys = [f"vk_run_{i:04d}" for i in range(10)]
    assert len(set(vault_keys)) == 10, "Each concurrent billing run must receive a distinct vault key"


# --- Parallel tool call deduplication ---

def test_parallel_tool_calls_deduplicated_by_idempotency_key():
    """Two parallel tool_calls with the same parameters → same idempotency key → one Stripe charge."""
    def compute_key(customer_id, amount_cents, billing_period):
        return hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

    # Simulate Mistral returning two identical tool_calls in one completion
    parallel_calls = [
        {"customer_id": "cus_Abc123", "amount_cents": 2900, "billing_period": "2026-06"},
        {"customer_id": "cus_Abc123", "amount_cents": 2900, "billing_period": "2026-06"},
    ]

    keys = [compute_key(**call) for call in parallel_calls]
    assert keys[0] == keys[1], (
        "Identical parallel tool_calls must produce the same idempotency key "
        "so Stripe deduplicates them to one charge"
    )

Gap analysis

The patterns above close the three primary failure modes. Four edge cases remain worth testing in your specific Mistral deployment:

Mistral streaming mode parallel tool calls. When using agents.stream() instead of agents.complete(), tool call deltas arrive as a stream of events. A common streaming handler accumulates tool call chunks and fires the tool when the stream closes. If the stream is interrupted mid-way through a parallel tool call list, the handler may fire only the first of two tool calls before the connection drops — then retry the full list. Test that your streaming handler deduplicates by idempotency key, not by tool call index.
model="mistral-large-latest" vs pinned model version. Mistral's parallel function calling behaviour differs across model versions. mistral-large-latest resolves to the current production model and may change as Mistral releases updates. A model update could change how aggressively the model batches parallel tool calls. Pin to a specific versioned model ID in production billing code and review tool-calling behaviour on each model upgrade.
Per-run vault key provisioning latency. Fetching a new vault key from the proxy adds a round-trip before each billing run. In a high-concurrency batch (50+ customers billed in parallel), the vault key provisioning endpoint becomes a bottleneck. Consider pre-fetching vault keys in bulk at the start of a billing batch and mapping one key per customer in your run coordinator, rather than fetching one key per agents.complete() call.
Agent instructions inherited by all runs. Mistral agent instructions are stored at the agent level and applied to every agents.complete() call. If your billing instructions include any reference to specific customers, amounts, or prior transactions, they become shared state across all billing runs for that agent. Keep agent instructions generic ("Handle subscription billing") and pass customer-specific context exclusively in the user message of each agents.complete() call.

Frequently asked questions

Does Mistral's Agents API support conversation continuity across `agents.complete()` calls?

Yes — if you pass prior conversation turns in the messages array, the model has access to prior tool calls and results. For billing agents, keep each billing operation as a single-turn conversation: one user message, one agents.complete() call. Avoid passing billing history from prior runs to prevent the model from replaying prior charges on ambiguous follow-up prompts.

Can I use `tool_choice="any"` to force the model to call `charge_stripe`?

Yes, but with caution: tool_choice="any" forces at least one tool call per completion. Combined with a retry wrapper that fires a new agents.complete() on error, it guarantees the model always calls a tool on every attempt — including retries where the original call already succeeded. Only use tool_choice="any" in billing code if combined with idempotency keys; otherwise retry + forced tool use = guaranteed double charge.

Is the idempotency key safe to reuse across different `Mistral` API clients?

The idempotency key is sent to Stripe (via the proxy), not to Mistral. It is safe to compute the same key in multiple threads, processes, or retry wrappers — Stripe's idempotency layer handles deduplication server-side. The key is only consumed at the Stripe layer; Mistral has no visibility into it.

What happens if the vault key's daily cap is exhausted mid-billing-batch?

The proxy returns an HTTP 403 with a spend_cap_exceeded error body. The tool function's try/except stripe.StripeError block catches this and returns it as a string result to the Mistral model. The model sees a final error (not a network failure) and stops retrying. Your billing coordinator should check for spend_cap_exceeded in the tool result and alert rather than retry — exhausted cap usually indicates a billing loop or misconfiguration, not a transient error.

How do I handle the case where the same customer has two outstanding invoices that should both be charged?

Pass both billing operations as separate agents.complete() calls with separate user messages — one per invoice. Do not batch both into a single message like "charge cus_Abc123 for June and July." Batching invites the model to emit parallel tool calls that can fail independently, leaving you in a partially-completed state. Sequential calls are slower but trivially auditable: each has its own idempotency key, vault key allocation, and proxy log entry.

Should I create one Mistral agent per customer tier, or one shared agent for all billing?

One shared agent per billing role is sufficient — the per-run isolation comes from vault keys, not from agent identity. The agent's instructions define the billing pattern; the vault key caps the spend for each run. If you need genuinely different billing behaviour per tier (e.g., different currency, different retry policy), create separate agents; otherwise, one agent with per-run vault keys covers all tiers cleanly.

Scoped vault keys for Mistral billing agents

Keybrake issues per-run vault keys with daily spend caps, endpoint allowlists, and a proxy audit log — so one Mistral billing loop can't exhaust your entire Stripe daily limit and every charge is attributed to the run that created it.