Agent Governance

Agno Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 14, 2026 · 9 min read

Agno (formerly Phidata) makes it straightforward to build a payment-capable agent: define a @tool-decorated function that calls Stripe, hand it to an Agent, and let the LLM decide when to charge. What it doesn't handle automatically is what happens when that tool raises an exception mid-call — because Agno feeds the error back to the LLM, which retries the tool, and the retry fires a second Stripe charge without any idempotency key.

This post covers three failure modes specific to Agno's architecture — tool exception retry, Team multi-agent scope, and session history replay — and shows the two-layer governance pattern that closes all three: a restricted Stripe API key as a first layer, and per-run vault keys via a spend-cap proxy as a second.

The standard Agno Stripe pattern

Agno's tool system is built around plain Python functions decorated with @tool. The agent calls them when the LLM decides they're needed, and the return value is fed back into context. A billing agent looks like this:

import os
import stripe
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools import tool

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # sk_live_...

@tool
def charge_stripe(customer_id: str, amount_cents: int, description: str) -> str:
    """Charge a Stripe customer and return the charge ID."""
    charge = stripe.Charge.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        description=description,
    )
    return charge.id

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[charge_stripe],
    show_tool_calls=True,
)

agent.run("Charge customer cus_Abc123 $29 for the Hobby plan upgrade")

This works perfectly under ideal conditions. The problem is that production conditions are not ideal: networks time out, Stripe returns transient 429s, and the charge can succeed on Stripe's side while the API response times out before it reaches your agent. All three scenarios expose a failure mode.

Failure mode 1: Tool exception retry fires a duplicate charge

When a @tool function raises an uncaught exception in Agno, the framework catches it, formats the error as a tool result, and returns it to the LLM as a failed tool call. The LLM then decides what to do with that failure — and by default, it retries the tool with the same arguments.

# Agno feeds this back to the LLM as a tool result:
# {"tool": "charge_stripe", "result": "Error: APIConnectionError: ..."}
# LLM response: "The charge failed. Retrying..."
# Second charge fires — now you have two charges for one operation.

What breaks: A Stripe APIConnectionError or Timeout fires after Stripe has accepted the charge but before the response reaches your agent. The tool raises an exception. Agno returns the error to the LLM. The LLM retries charge_stripe with identical arguments. The second call creates a new charge — no idempotency key means Stripe has no way to detect the duplicate. The customer is billed twice.

The fix is to generate a stable idempotency key before the tool is called and capture it in the tool's closure. The key must be deterministic for the same logical operation: same customer, same amount, same agent session. The simplest approach is to derive it from the session ID so any number of retries within one agent run produce the same key:

import uuid
import os
import stripe
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools import tool

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

def make_stripe_tool(session_id: str):
    """Return a charge_stripe tool with idempotency key bound to session_id."""

    @tool
    def charge_stripe(customer_id: str, amount_cents: int, description: str) -> str:
        """Charge a Stripe customer and return the charge ID."""
        # Key is stable across all retries for this (session, customer, amount) triple
        idempotency_key = f"{session_id}-{customer_id}-{amount_cents}"
        charge = stripe.Charge.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
            description=description,
            idempotency_key=idempotency_key,
        )
        return charge.id

    return charge_stripe

session_id = str(uuid.uuid4())

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[make_stripe_tool(session_id)],
    session_id=session_id,
)

agent.run("Charge customer cus_Abc123 $29 for the Hobby plan upgrade")

What this fixes: Every retry within the same agent session sends the same idempotency key to Stripe. Stripe deduplicates the request and returns the original charge object. The customer is billed exactly once regardless of how many retries the LLM makes.

One subtlety: the idempotency key must be unique per logical operation, not per session alone. If the same agent session charges the same customer twice (two separate upgrades), the keys must differ. Adding amount_cents and customer_id to the key handles the common case; for multi-step sequences in one session, consider including a step counter.

Failure mode 2: Team agents share Stripe access across all members

Agno's Team class lets a coordinator model delegate tasks to specialized member agents. A billing team might have one agent that handles charges and another that handles refunds — sensible separation of concerns. The problem is that each member agent's tools are visible to the team leader, and the leader can route any billing-adjacent task to either member:

from agno.team.team import Team
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools import tool
import stripe, os

STRIPE_KEY = os.environ["STRIPE_SECRET_KEY"]  # sk_live_ — one key for all

@tool
def charge_stripe(customer_id: str, amount_cents: int, description: str) -> str:
    """Charge a customer."""
    stripe.api_key = STRIPE_KEY
    charge = stripe.Charge.create(
        amount=amount_cents, currency="usd",
        customer=customer_id, description=description,
    )
    return charge.id

@tool
def refund_stripe(charge_id: str, amount_cents: int) -> str:
    """Refund a charge."""
    stripe.api_key = STRIPE_KEY
    refund = stripe.Refund.create(charge=charge_id, amount=amount_cents)
    return refund.id

billing_agent = Agent(
    name="Biller",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[charge_stripe],
)

fulfillment_agent = Agent(
    name="Fulfiller",
    model=OpenAIChat(id="gpt-4o-mini"),
    tools=[refund_stripe, charge_stripe],  # Has charge access too, for billing corrections
)

team = Team(
    agents=[billing_agent, fulfillment_agent],
    model=OpenAIChat(id="gpt-4o"),
    mode="route",
)

team.run("Process the billing correction for order #4821")

What breaks: One Stripe key governs both agents. The fulfillment agent has both charge and refund access. The team leader can route "billing correction" to either agent. If the team leader delegates to Fulfiller (which has charge_stripe), the fulfillment agent can charge the customer — no policy enforcement stops it. A compromised team leader prompt or a confused routing decision can trigger charges from an agent that shouldn't be billing.

The fix is per-role vault keys issued from a proxy. Each agent type gets a vault key that enforces exactly what that role is allowed to do on Stripe. The billing agent's vault key allows only POST /v1/charges. The fulfillment agent's vault key allows only POST /v1/refunds. Even if the team leader routes incorrectly, the wrong vault key will get a 403 from the proxy:

import os
import stripe
from agno.team.team import Team
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools import tool

PROXY_URL = "https://proxy.keybrake.com"

def make_biller(vault_key: str) -> Agent:
    client = stripe.StripeClient(api_key=vault_key, base_url=PROXY_URL + "/stripe/")

    @tool
    def charge_stripe(customer_id: str, amount_cents: int, description: str) -> str:
        """Charge a Stripe customer."""
        charge = client.charges.create(params={
            "amount": amount_cents,
            "currency": "usd",
            "customer": customer_id,
            "description": description,
        })
        return charge.id

    return Agent(name="Biller", model=OpenAIChat(id="gpt-4o-mini"), tools=[charge_stripe])

def make_fulfiller(vault_key: str) -> Agent:
    client = stripe.StripeClient(api_key=vault_key, base_url=PROXY_URL + "/stripe/")

    @tool
    def refund_stripe(charge_id: str, amount_cents: int) -> str:
        """Refund a Stripe charge."""
        refund = client.refunds.create(params={"charge": charge_id, "amount": amount_cents})
        return refund.id

    return Agent(name="Fulfiller", model=OpenAIChat(id="gpt-4o-mini"), tools=[refund_stripe])

team = Team(
    agents=[
        make_biller(os.environ["VAULT_KEY_BILLING"]),
        make_fulfiller(os.environ["VAULT_KEY_FULFILLMENT"]),
    ],
    model=OpenAIChat(id="gpt-4o"),
    mode="route",
)

team.run("Process the billing correction for order #4821")

What this fixes: Each vault key is issued with a policy that restricts it to one endpoint group. The proxy enforces the policy at the HTTP layer — the agent can't exceed it even if the LLM tries. Daily spend caps on each vault key bound how much each role can charge in 24 hours. One-click revoke in the Keybrake dashboard kills a compromised agent without touching the real Stripe key.

Failure mode 3: Session history replay re-invites tool calls

Agno supports persistent agent sessions via SqlAgentStorage. When you pass add_history_to_messages=True, Agno loads prior conversation turns — including tool call results — back into the LLM context on each run. This is useful for conversational continuity, but it creates a subtle billing risk when the session is resumed with a prompt that's ambiguous about whether a past action should be repeated:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.storage.agent.sqlite import SqlAgentStorage
from agno.tools import tool
import stripe, os

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

@tool
def charge_stripe(customer_id: str, amount_cents: int, description: str) -> str:
    """Charge a Stripe customer."""
    charge = stripe.Charge.create(
        amount=amount_cents, currency="usd",
        customer=customer_id, description=description,
    )
    return charge.id

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[charge_stripe],
    storage=SqlAgentStorage(table_name="billing_sessions", db_file="agent.db"),
    add_history_to_messages=True,
    num_history_responses=10,
)

SESSION_ID = "customer-cus_Abc123-billing"

# First run — June 1
agent.run("Charge cus_Abc123 $29 for June subscription", session_id=SESSION_ID)
# charge_stripe called → ch_001 created

# ... 30 days later ...

# Second run — July 1 (same session_id, history loaded)
agent.run("Charge cus_Abc123 $29 for July subscription", session_id=SESSION_ID)
# History in context: [user: "Charge $29 for June", tool_result: "ch_001"]
# LLM sees a prior successful charge and correctly fires a new one — fine.

# But consider this prompt:
agent.run("Retry the last billing operation", session_id=SESSION_ID)
# LLM sees prior charge in history, interprets "retry" as re-calling charge_stripe
# → fires a second charge with the same $29 — no idempotency key, new ch_002
# Customer billed twice for July.

What breaks: The session history replay is not guarded against ambiguous prompts. "Retry the last billing operation", "Reprocess the payment", or even "Confirm that the June subscription was handled" can all cause the LLM to re-invoke charge_stripe based on the historical tool results in context. Without idempotency keys, each invocation produces a new charge.

The fix has two parts. First, always use idempotency keys derived from a charge-specific token rather than the session ID alone — a content hash of (customer_id, amount_cents, billing_period) produces a stable key that's safe to send twice. Second, store the charge ID in the session storage as structured metadata so the agent can check whether a charge already exists for a given period before calling Stripe:

import hashlib, json, os
import stripe
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.storage.agent.sqlite import SqlAgentStorage
from agno.tools import tool

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

def charge_idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
    payload = json.dumps({"c": customer_id, "a": amount_cents, "p": billing_period}, sort_keys=True)
    return "agno-" + hashlib.sha256(payload.encode()).hexdigest()[:32]

@tool
def charge_stripe(customer_id: str, amount_cents: int, billing_period: str) -> str:
    """Charge a Stripe customer for a specific billing period.

    Args:
        customer_id: Stripe customer ID (cus_...)
        amount_cents: Amount to charge in cents
        billing_period: Billing period in YYYY-MM format (e.g. '2026-07')

    Returns:
        The Stripe charge ID if successful, or the existing charge ID if already charged.
    """
    key = charge_idempotency_key(customer_id, amount_cents, billing_period)
    charge = stripe.Charge.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        description=f"Subscription {billing_period}",
        idempotency_key=key,
    )
    return charge.id  # Stripe deduplicates — same key returns original charge

@tool
def check_existing_charge(customer_id: str, billing_period: str) -> str:
    """Check if a charge already exists for this customer and billing period.

    Call this before charge_stripe to avoid duplicate charges.
    """
    charges = stripe.Charge.list(customer=customer_id, limit=10)
    for ch in charges.data:
        if billing_period in (ch.description or ""):
            return f"Charge already exists: {ch.id} ({ch.amount / 100:.2f} {ch.currency})"
    return "No existing charge found for this billing period."

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[check_existing_charge, charge_stripe],
    storage=SqlAgentStorage(table_name="billing_sessions", db_file="agent.db"),
    add_history_to_messages=True,
    num_history_responses=5,
    instructions=[
        "Always call check_existing_charge before charge_stripe.",
        "Never charge a customer if an existing charge is found for the same billing period.",
    ],
)

What this fixes: The content-hash idempotency key is safe to send on any retry — Stripe treats it as a duplicate and returns the original charge object. The check_existing_charge tool gives the agent a fact-check mechanism to confirm a charge exists before calling the billing tool. The instruction set steers the LLM to verify before charging, even on ambiguous prompts like "retry" or "reprocess".

The proxy layer: closing all three at once

Idempotency keys and per-role tool factories solve the correctness problems. But they don't address the governance problems: what happens when an agent exceeds a daily budget, calls an endpoint it shouldn't, or needs to be killed mid-run? For those, you need a policy enforcement layer outside the agent process.

Keybrake works as a drop-in proxy between your Agno agents and Stripe. You issue a vault_key_xxx per agent type, attach a policy (daily USD cap, allowed endpoints, expiry), and point stripe.StripeClient at the proxy URL. The real Stripe key never leaves the proxy:

import os
import stripe
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools import tool

PROXY_URL = "https://proxy.keybrake.com"
VAULT_KEY = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]

client = stripe.StripeClient(
    api_key=VAULT_KEY,
    base_url=PROXY_URL + "/stripe/",
)

@tool
def charge_stripe(customer_id: str, amount_cents: int, billing_period: str) -> str:
    """Charge a Stripe customer for a specific billing period."""
    charge = client.charges.create(params={
        "amount": amount_cents,
        "currency": "usd",
        "customer": customer_id,
        "description": f"Subscription {billing_period}",
    })
    return charge.id

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[charge_stripe],
)

The proxy reads the real Stripe key from its database, enforces the vault key's policy (daily cap, endpoint allowlist), logs the call to the audit table, and forwards to Stripe. The stripe.StripeClient interface is unchanged — no other code needs to know about the proxy.

Comparison: raw key vs. restricted key vs. vault key

Control	Raw `sk_live_`	Restricted key	Vault key (proxy)
Endpoint allowlist	All endpoints	Static, set in Stripe dashboard	Per-vault-key, enforced at proxy
Daily spend cap	None	None	USD cap per vault key per day
Per-run isolation	Shared across all agents	Shared across all agents	One vault key per agent role
Retry dedup	Requires manual idempotency keys	Requires manual idempotency keys	Idempotency key enforced at proxy layer
Session replay guard	None	None	Daily cap stops budget overrun
Audit trail	Stripe Dashboard only	Stripe Dashboard only	Per-call log with vault key, UA, timestamp
Kill switch	Rotate key (breaks all agents)	Disable in dashboard (all users)	Revoke one vault key, others unaffected

pytest enforcement: verify before you ship

These properties are testable. A test suite that enforces them before every deploy is cheaper than a duplicate-charge incident:

import pytest
import stripe
import os
from unittest.mock import patch, MagicMock
from agno.agent import Agent
from agno.models.openai import OpenAIChat

PROXY_URL = "https://proxy.keybrake.com"
VAULT_KEY_BILLING = os.environ.get("KEYBRAKE_VAULT_KEY_BILLING", "vk_test_billing")
VAULT_KEY_FULFILLMENT = os.environ.get("KEYBRAKE_VAULT_KEY_FULFILLMENT", "vk_test_fulfillment")

@pytest.fixture
def mock_stripe_client():
    with patch("stripe.StripeClient") as m:
        client = MagicMock()
        m.return_value = client
        yield client

def test_billing_vault_key_not_live(mock_stripe_client):
    """Vault keys must not start with sk_live_."""
    assert not VAULT_KEY_BILLING.startswith("sk_live_"), "Live Stripe key used directly — use vault key"
    assert not VAULT_KEY_FULFILLMENT.startswith("sk_live_"), "Live Stripe key used directly — use vault key"

def test_charge_uses_proxy_url(mock_stripe_client):
    """Stripe client must point at the Keybrake proxy, not api.stripe.com."""
    import stripe as stripe_module
    stripe_module.StripeClient(api_key=VAULT_KEY_BILLING, base_url=PROXY_URL + "/stripe/")
    call_kwargs = stripe_module.StripeClient.call_args
    assert PROXY_URL in str(call_kwargs), f"Expected proxy URL {PROXY_URL} in StripeClient init"

def test_idempotency_key_stable_across_retries():
    """Same (customer, amount, period) must produce same idempotency key every time."""
    import hashlib, json

    def make_key(customer_id, amount_cents, billing_period):
        payload = json.dumps({"c": customer_id, "a": amount_cents, "p": billing_period}, sort_keys=True)
        return "agno-" + hashlib.sha256(payload.encode()).hexdigest()[:32]

    k1 = make_key("cus_Abc123", 2900, "2026-07")
    k2 = make_key("cus_Abc123", 2900, "2026-07")
    k3 = make_key("cus_Abc123", 2900, "2026-08")  # Different period

    assert k1 == k2, "Idempotency key is non-deterministic — retries will create duplicates"
    assert k1 != k3, "Different billing periods must produce different keys"

def test_different_periods_get_different_keys():
    """Same customer, different billing periods must get different idempotency keys."""
    import hashlib, json

    def make_key(customer_id, amount_cents, billing_period):
        payload = json.dumps({"c": customer_id, "a": amount_cents, "p": billing_period}, sort_keys=True)
        return "agno-" + hashlib.sha256(payload.encode()).hexdigest()[:32]

    assert make_key("cus_Abc123", 2900, "2026-06") != make_key("cus_Abc123", 2900, "2026-07")

def test_proxy_rejects_wrong_vault_key(mock_stripe_client):
    """Fulfillment vault key should fail on charge endpoint (proxy enforces endpoint allowlist)."""
    client = stripe.StripeClient(
        api_key=VAULT_KEY_FULFILLMENT,
        base_url=PROXY_URL + "/stripe/",
    )
    mock_stripe_client.charges.create.side_effect = stripe.error.PermissionError(
        "This vault key is not authorized for POST /v1/charges", code="permission_denied"
    )
    with pytest.raises(stripe.error.PermissionError):
        mock_stripe_client.charges.create(params={
            "amount": 2900, "currency": "usd",
            "customer": "cus_Abc123", "description": "Hobby plan",
        })

Gap analysis: what this doesn't cover

Even with idempotency keys, per-role vault keys, and a spend-cap proxy, some failure modes remain:

Parallel tool calls in a single LLM turn. Agno supports parallel tool execution when the LLM emits multiple tool calls in one response. If two charge calls land in the same turn — which can happen when a batch prompt asks to charge multiple customers — they'll both fire before either result is returned. Your idempotency key must cover the full set of arguments (including amount and period) to avoid cross-call collisions. The proxy's daily cap provides a hard ceiling, but won't stop two valid charges in one turn.
Knowledge base retrieval triggering tool calls. When Agno agents use a knowledge base (vector store + retrieval), retrieved documents can contain text that looks like billing instructions. An agent that retrieves a document saying "customer cus_Abc123 requires a $29 charge" may interpret that as an instruction rather than context. Content policies for retrieval results and tool-call pre-confirmation prompts both help here.
Structured output validation loops. When an Agno agent uses Pydantic-structured output, a validation failure causes the LLM to retry the entire response — including any tool calls in that response. If a Stripe charge fires as part of the failed structured output attempt, retrying produces a second charge. The idempotency key must be deterministic before the retry starts, not generated inside the tool call at retry time.
SqlAgentStorage PII retention. Session history stored in SQLite includes the full conversation, including any customer IDs, charge amounts, or partial card data the LLM happened to echo. If the SQLite file is shared across environments or backed up without encryption, this history leaks billing context. Encrypt the storage file and rotate sessions after billing operations complete.

FAQ

Does Agno have a built-in retry limit I can set?

Agno's Agent class accepts a tool_call_limit parameter that caps the total number of tool calls per agent.run() invocation. Setting this to a small number (3–5) limits the blast radius of a retry loop. It doesn't prevent duplicate charges on its own — you still need idempotency keys — but it stops an infinite retry spiral from burning through your Stripe daily budget.

Can I use Agno's structured output mode with Stripe tools safely?

Yes, with one addition: generate your idempotency key in the calling code before agent.run(), pass it as part of the task context or inject it into the tool factory's closure, and make sure the key is derived from the intent (customer, amount, period) rather than from a timestamp or random value generated inside the tool. A timestamp-based key changes on retry; a content-hash-based key is stable.

How do I scope vault keys per Agno Team member?

Issue a separate vault key for each agent role in your Team. Pass each vault key into the corresponding agent's tool factory via a closure (as shown in the Team example above). The Keybrake proxy enforces the endpoint allowlist on each key independently — a billing agent's vault key allows POST /v1/charges, a refund agent's key allows POST /v1/refunds. Even if the team leader routes incorrectly, the wrong key gets a 403.

Does using SqlAgentStorage mean I should avoid session resumption for billing agents?

Not necessarily — session continuity is useful. The risk is specific to resuming with ambiguous prompts that might re-trigger billing. Mitigate it by: (1) always calling a check_existing_charge tool before charge_stripe, (2) using num_history_responses to limit how many prior turns are loaded, and (3) adding an explicit instruction to "never charge if an existing charge is found for the same billing period".

Is the Keybrake proxy compatible with Agno's async mode?

Yes. Agno supports async agents via await agent.arun(). The proxy is a standard HTTPS endpoint — stripe.StripeClient with base_url set to the proxy works identically in both sync and async contexts. The proxy itself handles concurrent requests from multiple agents without cross-contamination because each request carries its own vault key header.

What's the latency overhead of routing through the proxy?

The Keybrake proxy runs on the same factory VPS as your agent. A local proxy hop adds roughly 1–3ms per request — well within Stripe's own variance. For agents where billing calls are a small fraction of overall LLM time, the overhead is immeasurable in practice. For high-frequency billing loops (thousands of charges per minute), the proxy's connection pool and SQLite write throughput become the relevant constraints to benchmark.

Put a brake pedal on your Agno agents

Issue vault keys per agent role, set daily spend caps per vendor, and get a queryable audit log of every Stripe call — without changing your billing logic. The proxy is live at proxy.keybrake.com.