Agent Governance

Semantic Kernel Stripe Plugin: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 13, 2026 · 9 min read

Semantic Kernel's plugin system makes it trivially easy to register a Stripe function and let the planner call it. FunctionChoiceBehavior.Auto() will then invoke that plugin as many times as the LLM decides is necessary — without a hard call limit, a per-invocation spend cap, or a kill switch tied to a specific SK session.

This post covers the three failure modes specific to SK's plugin architecture and shows the governance pattern — restricted Stripe API keys as a first layer, vault keys via Keybrake as a second layer — that closes all three gaps.

The standard SK Stripe plugin pattern

Semantic Kernel registers Stripe access as a KernelPlugin using the @kernel_function decorator. The kernel passes the plugin to the chat completion service, and the planner can invoke it on any turn where it decides a Stripe call is the right next step.

import os
import stripe
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.functions import kernel_function
from semantic_kernel.contents import ChatHistory

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # sk_live_...

class StripePlugin:
    @kernel_function(name="create_charge", description="Create a Stripe charge for a customer")
    async def create_charge(self, customer_id: str, amount_cents: int, description: str) -> str:
        charge = stripe.Charge.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
            description=description,
        )
        return f"Charged {amount_cents/100:.2f} USD. ID: {charge.id}"

    @kernel_function(name="refund_charge", description="Refund a Stripe charge")
    async def refund_charge(self, charge_id: str) -> str:
        refund = stripe.Refund.create(charge=charge_id)
        return f"Refunded. Refund ID: {refund.id}"

kernel = Kernel()
kernel.add_service(AzureChatCompletion(
    deployment_name=os.environ["AZURE_OAI_DEPLOYMENT"],
    endpoint=os.environ["AZURE_OAI_ENDPOINT"],
    api_key=os.environ["AZURE_OAI_KEY"],
))
kernel.add_plugin(StripePlugin(), plugin_name="stripe")

settings = kernel.get_prompt_execution_settings_from_service_id("default")
settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

chat_history = ChatHistory()
chat_history.add_system_message(
    "You are a billing agent. Process overdue accounts by charging the card on file."
)

This pattern works. A billing agent can iterate over a list of customers, call stripe-create_charge for each, and return a summary. The problem is what happens when the agent gets confused, receives a malformed input, or the LLM decides to retry.

Failure mode 1: Plugin registration exposes all functions at once

Risk: Every function in StripePlugin is available to the planner from the moment you call kernel.add_plugin(). There is no per-function key scoping. If the plugin has both create_charge and refund_charge, both use the same stripe.api_key. An agent that should only charge — never refund — has unrestricted access to both operations from the same credential.

This is the all-or-nothing problem with SK's plugin model. You register a plugin to give an agent Stripe access, and you get all of Stripe at once. A billing-only agent shouldn't have stripe.Refund.create available, but it does.

The same problem applies across plugin methods. A customer lookup function and a charge creation function sharing one secret key means the same key that reads customer data can also spend money. If the LLM hallucinates a tool call — and it will — the blast radius is the entire key's permission set.

Failure mode 2: FunctionChoiceBehavior.Auto() has no call budget

Risk: FunctionChoiceBehavior.Auto() lets the planner invoke Stripe plugins as many times as the LLM determines is necessary. There is no max_function_calls parameter, no per-turn cap, and no built-in spend limit. A stuck loop — or an instruction like "process all 847 overdue accounts" — becomes 847 sequential Stripe charges before any human can intervene.

Semantic Kernel's auto-invoke loop runs until the LLM stops requesting tool calls. The only built-in termination is the context window limit. In practice, a billing agent processing a large batch will keep calling stripe-create_charge until it either succeeds on every account, fails with an unrecoverable exception, or fills the context window with tool results.

There is no max_consecutive_function_calls equivalent in SK (unlike AutoGen's max_consecutive_auto_reply). You can implement a wrapper, but the default pattern has no guard.

# The loop that has no hard stop:
async for response in kernel.invoke_stream(
    function=chat_function,
    settings=settings,
    chat_history=chat_history,
):
    # SK will keep calling stripe-create_charge until the LLM stops requesting it
    # No per-session spend cap, no call count limit
    pass

Failure mode 3: ChatHistory persists Stripe data across turns

Risk: Semantic Kernel's ChatHistory stores full function results — including Stripe charge IDs, customer IDs, and amounts — as tool messages in the conversation context. In a long-running agent session, the LLM can reference a charge ID from turn 3 to attempt a refund on turn 12, even if the current turn's intent was unrelated to refunds. The context window becomes an inadvertent permission system.

SK's function calling pattern appends each tool result to the ChatHistory as a structured message. A charge result like "Charged 199.00 USD. ID: ch_3NVq8W2eZvKYlo2C0pB8XYZQ" sits in the history for the rest of the session. A downstream planning step that encounters an instruction about refunds — even from a separate user turn — now has a charge ID to reference.

This is less of a concern in short-lived request/response sessions, but matters significantly in agent loops that process a series of tasks in a single ChatHistory instance.

Fix layer 1: Restrict the Stripe API key before it touches SK

The fastest improvement is to replace the live secret key with a Stripe restricted API key scoped to exactly what the plugin needs. For a billing agent that only charges existing customers:

# Stripe Dashboard → Developers → API keys → Create restricted key
# Minimum permissions for a charge-only billing agent:
#
#   Charges          → Write
#   Customers        → Read
#   Payment intents  → None
#   Refunds          → None     ← explicit None, not just absent
#   Balance          → None
#   Billing          → None

# In SK plugin:
stripe.api_key = os.environ["STRIPE_RESTRICTED_KEY"]  # rk_live_...

class StripePlugin:
    @kernel_function(name="create_charge", description="Create a Stripe charge for a customer")
    async def create_charge(self, customer_id: str, amount_cents: int, description: str) -> str:
        # If the LLM hallucinates a refund_charge call, Stripe rejects it at the API layer
        # before it ever touches a real charge — the restricted key has no refund permission
        charge = stripe.Charge.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
            description=description,
        )
        return f"Charged {amount_cents/100:.2f} USD. ID: {charge.id}"

The restricted key limits the blast radius of any single plugin invocation: even if the LLM hallucinates an out-of-scope call, Stripe returns a 403 before money moves. But the restricted key still doesn't give you a per-session spend cap, a per-invocation kill switch, or an audit trail beyond what Stripe's own dashboard shows.

Fix layer 2: Per-invocation vault key via Keybrake

A vault key is a short-lived proxy credential you issue per agent invocation. Your SK plugin talks to proxy.keybrake.com using the vault key instead of talking to Stripe directly with a raw API key. The proxy looks up the real restricted Stripe key, enforces your policy (daily spend cap, endpoint allowlist, expiry), and logs every call.

import os
import httpx
from semantic_kernel.functions import kernel_function

PROXY_BASE = "https://proxy.keybrake.com/stripe"

class StripeVaultPlugin:
    def __init__(self, vault_key: str):
        # vault_key is provisioned per SK invocation — not a module-level constant
        self._vault_key = vault_key
        self._client = httpx.AsyncClient(
            base_url=PROXY_BASE,
            headers={"Authorization": f"Bearer {self._vault_key}"},
            timeout=30.0,
        )

    @kernel_function(name="create_charge", description="Create a Stripe charge for a customer")
    async def create_charge(self, customer_id: str, amount_cents: int, description: str) -> str:
        resp = await self._client.post(
            "/v1/charges",
            data={
                "amount": str(amount_cents),
                "currency": "usd",
                "customer": customer_id,
                "description": description,
            },
        )
        if resp.status_code == 429:
            return "Spend cap reached — charge blocked by policy. Stop processing."
        resp.raise_for_status()
        charge = resp.json()
        return f"Charged {amount_cents/100:.2f} USD. ID: {charge['id']}"

    @kernel_function(name="list_customers", description="List Stripe customers")
    async def list_customers(self, limit: int = 10) -> str:
        resp = await self._client.get("/v1/customers", params={"limit": limit})
        resp.raise_for_status()
        customers = resp.json()
        return str([c["id"] for c in customers.get("data", [])])

Provision and inject the vault key at the invocation boundary — not at module import time:

import uuid
import httpx

async def run_billing_agent(accounts: list[dict]) -> str:
    # Provision a fresh vault key for this invocation only
    invocation_id = str(uuid.uuid4())[:8]
    vault_key = await provision_vault_key(
        name=f"sk-billing-{invocation_id}",
        vendor="stripe",
        daily_usd_cap=500.0,           # hard cap: stops at $500 regardless of account list size
        allowed_operations=["charges.create", "customers.list"],
        expires_in_seconds=3600,        # auto-revoke after 1 hour
    )

    kernel = Kernel()
    kernel.add_service(AzureChatCompletion(...))
    kernel.add_plugin(StripeVaultPlugin(vault_key=vault_key), plugin_name="stripe")

    settings = kernel.get_prompt_execution_settings_from_service_id("default")
    settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

    chat_history = ChatHistory()
    chat_history.add_system_message(
        f"You are a billing agent. Process these accounts: {accounts}. "
        f"Invocation ID: {invocation_id}. Stop immediately if the proxy returns a spend cap error."
    )
    chat_history.add_user_message("Process all overdue accounts.")

    # The proxy enforces the $500 daily cap server-side.
    # The agent receives a 429 with message "spend cap reached" and the loop terminates.
    result = await kernel.invoke(chat_function, settings=settings, chat_history=chat_history)
    return str(result)

async def provision_vault_key(name: str, vendor: str, daily_usd_cap: float,
                               allowed_operations: list, expires_in_seconds: int) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://proxy.keybrake.com/admin/vault-keys",
            json={
                "name": name,
                "vendor": vendor,
                "daily_usd_cap": daily_usd_cap,
                "allowed_operations": allowed_operations,
                "expires_in_seconds": expires_in_seconds,
            },
            headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_ADMIN_KEY']}"},
        )
        return resp.json()["vault_key"]

Each SK invocation gets its own vault key, its own spend budget, and its own expiry. Revoking one agent's access doesn't require rotating the underlying Stripe key.

SK-specific patterns worth noting

Injecting vault keys via KernelArguments

If you're using SK's templated prompt functions, you can pass the vault key through KernelArguments rather than instantiating the plugin class with it:

from semantic_kernel.functions import KernelArguments

# Plugin reads vault key from context arguments at invocation time
class StripePlugin:
    @kernel_function(name="create_charge")
    async def create_charge(
        self,
        customer_id: str,
        amount_cents: int,
        vault_key: str = "",  # injected via KernelArguments
    ) -> str:
        headers = {"Authorization": f"Bearer {vault_key}"}
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                "https://proxy.keybrake.com/stripe/v1/charges",
                headers=headers,
                data={"amount": str(amount_cents), "currency": "usd", "customer": customer_id},
            )
            return resp.text

# At invocation time:
args = KernelArguments(vault_key=vault_key, customer_id="cus_...", amount_cents=1999)

Handling the planner's spend cap signal

When the proxy returns a 429 with a spend cap message, the LLM receives that message as a tool result. Add explicit instruction in the system prompt for how to handle it:

chat_history.add_system_message(
    "You are a billing agent. "
    "If a Stripe call returns 'spend cap reached', stop all processing immediately "
    "and return a summary of what was charged before the cap was hit. "
    "Do not retry. Do not attempt alternative payment methods."
)

Without this instruction, some LLMs will interpret a 429 as a transient error and retry — which is exactly the wrong behavior when the 429 is a policy enforcement signal, not a rate limit.

Comparison: raw key vs restricted key vs vault key

Control dimension	Raw secret key (`sk_live_…`)	Restricted key (`rk_live_…`)	Vault key via Keybrake
Endpoint scope	Full Stripe API access	Locked to selected resource types	Allowlist of specific operations
Per-session spend cap	None	None	Yes — hard cap, returns 429 when hit
Per-invocation revoke	Rotates all agents globally	Rotates all agents globally	Revoke single vault key; others unaffected
Audit trail	Stripe dashboard (request level)	Stripe dashboard (request level)	Proxy audit log with vault_key_id, invocation, timestamp
Key expiry	Manual rotation only	Manual rotation only	Auto-expire after TTL (e.g. 1 hour per invocation)
ChatHistory data risk	Full charge IDs persisted in history	Full charge IDs persisted in history	Vault key in history is revoked after session; no live credential leak

Enforcement tests

Test that your governance layer works correctly before running against production:

import pytest
import httpx
from unittest.mock import AsyncMock, patch

@pytest.fixture
def vault_key():
    return "vk_test_abc123"

@pytest.mark.asyncio
async def test_create_charge_routes_through_proxy(vault_key):
    with patch("httpx.AsyncClient.post") as mock_post:
        mock_post.return_value = AsyncMock(
            status_code=200,
            json=lambda: {"id": "ch_test_123", "amount": 1999},
        )
        plugin = StripeVaultPlugin(vault_key=vault_key)
        result = await plugin.create_charge("cus_test", 1999, "Test charge")

        call_args = mock_post.call_args
        # Confirm request went to proxy, not Stripe directly
        assert "proxy.keybrake.com" in str(call_args)
        # Confirm vault key (not live key) was in headers
        assert vault_key in str(call_args)
        # Confirm live key was NOT in headers
        assert "sk_live" not in str(call_args)
        assert "Charged 19.99 USD" in result

@pytest.mark.asyncio
async def test_spend_cap_triggers_stop(vault_key):
    with patch("httpx.AsyncClient.post") as mock_post:
        mock_post.return_value = AsyncMock(
            status_code=429,
            json=lambda: {"error": "spend cap reached", "cap_usd": 500.0, "spent_usd": 500.0},
        )
        plugin = StripeVaultPlugin(vault_key=vault_key)
        result = await plugin.create_charge("cus_test", 9999, "Over-cap charge")

        assert "spend cap reached" in result.lower()
        assert "Stop processing" in result

@pytest.mark.asyncio
async def test_disallowed_operation_blocked(vault_key):
    with patch("httpx.AsyncClient.post") as mock_post:
        mock_post.return_value = AsyncMock(
            status_code=403,
            json=lambda: {"error": "operation charges.refund not in allowlist"},
        )
        plugin = StripeVaultPlugin(vault_key=vault_key)
        with pytest.raises(httpx.HTTPStatusError):
            await plugin.refund_charge("ch_test_123")

@pytest.mark.asyncio
async def test_vault_key_not_live_key(vault_key):
    """Ensure the plugin never sends a live key to any endpoint."""
    calls = []
    original_post = httpx.AsyncClient.post

    async def capturing_post(self, url, **kwargs):
        calls.append({"url": url, "headers": dict(self.headers)})
        return AsyncMock(status_code=200, json=lambda: {"id": "ch_ok", "amount": 100})()

    with patch.object(httpx.AsyncClient, "post", capturing_post):
        plugin = StripeVaultPlugin(vault_key=vault_key)
        await plugin.create_charge("cus_test", 100, "test")

    for call in calls:
        auth = call["headers"].get("authorization", "")
        assert "sk_live" not in auth, "Live Stripe key found in request headers"
        assert "rk_live" not in auth, "Restricted key found in request headers (use vault key)"

Gap analysis

The vault key pattern closes the most critical gaps, but a few remain worth noting:

No per-plugin-function cap. The vault key's daily cap is across all operations the agent performs. A charge-only agent that hits its cap can't separately cap charges vs customer lookups — the cap is aggregate. Per-operation caps require separate vault keys per function type.
LLM ignores spend cap signal. If you don't add explicit instruction about what to do on a 429, some LLMs treat it as a transient error and retry. The proxy blocks the retries, but the agent may keep calling until its planner loop terminates from context length, not from a clean stop signal.
KernelArguments injection via templates is unsafe. If you inject the vault key via a Handlebars template rather than a constructor or KernelArguments, the key ends up in the rendered prompt and therefore in ChatHistory. Use constructor injection or KernelArguments, not template variable injection.
Azure OpenAI token limits affect loop termination. SK with Azure OpenAI uses the same token limit as the underlying model. A very long batch run can exhaust the context window without ever hitting the spend cap, leaving the billing state partially processed. Design for idempotent charges (unique idempotency key per customer per run) to handle partial completion safely.

FAQ

Does this work with Semantic Kernel's Process Framework (SK Process)?

Yes. SK Process steps are just function calls under the hood. You can pass the vault key as a process variable injected at step creation time, using the same constructor injection pattern shown above. Each process gets its own vault key with its own TTL.

What happens if proxy.keybrake.com is unavailable?

The httpx.AsyncClient will raise a connection error, which propagates as a tool call failure. The LLM typically interprets this as a transient error and stops processing. Design your system prompt to tell the agent to stop and report the failure rather than find an alternative path to Stripe.

Can I use this with the Handlebars planner?

Yes. The Handlebars planner invokes registered plugin functions the same way as FunctionChoiceBehavior.Auto(). The vault key enforcement is at the HTTP layer — the proxy doesn't know or care whether the call came from a Handlebars plan, an auto-invoke loop, or a manual kernel invocation.

How do I provision vault keys in production without an admin key in each agent?

Use a lightweight provisioning service that holds the admin key and issues vault keys on request from authenticated agents. Each agent's container has only its issued vault key, not the admin key. The provisioning service can enforce per-agent quotas on how many vault keys it will issue per hour.

Does Semantic Kernel's function calling work differently on Azure OpenAI vs OpenAI?

The function calling behavior is the same — SK abstracts the connector. The governance pattern works identically regardless of which backend model you're using. The differences are in token limits, rate limits, and regional availability — none of which affect the proxy routing pattern.

Do I need a separate Keybrake vault key for each SK plugin method?

No. One vault key per agent invocation is sufficient. The vault key's policy specifies which operations are allowed (e.g. ["charges.create", "customers.list"]). The proxy blocks any operation not in that list, regardless of which plugin method called it. You don't need separate keys per function.

Add spend caps to your SK Stripe plugin

Keybrake is a scoped API-key proxy for the SaaS APIs your agents call. Issue a vault key per SK invocation, enforce a daily spend cap, and get a full audit log — without touching your Stripe setup.