Agent Governance

Azure AI Agent Service Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · June 15, 2026 · 9 min read

Microsoft's Azure AI Agent Service and the azure-ai-projects SDK make it straightforward to build Stripe-capable agents using tool calls over Azure-hosted models. The risk surfaces in three specific places: azure-core's built-in retry policy re-executing tool call handlers on transient errors, the Agent Service marking failed tool runs as retriable and re-submitting identical tool call arguments to a new run, and persistent Thread history causing the agent to replay completed billing operations on ambiguous follow-up messages.

This post covers all three failure modes specific to the Azure AI Agent Service architecture and shows the two-layer governance pattern that closes each one: a restricted Stripe API key as a first layer, and per-run vault keys via a spend-cap proxy as a second.

The standard Azure AI Agent Service Stripe pattern

The baseline pattern for a Stripe-capable agent using Azure AI Agent Service defines a function tool, creates a thread, and uses create_and_process_run() to let the service orchestrate multi-step tool execution:

import json, os, stripe
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import FunctionTool, ToolSet
from azure.identity import DefaultAzureCredential

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # sk_live_...

def charge_stripe(customer_id: str, amount_cents: int, billing_period: str) -> str:
    """Charge a Stripe customer for a billing period."""
    charge = stripe.Charge.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        description=f"Subscription {billing_period}",
    )
    return json.dumps({"charge_id": charge.id, "status": charge.status})

client = AIProjectClient.from_connection_string(
    conn_str=os.environ["AIPROJECT_CONNECTION_STRING"],
    credential=DefaultAzureCredential(),
)

agent = client.agents.create_agent(
    model="gpt-4o",
    name="billing-agent",
    instructions="You are a billing agent. Charge customers for their subscription plans.",
    tools=ToolSet(function_tools=[FunctionTool(functions={charge_stripe})])
)

thread = client.agents.create_thread()
client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content="Charge customer cus_Abc123 $29 for the June plan",
)

run = client.agents.create_and_process_run(
    thread_id=thread.id,
    agent_id=agent.id
)
print(f"Run status: {run.status}")

This works correctly in the happy path. Three distinct failure modes emerge when azure-core retries the inference call, when the Agent Service retries a failed run, or when a resumed Thread contains completed billing history.

Failure mode 1: azure-core retry re-executes the tool call handler

The azure-ai-inference package uses azure-core for HTTP transport, which applies a default retry policy on transient errors (429 Too Many Requests, 503 Service Unavailable). When you use ChatCompletionsClient directly — the lower-level inference client under Azure AI Foundry — and wrap the complete() call in application-level retry logic, you get two retry layers: the SDK-level azure-core retry and your application-level retry. If the model returns a tool_calls response, your handler fires the Stripe charge. If the application-level retry then fires again (because the response was lost in transit before reaching your code), the handler fires a second Stripe charge with no idempotency key:

import json, os, stripe
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import (
    UserMessage, ChatCompletionsFunctionToolDefinition, FunctionDefinition
)
from azure.core.credentials import AzureKeyCredential
from tenacity import retry, stop_after_attempt, wait_exponential

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

inference_client = ChatCompletionsClient(
    endpoint=os.environ["AZURE_AI_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"]),
)

charge_tool = ChatCompletionsFunctionToolDefinition(
    function=FunctionDefinition(
        name="charge_stripe",
        description="Charge a Stripe customer.",
        parameters={
            "type": "object",
            "properties": {
                "customer_id":    {"type": "string"},
                "amount_cents":   {"type": "integer"},
                "billing_period": {"type": "string"},
            },
            "required": ["customer_id", "amount_cents", "billing_period"],
        },
    )
)

# Application-level retry on top of azure-core's built-in retry
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def call_model(messages):
    return inference_client.complete(
        messages=messages,
        tools=[charge_tool],
        model="gpt-4o",
    )

response = call_model([UserMessage("Charge cus_Abc123 $29 for June")])

# Handler fires on whatever response arrived — but if the first call
# succeeded and the handler ran before the retry, Stripe sees two calls
if response.choices[0].finish_reason == "tool_calls":
    for tc in response.choices[0].message.tool_calls:
        if tc.function.name == "charge_stripe":
            args = json.loads(tc.function.arguments)
            # No idempotency key — retry scenario creates second charge
            stripe.Charge.create(
                amount=args["amount_cents"],
                currency="usd",
                customer=args["customer_id"],
                description=f"Subscription {args['billing_period']}",
            )

What breaks: azure-core retries the complete() HTTP call on 429 or 503 errors automatically. If you also add tenacity retry at the application level — a common pattern for production reliability — and the response is lost in transit (the inference service processed the request, but the HTTP response never arrived), the application-level retry fires a new complete() call. The model produces an identical tool_calls response (same customer, same amount, same billing period). Your handler fires stripe.Charge.create() again with no idempotency key. Stripe creates a second charge. The customer is double-billed.

The fix: compute a content-hash idempotency key from the tool call arguments before the Stripe call. Because the arguments are identical across retries, the key is identical — Stripe deduplicates all calls with the same key and returns the original charge object:

import hashlib, json, os, stripe
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import UserMessage, ChatCompletionsFunctionToolDefinition, FunctionDefinition
from azure.core.credentials import AzureKeyCredential

PROXY_URL = "https://proxy.keybrake.com"
VAULT_KEY  = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]  # POST /v1/charges only

inference_client = ChatCompletionsClient(
    endpoint=os.environ["AZURE_AI_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"]),
)

def handle_tool_call(tc_name: str, args: dict) -> dict:
    if tc_name != "charge_stripe":
        return {"error": "unknown_tool"}

    idempotency_key = hashlib.sha256(
        f"{args['customer_id']}:{args['amount_cents']}:{args['billing_period']}".encode()
    ).hexdigest()[:32]

    client = stripe.StripeClient(api_key=VAULT_KEY, base_url=PROXY_URL + "/stripe/")
    try:
        charge = client.charges.create(params={
            "amount":          args["amount_cents"],
            "currency":        "usd",
            "customer":        args["customer_id"],
            "description":     f"Subscription {args['billing_period']}",
            "idempotency_key": idempotency_key,
        })
        return {"charge_id": charge.id, "status": charge.status}
    except stripe.StripeError as e:
        return {"error": str(e), "idempotency_key": idempotency_key}

What this fixes: The idempotency key is computed from (customer_id, amount_cents, billing_period) — the same arguments the model emits for the same billing operation regardless of how many times the handler runs. Whether the retry fires once or three times, Stripe deduplicates all requests with the same key and returns the original charge object. The customer is charged exactly once. The vault key also routes the call through the spend-cap proxy, enforcing the daily cap and writing an audit log entry.

Failure mode 2: Azure AI Agent Service run retry fires a duplicate charge

Azure AI Agent Service's create_and_process_run() orchestrates multi-step tool execution: the service calls the model, receives tool calls, invokes your function tool implementations, and continues until the run completes. If a tool call fails — your function raises an exception, times out, or returns an error status — the service marks that run step as failed. When you retry the run (by creating a new run on the same thread), the agent replays the tool call from the thread's state. If the Stripe charge actually executed before the error was reported (e.g., the charge succeeded but the network call returning the result timed out), the retry fires a duplicate charge:

import json, os, stripe
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import FunctionTool, ToolSet, RunStatus
from azure.identity import DefaultAzureCredential

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

def charge_stripe(customer_id: str, amount_cents: int, billing_period: str) -> str:
    # This Stripe call may succeed even when the function appears to fail
    # (e.g., stripe.Charge.create() completes, but the response processing raises)
    charge = stripe.Charge.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        description=f"Subscription {billing_period}",
        # No idempotency_key — retry scenario creates second charge
    )
    # If an exception occurs here (after the charge is created),
    # the Agent Service sees the tool as failed
    result = {"charge_id": charge.id, "status": charge.status}
    return json.dumps(result)

client = AIProjectClient.from_connection_string(
    conn_str=os.environ["AIPROJECT_CONNECTION_STRING"],
    credential=DefaultAzureCredential(),
)
agent = client.agents.create_agent(
    model="gpt-4o",
    name="billing-agent",
    instructions="Charge customers for their subscriptions.",
    tools=ToolSet(function_tools=[FunctionTool(functions={charge_stripe})])
)
thread = client.agents.create_thread()
client.agents.create_message(thread_id=thread.id, role="user",
    content="Charge cus_Abc123 $29 for June")

run = client.agents.create_and_process_run(thread_id=thread.id, agent_id=agent.id)

if run.status == RunStatus.FAILED:
    # Common pattern: retry the run on failure
    # The thread already contains the prior (failed) tool call attempt
    # The agent re-executes charge_stripe with the same arguments
    retry_run = client.agents.create_and_process_run(
        thread_id=thread.id, agent_id=agent.id
    )  # Second Stripe charge fires here

What breaks: When charge_stripe raises an exception after stripe.Charge.create() completes, the Agent Service records the tool call as failed. On the retry run, the agent re-calls charge_stripe with the same arguments — the same customer, same amount, same billing period — and the function fires another stripe.Charge.create() with no idempotency key. Stripe creates a second charge. The customer is billed twice for the same period. This also applies when the run fails for unrelated reasons (model error, quota exceeded) and you retry the entire run — the agent's state in the thread determines which tool calls it re-executes.

The fix is the same content-hash idempotency key pattern, applied inside the tool function itself so it's stable across any number of run retries:

import hashlib, json, os, stripe
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import FunctionTool, ToolSet
from azure.identity import DefaultAzureCredential

PROXY_URL = "https://proxy.keybrake.com"
VAULT_KEY  = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]

def charge_stripe(customer_id: str, amount_cents: int, billing_period: str) -> str:
    """Charge a Stripe customer. Safe to retry — idempotency key prevents duplicates."""
    idempotency_key = hashlib.sha256(
        f"{customer_id}:{amount_cents}:{billing_period}".encode()
    ).hexdigest()[:32]

    stripe_client = stripe.StripeClient(api_key=VAULT_KEY, base_url=PROXY_URL + "/stripe/")
    try:
        charge = stripe_client.charges.create(params={
            "amount":          amount_cents,
            "currency":        "usd",
            "customer":        customer_id,
            "description":     f"Subscription {billing_period}",
            "idempotency_key": idempotency_key,
        })
        return json.dumps({"charge_id": charge.id, "status": charge.status})
    except stripe.StripeError as e:
        # Return error as a JSON string — Agent Service will surface this to the model
        return json.dumps({"error": str(e), "idempotency_key": idempotency_key})

client = AIProjectClient.from_connection_string(
    conn_str=os.environ["AIPROJECT_CONNECTION_STRING"],
    credential=DefaultAzureCredential(),
)
agent = client.agents.create_agent(
    model="gpt-4o",
    name="billing-agent",
    instructions="Charge customers using charge_stripe. The function handles idempotency.",
    tools=ToolSet(function_tools=[FunctionTool(functions={charge_stripe})])
)

What this fixes: The idempotency key is computed inside charge_stripe from the function arguments the Agent Service passes. Regardless of how many run retries fire — two, five, or ten — all calls for the same (customer_id, amount_cents, billing_period) produce the same idempotency key. Stripe deduplicates them and returns the original charge. Returning the StripeError as a JSON string (rather than re-raising) lets the Agent Service surface the error to the model for graceful handling, rather than marking the tool call as a platform-level failure that triggers another retry cycle.

Failure mode 3: Thread history replays completed billing

Azure AI Agent Service persists all messages, tool calls, and tool results in a Thread — the conversation state store analogous to OpenAI Assistants' threads. A Thread accumulates the full history: user messages, assistant messages, tool call records (tool_use), and tool results (tool_result). When a new run is created on an existing thread, the agent has access to the entire history. On ambiguous follow-up messages, the agent may interpret prior completed billing operations as incomplete and re-execute the charge_stripe tool:

import json, os, stripe
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import FunctionTool, ToolSet
from azure.identity import DefaultAzureCredential

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

client = AIProjectClient.from_connection_string(
    conn_str=os.environ["AIPROJECT_CONNECTION_STRING"],
    credential=DefaultAzureCredential(),
)
agent = client.agents.create_agent(
    model="gpt-4o",
    name="billing-agent",
    instructions="You are a billing agent.",
    tools=ToolSet(function_tools=[FunctionTool(functions={charge_stripe})])
)

# Existing thread from a prior session — contains completed billing history
thread_id = os.environ["EXISTING_THREAD_ID"]

# Run 1 (prior session): charged cus_Abc123 $29 for June.
# Thread now contains: UserMessage, AssistantMessage (tool_call), ToolResult (charge_ch_xxx)

# Run 2 (current session): ambiguous customer service message
client.agents.create_message(
    thread_id=thread_id,
    role="user",
    content="The customer says the June invoice hasn't arrived — can you look into it?",
)
run = client.agents.create_and_process_run(thread_id=thread_id, agent_id=agent.id)
# Agent sees prior charge_stripe call and its result in thread history.
# "Look into it" is ambiguous — agent may re-call charge_stripe with same args.
# No idempotency key in charge_stripe — second charge fires.

What breaks: The Thread contains the full billing history including the original charge_stripe tool call and its arguments. "The customer says the June invoice hasn't arrived — can you look into it?" is an ambiguous prompt: it could mean "check the charge status" or "re-run the charge because maybe it didn't go through." The agent, seeing the prior tool call in context, may decide to re-execute it. Because the original charge_stripe has no idempotency key, Stripe creates a second charge for the same customer and billing period. Customer gets billed twice in June.

Three controls work together: idempotency keys make the Stripe call safe regardless of re-execution; a separate read-only function for charge lookups steers the model away from re-billing; and a billing vault key that allows only POST /v1/charges combined with an audit vault key that allows only GET /v1/charges enforces the separation at the proxy layer:

import hashlib, json, os, stripe
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import FunctionTool, ToolSet
from azure.identity import DefaultAzureCredential

PROXY_URL    = "https://proxy.keybrake.com"
BILLING_KEY  = os.environ["KEYBRAKE_VAULT_KEY_BILLING"]  # POST /v1/charges only
AUDIT_KEY    = os.environ["KEYBRAKE_VAULT_KEY_AUDIT"]    # GET /v1/charges only

def charge_stripe(customer_id: str, amount_cents: int, billing_period: str) -> str:
    """Bill a customer. Requires explicit billing_period. Never re-use for lookups."""
    idempotency_key = hashlib.sha256(
        f"{customer_id}:{amount_cents}:{billing_period}".encode()
    ).hexdigest()[:32]
    stripe_client = stripe.StripeClient(api_key=BILLING_KEY, base_url=PROXY_URL + "/stripe/")
    try:
        charge = stripe_client.charges.create(params={
            "amount":          amount_cents,
            "currency":        "usd",
            "customer":        customer_id,
            "description":     f"Subscription {billing_period}",
            "idempotency_key": idempotency_key,
        })
        return json.dumps({"charge_id": charge.id, "status": charge.status})
    except stripe.StripeError as e:
        return json.dumps({"error": str(e)})

def get_charge_status(charge_id: str) -> str:
    """Read-only: look up an existing charge by ID. Use for customer service lookups."""
    audit_client = stripe.StripeClient(api_key=AUDIT_KEY, base_url=PROXY_URL + "/stripe/")
    try:
        charge = audit_client.charges.retrieve(charge_id)
        return json.dumps({
            "charge_id":    charge.id,
            "status":       charge.status,
            "amount":       charge.amount,
            "description":  charge.description,
        })
    except stripe.StripeError as e:
        return json.dumps({"error": str(e)})

client = AIProjectClient.from_connection_string(
    conn_str=os.environ["AIPROJECT_CONNECTION_STRING"],
    credential=DefaultAzureCredential(),
)
agent = client.agents.create_agent(
    model="gpt-4o",
    name="billing-agent",
    instructions=(
        "Use charge_stripe only when explicitly instructed to bill a customer for a "
        "specific billing_period that is confirmed in the current message. "
        "Use get_charge_status to look up existing charges from conversation history. "
        "Never re-execute charge_stripe for a billing_period already present in thread history "
        "unless the current message contains an explicit instruction to re-bill."
    ),
    tools=ToolSet(function_tools=[
        FunctionTool(functions={charge_stripe}),
        FunctionTool(functions={get_charge_status}),
    ])
)

What this fixes: When a customer service message arrives, the agent calls get_charge_status (read-only, routes through audit vault key) rather than re-firing charge_stripe. If the model mistakenly calls charge_stripe with the audit vault key, the proxy rejects the POST with 403 — no charge is created. When charge_stripe is correctly called with the billing vault key, the content-hash idempotency key collapses any duplicate attempts to one charge at Stripe. The system instruction requiring an explicit billing_period confirmation reduces the probability of history-triggered re-billing.

One-line proxy override

The Keybrake proxy is compatible with the Stripe Python SDK's StripeClient interface. Switching from a direct Stripe call to the proxy requires one line change in your tool function:

# Before — direct to Stripe
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
charge = stripe.Charge.create(amount=2900, currency="usd", customer="cus_Abc123")

# After — routes through Keybrake proxy, enforces spend cap, writes audit log
from stripe import StripeClient
stripe_client = StripeClient(
    api_key=os.environ["KEYBRAKE_VAULT_KEY"],
    base_url="https://proxy.keybrake.com/stripe/",
)
charge = stripe_client.charges.create(
    params={"amount": 2900, "currency": "usd", "customer": "cus_Abc123"}
)

No changes to the Azure AI Agent Service or azure-ai-projects configuration are needed — the function tool definition, agent setup, thread management, and run orchestration are unchanged. Only the Stripe call inside the tool function routes through the proxy.

Comparison: raw key vs restricted key vs vault key

Property	Raw `sk_live_` key	Restricted Stripe key	Vault key (Keybrake proxy)
Endpoint allowlist	No — full API access	Partial — Stripe-enforced resource set	Yes — per-role allowlist, proxy-enforced
Daily spend cap	No	No	Yes — configurable per vault key
Per-agent isolation	No — all Azure agents share one key	No — still shared across agents	Yes — one vault key per agent or role
Run retry guard	No — retry re-executes tool, duplicate charge	No — application-level problem	Yes — with idempotency key inside tool function
Thread history guard	No — history can re-trigger billing	No — application-level problem	Partial — read-only audit key prevents re-billing on lookup calls
Audit trail	Stripe Dashboard only	Stripe Dashboard only	Yes — per-call log at proxy layer, queryable
Kill switch	Rotate key (affects all agents)	Rotate key (affects all agents)	Revoke vault key (scoped to one agent or role)

pytest enforcement suite

import pytest, hashlib, json, os, stripe

PROXY_URL   = "https://proxy.keybrake.com"
BILLING_KEY = os.environ.get("KEYBRAKE_VAULT_KEY_BILLING", "")
AUDIT_KEY   = os.environ.get("KEYBRAKE_VAULT_KEY_AUDIT", "")

def test_billing_vault_key_not_live():
    """Billing vault key must never be a raw Stripe live key."""
    assert not BILLING_KEY.startswith("sk_live_"), (
        "KEYBRAKE_VAULT_KEY_BILLING must be a vault key, not sk_live_"
    )

def test_stripe_client_uses_proxy():
    """StripeClient must route through the Keybrake proxy, not api.stripe.com."""
    client = stripe.StripeClient(api_key=BILLING_KEY, base_url=PROXY_URL + "/stripe/")
    assert PROXY_URL in client.base_url, (
        "StripeClient base_url must point at proxy.keybrake.com"
    )

def test_idempotency_key_is_deterministic():
    """Same (customer, amount, period) must produce the same idempotency key."""
    def make_key(customer_id, amount_cents, billing_period):
        return hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

    key1 = make_key("cus_Abc123", 2900, "2026-06")
    key2 = make_key("cus_Abc123", 2900, "2026-06")
    assert key1 == key2, "Idempotency key must be deterministic for same inputs"

def test_different_customers_get_different_keys():
    """Different customers in the same billing period must get different keys."""
    def make_key(customer_id, amount_cents, billing_period):
        return hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}".encode()
        ).hexdigest()[:32]

    key_abc = make_key("cus_Abc123", 2900, "2026-06")
    key_xyz = make_key("cus_Xyz789", 2900, "2026-06")
    assert key_abc != key_xyz, "Different customers must produce different idempotency keys"

def test_audit_key_cannot_create_charges(monkeypatch):
    """Audit vault key must be rejected for POST /v1/charges — proxy returns 403."""
    import httpx

    def mock_request(*args, **kwargs):
        return httpx.Response(403, json={"error": "vault_key_not_authorized"})

    monkeypatch.setattr(httpx, "request", mock_request)
    client = stripe.StripeClient(api_key=AUDIT_KEY, base_url=PROXY_URL + "/stripe/")
    with pytest.raises(stripe.PermissionError):
        client.charges.create(
            params={"amount": 2900, "currency": "usd", "customer": "cus_test"}
        )

Gap analysis

Double retry layer: azure-core + application retry. The azure-ai-inference SDK applies azure-core's default retry policy on 429 and 503 errors. If you also add tenacity or custom retry logic around complete(), both layers can trigger on different error conditions — azure-core on network-layer errors, your application retry on logic-level failures. Two retry layers mean two potential re-executions of the tool call handler. Use one retry mechanism. If you need application-level retry, disable azure-core's retry with retry_policy=NoRetry() in the transport configuration, then manage retries entirely in your application code with idempotency keys at each retry boundary.

Azure AI Agent Service run status vs tool result status. When a tool function raises an uncaught exception, the Agent Service marks the run step as failed with last_error.code = "tool_error". When you retry the run, the service replays from the failed step — re-executing the same tool function with the same arguments. If your tool function catches the StripeError and returns it as a JSON string (as shown in failure mode 2's fix), the Agent Service sees a successful tool call and gives the model the error information. The model can then decide whether to retry (e.g., on a network error) or escalate (e.g., on a card declined). This gives you explicit control over retry semantics rather than relying on the platform's automatic retry.

Thread state accumulation and context window. Long-running Azure AI Agent Service threads accumulate tool call and tool result messages. After dozens of billing cycles, the thread history may contain multiple charge_stripe / result pairs across multiple billing periods. The agent's context window can correctly distinguish them by billing_period, but very long history increases the risk of the model mixing up billing periods (e.g., calling June's amount for a July billing request). Consider creating new threads for each billing cycle rather than reusing a single thread across billing periods. Pass the prior billing history as a summarized system message in the new thread rather than as raw tool call records.

Multi-agent connected agents and key sharing. Azure AI Foundry's connected agents pattern allows a primary agent to delegate tasks to specialist sub-agents (e.g., a billing sub-agent). If the primary agent and the billing sub-agent both have charge_stripe in their tool sets, and the primary agent isn't sure the sub-agent completed (due to a timeout in the orchestration), it may call charge_stripe directly — producing a duplicate charge in addition to whatever the sub-agent fired. The fix: only the billing sub-agent has a vault key that allows POST /v1/charges. The primary agent has only a read-only audit vault key. Any attempt by the primary agent to charge directly fails at the proxy with 403.

Azure OpenAI vs Azure AI Foundry inference endpoint. Azure OpenAI (the legacy endpoint at *.openai.azure.com) uses the openai.AzureOpenAI client, which has OpenAI SDK retry behavior (different from azure-core). Azure AI Foundry (newer endpoint at *.services.ai.azure.com or *.inference.ai.azure.com) uses azure-ai-inference with azure-core retry. The idempotency key pattern is the same for both — compute the key from tool call arguments, not from a request-time UUID — but the retry configuration differs. Check which endpoint your deployment uses and configure retry at only one layer.

Frequently asked questions

Does the Azure AI Agent Service's run orchestration automatically deduplicate Stripe calls?

No. The Agent Service orchestrates which tool functions to call and passes the arguments, but it has no awareness of what happens inside your tool implementation. If your tool function fires stripe.Charge.create() and that call reaches Stripe, the Agent Service has no way to know the charge succeeded — it only sees the return value your function returns. Idempotency key management is entirely the responsibility of the tool function implementation.

When should I create a new thread vs reuse an existing thread?

Create a new thread per billing session or billing period. Threads are cheap to create (they're just state containers), and a fresh thread eliminates the risk of the agent misinterpreting prior billing history as instructions for the current run. Pass context between sessions via summarized system messages in the new thread rather than raw tool call history. If you need to look up prior charges in a new thread, use the get_charge_status read-only tool with the charge ID stored in your own database — not retrieved from thread history.

How does the Azure AI Agent Service run retry differ from application-level retry?

The Agent Service retries at the run level — when a run fails, you create a new run on the same thread, and the agent replays from the point of failure using the thread state. Application-level retry (via tenacity or similar) retries the HTTP call to the inference endpoint, before any tool calls execute. The two layers are independent: the Agent Service doesn't know about your application-level retry, and your application retry doesn't know about the Agent Service's run state. Idempotency keys in the tool function protect against both: the key is derived from tool arguments, which are identical at both the run level and the inference level for the same billing operation.

Can I use the billing vault key for both charging and looking up charges?

You can, but you shouldn't. Using a single vault key for both POST (create charges) and GET (retrieve charges) means the audit trail can't distinguish between billing intent and status-check intent at the proxy layer. More importantly, if the model mistakenly calls charge_stripe when it should have called get_charge_status, the billing vault key allows the charge to proceed. Separate vault keys — billing key for POST /v1/charges only, audit key for GET /v1/charges only — enforce the intent separation at the proxy layer, independent of whether the model makes the right function choice.

Does this pattern work with Azure OpenAI's function calling interface?

Yes. Azure OpenAI's function calling (openai.AzureOpenAI client with tools parameter) uses the same JSON-based tool call format. The idempotency key logic is identical: compute from the arguments JSON in the tool_calls response and pass to stripe.StripeClient with base_url pointing at the proxy. The retry behavior differs (OpenAI SDK retry vs azure-core retry), but the protective pattern is the same.

What vault key policy should I configure for a billing agent?

For the billing vault key: allowed endpoints = POST /stripe/v1/charges only; daily USD cap = your max expected billing volume per agent (e.g., $10,000 for an agent processing up to 344 × $29 charges per day); expires_at = end of billing cycle. For the audit vault key: allowed endpoints = GET /stripe/v1/charges/* only; no spend cap needed (GET calls don't charge). Issue one billing vault key per billing agent instance — not one key per organization. If a billing agent misbehaves, revoke its key without affecting other agents or billing cycles.

Vault keys for Azure AI Agent Service Stripe workflows

Keybrake issues scoped vault keys for Stripe — per-agent endpoint allowlists, daily spend caps, and a per-call audit log. One line change in your tool function from stripe.api_key to stripe.StripeClient(api_key=VAULT_KEY, base_url=PROXY_URL+"/stripe/"). Proxy is live now.