LangSmith Stripe Tracing: Close the Observability Gap for AI Agent Payments
LangSmith gives you full visibility into your LLM calls — tokens, latency, reasoning chains, tool invocations. But the moment your agent charges Stripe, LangSmith goes blind. It records that the tool was called, not whether the charge succeeded, what the charge ID was, or how much money moved. This post covers the gap and how to close it without forking your code.
What LangSmith actually traces
LangSmith is
LangChain's hosted tracing and evaluation platform. When you add the
LANGCHAIN_TRACING_V2=true environment variable and a project key,
every LLM call in your chain is traced automatically:
- Model name, input messages, output tokens
- Latency at each step of a chain or agent
- Tool definitions and the arguments the model chose to pass
- Errors and retries at the LLM layer
- Total token cost (via model pricing tables LangSmith maintains)
This is genuinely useful. If your agent makes five LLM calls before deciding to trigger a Stripe charge, you can see each reasoning step, the token cost of the planning phase, and exactly which tool arguments the model produced.
What LangSmith does not trace is what happens after the tool arguments are handed to your tool function. From LangSmith's perspective, a tool call is a black box: arguments in, result string out.
The observability gap: tool execution is a black box
Consider a simple LangChain billing agent. The agent decides to charge a
customer $49.00 and calls a charge_customer tool. In LangSmith,
you see something like:
Tool: charge_customer
Input: {"customer_id": "cus_abc123", "amount_cents": 4900, "currency": "usd"}
Output: "Charge created: ch_xyz789"
Latency: 310ms
That looks complete. But here's what LangSmith didn't capture:
- Which Stripe key was used — the restricted key, the full secret, or the wrong env var because the agent ran in the wrong environment?
- The actual HTTP response — was the charge status
succeededorpending? Did Stripe return acard_errorthat your tool code swallowed? - The real cost — LangSmith tracks LLM token cost. It has no knowledge that your agent just moved $49 in Stripe.
- Daily spend accumulation — if the agent runs 50 times today, LangSmith shows you 50 tool calls. It does not tell you the agent charged $2,450 to real cards.
- Rate-limit headers — Stripe 429s that your retry logic handles silently appear as a slightly higher latency in LangSmith.
For a simple invoice-on-demand agent, this gap is acceptable. For an autonomous billing agent running unattended — recurring charges, dunning retries, refund decisions — the gap becomes a liability.
A minimal LangChain + Stripe example
Let's make the gap concrete. Here's a LangChain agent that creates Stripe charges, with LangSmith tracing enabled:
import os
import stripe
from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
# LangSmith tracing (traces LLM calls automatically)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGSMITH_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "billing-agent"
stripe.api_key = os.environ["STRIPE_SECRET_KEY"] # full secret — risky
@tool
def charge_customer(customer_id: str, amount_cents: int, currency: str = "usd") -> str:
"""Create a Stripe charge for a customer."""
charge = stripe.PaymentIntent.create(
amount=amount_cents,
currency=currency,
customer=customer_id,
confirm=True,
automatic_payment_methods={"enabled": True, "allow_redirects": "never"},
)
return f"PaymentIntent {charge.id}: {charge.status}"
llm = ChatAnthropic(model="claude-sonnet-4-6")
tools = [charge_customer]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a billing agent. Use the charge_customer tool when asked."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=False)
result = executor.invoke({"input": "Charge customer cus_abc123 $49 for the Pro plan renewal."})
print(result["output"])
With LANGCHAIN_TRACING_V2=true, LangSmith captures the Claude
call, the tool invocation arguments, and the string result. What it does not
see is the Stripe POST /v1/payment_intents HTTP call, the
response body, the charge.id in the Stripe audit trail, or
the fact that $49 moved between accounts.
What LangSmith shows vs. what it misses
Here's the gap laid out as a table:
| Signal | LangSmith | LangSmith + Keybrake |
|---|---|---|
| LLM model and token count | ✅ Full trace | ✅ Full trace |
| Agent reasoning chain | ✅ Full trace | ✅ Full trace |
| Tool call arguments (what the model chose) | ✅ Logged | ✅ Logged |
| Stripe HTTP request path + method | ❌ Not captured | ✅ Every call logged |
| Stripe response status + charge ID | ❌ Not captured | ✅ Full response metadata |
| Which Stripe key was used | ❌ Not captured | ✅ Vault key ID logged per call |
| Real dollar amount moved per call | ❌ Not captured | ✅ Parsed from response, cumulative |
| Daily vendor spend cap enforcement | ❌ No enforcement | ✅ Hard stop at configured limit |
| Stripe rate-limit events (429s) | ❌ Appears as latency | ✅ Explicit 429 log entries |
| Per-agent or per-run isolation | ❌ Shared key across runs | ✅ Vault key per agent instance |
| Kill switch (revoke mid-run) | ❌ Not possible | ✅ Revoke vault key, all calls stop |
Adding Keybrake: one-line proxy routing
Keybrake is a reverse proxy that sits between your agent and Stripe. You
route Stripe API calls through proxy.keybrake.com/stripe/v1/
instead of api.stripe.com/v1/. The proxy looks up your real
Stripe key (stored server-side), enforces your policy (spend cap, allowed
endpoints, expiry), forwards the call to Stripe, and logs the result.
For the Stripe Python SDK, there is exactly one line to change:
import stripe
# Before: direct Stripe call, no observability
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
# After: routed through Keybrake proxy
stripe.api_key = os.environ["KEYBRAKE_VAULT_KEY"] # vault_key_xxx
stripe.api_base = "https://proxy.keybrake.com/stripe" # one new line
Every Stripe method call in your codebase — stripe.PaymentIntent.create(),
stripe.Refund.create(), stripe.Customer.retrieve() —
now flows through the proxy without any other changes. The Stripe SDK's
request structure is preserved; the proxy simply intercepts, enforces policy,
and forwards.
Updated LangSmith + Keybrake example
import os
import stripe
from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from pydantic import BaseModel, Field
# LangSmith: traces LLM layer
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGSMITH_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "billing-agent"
# Keybrake: traces Stripe layer
stripe.api_key = os.environ["KEYBRAKE_VAULT_KEY"] # vault_key_xxx
stripe.api_base = "https://proxy.keybrake.com/stripe" # proxy routing
class ChargeInput(BaseModel):
customer_id: str = Field(description="Stripe customer ID (cus_...)")
amount_cents: int = Field(gt=0, le=100_000, description="Amount in cents, max $1,000")
currency: str = Field(default="usd", pattern="^[a-z]{3}$")
@tool(args_schema=ChargeInput)
def charge_customer(customer_id: str, amount_cents: int, currency: str = "usd") -> str:
"""Create a Stripe PaymentIntent for a customer. Max $1,000 per call."""
intent = stripe.PaymentIntent.create(
amount=amount_cents,
currency=currency,
customer=customer_id,
confirm=True,
automatic_payment_methods={"enabled": True, "allow_redirects": "never"},
)
return f"PaymentIntent {intent.id} status={intent.status} amount={amount_cents/100:.2f} {currency.upper()}"
llm = ChatAnthropic(model="claude-sonnet-4-6")
tools = [charge_customer]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a billing agent. Always confirm the amount before charging."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=False)
result = executor.invoke({"input": "Charge customer cus_abc123 $49 for the Pro plan."})
print(result["output"])
The only changes from the original example are the two stripe.*
lines at the top. LangSmith tracing is unchanged. LangChain, the tools,
and the agent logic are identical.
What you can now observe end-to-end
After adding the proxy, you have two complementary observability layers:
LangSmith shows
- The full LLM reasoning chain that led to the charging decision
- Token usage and latency for the Claude model call
- The
ChargeInputthe model constructed (customer ID, amount, currency) - The string the tool returned (
PaymentIntent pi_xxx status=succeeded) - Total LLM cost for the run
Keybrake audit log shows
- The HTTP method and path:
POST /stripe/v1/payment_intents - Which vault key was used:
vault_key_prod_agent_billing - Stripe's response status (200, 402 card declined, 429 rate limited)
- The charge amount parsed from the response body
- Cumulative vendor spend for the day against your configured cap
- Request duration and Stripe's
Request-Idheader for support lookup
Together, LangSmith tells you why the agent decided to charge, and Keybrake tells you what actually happened at Stripe. Neither alone gives the full picture.
Production setup: per-agent vault keys and spend caps
In production, you issue a separate vault key per agent instance or per deployment environment. This matters for two reasons:
- Isolation. If a staging agent misbehaves, revoking its vault key does not affect the production agent. With a shared Stripe key, your only option is rotating the secret everywhere at once.
- Attribution. Keybrake's audit log ties each Stripe call to a vault key. When LangSmith shows you an anomalous run at 03:00 UTC, you can cross-reference the Keybrake log for the same timestamp to see exactly which Stripe calls that run made.
# Create a vault key with a $500/day cap on payment intents only
POST https://proxy.keybrake.com/admin/vault_keys
Authorization: Bearer {your_admin_key}
Content-Type: application/json
{
"label": "billing-agent-prod",
"vendor": "stripe",
"daily_usd_cap": 500,
"allowed_endpoints": [
"POST /v1/payment_intents",
"GET /v1/payment_intents/*",
"GET /v1/customers/*"
],
"expires_at": "2026-09-01T00:00:00Z"
}
# Response
{
"vault_key": "vault_key_prod_abc123xyz",
"vendor": "stripe",
"daily_usd_cap": 500,
"status": "active"
}
Set this vault key in your agent's environment and the spend cap is enforced server-side — no SDK changes, no try/except wrappers in your tool code. If the agent hits $500 in Stripe charges in one day, subsequent calls return HTTP 429 from the proxy (not from Stripe), and LangSmith will log the tool error for you to investigate.
Setting vault keys per LangSmith project
LangSmith's project concept maps naturally to Keybrake's vault key concept. One LangSmith project, one vault key:
# .env.production
LANGCHAIN_PROJECT=billing-agent-prod
LANGSMITH_API_KEY=ls_prod_...
KEYBRAKE_VAULT_KEY=vault_key_prod_abc123xyz
# .env.staging
LANGCHAIN_PROJECT=billing-agent-staging
LANGSMITH_API_KEY=ls_staging_...
KEYBRAKE_VAULT_KEY=vault_key_staging_def456uvw
With this setup, LangSmith runs for billing-agent-staging and
Keybrake logs for vault_key_staging_def456uvw can be correlated
by timestamp to reconstruct any run end-to-end.
Correlating LangSmith traces with Keybrake logs
For deeper correlation, add the LangSmith run ID to your tool's Stripe metadata so it appears in both systems:
from langchain_core.callbacks import get_openai_callback
from langchain_core.runnables import RunnableConfig
import langsmith
@tool(args_schema=ChargeInput)
def charge_customer(
customer_id: str,
amount_cents: int,
currency: str = "usd",
config: RunnableConfig = None,
) -> str:
"""Create a Stripe PaymentIntent for a customer."""
# Get the current LangSmith run ID if available
run_id = None
if config and config.get("callbacks"):
for cb in config["callbacks"]:
if hasattr(cb, "run_id"):
run_id = str(cb.run_id)
break
intent = stripe.PaymentIntent.create(
amount=amount_cents,
currency=currency,
customer=customer_id,
confirm=True,
automatic_payment_methods={"enabled": True, "allow_redirects": "never"},
metadata={
"langsmith_run_id": run_id or "unknown",
"agent": "billing-agent",
},
)
return f"PaymentIntent {intent.id} status={intent.status}"
The langsmith_run_id appears in Stripe's charge metadata and
in Keybrake's forwarded request body. Given a Stripe charge ID from a dispute
or refund request, you can look up the LangSmith run that created it in
seconds.
Gap analysis: what this setup still doesn't cover
LangSmith + Keybrake closes the major observability gap, but a few scenarios still fall through:
-
Stripe webhooks. When Stripe sends a
payment_intent.succeededwebhook to your server, that event was not triggered by your agent's HTTP call — it's an inbound request. Neither LangSmith nor Keybrake captures inbound webhooks. You'll want a separate webhook handler with its own logging. - Stripe API calls outside the agent. If your server-side code (outside LangChain) also calls Stripe using the same key, Keybrake will capture those calls but LangSmith will not have a corresponding trace. Keep agent Stripe calls and server-side Stripe calls on separate vault keys.
- LangSmith token cost vs. real cost. LangSmith estimates LLM cost using published model pricing. If you're on a custom contract or a batched pricing tier, LangSmith's cost estimate may differ from your invoice. This is a LangSmith limitation, unrelated to Keybrake.
-
Complex multi-step charges. If one agent run creates a
PaymentIntent, a second run confirms it, and a third run captures it,
Keybrake logs each call separately but the LangSmith traces are on three
different run IDs. The
langsmith_run_idmetadata approach above handles this if you pass a shared session ID across runs.
FAQ
Does adding Keybrake slow down Stripe calls?
The proxy adds 5–15ms of latency for the proxy-to-Stripe leg. Stripe API calls typically take 200–400ms, so this is less than 5% overhead. For billing agents that run offline or asynchronously, this is not perceptible. For latency-sensitive payment flows, benchmark your specific use case first.
Does LangSmith see Keybrake's 429 (spend cap exceeded) errors?
Yes. When the proxy rejects a call because the daily spend cap is hit,
the Stripe SDK raises a stripe.error.RateLimitError. If your
tool lets that exception propagate, LangSmith logs it as a tool error
with the full traceback. This is useful: you can set a LangSmith alert
on tool errors matching RateLimitError to get notified
when an agent hits its spend cap before the money runs out.
Can I use LangSmith's dataset and evaluation features alongside Keybrake?
Yes, but carefully. LangSmith lets you replay traced runs as evaluations. When replaying a run that contained a Stripe tool call, the evaluation will hit Keybrake again — and if the vault key has a daily cap, evaluation runs count toward it. Use a separate vault key with a very low cap (e.g., $1) for evaluation runs to prevent accidental charges.
Does this work with LangGraph agents?
Yes. LangGraph builds on LangChain and is traced by LangSmith automatically.
Tool nodes in a LangGraph graph work the same way as LangChain tools —
adding the two stripe.* lines is sufficient. The graph's
full state machine is visible in LangSmith's trace, and each Stripe call
within any node is logged by Keybrake.
Do I need to set up Keybrake's real Stripe key server-side before this works?
Yes. The vault key your agent uses is a credential that maps server-side to your real Stripe key plus a policy. You create the vault key via Keybrake's admin API (or dashboard), specify the real Stripe key, set a daily cap and allowed endpoints, and Keybrake stores the real key encrypted. Your agent code never sees the real Stripe key — only the vault key. Start at proxy.keybrake.com to set up your first vault key.
I use LangSmith's prompt playground to test my agent. Will Stripe calls fire in the playground?
Yes, if your tool is wired to a live vault key. LangSmith's playground runs your chain with real tool calls. To prevent accidental charges during prompt iteration, either (a) use a vault key scoped to Stripe's test mode (with a test Stripe key), or (b) use a vault key with a $0 cap that blocks all calls. Both approaches let you iterate on your prompt logic without risk.
Summary
LangSmith and Keybrake cover complementary blind spots in AI agent observability:
- LangSmith — LLM reasoning, token cost, chain latency, tool arguments
- Keybrake — Stripe HTTP calls, charge IDs, real dollar amounts, spend cap enforcement, per-agent isolation
The integration requires two lines of code (stripe.api_key and
stripe.api_base) and no changes to your LangSmith setup,
agent logic, or tool definitions. Both systems are fully independent and
can be added to an existing LangChain codebase incrementally.
If your agent makes Stripe calls today with a full secret key and no spend cap, the next incident is a matter of when, not if. A stuck retry loop, a prompt injection, or a logic bug can create real charges before LangSmith can alert you — because LangSmith doesn't see those charges. Adding Keybrake closes the loop.
Try the proxy — free up to 1,000 requests/month →
See also: LangChain Stripe Integration: Safe Agent Payments with Policy Enforcement · AI Agent API Governance in Python: Policy Models, Spend Enforcement, and Audit Logs · Stripe Restricted API Key Permissions: Complete Reference for AI Agents