AI agents · Cost visibility · Spend reporting
AI agent spend reporting: per-run cost visibility for autonomous systems
When an AI agent makes payments via Stripe, sends SMS via Twilio, or sends email via Resend, the vendor's dashboard shows you what happened — account totals, maybe a transaction list. But it can't tell you which agent run caused the charge, whether that run was within budget, which agent is the most expensive across your fleet, or what your projected monthly spend is at the current run rate. These questions require per-run attribution, and vendor dashboards weren't built for it. This page covers the data model, query patterns, and alerting design for effective AI agent spend reporting.
TL;DR
The foundation of agent spend reporting is an audit log where every vendor API call is tagged with agent_run_id, agent_name, and cost_usd_parsed — data that vendor dashboards don't expose. A proxy audit log provides this at the call level; vendor dashboards provide it at the account level with a 24-hour lag. With a per-call audit log, four SQL queries answer the four questions operations teams actually need: per-run cost summary, per-agent aggregate, per-vendor daily trend, and anomalous-spend alerts. The reporting layer is simple once the data source is right; the hard part is getting the right data source.
Why vendor dashboards aren't enough for agent spend reporting
Stripe's Dashboard, Twilio's Monitor, and Resend's email log all provide transaction-level records. But they share three gaps that make them inadequate as the sole reporting source for AI agent operations:
| Reporting need | Vendor dashboard | Proxy audit log |
|---|---|---|
| Per-run cost summary — how much did run X cost? | No agent run context. Stripe shows charges grouped by customer, not by which agent execution triggered them. You'd need to reconstruct this by matching Stripe's created timestamp to your orchestration system's run timeline — a manual join with no shared key. |
Every call tagged with agent_run_id. SELECT SUM(cost_usd) FROM audit_log WHERE agent_run_id = 'run_abc123' returns the exact per-run cost in milliseconds. |
| Per-agent aggregate — which agent spends the most? | No agent identity in vendor records. Stripe PaymentIntent has a metadata field you can populate, but only if you remember to do so in every SDK call — a convention, not enforcement. Twilio and Resend have no equivalent. |
Every call tagged with agent_name. SELECT agent_name, SUM(cost_usd) FROM audit_log GROUP BY agent_name ORDER BY 2 DESC gives the full fleet breakdown. |
| Per-vendor daily trend — is Stripe spend accelerating? | Stripe's Dashboard shows daily totals for the account — not per-agent or per-workflow-type. A spike caused by a single runaway agent blends into the account total and may not be identifiable as agent-caused until hours later. | SELECT vendor, date(called_at), SUM(cost_usd) FROM audit_log GROUP BY 1,2 shows per-vendor daily trends, filterable by agent name or run label. |
| Projected monthly spend — at current rate, what's the monthly bill? | Stripe shows the current billing period total. No per-agent run-rate projection. Vendor billing alerts fire after thresholds are crossed — no forward-looking projection. | 7-day rolling average per agent × 30 gives the monthly projection per agent, computed from the audit log without waiting for a billing cycle. |
The minimum audit log schema for agent spend reporting
Effective spend reporting starts with the right columns in the audit log. The minimum schema that enables all four reporting layers:
CREATE TABLE audit_log (
id TEXT PRIMARY KEY,
agent_run_id TEXT NOT NULL, -- workflow run ID, task ID, or session ID
agent_name TEXT NOT NULL, -- "billing-agent", "notification-agent"
vendor TEXT NOT NULL, -- "stripe", "twilio", "resend"
endpoint TEXT NOT NULL, -- "POST /v1/payment_intents"
cost_usd REAL, -- parsed from vendor response
vendor_txn_id TEXT, -- Stripe PaymentIntent.id, Twilio SID, Resend message ID
policy_verdict TEXT, -- "allowed", "cap_hit", "endpoint_blocked"
called_at TEXT NOT NULL, -- ISO 8601 timestamp
vault_key_id TEXT -- which vault key authorized this call
);
The cost_usd column is the critical one that vendor dashboards don't provide in a per-call, agent-tagged format. It's parsed from vendor response data: Stripe's amount field (in cents, divided by 100), Twilio's price field on the status callback, Resend's fixed per-email rate from the account tier. A proxy that forwards calls to these vendors can parse this data at call time and write it to the audit log before returning the response to the agent.
Four query patterns for agent spend reporting
1. Per-run cost summary
-- Cost breakdown for a single agent run
SELECT
vendor,
endpoint,
COUNT(*) AS call_count,
SUM(cost_usd) AS total_cost_usd,
MIN(called_at) AS first_call,
MAX(called_at) AS last_call
FROM audit_log
WHERE agent_run_id = 'temporal/billing-workflow/run_abc123'
GROUP BY vendor, endpoint
ORDER BY total_cost_usd DESC;
This answers the incident-response question "what did run X actually do and how much did it cost?" in under a second, without pulling logs from Stripe and cross-referencing timestamps.
2. Per-agent aggregate (fleet view)
-- Per-agent spend for the last 7 days
SELECT
agent_name,
vendor,
COUNT(*) AS total_calls,
SUM(cost_usd) AS total_spend_usd,
AVG(cost_usd) AS avg_cost_per_call,
MAX(cost_usd) AS max_single_call_usd
FROM audit_log
WHERE called_at >= datetime('now', '-7 days')
GROUP BY agent_name, vendor
ORDER BY total_spend_usd DESC;
This identifies which agents are the highest spenders, which vendors are being used most, and whether any agent's average cost per call is anomalously high (a signal of calls with larger-than-expected amounts).
3. Per-vendor daily trend
-- Daily spend per vendor for the last 30 days
SELECT
date(called_at) AS day,
vendor,
COUNT(*) AS calls,
SUM(cost_usd) AS spend_usd
FROM audit_log
WHERE called_at >= datetime('now', '-30 days')
GROUP BY date(called_at), vendor
ORDER BY day DESC, spend_usd DESC;
This is the time-series view that shows whether Stripe spend is accelerating (daily total growing), whether a new agent deployment increased costs, and whether spend is flat or seasonal. This data is available in the vendor's billing portal, but not filterable by agent or tagged with run context.
4. Projected monthly spend
-- 30-day spend projection based on 7-day rolling average per agent
SELECT
agent_name,
vendor,
SUM(cost_usd) / 7.0 AS avg_daily_spend_usd,
SUM(cost_usd) / 7.0 * 30 AS projected_monthly_spend_usd
FROM audit_log
WHERE called_at >= datetime('now', '-7 days')
GROUP BY agent_name, vendor
ORDER BY projected_monthly_spend_usd DESC;
Run this weekly to give engineering and finance a forward-looking view. If the projection exceeds budget, you can reduce per-run caps or adjust run frequency before the month ends.
Alerting triggers for agent spend anomalies
Spend reporting should drive automated alerts, not just dashboards engineers check manually. Three alert patterns that catch real incidents:
| Alert | Query | What it catches |
|---|---|---|
| Cap hit rate spike | SELECT COUNT(*) FROM audit_log WHERE policy_verdict = 'cap_hit' AND called_at >= datetime('now', '-1 hour') — alert if count > 10 in 1 hour |
An agent hitting its cap repeatedly suggests the cap is set too low, or the agent is in a retry loop. Cap hits that repeat indicate the agent isn't handling the 429 correctly. |
| Per-run cost exceeds threshold | SELECT agent_run_id, SUM(cost_usd) FROM audit_log WHERE called_at >= datetime('now', '-1 hour') GROUP BY agent_run_id HAVING SUM(cost_usd) > 200 — alert on any run exceeding $200 |
A single run with unexpectedly high spend — usually caused by a data error (larger customer list than expected) or a logic bug (a loop that wasn't supposed to iterate that many times). Catches incidents before the daily cap is hit. |
| Spend velocity anomaly | Compare current-hour spend to average-hour spend for the same agent over the last 30 days. Alert if current hour is >3× the 30-day average. | A deployment that accidentally doubled the batch size, a new agent type that's more expensive than expected, or an on-call incident that triggered many retries. The velocity check catches structural changes that cap-hit alerts don't. |
How Keybrake provides the data source
Keybrake is a proxy that sits between your agents and their vendor APIs. Every call that passes through the proxy is written to an audit log with agent_run_id (from the agent_run_label you set on the vault key), vendor, endpoint, cost_usd_parsed (extracted from the vendor response), and policy_verdict. This is the audit log that drives all four query patterns above.
The proxy provides what vendor dashboards can't: per-call, per-agent attribution available immediately after the call, not hours later in a billing report. The GET /audit endpoint queries the audit log with filters by agent_run_id, agent_name, vendor, and time range — the same queries above can be run via the API without maintaining your own database.
Related questions
Can I build agent spend reporting without a proxy, using Stripe metadata?
Partially. Stripe's PaymentIntent.metadata accepts arbitrary key-value pairs — you can add agent_run_id and agent_name to every charge. This enables per-agent Stripe spend queries via the Stripe API or data export. But three gaps remain: (1) Twilio and Resend have no equivalent metadata field, so vendor coverage is incomplete; (2) you have to consistently pass the metadata in every Stripe SDK call across your entire codebase — a convention that fails silently when missed; (3) parsed cost requires a custom ETL from Stripe's export format into your reporting database, which adds maintenance overhead. A proxy enforces the tagging at the call level regardless of which code path calls the vendor.
What's the difference between agent spend reporting and LLM cost reporting?
They track different things and require different data sources. LLM cost reporting (via LiteLLM, Helicone, Portkey) tracks token consumption and inference costs — typically $0.001–$0.10 per call, bounded by context window. Agent spend reporting tracks vendor API costs — Stripe charges, Twilio SMS, Resend email — which are typically lower per-call but unbounded in volume (a stuck loop can make thousands of calls). The two are on different axes: an agent with a $10 LLM cost per run can generate $10,000 in Stripe charges. Both reporting layers should exist; they're joined on agent_run_id to give a complete per-run cost picture (LLM inference cost + vendor action cost).
How long should I retain audit log data for spend reporting?
For operational alerting (cap hit rate, velocity anomalies), 7 days of retention is sufficient — the queries run against recent data. For per-agent trend analysis and monthly projections, 90 days lets you compare current month to last quarter and spot seasonal patterns. For compliance and incident investigation, 1 year is the common standard — you want to be able to reconstruct what an agent did in response to a customer dispute months later. Keybrake's Team plan includes 90-day audit retention; the Free plan retains 7 days.
Further reading
- AI agent audit trail — the foundational data model for tracking what agents did, with the minimum four-column schema and the three implementation paths.
- AI agent cost management — the three-axis decomposition of agent costs (LLM tokens, vendor API spend, infrastructure) and why they require different controls.
- AI agent Stripe spend cap — why Stripe's native billing alerts are post-charge (not pre-charge) and the distinction between a cap and an alert for spend reporting.
- AI agent multi-tenant isolation — when serving multiple customers, per-tenant spend reporting requires per-tenant vault keys so audit queries can filter by tenant without cross-contamination.