AI agents · Cost visibility · Spend reporting

AI agent spend reporting: per-run cost visibility for autonomous systems

When an AI agent makes payments via Stripe, sends SMS via Twilio, or sends email via Resend, the vendor's dashboard shows you what happened — account totals, maybe a transaction list. But it can't tell you which agent run caused the charge, whether that run was within budget, which agent is the most expensive across your fleet, or what your projected monthly spend is at the current run rate. These questions require per-run attribution, and vendor dashboards weren't built for it. This page covers the data model, query patterns, and alerting design for effective AI agent spend reporting.

TL;DR

The foundation of agent spend reporting is an audit log where every vendor API call is tagged with agent_run_id, agent_name, and cost_usd_parsed — data that vendor dashboards don't expose. A proxy audit log provides this at the call level; vendor dashboards provide it at the account level with a 24-hour lag. With a per-call audit log, four SQL queries answer the four questions operations teams actually need: per-run cost summary, per-agent aggregate, per-vendor daily trend, and anomalous-spend alerts. The reporting layer is simple once the data source is right; the hard part is getting the right data source.

Why vendor dashboards aren't enough for agent spend reporting

Stripe's Dashboard, Twilio's Monitor, and Resend's email log all provide transaction-level records. But they share three gaps that make them inadequate as the sole reporting source for AI agent operations:

Reporting need	Vendor dashboard	Proxy audit log
Per-run cost summary — how much did run X cost?	No agent run context. Stripe shows charges grouped by customer, not by which agent execution triggered them. You'd need to reconstruct this by matching Stripe's `created` timestamp to your orchestration system's run timeline — a manual join with no shared key.	Every call tagged with `agent_run_id`. `SELECT SUM(cost_usd) FROM audit_log WHERE agent_run_id = 'run_abc123'` returns the exact per-run cost in milliseconds.
Per-agent aggregate — which agent spends the most?	No agent identity in vendor records. Stripe `PaymentIntent` has a `metadata` field you can populate, but only if you remember to do so in every SDK call — a convention, not enforcement. Twilio and Resend have no equivalent.	Every call tagged with `agent_name`. `SELECT agent_name, SUM(cost_usd) FROM audit_log GROUP BY agent_name ORDER BY 2 DESC` gives the full fleet breakdown.
Per-vendor daily trend — is Stripe spend accelerating?	Stripe's Dashboard shows daily totals for the account — not per-agent or per-workflow-type. A spike caused by a single runaway agent blends into the account total and may not be identifiable as agent-caused until hours later.	`SELECT vendor, date(called_at), SUM(cost_usd) FROM audit_log GROUP BY 1,2` shows per-vendor daily trends, filterable by agent name or run label.
Projected monthly spend — at current rate, what's the monthly bill?	Stripe shows the current billing period total. No per-agent run-rate projection. Vendor billing alerts fire after thresholds are crossed — no forward-looking projection.	7-day rolling average per agent × 30 gives the monthly projection per agent, computed from the audit log without waiting for a billing cycle.

The minimum audit log schema for agent spend reporting

Effective spend reporting starts with the right columns in the audit log. The minimum schema that enables all four reporting layers:

CREATE TABLE audit_log (
    id            TEXT PRIMARY KEY,
    agent_run_id  TEXT NOT NULL,  -- workflow run ID, task ID, or session ID
    agent_name    TEXT NOT NULL,  -- "billing-agent", "notification-agent"
    vendor        TEXT NOT NULL,  -- "stripe", "twilio", "resend"
    endpoint      TEXT NOT NULL,  -- "POST /v1/payment_intents"
    cost_usd      REAL,           -- parsed from vendor response
    vendor_txn_id TEXT,           -- Stripe PaymentIntent.id, Twilio SID, Resend message ID
    policy_verdict TEXT,          -- "allowed", "cap_hit", "endpoint_blocked"
    called_at     TEXT NOT NULL,  -- ISO 8601 timestamp
    vault_key_id  TEXT            -- which vault key authorized this call
);

The cost_usd column is the critical one that vendor dashboards don't provide in a per-call, agent-tagged format. It's parsed from vendor response data: Stripe's amount field (in cents, divided by 100), Twilio's price field on the status callback, Resend's fixed per-email rate from the account tier. A proxy that forwards calls to these vendors can parse this data at call time and write it to the audit log before returning the response to the agent.

Four query patterns for agent spend reporting

1. Per-run cost summary

-- Cost breakdown for a single agent run
SELECT
    vendor,
    endpoint,
    COUNT(*) AS call_count,
    SUM(cost_usd) AS total_cost_usd,
    MIN(called_at) AS first_call,
    MAX(called_at) AS last_call
FROM audit_log
WHERE agent_run_id = 'temporal/billing-workflow/run_abc123'
GROUP BY vendor, endpoint
ORDER BY total_cost_usd DESC;

This answers the incident-response question "what did run X actually do and how much did it cost?" in under a second, without pulling logs from Stripe and cross-referencing timestamps.

2. Per-agent aggregate (fleet view)

-- Per-agent spend for the last 7 days
SELECT
    agent_name,
    vendor,
    COUNT(*) AS total_calls,
    SUM(cost_usd) AS total_spend_usd,
    AVG(cost_usd) AS avg_cost_per_call,
    MAX(cost_usd) AS max_single_call_usd
FROM audit_log
WHERE called_at >= datetime('now', '-7 days')
GROUP BY agent_name, vendor
ORDER BY total_spend_usd DESC;

This identifies which agents are the highest spenders, which vendors are being used most, and whether any agent's average cost per call is anomalously high (a signal of calls with larger-than-expected amounts).

3. Per-vendor daily trend

-- Daily spend per vendor for the last 30 days
SELECT
    date(called_at) AS day,
    vendor,
    COUNT(*) AS calls,
    SUM(cost_usd) AS spend_usd
FROM audit_log
WHERE called_at >= datetime('now', '-30 days')
GROUP BY date(called_at), vendor
ORDER BY day DESC, spend_usd DESC;

This is the time-series view that shows whether Stripe spend is accelerating (daily total growing), whether a new agent deployment increased costs, and whether spend is flat or seasonal. This data is available in the vendor's billing portal, but not filterable by agent or tagged with run context.

4. Projected monthly spend

-- 30-day spend projection based on 7-day rolling average per agent
SELECT
    agent_name,
    vendor,
    SUM(cost_usd) / 7.0 AS avg_daily_spend_usd,
    SUM(cost_usd) / 7.0 * 30 AS projected_monthly_spend_usd
FROM audit_log
WHERE called_at >= datetime('now', '-7 days')
GROUP BY agent_name, vendor
ORDER BY projected_monthly_spend_usd DESC;

Run this weekly to give engineering and finance a forward-looking view. If the projection exceeds budget, you can reduce per-run caps or adjust run frequency before the month ends.

Alerting triggers for agent spend anomalies

Spend reporting should drive automated alerts, not just dashboards engineers check manually. Three alert patterns that catch real incidents:

Alert	Query	What it catches
Cap hit rate spike	`SELECT COUNT(*) FROM audit_log WHERE policy_verdict = 'cap_hit' AND called_at >= datetime('now', '-1 hour')` — alert if count > 10 in 1 hour	An agent hitting its cap repeatedly suggests the cap is set too low, or the agent is in a retry loop. Cap hits that repeat indicate the agent isn't handling the 429 correctly.
Per-run cost exceeds threshold	`SELECT agent_run_id, SUM(cost_usd) FROM audit_log WHERE called_at >= datetime('now', '-1 hour') GROUP BY agent_run_id HAVING SUM(cost_usd) > 200` — alert on any run exceeding $200	A single run with unexpectedly high spend — usually caused by a data error (larger customer list than expected) or a logic bug (a loop that wasn't supposed to iterate that many times). Catches incidents before the daily cap is hit.
Spend velocity anomaly	Compare current-hour spend to average-hour spend for the same agent over the last 30 days. Alert if current hour is >3× the 30-day average.	A deployment that accidentally doubled the batch size, a new agent type that's more expensive than expected, or an on-call incident that triggered many retries. The velocity check catches structural changes that cap-hit alerts don't.

How Keybrake provides the data source

Keybrake is a proxy that sits between your agents and their vendor APIs. Every call that passes through the proxy is written to an audit log with agent_run_id (from the agent_run_label you set on the vault key), vendor, endpoint, cost_usd_parsed (extracted from the vendor response), and policy_verdict. This is the audit log that drives all four query patterns above.

The proxy provides what vendor dashboards can't: per-call, per-agent attribution available immediately after the call, not hours later in a billing report. The GET /audit endpoint queries the audit log with filters by agent_run_id, agent_name, vendor, and time range — the same queries above can be run via the API without maintaining your own database.

Get early access