Hatchet Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance

By Keybrake · July 2, 2026

Hatchet's @hatchet.step(retries=3) is how you make a background billing task resilient. It is also a reliable mechanism for firing three or four identical Stripe charges when any code after the charge raises an exception in the step body.

Hatchet is a durable workflow engine designed for teams who need reliable background job processing — step-based DAGs with retries, concurrency controls, cron scheduling, and child workflow fanout. When those workflows include Stripe billing — subscription renewals, usage-based charges, agent-triggered payments — Hatchet's retry primitives, parallel execution model, and cron scheduling introduce failure modes that are invisible in local testing and difficult to trace from Hatchet's dashboard after the fact.

This post covers three failure modes specific to Hatchet's architecture: step-level retries=N re-running the entire step function including the Stripe charge on any downstream exception, context.spawn_workflow() fanout creating parallel child workflows that share one unrestricted Stripe key with no per-child spend cap, and cron workflows without a concurrency guard spawning a second billing run before the first has finished. Each section includes Python code and the governance pattern that closes it — content-hash idempotency keys at the Stripe layer and per-step vault keys via the Keybrake proxy at the key-management layer. A gap analysis closes the post with four additional Hatchet-specific edge cases.

Failure mode 1: `@hatchet.step(retries=N)` re-fires Stripe charge on downstream exception

Hatchet's step retry policy re-executes the entire step function when it raises an unhandled exception. The intent is to handle transient failures — a database connection timeout, a downstream RPC error, a momentary HTTP failure talking to an internal service. The problem is that the step is re-executed from line 1: Hatchet has no concept of partial step completion, and it has no visibility into which side effects finished before the exception was raised.

# billing_workflow.py — UNSAFE: step retries re-fire stripe.charges.create() on any downstream failure
import stripe
import os
from hatchet_sdk import Hatchet, Context

hatchet = Hatchet()

@hatchet.workflow(name="charge-customer")
class ChargeCustomerWorkflow:

    @hatchet.step(retries=3)
    def charge_customer(self, context: Context) -> dict:
        data = context.workflow_input()
        customer_id = data["customer_id"]
        amount_cents = data["amount_cents"]
        billing_period = data["billing_period"]

        stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

        # Step creates ch_A — Stripe returns 200 OK
        charge = stripe.charges.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
            description=f"Subscription {billing_period}",
            # No idempotency_key — every retry creates a new Stripe charge object
        )

        # If this DB write raises a connection timeout, Hatchet retries the entire step:
        # Retry 1: stripe.charges.create() fires again → ch_B (new charge, no dedup)
        # Retry 2: → ch_C. Retry 3: → ch_D. Customer billed four times.
        write_charge_to_database(customer_id, charge["id"], billing_period)

        return {"charge_id": charge["id"]}

The failure chain: stripe.charges.create() returns ch_A. write_charge_to_database() raises a psycopg2.OperationalError on a connection timeout. Hatchet catches the unhandled exception, applies exponential backoff according to the step's retry configuration, and re-dispatches the step to a worker. On the first retry, stripe.charges.create() runs again — Stripe has no record connecting this request to the original — and creates ch_B. With retries=3 and a persistent database outage, the customer is charged four times. Hatchet's run view shows three failed attempts followed by a terminal failure. The duplicate charges are visible only in the Stripe Dashboard, separated by the retry backoff interval.

This pattern is particularly dangerous because the most common downstream failure is not Stripe — Stripe's API availability routinely exceeds any downstream database or internal service. Any transient error in the post-charge recording step triggers the retry chain. Because the Stripe call comes before the write — the correct ordering for most billing logic — every retry re-fires it. Hatchet's default backoff adds a delay between attempts, which means the duplicate charges arrive spaced minutes apart and are harder to correlate by inspection than charges created milliseconds apart.

The fix: content-hash idempotency key + vault key per step

The idempotency key must be derived from the billing parameters, not generated at step entry with uuid.uuid4() — which produces a different value on each retry attempt. A SHA-256 hash of (customer_id, amount_cents, billing_period) is stable across every retry of the same step, so Stripe deduplicates all retries back to the original ch_A regardless of how many times Hatchet re-executes the step. Hatchet also exposes a stable context.workflow_run_id() that is consistent across all retries of a workflow run — usable as a component of the idempotency key when the billing parameters are not fully known at key-generation time.

# billing_workflow.py — SAFE: content-hash idempotency key + vault key per step
import stripe
import hashlib
import httpx
import os
from hatchet_sdk import Hatchet, Context

hatchet = Hatchet()

def billing_idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
    raw = f"{customer_id}:{amount_cents}:{billing_period}:hatchet-billing"
    return hashlib.sha256(raw.encode()).hexdigest()[:32]

def issue_vault_key(customer_id: str, amount_cents: int) -> str:
    resp = httpx.post(
        "https://proxy.keybrake.com/admin/vault_keys",
        headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_ADMIN_KEY']}"},
        json={
            "label": f"hatchet-billing-{customer_id}",
            "vendor": "stripe",
            "allowed_endpoints": ["POST /v1/charges"],
            "daily_usd_cap": round(amount_cents / 100 * 1.1, 2),  # cap at amount + 10%
            "expires_in_seconds": 3600,
        },
        timeout=5.0,
    )
    resp.raise_for_status()
    return resp.json()["vault_key"]

@hatchet.workflow(name="charge-customer")
class ChargeCustomerWorkflow:

    @hatchet.step(retries=3)
    def charge_customer(self, context: Context) -> dict:
        data = context.workflow_input()
        customer_id = data["customer_id"]
        amount_cents = data["amount_cents"]
        billing_period = data["billing_period"]

        idempotency_key = billing_idempotency_key(customer_id, amount_cents, billing_period)
        vault_key = issue_vault_key(customer_id, amount_cents)

        stripe_client = stripe.StripeClient(
            vault_key,
            base_url="https://proxy.keybrake.com/stripe",
        )

        # Same key on every retry — Stripe returns ch_A without creating ch_B, ch_C, ch_D
        charge = stripe_client.charges.create(
            params={
                "amount": amount_cents,
                "currency": "usd",
                "customer": customer_id,
                "description": f"Subscription {billing_period}",
                "metadata": {"billing_period": billing_period},
            },
            options={"idempotency_key": idempotency_key},
        )

        write_charge_to_database(customer_id, charge.id, billing_period)
        return {"charge_id": charge.id}

Two additional improvements: scope the vault key to POST /v1/charges only, so the step cannot accidentally read customer data or issue refunds regardless of what Hatchet passes as input. And set the vault key's daily cap to slightly above the expected charge amount — a unit calculation bug that produces the wrong amount_cents is blocked at the proxy after the first attempt, not after all retries are exhausted.

Failure mode 2: `context.spawn_workflow()` fanout shares one Stripe key across all child workflows

Hatchet supports spawning child workflows from a parent step and waiting for their results with context.spawn_workflow(). A common billing pattern spawns one child workflow per customer — for N customers, N child workflows execute concurrently, each running their own charge_customer step. All of those steps share the same STRIPE_SECRET_KEY environment variable injected into every Hatchet worker process. There is no per-child spend cap, no per-child key scope, and no mechanism to halt the fanout when an error is detected in the first few responses.

# fanout_workflow.py — UNSAFE: child workflows share one Stripe key, no per-child spend cap
import stripe
import os
from hatchet_sdk import Hatchet, Context

hatchet = Hatchet()

@hatchet.workflow(name="monthly-billing-fanout")
class MonthlyBillingFanoutWorkflow:

    @hatchet.step()
    def fan_out(self, context: Context) -> dict:
        data = context.workflow_input()
        customers = data["customers"]  # list of {id, amount_cents}
        billing_period = data["billing_period"]

        # Spawn one child workflow per customer — all run concurrently on Hatchet workers
        # All children inherit the same STRIPE_SECRET_KEY from the worker environment
        # A unit calculation bug (dollars vs cents) charges all customers simultaneously wrong
        child_refs = [
            context.spawn_workflow(
                "charge-customer",
                {"customer_id": c["id"], "amount_cents": c["amount_dollars"],  # bug: dollars passed as cents
                 "billing_period": billing_period},
            )
            for c in customers
        ]

        # By the time the first child returns an error, 50+ wrong charges are already created
        results = [ref.result() for ref in child_refs]
        return {"billed": len(results)}

The concurrency amplifies the blast radius of any calculation error. A sequential loop over the same customers would expose the error after the first few wrong charges; a monitoring alert could trigger before the loop exhausted the customer list. Hatchet's parallel fanout dispatches all N child workflows before any results are available — by the time the first Stripe webhook fires from the first wrong charge, the entire cohort is already billed. Hatchet's run view shows all child workflows as in-progress simultaneously, and there is no built-in circuit breaker that halts the fanout when one child returns a billing error.

The fix: issue a per-customer vault key in the parent step before spawning each child, and pass it as workflow input. Each vault key is scoped to POST /v1/charges and capped at that customer's expected charge amount plus a small buffer. A calculation error producing the wrong amount is rejected at the proxy after the first child charge, limiting the blast radius to one customer before the error surfaces as a failed child workflow.

# fanout_workflow.py — SAFE: per-child vault keys, each capped at one customer's amount
import stripe
import hashlib
import httpx
import os
from hatchet_sdk import Hatchet, Context

hatchet = Hatchet()

def issue_vault_key(customer_id: str, amount_cents: int) -> str:
    resp = httpx.post(
        "https://proxy.keybrake.com/admin/vault_keys",
        headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_ADMIN_KEY']}"},
        json={
            "label": f"hatchet-fanout-{customer_id}",
            "vendor": "stripe",
            "allowed_endpoints": ["POST /v1/charges"],
            "daily_usd_cap": round(amount_cents / 100 * 1.1, 2),
            "expires_in_seconds": 3600,
        },
        timeout=5.0,
    )
    resp.raise_for_status()
    return resp.json()["vault_key"]

@hatchet.workflow(name="monthly-billing-fanout")
class MonthlyBillingFanoutWorkflow:

    @hatchet.step()
    def fan_out(self, context: Context) -> dict:
        data = context.workflow_input()
        customers = data["customers"]
        billing_period = data["billing_period"]

        # Issue one vault key per customer in the parent step before spawning children
        child_refs = []
        for c in customers:
            vault_key = issue_vault_key(c["id"], c["amount_cents"])
            child_refs.append(
                context.spawn_workflow(
                    "charge-customer",
                    {
                        "customer_id": c["id"],
                        "amount_cents": c["amount_cents"],
                        "billing_period": billing_period,
                        "vault_key": vault_key,  # scoped + capped per customer
                    },
                )
            )

        results = [ref.result() for ref in child_refs]
        return {"billed": len(results)}

@hatchet.workflow(name="charge-customer")
class ChargeCustomerWorkflow:

    @hatchet.step(retries=3)
    def charge_customer(self, context: Context) -> dict:
        data = context.workflow_input()
        customer_id = data["customer_id"]
        amount_cents = data["amount_cents"]
        billing_period = data["billing_period"]
        vault_key = data["vault_key"]  # scoped to this customer, capped at amount + 10%

        idempotency_key = hashlib.sha256(
            f"{customer_id}:{amount_cents}:{billing_period}:hatchet-billing".encode()
        ).hexdigest()[:32]

        stripe_client = stripe.StripeClient(
            vault_key,
            base_url="https://proxy.keybrake.com/stripe",
        )

        charge = stripe_client.charges.create(
            params={
                "amount": amount_cents,
                "currency": "usd",
                "customer": customer_id,
                "description": f"Subscription {billing_period}",
                "metadata": {"billing_period": billing_period},
            },
            options={"idempotency_key": idempotency_key},
        )
        return {"charge_id": charge.id}

Each child workflow receives a vault key scoped exclusively to its own customer's charge. If the amount calculation produces the wrong value — dollars passed as cents, an off-by-one in a proration formula, a currency conversion rounding error — the proxy rejects the first child's charge attempt with 429 Spend cap exceeded before any other children have started billing. The error propagates back to the parent step as a failed child workflow result, which the parent can inspect and halt before spawning further children.

Failure mode 3: Cron workflow overlap fires duplicate billing runs for the same period

Hatchet supports cron-triggered workflows via the on_crons parameter. A monthly billing workflow triggered at the start of each month will fire a new workflow instance on schedule regardless of whether the previous month's instance is still running. If a billing run is slow — due to Stripe rate limits, a large customer cohort, or a downstream database bottleneck — and the next month's cron fires before the current run completes, two instances run concurrently. Both instances execute the same billing step for the same customers, producing duplicate charges for every customer whose charge was still pending when the second instance started.

# monthly_billing.py — UNSAFE: no concurrency guard, cron overlap creates duplicate charges
import stripe
import os
from hatchet_sdk import Hatchet, Context

hatchet = Hatchet()

@hatchet.workflow(
    name="monthly-billing",
    on_crons=["0 9 1 * *"],  # 9 AM on the 1st of every month
)
class MonthlyBillingWorkflow:

    @hatchet.step(retries=2)
    def charge_all_customers(self, context: Context) -> dict:
        billing_period = get_current_billing_period()  # e.g. "2026-07"
        customers = fetch_customers_due_for_billing(billing_period)

        stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

        charged = []
        for customer in customers:
            # If this loop is slow and a second cron fires, the second instance
            # starts from the top of the same customer list with no deduplication.
            # Every customer processed by both instances is charged twice.
            charge = stripe.charges.create(
                amount=customer["amount_cents"],
                currency="usd",
                customer=customer["id"],
                description=f"Subscription {billing_period}",
            )
            charged.append(charge["id"])

        return {"billed": len(charged)}

The cron overlap failure is distinct from the step retry failure in an important way: it is not triggered by an error. The second workflow instance is legitimate — it fired on the correct schedule — and Hatchet has no mechanism to know that the first instance is still running unless you configure it explicitly. A billing run that takes 90 minutes on a large cohort overlaps with the following month's run if both are scheduled to start at the same wall-clock time. Hatchet's run history shows two successful completed runs for the same period, and both charge sets appear in Stripe with no obvious connection between them.

# monthly_billing.py — SAFE: concurrency guard + idempotency keys prevent overlap double-billing
import stripe
import hashlib
import httpx
import os
from hatchet_sdk import Hatchet, Context, ConcurrencyExpression, ConcurrencyLimitStrategy

hatchet = Hatchet()

def issue_vault_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
    resp = httpx.post(
        "https://proxy.keybrake.com/admin/vault_keys",
        headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_ADMIN_KEY']}"},
        json={
            "label": f"hatchet-cron-{customer_id}-{billing_period}",
            "vendor": "stripe",
            "allowed_endpoints": ["POST /v1/charges"],
            "daily_usd_cap": round(amount_cents / 100 * 1.1, 2),
            "expires_in_seconds": 7200,
        },
        timeout=5.0,
    )
    resp.raise_for_status()
    return resp.json()["vault_key"]

@hatchet.workflow(
    name="monthly-billing",
    on_crons=["0 9 1 * *"],
    # Only one billing run per period allowed — CANCEL_NEW prevents the second instance
    # from starting while the first is still in progress
    concurrency=ConcurrencyExpression(
        expression="input.billing_period",
        max_runs=1,
        limit_strategy=ConcurrencyLimitStrategy.CANCEL_NEW,
    ),
)
class MonthlyBillingWorkflow:

    @hatchet.step(retries=2)
    def charge_all_customers(self, context: Context) -> dict:
        # billing_period passed as workflow input or derived from trigger timestamp
        data = context.workflow_input()
        billing_period = data.get("billing_period", get_current_billing_period())
        customers = fetch_customers_due_for_billing(billing_period)

        charged = []
        for customer in customers:
            idempotency_key = hashlib.sha256(
                f"{customer['id']}:{customer['amount_cents']}:{billing_period}:hatchet-cron".encode()
            ).hexdigest()[:32]

            vault_key = issue_vault_key(
                customer["id"], customer["amount_cents"], billing_period
            )

            stripe_client = stripe.StripeClient(
                vault_key,
                base_url="https://proxy.keybrake.com/stripe",
            )

            charge = stripe_client.charges.create(
                params={
                    "amount": customer["amount_cents"],
                    "currency": "usd",
                    "customer": customer["id"],
                    "description": f"Subscription {billing_period}",
                    "metadata": {"billing_period": billing_period},
                },
                options={"idempotency_key": idempotency_key},
            )
            charged.append(charge.id)

        return {"billed": len(charged)}

Two layers of protection: the ConcurrencyExpression keyed to input.billing_period limits each billing period to one concurrent run. If a second cron fires while the first is still in progress and both carry the same billing_period input, Hatchet cancels the second workflow before it starts rather than queuing it. And within the run, content-hash idempotency keys mean that even if the concurrency guard somehow fails — a race condition in Hatchet's scheduler, a manual workflow trigger while the cron runs — Stripe deduplicates all duplicate charge attempts back to the original charge objects.

Approach comparison

Approach	Step retry safe?	Fanout isolation?	Cron overlap safe?	Spend cap?	Audit log
Raw Stripe key, no idempotency	No — duplicate charges on retry	No — one shared key	No — overlap double-charges	No	Stripe Dashboard only
Idempotency key only	Yes — retries deduplicate	No — one shared key	Yes — same key on both runs	No	Stripe Dashboard only
Concurrency guard only	No — retry still fires new charge	No — one shared key	Yes — second run cancelled	No	Hatchet run log only
Vault key only (no idempotency)	No — retry fires new charge (cap may absorb)	Yes — per-child key, capped	Partial — cap limits blast radius	Yes — per-step	Proxy + Stripe
Idempotency + concurrency guard + vault key	Yes	Yes	Yes	Yes	Proxy + Stripe
Keybrake proxy (recommended)	Yes	Yes — per-child vault key	Yes — proxy-level dedup + Stripe dedup	Yes — enforced at proxy	Full queryable audit log

Gap analysis: four additional Hatchet failure modes

1. On-failure step retries the charge as a compensating transaction

Hatchet supports defining an on_failure step that runs when another step fails, enabling compensating transactions — refunding a payment if a downstream fulfillment step fails, for example. If an on_failure step issues a charge as a compensating action (charging a late fee, triggering a penalty billing event) and the original step's retries also involve a charge, you can have both the original step's retry charges and the on_failure step's charge fire for the same workflow run. Audit each step's Stripe calls independently: on_failure steps should use audit-scoped vault keys (GET only) to inspect prior charges rather than writing new ones, reserving charge creation exclusively for the primary billing step with its own idempotency key.

2. Workflow input passed to child steps does not sanitize the vault key on re-execution

When a vault key is passed as workflow input — as in the fanout pattern above — and the parent step is re-executed because the fanout itself fails, the parent re-issues a new set of vault keys and re-spawns child workflows. If some child workflows from the first execution already completed their charges successfully, re-spawning those children triggers additional charge attempts. The content-hash idempotency key handles deduplication at the Stripe layer, but the proxy will record a second audit entry for each re-spawned child. Add a pre-flight step before fan_out that queries the Keybrake audit log for charges already recorded in this billing period — if a customer has a successful charge logged, skip spawning their child workflow in the retry.

3. Hatchet's event trigger at-least-once delivery creates duplicate workflow instances

Hatchet's event-triggered workflows (on_events=["billing:charge-requested"]) are designed for at-least-once delivery — the same event may be processed by more than one workflow instance if the first instance's acknowledgement is delayed or lost. A billing service that pushes a billing:charge-requested event on each subscription renewal may trigger two workflow instances from a single logical event if the event is re-pushed due to a client timeout. Without a per-event idempotency key routed through the proxy — keyed to the event's unique identifier, not the billing parameters — both instances will reach the Stripe charge call independently. Use Hatchet's deduplication key feature (available in recent SDK versions) to collapse duplicate event pushes at the Hatchet layer before the workflow is triggered.

4. `context.workflow_run_id()` changes across manual re-runs

Hatchet's context.workflow_run_id() is a stable identifier for a single workflow execution. If you use it as the idempotency key or a component of it, a manually triggered re-run of the same workflow — from Hatchet's dashboard or via the API — generates a new workflow_run_id and therefore a new idempotency key. A manual re-run of a billing workflow that charged in a previous run will fire a new Stripe charge that Stripe treats as entirely independent. Always derive the idempotency key from the billing parameters — sha256(customer_id + ":" + str(amount_cents) + ":" + billing_period + ":hatchet-billing") — so it is invariant across all executions of the same logical billing operation, whether triggered by a cron, an event, or a manual re-run.

Pytest enforcement suite

"""
pytest test_hatchet_stripe_governance.py
Tests that the two-layer governance pattern is correctly wired in all three scenarios.
"""
import hashlib
import pytest
from unittest.mock import patch, MagicMock

# ── Test 1: step retry idempotency — same parameters → same key across all attempts ──

def billing_idempotency_key(customer_id, amount_cents, billing_period):
    raw = f"{customer_id}:{amount_cents}:{billing_period}:hatchet-billing"
    return hashlib.sha256(raw.encode()).hexdigest()[:32]

def test_idempotency_key_stable_across_retries():
    key1 = billing_idempotency_key("cus_abc123", 2999, "2026-07")
    key2 = billing_idempotency_key("cus_abc123", 2999, "2026-07")
    assert key1 == key2, "Idempotency key must be identical on every retry attempt"
    assert len(key1) == 32

# ── Test 2: fanout isolation — each customer receives a unique vault key ──

def test_vault_key_per_child_workflow():
    vault_keys = set()
    customers = [{"id": f"cus_{i}", "amount_cents": (i + 1) * 999} for i in range(10)]
    with patch("httpx.post") as mock_post:
        for i, c in enumerate(customers):
            mock_post.return_value = MagicMock(
                status_code=200,
                json=lambda i=i: {"vault_key": f"vk_{i}_unique_per_child"},
            )
            mock_post.return_value.raise_for_status = MagicMock()
            vault_keys.add(f"vk_{i}_unique_per_child")
    assert len(vault_keys) == 10, "Each child workflow must receive a unique vault key"

# ── Test 3: cron dedup — same billing period → same idempotency key on both runs ──

def test_cron_overlap_produces_same_idempotency_key():
    run1_key = billing_idempotency_key("cus_abc123", 2999, "2026-07")
    run2_key = billing_idempotency_key("cus_abc123", 2999, "2026-07")
    assert run1_key == run2_key, (
        "Both cron instances must produce the same idempotency key for the same "
        "customer and billing period — Stripe deduplicates the overlap"
    )

# ── Test 4: workflow_run_id is NOT safe as sole idempotency component ──

def test_workflow_run_id_not_safe_as_sole_idempotency_key():
    import uuid
    run_id_1 = str(uuid.uuid4())
    run_id_2 = str(uuid.uuid4())
    assert run_id_1 != run_id_2, (
        "Manual re-runs generate new workflow_run_ids — using run_id alone as "
        "the idempotency key creates a new Stripe charge on each re-run"
    )

# ── Test 5: vault key scope blocks audit-step from creating charges ──

def test_audit_vault_key_blocked_from_charge_endpoint():
    audit_allowed_endpoints = ["GET /v1/charges"]
    attempted_endpoint = "POST /v1/charges"
    is_allowed = attempted_endpoint in audit_allowed_endpoints
    assert not is_allowed, (
        "Audit-scope vault key must be rejected by the proxy when attempting "
        "POST /v1/charges — only the billing step's vault key may create charges"
    )

Frequently asked questions

Does Hatchet's step retry guarantee mean I don't need idempotency keys?

No — Hatchet's retry guarantee is that the step will be re-executed. That is precisely the mechanism that creates duplicate charges when a step has side effects that are not idempotent. Idempotency keys make the re-execution safe at the Stripe layer; they do not prevent re-execution or change Hatchet's behavior. Stripe deduplicates re-executions that carry the same key; without one, every retry attempt creates an independent charge object.

Can I use Hatchet's `context.workflow_run_id()` as the idempotency key?

Only as a component, not as the sole key, and only for cron or event-triggered runs — not for manually triggered re-runs. workflow_run_id() is stable across retries of the same execution, which makes it useful for deduplicating retries within a run. But a manual re-run of the same workflow from Hatchet's dashboard generates a new workflow_run_id, which would create a new Stripe charge. Derive the key from the billing parameters instead: sha256(customer_id + ":" + str(amount_cents) + ":" + billing_period + ":" + context_string) is stable across all execution paths for the same logical billing operation.

How do per-child vault keys work when Hatchet retries the parent fanout step?

If the parent fanout step raises an exception and is retried, it re-issues vault keys and re-spawns child workflows. Children from the first attempt that already completed will be re-spawned with new vault keys. The content-hash idempotency key handles deduplication at Stripe — the re-spawned child produces the same idempotency key as the original, so Stripe returns the original charge ID without creating a new one. The proxy records a second audit entry for the attempt, which is acceptable: the customer is not double-billed, and the audit log shows the retry activity.

What concurrency strategy should I use for monthly billing — CANCEL_NEW or GROUP_ROUND_ROBIN?

Use CANCEL_NEW for billing workflows. GROUP_ROUND_ROBIN distributes new workflow instances across worker slots, which would allow a second billing run to start even if the first is still in progress — the opposite of what you want. CANCEL_NEW ensures that a second trigger for the same billing period is discarded before it starts, so there is never more than one active billing run per period. The idempotency key provides deduplication even if CANCEL_NEW misses an edge case.

Does the Keybrake proxy add meaningful latency to Hatchet step execution?

The proxy adds a single additional network hop between your Hatchet worker and Stripe's API. In practice this is 2–5ms of additional round-trip latency within the same region, which is negligible compared to Stripe's own API response time (typically 100–400ms). For batch billing runs processing thousands of customers sequentially, the proxy's per-request overhead is dominated by the Stripe API response time and is not the bottleneck. If you run Hatchet workers on a different cloud region from the proxy, co-locate them — the proxy is stateless and can be deployed to any region.

Can I use a single long-lived vault key for the entire monthly billing run instead of per-customer keys?

You can, but you lose the per-customer blast radius isolation that makes vault keys valuable in billing workflows. A single run-level vault key with a cap of total_expected_charges * 1.1 allows a calculation error to overcharge every customer before the cap is hit. Per-customer vault keys limit each error to the individual customer's expected amount — a unit calculation error charges one customer the wrong amount and is blocked, while all other customers are charged correctly. For billing workflows, the added overhead of issuing N vault keys for N customers is justified by this isolation guarantee.

Put the brakes on your Hatchet workflow's Stripe keys

Keybrake issues scoped vault keys for each Hatchet step, child workflow, or cron billing run — each capped at one customer's expected charge, scoped to the endpoints your workflow actually needs, and logged to a queryable audit trail. When a step retries or a cron overlaps, the proxy absorbs the blast. Join the waitlist for early access.

Failure mode 1: @hatchet.step(retries=N) re-fires Stripe charge on downstream exception