Ray · AI agents · API key security

Ray AI agent API key: spend caps for distributed agent workloads

Ray turns a Python function into a distributed, parallelized workload with a single decorator. For ML training and data processing, that's transformative. For AI agents that call Stripe, Twilio, or Resend, it's a liability: ray.remote() can spawn hundreds of simultaneous vendor API calls in the time it takes your monitoring alert to fire. This page covers what Ray's runtime doesn't handle for vendor spend control, and the vault-key pattern that bounds per-job spend without touching your cluster config.

TL;DR

Ray's @ray.remote tasks execute in parallel across workers — which is great for throughput but dangerous for vendor APIs that charge per call. Issue one vault key at job start, enforce a per-job dollar cap at the proxy layer, and get a structured audit log with the Ray job ID attached. If a job runs wild, revoke the vault key from the dashboard without touching the real API key that every other Ray job shares.

How Ray AI agents call vendor APIs

In a Ray workload, vendor API calls are typically inside @ray.remote functions or inside Actor methods. A billing agent processing a customer list might look like:

import ray
import stripe
import os

ray.init()

@ray.remote
def charge_customer(customer_id: str, amount_cents: int) -> dict:
    stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # full-access key
    return stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
    )

def run_billing(customer_ids: list[str], amount_cents: int):
    futures = [
        charge_customer.remote(cid, amount_cents)
        for cid in customer_ids
    ]
    return ray.get(futures)  # blocks until all complete

This is idiomatic Ray. The issue: STRIPE_SECRET_KEY is serialized into every remote function's environment, there's no per-job cap, and .remote() dispatches all tasks simultaneously — Ray's scheduler limits concurrency by available CPU resources, not by vendor API spend.

Three gaps Ray's runtime doesn't fill for vendor spend control

Gap	What happens in practice	Ray's answer
No per-job spend cap	A billing job receives a list of 10,000 customer IDs from a data pipeline bug. Ray dispatches all 10,000 `charge_customer.remote()` calls. Depending on cluster size, 500-2,000 may execute simultaneously. Stripe processes every successfully-delivered call. The damage is bounded only by your Stripe account credit limit.	None. Ray Dashboard shows task throughput and resource usage, not dollar spend on vendor APIs.
No job-level revoke	You call `ray.cancel(future)` or `ray.shutdown()` to stop a runaway job. Tasks already submitted to workers may have made vendor API calls before cancellation reached them. Rotating the Stripe secret to prevent further calls breaks every other Ray job sharing that environment variable.	Task cancellation via `ray.cancel()` stops pending tasks but cannot recall vendor API calls already dispatched by executing tasks.
No per-call audit with Ray job context	Ray's task events and logs show execution time and error rates, but don't parse dollar amounts from Stripe responses or cross-reference the Stripe `Request-Id` with the Ray `job_id` in a queryable format.	Ray Dashboard task timelines and application logs. No structured cost tracking, no Ray-job-to-Stripe-charge correlation.

The parallelism risk: ray.remote and simultaneous vendor calls

Ray's parallelism model is its greatest strength and, in the context of vendor API calls, its most significant risk surface. When you write:

futures = [charge_customer.remote(cid, amount_cents) for cid in customer_ids]
ray.get(futures)

You are submitting all tasks to the Ray scheduler at once. On a cluster with 64 CPU cores, Ray will execute up to 64 charge_customer tasks simultaneously — each making its own Stripe API call. On a cluster with 500 cores, it's 500 simultaneous calls. The data pipeline bug that sends 10,000 IDs instead of 100 completes in roughly the same wall-clock time as a correct 100-ID run — Ray just uses more parallelism.

A per-job vault key turns 429s into a circuit breaker. The proxy enforces the cap atomically: once the cumulative spend across all concurrent calls hits the cap, subsequent calls return 429. Ray tasks that receive 429 raise exceptions, which surface as task failures in the Ray Dashboard — visible and queryable, not silent charges on next month's Stripe statement.

The Actor pattern: long-running agent workers

Ray Actors are long-lived Python objects that maintain state across method calls. An AI customer-service agent built on Ray Actors might keep a conversation context and call Twilio to send SMS updates. Each Actor instance holds its own reference to the Stripe or Twilio key:

@ray.remote
class CustomerAgent:
    def __init__(self):
        self.twilio_key = os.environ["TWILIO_AUTH_TOKEN"]  # per-actor copy

    def send_notification(self, to: str, message: str) -> dict:
        client = Client(os.environ["TWILIO_ACCOUNT_SID"], self.twilio_key)
        return client.messages.create(to=to, from_="+15550001234", body=message)

If you spin up 200 CustomerAgent actors for a notification campaign, you now have 200 actor instances each holding a reference to the full-access Twilio key. Revoking access to one actor means rotating the key for all 200 — and for every other system sharing that key.

Scoping vault keys per Ray job

Issue the vault key in the driver (the script that calls ray.init()), then pass it as a parameter to remote functions or actor constructors:

import ray
import httpx
import stripe
import os

ray.init()

def issue_vault_key(job_id: str, budget_usd: float) -> str:
    r = httpx.post(
        "https://proxy.keybrake.com/vault/keys",
        headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_API_KEY']}"},
        json={
            "vendor": "stripe",
            "daily_usd_cap": budget_usd,
            "allowed_endpoints": ["POST /v1/payment_intents"],
            "expires_in": "2h",
            "agent_run_label": f"ray-billing/{job_id}",
        },
    )
    return r.json()["vault_key"]

@ray.remote
def charge_customer(customer_id: str, amount_cents: int, vault_key: str) -> dict:
    stripe.api_key = vault_key                          # scoped vault key
    stripe.api_base = "https://proxy.keybrake.com/stripe"
    return stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        idempotency_key=f"{customer_id}-{amount_cents}",
    )

def run_billing(customer_ids: list[str], amount_cents: int, budget_usd: float = 500.0):
    job_id = ray.get_runtime_context().get_job_id()
    vault_key = issue_vault_key(str(job_id), budget_usd)
    futures = [
        charge_customer.remote(cid, amount_cents, vault_key)
        for cid in customer_ids
    ]
    return ray.get(futures)

The vault key is issued once in the driver and passed to all remote tasks as a string parameter. Ray serializes it along with the other task arguments — it's ephemeral and scoped to this job. The real Stripe secret never leaves Keybrake. The $500 cap is shared across all concurrent task invocations: once the parallel calls collectively hit $500, further calls return 429s.

How Keybrake fits

Keybrake is the proxy layer between your Ray tasks and Stripe, Twilio, or Resend. You swap the real API key for the vault key and point the vendor SDK at https://proxy.keybrake.com/stripe. The real secret stays in Keybrake's environment, not serialized into Ray remote task arguments or actor state. Each Ray job gets its own vault key with its own dollar cap, endpoint allowlist, and expiry. Parallel task fan-outs that exceed the cap return 429s — catchable exceptions in your remote functions, not silent charges across hundreds of concurrent worker processes.

Get early access