Ray · AI agents · API key security
Ray AI agent API key: spend caps for distributed agent workloads
Ray turns a Python function into a distributed, parallelized workload with a single decorator. For ML training and data processing, that's transformative. For AI agents that call Stripe, Twilio, or Resend, it's a liability: ray.remote() can spawn hundreds of simultaneous vendor API calls in the time it takes your monitoring alert to fire. This page covers what Ray's runtime doesn't handle for vendor spend control, and the vault-key pattern that bounds per-job spend without touching your cluster config.
TL;DR
Ray's @ray.remote tasks execute in parallel across workers — which is great for throughput but dangerous for vendor APIs that charge per call. Issue one vault key at job start, enforce a per-job dollar cap at the proxy layer, and get a structured audit log with the Ray job ID attached. If a job runs wild, revoke the vault key from the dashboard without touching the real API key that every other Ray job shares.
How Ray AI agents call vendor APIs
In a Ray workload, vendor API calls are typically inside @ray.remote functions or inside Actor methods. A billing agent processing a customer list might look like:
import ray
import stripe
import os
ray.init()
@ray.remote
def charge_customer(customer_id: str, amount_cents: int) -> dict:
stripe.api_key = os.environ["STRIPE_SECRET_KEY"] # full-access key
return stripe.PaymentIntent.create(
amount=amount_cents,
currency="usd",
customer=customer_id,
)
def run_billing(customer_ids: list[str], amount_cents: int):
futures = [
charge_customer.remote(cid, amount_cents)
for cid in customer_ids
]
return ray.get(futures) # blocks until all complete
This is idiomatic Ray. The issue: STRIPE_SECRET_KEY is serialized into every remote function's environment, there's no per-job cap, and .remote() dispatches all tasks simultaneously — Ray's scheduler limits concurrency by available CPU resources, not by vendor API spend.
Three gaps Ray's runtime doesn't fill for vendor spend control
| Gap | What happens in practice | Ray's answer |
|---|---|---|
| No per-job spend cap | A billing job receives a list of 10,000 customer IDs from a data pipeline bug. Ray dispatches all 10,000 charge_customer.remote() calls. Depending on cluster size, 500-2,000 may execute simultaneously. Stripe processes every successfully-delivered call. The damage is bounded only by your Stripe account credit limit. |
None. Ray Dashboard shows task throughput and resource usage, not dollar spend on vendor APIs. |
| No job-level revoke | You call ray.cancel(future) or ray.shutdown() to stop a runaway job. Tasks already submitted to workers may have made vendor API calls before cancellation reached them. Rotating the Stripe secret to prevent further calls breaks every other Ray job sharing that environment variable. |
Task cancellation via ray.cancel() stops pending tasks but cannot recall vendor API calls already dispatched by executing tasks. |
| No per-call audit with Ray job context | Ray's task events and logs show execution time and error rates, but don't parse dollar amounts from Stripe responses or cross-reference the Stripe Request-Id with the Ray job_id in a queryable format. |
Ray Dashboard task timelines and application logs. No structured cost tracking, no Ray-job-to-Stripe-charge correlation. |
The parallelism risk: ray.remote and simultaneous vendor calls
Ray's parallelism model is its greatest strength and, in the context of vendor API calls, its most significant risk surface. When you write:
futures = [charge_customer.remote(cid, amount_cents) for cid in customer_ids]
ray.get(futures)
You are submitting all tasks to the Ray scheduler at once. On a cluster with 64 CPU cores, Ray will execute up to 64 charge_customer tasks simultaneously — each making its own Stripe API call. On a cluster with 500 cores, it's 500 simultaneous calls. The data pipeline bug that sends 10,000 IDs instead of 100 completes in roughly the same wall-clock time as a correct 100-ID run — Ray just uses more parallelism.
A per-job vault key turns 429s into a circuit breaker. The proxy enforces the cap atomically: once the cumulative spend across all concurrent calls hits the cap, subsequent calls return 429. Ray tasks that receive 429 raise exceptions, which surface as task failures in the Ray Dashboard — visible and queryable, not silent charges on next month's Stripe statement.
The Actor pattern: long-running agent workers
Ray Actors are long-lived Python objects that maintain state across method calls. An AI customer-service agent built on Ray Actors might keep a conversation context and call Twilio to send SMS updates. Each Actor instance holds its own reference to the Stripe or Twilio key:
@ray.remote
class CustomerAgent:
def __init__(self):
self.twilio_key = os.environ["TWILIO_AUTH_TOKEN"] # per-actor copy
def send_notification(self, to: str, message: str) -> dict:
client = Client(os.environ["TWILIO_ACCOUNT_SID"], self.twilio_key)
return client.messages.create(to=to, from_="+15550001234", body=message)
If you spin up 200 CustomerAgent actors for a notification campaign, you now have 200 actor instances each holding a reference to the full-access Twilio key. Revoking access to one actor means rotating the key for all 200 — and for every other system sharing that key.
Scoping vault keys per Ray job
Issue the vault key in the driver (the script that calls ray.init()), then pass it as a parameter to remote functions or actor constructors:
import ray
import httpx
import stripe
import os
ray.init()
def issue_vault_key(job_id: str, budget_usd: float) -> str:
r = httpx.post(
"https://proxy.keybrake.com/vault/keys",
headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_API_KEY']}"},
json={
"vendor": "stripe",
"daily_usd_cap": budget_usd,
"allowed_endpoints": ["POST /v1/payment_intents"],
"expires_in": "2h",
"agent_run_label": f"ray-billing/{job_id}",
},
)
return r.json()["vault_key"]
@ray.remote
def charge_customer(customer_id: str, amount_cents: int, vault_key: str) -> dict:
stripe.api_key = vault_key # scoped vault key
stripe.api_base = "https://proxy.keybrake.com/stripe"
return stripe.PaymentIntent.create(
amount=amount_cents,
currency="usd",
customer=customer_id,
idempotency_key=f"{customer_id}-{amount_cents}",
)
def run_billing(customer_ids: list[str], amount_cents: int, budget_usd: float = 500.0):
job_id = ray.get_runtime_context().get_job_id()
vault_key = issue_vault_key(str(job_id), budget_usd)
futures = [
charge_customer.remote(cid, amount_cents, vault_key)
for cid in customer_ids
]
return ray.get(futures)
The vault key is issued once in the driver and passed to all remote tasks as a string parameter. Ray serializes it along with the other task arguments — it's ephemeral and scoped to this job. The real Stripe secret never leaves Keybrake. The $500 cap is shared across all concurrent task invocations: once the parallel calls collectively hit $500, further calls return 429s.
How Keybrake fits
Keybrake is the proxy layer between your Ray tasks and Stripe, Twilio, or Resend. You swap the real API key for the vault key and point the vendor SDK at https://proxy.keybrake.com/stripe. The real secret stays in Keybrake's environment, not serialized into Ray remote task arguments or actor state. Each Ray job gets its own vault key with its own dollar cap, endpoint allowlist, and expiry. Parallel task fan-outs that exceed the cap return 429s — catchable exceptions in your remote functions, not silent charges across hundreds of concurrent worker processes.
Related questions
Does passing the vault key as a task parameter expose it in Ray's object store?
The vault key string is serialized into Ray's distributed object store when passed as a task argument — the same way any string parameter is. This is no more or less secure than passing the real API key as a parameter (which is the alternative). The key advantage is that the vault key has a short TTL (expires_in: "2h"), a dollar cap, and can be revoked via the Keybrake dashboard even if it's already been distributed to workers. The real Stripe secret, by contrast, lives only in Keybrake and is never serialized into Ray's runtime at all.
How do I handle the vault key for long-lived Ray Actors?
Pass the vault key in the Actor's __init__ and store it as an instance attribute. Set the vault key TTL to match the Actor's expected lifetime plus a buffer. If your actors run for more than a few hours, implement a refresh_key() method that issues a new vault key and updates the instance attribute — call it from the driver when the key is close to expiry. The per-actor cap should reflect the maximum spend a single actor instance should make in its lifetime; if you run 100 actors, issue 100 vault keys (one per actor) rather than sharing a single key across all actor instances.
What happens to in-flight Ray tasks when I revoke the vault key from the Keybrake dashboard?
Revocation takes effect at the proxy layer immediately. Tasks that have already made a vendor API call and received a response are unaffected — those transactions are complete. Tasks that are mid-execution and haven't yet called the vendor API will receive a 401 when they do, which surfaces as an exception in the Ray task and marks that task instance as failed. Tasks that haven't been scheduled yet will also receive 401s when they eventually execute. This gives you a clean stop: future vendor calls are blocked, past calls are preserved in the audit log.
Further reading
- AI agent kill switch patterns — the four ways to stop a runaway agent and their real stop latencies, including distributed workload scenarios.
- AI agent audit trail schema — what belongs in a structured per-call log and the SQL queries that matter when reviewing a billing incident.
- Modal AI agent API key — similar pattern for Modal's serverless functions: per-invocation spend caps on parallel vendor calls.
- AI agent credential management — the full architecture: storage vs. access vs. enforcement vs. audit, and which tools handle which layer.