Google Cloud Run Jobs · AI agents · API key security

Google Cloud Run Jobs AI agent API key: scoping vendor calls in batch task containers

Google Cloud Run Jobs is a fully managed compute platform that runs containerized tasks to completion — no servers to manage, no queue infrastructure, and built-in retry logic. AI agent teams adopt Cloud Run Jobs for batch billing workflows, bulk notification sends, and recurring agent pipelines triggered by Cloud Scheduler. The parallelism setting fans out multiple task container instances simultaneously — each instance calls Stripe, Twilio, or Resend independently using the same API key mounted from Secret Manager. There is no per-job dollar cap in Cloud Run Jobs; task retries re-run failed containers that may have already billed Stripe; and no structured per-job vendor cost log exists natively. This page covers the vault-key pattern that bounds vendor spend per Cloud Run job execution.

TL;DR

Add a setup container step — or call the Keybrake API from your job's entrypoint before the main task loop — to issue a scoped vault key for each job run. Pass the vault key to task containers via an environment variable set from the API response, or write it to a shared Cloud Storage object that all task instances read at startup. Each container uses vault_key as the Authorization: Bearer credential when calling Stripe through the Keybrake proxy at https://proxy.keybrake.com/stripe/.... All parallel task containers share one accumulating cap. Revoking a runaway job is a single DELETE /vault/keys/{key_id} call — no Secret Manager rotation, no container image redeployment.

How Cloud Run Jobs AI agent workflows call vendor APIs

A typical agent batch billing job fans out task containers across a customer slice:

# cloudbuild.yaml trigger (or gcloud CLI equivalent)
# gcloud run jobs create billing-agent \
#   --image gcr.io/myproject/billing-agent:latest \
#   --tasks 500 \
#   --parallelism 50 \
#   --max-retries 3 \
#   --set-secrets STRIPE_SECRET_KEY=stripe-key:latest \
#   --region us-central1

# billing-agent container entrypoint (Python)
import os, stripe, math

TASK_INDEX = int(os.environ["CLOUD_RUN_TASK_INDEX"])   # 0-based
TASK_COUNT = int(os.environ["CLOUD_RUN_TASK_COUNT"])   # total tasks

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

# Each container handles its slice of customers
customers = fetch_customers_slice(TASK_INDEX, TASK_COUNT)
for customer in customers:
    stripe.PaymentIntent.create(
        amount=customer["amount_cents"],
        currency="usd",
        customer=customer["id"],
    )

This pattern has two compounding risks. First, parallelism=50 means Cloud Run Jobs starts 50 task containers simultaneously, each independently calling Stripe with the same STRIPE_SECRET_KEY from Secret Manager. A job with 500 tasks and parallelism 50 has 50 simultaneous containers charging customers at any moment — there is no dollar-spend stop condition at the job level. Cloud Billing budget alerts fire after spend has been incurred (typically hours later, after billing ingestion), too late to stop a batch job that completes in minutes. Second, max-retries 3 re-executes failed task containers. If Stripe returned a 500 after the charge partially succeeded, the retry creates a duplicate charge unless every payment call includes a stable idempotency key derived from the task index — which the default Cloud Run Jobs environment doesn't provide automatically.

Three gaps Cloud Run Jobs' native tooling doesn't fill for vendor spend control

GapWhat happens in practiceCloud Run Jobs' answer
No per-job spend cap Cloud Run Jobs has no mechanism to stop a job when cumulative vendor API spend reaches a dollar threshold. parallelism limits concurrent containers by count, not by cost. A billing job processing 10,000 customers with a bug that charges twice per customer runs to completion — all 10,000 containers finish before any billing alert fires. Cloud Billing budget alerts arrive on an 8-to-48 hour lag; they notify, they don't stop tasks already running. Cloud Billing budget alerts fire after spend is incurred. No pre-call per-job dollar cap exists in Cloud Run Jobs itself.
No mid-job vendor revoke without Secret Manager rotation The Stripe key is mounted from Secret Manager at container startup. To revoke it mid-job, you must rotate the Secret Manager secret version — but containers that already loaded the secret into process memory continue using the old key until they exit. Cancelling a Cloud Run Job execution (gcloud run jobs cancel) sends a SIGTERM to each container, which has 10 seconds to finish current work. A container mid-way through a Stripe API call will complete that call before exiting. No mechanism prevents the last in-flight call from being charged. gcloud run jobs cancel sends SIGTERM with a 10-second graceful shutdown window. In-flight vendor API calls complete before shutdown. No per-call revoke that fires before the API call is forwarded.
No per-call audit with job context Cloud Logging captures container stdout/stderr and structured log entries emitted by your code, but it doesn't parse dollar amounts from Stripe response bodies, correlate Stripe PaymentIntent.id with the Cloud Run job execution ID and task index, or provide a queryable per-job vendor spend summary. Reconstructing what a runaway job charged requires cross-referencing Cloud Logging, Cloud Run job execution metadata, and the Stripe Dashboard with manual timestamp alignment. Cloud Logging captures execution events. No structured vendor cost tracking or job-execution-to-charge correlation built in.

The parallelism amplification risk

The parallelism setting is the primary spend amplifier. Setting --parallelism 100 on a 1,000-task job means Cloud Run Jobs maintains 100 simultaneous container instances at all times — each independently calling Stripe. If your container processes a single customer and exits in 200ms, Cloud Run Jobs cycles through 1,000 containers in roughly 2 seconds of wall time, but the vendor charges accumulate as fast as Stripe can respond. At 100 concurrent containers each making one Stripe call, you can exhaust a per-day rate limit or accumulate significant spend in seconds.

The task retry amplification compounds this. When a task container exits with a non-zero exit code, Cloud Run Jobs retries it up to max-retries times. A container that called Stripe, got a 500, and exited with code 1 is retried — without any check on whether Stripe actually processed the charge. Unless you embed the task index in the idempotency key (CLOUD_RUN_TASK_INDEX is available as an environment variable), each retry creates a new charge attempt that Stripe treats as a distinct request.

Scoping vault keys per Cloud Run job execution

import os
import httpx
import stripe

KEYBRAKE_BASE = "https://proxy.keybrake.com"
KEYBRAKE_ADMIN_KEY = os.environ["KEYBRAKE_ADMIN_KEY"]
TASK_INDEX = int(os.environ.get("CLOUD_RUN_TASK_INDEX", "0"))
TASK_COUNT = int(os.environ.get("CLOUD_RUN_TASK_COUNT", "1"))
JOB_EXECUTION = os.environ.get("CLOUD_RUN_EXECUTION", "local")

# Task 0 issues the vault key and writes it to GCS.
# All other tasks read the vault key from GCS.
# This ensures exactly one vault key is issued per job execution.
import json
from google.cloud import storage

def get_or_issue_vault_key() -> str:
    bucket = storage.Client().bucket(os.environ["GCS_VAULT_BUCKET"])
    blob = bucket.blob(f"vault-keys/{JOB_EXECUTION}.json")

    if blob.exists():
        data = json.loads(blob.download_as_text())
        return data["vault_key"]

    # Only task 0 should reach here — other tasks race briefly on cold start
    # but blob.exists() is strongly consistent in GCS, so no double-issuance
    resp = httpx.post(
        f"{KEYBRAKE_BASE}/vault/keys",
        headers={"Authorization": f"Bearer {KEYBRAKE_ADMIN_KEY}"},
        json={
            "vendor": "stripe",
            "daily_usd_cap": float(os.environ.get("JOB_BUDGET_USD", "500")),
            "allowed_endpoints": ["POST /v1/payment_intents"],
            "expires_in": "2h",
            "label": JOB_EXECUTION,
        },
    )
    resp.raise_for_status()
    vault_key = resp.json()["vault_key"]
    blob.upload_from_string(json.dumps({"vault_key": vault_key}))
    return vault_key

vault_key = get_or_issue_vault_key()

customers = fetch_customers_slice(TASK_INDEX, TASK_COUNT)
for customer in customers:
    # Idempotency key: stable across task retries for the same customer
    idempotency_key = f"{JOB_EXECUTION}-{customer['id']}"

    resp = httpx.post(
        f"{KEYBRAKE_BASE}/stripe/v1/payment_intents",
        headers={
            "Authorization": f"Bearer {vault_key}",
            "Idempotency-Key": idempotency_key,
        },
        json={
            "amount": customer["amount_cents"],
            "currency": "usd",
            "customer": customer["id"],
        },
    )

    if resp.status_code == 429 and resp.json().get("code") == "cap_exhausted":
        print(f"[{JOB_EXECUTION}] spend cap exhausted at task {TASK_INDEX}, exiting")
        break   # Exit cleanly — do not retry cap exhaustion

    resp.raise_for_status()

This pattern ensures one vault key is issued per Cloud Run job execution regardless of how many parallel task containers start simultaneously. GCS's strongly consistent object reads mean all containers see the same vault key once task 0 writes it. Each customer's idempotency key is job_execution_id + customer_id — stable across Cloud Run retries of the same task container. When the cap is exhausted, the container exits cleanly with code 0 so Cloud Run Jobs does not count it as a failed task requiring retry.

How Keybrake fits

Keybrake is the proxy layer between your Cloud Run task containers and Stripe, Twilio, or Resend. The vault key replaces the STRIPE_SECRET_KEY Secret Manager secret that was previously injected directly into each container — the real Stripe secret stays in Keybrake and never appears in Cloud Logging, container environment dumps, or task execution metadata. All parallel task containers share one accumulating spend cap; the cap is enforced atomically on each proxied call, so 50 simultaneous containers each trying to push past the cap are all blocked at the proxy layer, not after the charges land. Revoking a runaway job is a single DELETE /vault/keys/{key_id} call — effective on the next proxied request, without Secret Manager rotation, without container image changes, and without affecting other job executions holding different vault keys.

Get early access

Related questions

How do I share one vault key across all parallel task containers without issuing N keys?

Write the vault key to a Cloud Storage object immediately after issuance. All task containers read the object at startup — GCS object reads are strongly consistent after a write, so every container that starts after task 0 writes the key will read the same value. If you're concerned about the brief window between job start and first write, gate downstream task logic on the key's presence: poll the GCS object for up to 5 seconds with 500ms backoff before proceeding. Do not issue a separate vault key in each task container — that creates N independent caps and defeats the goal of bounding total job spend.

How should I handle cap exhaustion so Cloud Run Jobs doesn't retry the task?

Exit with code 0 when the proxy returns a 429 with code: cap_exhausted. Cloud Run Jobs treats non-zero exit codes as failures that trigger retry (up to max-retries); a zero exit code means the task completed successfully. Cap exhaustion is intentional — you don't want Cloud Run Jobs retrying the task because the budget was intentionally exhausted. Log the exhaustion event before exiting so your Cloud Logging queries can identify which job execution hit the cap and at what task index.

What vault key TTL should I use for Cloud Run Jobs?

Set expires_in to cover the job's expected wall-clock runtime with a 50% buffer. A 500-task job with parallelism=50 that processes each customer in 200ms will complete in roughly 2 seconds of real time (500 / 50 × 0.2s) — use expires_in: "5m" to be safe. For longer-running jobs (large batch sizes, slow vendor response times, or sequential processing), calculate the expected runtime from your historical job durations and add a buffer. If a Cloud Run Job exceeds its task timeout (--task-timeout flag), the containers are killed — set your vault key TTL to match the task timeout, not the job's total expected runtime.

Can I use this with Cloud Scheduler to run the billing job on a schedule?

Yes. Cloud Scheduler triggers a Cloud Run job execution via an HTTP target that calls gcloud run jobs run or the Cloud Run Admin API. Each triggered execution gets a unique CLOUD_RUN_EXECUTION environment variable — use this as the vault key label and GCS object name to ensure each scheduled execution gets its own scoped key. This also means per-execution spend is queryable in Keybrake's audit log: filter by label = {execution_id} to reconstruct what each scheduled batch charged.

Further reading