Google Cloud Run Jobs · AI agents · API key security

Google Cloud Run Jobs AI agent API key: scoping vendor calls in batch task containers

Google Cloud Run Jobs is a fully managed compute platform that runs containerized tasks to completion — no servers to manage, no queue infrastructure, and built-in retry logic. AI agent teams adopt Cloud Run Jobs for batch billing workflows, bulk notification sends, and recurring agent pipelines triggered by Cloud Scheduler. The parallelism setting fans out multiple task container instances simultaneously — each instance calls Stripe, Twilio, or Resend independently using the same API key mounted from Secret Manager. There is no per-job dollar cap in Cloud Run Jobs; task retries re-run failed containers that may have already billed Stripe; and no structured per-job vendor cost log exists natively. This page covers the vault-key pattern that bounds vendor spend per Cloud Run job execution.

TL;DR

Add a setup container step — or call the Keybrake API from your job's entrypoint before the main task loop — to issue a scoped vault key for each job run. Pass the vault key to task containers via an environment variable set from the API response, or write it to a shared Cloud Storage object that all task instances read at startup. Each container uses vault_key as the Authorization: Bearer credential when calling Stripe through the Keybrake proxy at https://proxy.keybrake.com/stripe/.... All parallel task containers share one accumulating cap. Revoking a runaway job is a single DELETE /vault/keys/{key_id} call — no Secret Manager rotation, no container image redeployment.

How Cloud Run Jobs AI agent workflows call vendor APIs

A typical agent batch billing job fans out task containers across a customer slice:

# cloudbuild.yaml trigger (or gcloud CLI equivalent)
# gcloud run jobs create billing-agent \
#   --image gcr.io/myproject/billing-agent:latest \
#   --tasks 500 \
#   --parallelism 50 \
#   --max-retries 3 \
#   --set-secrets STRIPE_SECRET_KEY=stripe-key:latest \
#   --region us-central1

# billing-agent container entrypoint (Python)
import os, stripe, math

TASK_INDEX = int(os.environ["CLOUD_RUN_TASK_INDEX"])   # 0-based
TASK_COUNT = int(os.environ["CLOUD_RUN_TASK_COUNT"])   # total tasks

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

# Each container handles its slice of customers
customers = fetch_customers_slice(TASK_INDEX, TASK_COUNT)
for customer in customers:
    stripe.PaymentIntent.create(
        amount=customer["amount_cents"],
        currency="usd",
        customer=customer["id"],
    )

This pattern has two compounding risks. First, parallelism=50 means Cloud Run Jobs starts 50 task containers simultaneously, each independently calling Stripe with the same STRIPE_SECRET_KEY from Secret Manager. A job with 500 tasks and parallelism 50 has 50 simultaneous containers charging customers at any moment — there is no dollar-spend stop condition at the job level. Cloud Billing budget alerts fire after spend has been incurred (typically hours later, after billing ingestion), too late to stop a batch job that completes in minutes. Second, max-retries 3 re-executes failed task containers. If Stripe returned a 500 after the charge partially succeeded, the retry creates a duplicate charge unless every payment call includes a stable idempotency key derived from the task index — which the default Cloud Run Jobs environment doesn't provide automatically.

Three gaps Cloud Run Jobs' native tooling doesn't fill for vendor spend control

Gap	What happens in practice	Cloud Run Jobs' answer
No per-job spend cap	Cloud Run Jobs has no mechanism to stop a job when cumulative vendor API spend reaches a dollar threshold. `parallelism` limits concurrent containers by count, not by cost. A billing job processing 10,000 customers with a bug that charges twice per customer runs to completion — all 10,000 containers finish before any billing alert fires. Cloud Billing budget alerts arrive on an 8-to-48 hour lag; they notify, they don't stop tasks already running.	Cloud Billing budget alerts fire after spend is incurred. No pre-call per-job dollar cap exists in Cloud Run Jobs itself.
No mid-job vendor revoke without Secret Manager rotation	The Stripe key is mounted from Secret Manager at container startup. To revoke it mid-job, you must rotate the Secret Manager secret version — but containers that already loaded the secret into process memory continue using the old key until they exit. Cancelling a Cloud Run Job execution (`gcloud run jobs cancel`) sends a SIGTERM to each container, which has 10 seconds to finish current work. A container mid-way through a Stripe API call will complete that call before exiting. No mechanism prevents the last in-flight call from being charged.	`gcloud run jobs cancel` sends SIGTERM with a 10-second graceful shutdown window. In-flight vendor API calls complete before shutdown. No per-call revoke that fires before the API call is forwarded.
No per-call audit with job context	Cloud Logging captures container stdout/stderr and structured log entries emitted by your code, but it doesn't parse dollar amounts from Stripe response bodies, correlate Stripe `PaymentIntent.id` with the Cloud Run job execution ID and task index, or provide a queryable per-job vendor spend summary. Reconstructing what a runaway job charged requires cross-referencing Cloud Logging, Cloud Run job execution metadata, and the Stripe Dashboard with manual timestamp alignment.	Cloud Logging captures execution events. No structured vendor cost tracking or job-execution-to-charge correlation built in.

The parallelism amplification risk

The parallelism setting is the primary spend amplifier. Setting --parallelism 100 on a 1,000-task job means Cloud Run Jobs maintains 100 simultaneous container instances at all times — each independently calling Stripe. If your container processes a single customer and exits in 200ms, Cloud Run Jobs cycles through 1,000 containers in roughly 2 seconds of wall time, but the vendor charges accumulate as fast as Stripe can respond. At 100 concurrent containers each making one Stripe call, you can exhaust a per-day rate limit or accumulate significant spend in seconds.

The task retry amplification compounds this. When a task container exits with a non-zero exit code, Cloud Run Jobs retries it up to max-retries times. A container that called Stripe, got a 500, and exited with code 1 is retried — without any check on whether Stripe actually processed the charge. Unless you embed the task index in the idempotency key (CLOUD_RUN_TASK_INDEX is available as an environment variable), each retry creates a new charge attempt that Stripe treats as a distinct request.

Scoping vault keys per Cloud Run job execution

import os
import httpx
import stripe

KEYBRAKE_BASE = "https://proxy.keybrake.com"
KEYBRAKE_ADMIN_KEY = os.environ["KEYBRAKE_ADMIN_KEY"]
TASK_INDEX = int(os.environ.get("CLOUD_RUN_TASK_INDEX", "0"))
TASK_COUNT = int(os.environ.get("CLOUD_RUN_TASK_COUNT", "1"))
JOB_EXECUTION = os.environ.get("CLOUD_RUN_EXECUTION", "local")

# Task 0 issues the vault key and writes it to GCS.
# All other tasks read the vault key from GCS.
# This ensures exactly one vault key is issued per job execution.
import json
from google.cloud import storage

def get_or_issue_vault_key() -> str:
    bucket = storage.Client().bucket(os.environ["GCS_VAULT_BUCKET"])
    blob = bucket.blob(f"vault-keys/{JOB_EXECUTION}.json")

    if blob.exists():
        data = json.loads(blob.download_as_text())
        return data["vault_key"]

    # Only task 0 should reach here — other tasks race briefly on cold start
    # but blob.exists() is strongly consistent in GCS, so no double-issuance
    resp = httpx.post(
        f"{KEYBRAKE_BASE}/vault/keys",
        headers={"Authorization": f"Bearer {KEYBRAKE_ADMIN_KEY}"},
        json={
            "vendor": "stripe",
            "daily_usd_cap": float(os.environ.get("JOB_BUDGET_USD", "500")),
            "allowed_endpoints": ["POST /v1/payment_intents"],
            "expires_in": "2h",
            "label": JOB_EXECUTION,
        },
    )
    resp.raise_for_status()
    vault_key = resp.json()["vault_key"]
    blob.upload_from_string(json.dumps({"vault_key": vault_key}))
    return vault_key

vault_key = get_or_issue_vault_key()

customers = fetch_customers_slice(TASK_INDEX, TASK_COUNT)
for customer in customers:
    # Idempotency key: stable across task retries for the same customer
    idempotency_key = f"{JOB_EXECUTION}-{customer['id']}"

    resp = httpx.post(
        f"{KEYBRAKE_BASE}/stripe/v1/payment_intents",
        headers={
            "Authorization": f"Bearer {vault_key}",
            "Idempotency-Key": idempotency_key,
        },
        json={
            "amount": customer["amount_cents"],
            "currency": "usd",
            "customer": customer["id"],
        },
    )

    if resp.status_code == 429 and resp.json().get("code") == "cap_exhausted":
        print(f"[{JOB_EXECUTION}] spend cap exhausted at task {TASK_INDEX}, exiting")
        break   # Exit cleanly — do not retry cap exhaustion

    resp.raise_for_status()

This pattern ensures one vault key is issued per Cloud Run job execution regardless of how many parallel task containers start simultaneously. GCS's strongly consistent object reads mean all containers see the same vault key once task 0 writes it. Each customer's idempotency key is job_execution_id + customer_id — stable across Cloud Run retries of the same task container. When the cap is exhausted, the container exits cleanly with code 0 so Cloud Run Jobs does not count it as a failed task requiring retry.

How Keybrake fits

Keybrake is the proxy layer between your Cloud Run task containers and Stripe, Twilio, or Resend. The vault key replaces the STRIPE_SECRET_KEY Secret Manager secret that was previously injected directly into each container — the real Stripe secret stays in Keybrake and never appears in Cloud Logging, container environment dumps, or task execution metadata. All parallel task containers share one accumulating spend cap; the cap is enforced atomically on each proxied call, so 50 simultaneous containers each trying to push past the cap are all blocked at the proxy layer, not after the charges land. Revoking a runaway job is a single DELETE /vault/keys/{key_id} call — effective on the next proxied request, without Secret Manager rotation, without container image changes, and without affecting other job executions holding different vault keys.

Get early access