Haystack · AI agents · API key security

Haystack AI agent API key: scoping vendor tool calls in pipelines

Haystack's Pipeline abstraction and Agent component make it straightforward to build LLM-driven workflows that call external tools — Stripe for payments, Twilio for notifications, Resend for email. The pipeline's component graph handles routing; the Agent component loops tool calls until the LLM decides it's done. That "loop until done" semantics is exactly the spending risk: an agent instructed to "process all overdue accounts" will call the Stripe tool for each account it identifies, with no native cap on how many times that tool fires per run. This page covers the vault-key pattern that adds per-run spend guardrails to Haystack agent pipelines.

TL;DR

Haystack's Agent component loops tool calls until a stop condition — and vendor API tools called inside that loop have no native per-run spend cap. Issue a vault key before the pipeline run, inject it into the tool that wraps the vendor API call, and enforce a per-run dollar cap at the proxy layer. If an agent loop goes wrong, revoke the vault key from the Keybrake dashboard without touching the real API key shared across all Haystack pipelines.

How Haystack AI agents call vendor APIs

In a Haystack 2.x pipeline, vendor API calls typically live inside custom components that the Agent's tool-calling loop invokes. A billing agent might be built like this:

from haystack import Pipeline, component
from haystack.components.agents import Agent
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.tools import ComponentTool
import stripe
import os

@component
class StripeCharger:
    @component.output_types(result=dict)
    def run(self, customer_id: str, amount_cents: int) -> dict:
        stripe.api_key = os.environ["STRIPE_SECRET_KEY"]  # full-access key
        intent = stripe.PaymentIntent.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
        )
        return {"result": intent}

stripe_tool = ComponentTool(
    component=StripeCharger(),
    name="charge_customer",
    description="Charge a customer via Stripe for a given amount in cents",
)

agent = Agent(
    chat_generator=OpenAIChatGenerator(model="gpt-4o"),
    tools=[stripe_tool],
    max_agent_steps=20,
)

pipeline = Pipeline()
pipeline.add_component("agent", agent)

result = pipeline.run({
    "agent": {
        "messages": [{"role": "user", "content": "Process all overdue accounts in the database"}]
    }
})

This is clean Haystack 2.x code. The problem: STRIPE_SECRET_KEY is a full-access key, max_agent_steps=20 allows up to 20 tool calls per run (each potentially charging a different customer), and there's no dollar cap on cumulative spend across those 20 calls.

Three gaps Haystack's native tooling doesn't fill for vendor spend control

Gap	What happens in practice	Haystack's answer
No per-run spend cap	An agent instructed to "process overdue accounts" finds 150 accounts. With `max_agent_steps=20` and multiple accounts processed per step, the agent calls the Stripe tool dozens of times before hitting the step limit. Each call issues a real Stripe charge. There's no native way to say "stop charging when cumulative spend exceeds $500 for this run."	None. `max_agent_steps` caps LLM loop iterations, not vendor API spend. A single step can trigger multiple tool calls.
No mid-run vendor revoke	A pipeline run is hanging or producing unexpected behavior. You want to stop it from making more Stripe calls immediately. Rotating the Stripe secret key halts all Stripe calls globally — including from other pipeline runs, cron jobs, and background workers that are behaving correctly.	No built-in per-run credential scoping. Stopping a pipeline run doesn't revoke vendor API credentials used within it.
No per-call audit with pipeline context	Haystack's pipeline tracing (via Datadog or OpenTelemetry) shows component timings and LLM traces, but doesn't parse dollar amounts from Stripe responses or cross-reference Stripe `Request-Id` values with Haystack pipeline run IDs in a queryable format.	Component output inspection and pipeline tracing. No structured cost extraction for vendor API calls.

The agent loop risk: tool calls that compound per run

Haystack's Agent component is designed to loop: the LLM decides which tool to call, Haystack executes it, the result is added to the message history, the LLM evaluates the updated context and decides whether to call another tool or respond with a final answer. This loop continues until either the LLM stops calling tools or max_agent_steps is reached.

For a billing agent, this means:

The LLM receives "process all overdue accounts" and calls charge_customer for account 1.
Haystack executes the component. Stripe charges account 1.
The LLM sees the success and calls charge_customer for account 2.
This continues until the LLM produces a final answer (after processing all accounts) or hits max_agent_steps.

With 20 steps, a billing agent can issue up to 20 Stripe charges per run. A data error that inflates the account list has a direct multiplier effect on how much real money moves before the run completes.

Injecting vault keys into Haystack components

Issue the vault key before the pipeline run, then pass it to the component via constructor injection:

import httpx
from haystack import Pipeline, component
from haystack.components.agents import Agent
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.tools import ComponentTool
import stripe
import os
import uuid

def issue_vault_key(run_id: str, budget_usd: float) -> str:
    r = httpx.post(
        "https://proxy.keybrake.com/vault/keys",
        headers={"Authorization": f"Bearer {os.environ['KEYBRAKE_API_KEY']}"},
        json={
            "vendor": "stripe",
            "daily_usd_cap": budget_usd,
            "allowed_endpoints": ["POST /v1/payment_intents"],
            "expires_in": "1h",
            "agent_run_label": f"haystack-billing/{run_id}",
        },
    )
    return r.json()["vault_key"]

@component
class StripeCharger:
    def __init__(self, vault_key: str):
        self.vault_key = vault_key      # scoped key passed at construction time

    @component.output_types(result=dict)
    def run(self, customer_id: str, amount_cents: int) -> dict:
        stripe.api_key = self.vault_key
        stripe.api_base = "https://proxy.keybrake.com/stripe"
        intent = stripe.PaymentIntent.create(
            amount=amount_cents,
            currency="usd",
            customer=customer_id,
            idempotency_key=f"{customer_id}-{amount_cents}",
        )
        return {"result": intent}

def run_billing_pipeline(budget_usd: float = 500.0):
    run_id = str(uuid.uuid4())
    vault_key = issue_vault_key(run_id, budget_usd)

    stripe_tool = ComponentTool(
        component=StripeCharger(vault_key=vault_key),
        name="charge_customer",
        description="Charge a customer via Stripe for a given amount in cents",
    )

    agent = Agent(
        chat_generator=OpenAIChatGenerator(model="gpt-4o"),
        tools=[stripe_tool],
        max_agent_steps=20,
    )

    pipeline = Pipeline()
    pipeline.add_component("agent", agent)

    return pipeline.run({
        "agent": {
            "messages": [{"role": "user", "content": "Process all overdue accounts"}]
        }
    })

The vault key is issued once per pipeline run and injected into the component constructor. Every tool call the Agent makes uses the same vault key and its shared per-run dollar cap. The real Stripe secret never appears in the pipeline code. The Keybrake audit log records each tool call with agent_run_label: "haystack-billing/{run_id}", giving you a queryable per-run view of what the agent spent and on which customers.

How Keybrake fits

Keybrake is the proxy layer between your Haystack components and Stripe, Twilio, or Resend. You swap the real API key for the vault key (injected at pipeline construction time) and point the vendor SDK at https://proxy.keybrake.com/stripe. The real secret stays in Keybrake, never in your pipeline code or component state. Each pipeline run gets its own vault key with its own dollar cap, endpoint allowlist, and expiry. Agent loops that would exceed the cap return 429s — which surface as component exceptions in Haystack, stopping the agent's tool-call loop before it issues more charges.

Get early access

Related questions

How does the vault key interact with Haystack's pipeline serialization (.to_dict() / .from_dict())?

Haystack pipelines can be serialized to YAML or dict for storage and reloading. The vault key, injected into the component constructor, will be included in the serialized pipeline if the component's __init__ is serializable. For most use cases, you should not serialize the vault key — it's short-lived and run-specific. The recommended pattern is to construct the pipeline fresh per run (as shown in the code example), not to serialize a pipeline that contains a vault key and reload it later. If you need to serialize the pipeline structure without the vault key, use a placeholder in serialization and inject the real vault key at deserialization time.

Can I use one vault key across multiple tools in the same pipeline run?

Yes — and this is usually the right approach. If your pipeline has both a StripeCharger and a TwilioNotifier, issue a vault key for each vendor (both with per-run caps) and inject the appropriate key into each component. The caps are per-vendor: the $500 Stripe cap and the $50 Twilio cap are enforced independently at the proxy layer. A single Keybrake account holds both vendor keys; the vault keys issued per run scope access to one vendor each. If you want a combined cross-vendor cap, you'd need to implement that at the application layer.

What happens when the agent loop receives a 429 from the vault key proxy?

The Haystack component raises an exception (likely a stripe.error.RateLimitError or similar, since the proxy returns the same status code the vendor would). The Agent component receives the exception as a tool error and includes it in the message history. The LLM then decides how to proceed — typically it will generate a final response explaining that the budget was exceeded, rather than calling the tool again. If you want deterministic behavior, you can handle the exception in your component and return a structured error result like {"error": "budget_exceeded", "cap_usd": 500} — the LLM can interpret this clearly and stop the loop.