LLM infrastructure · Proxy-shape comparison

LiteLLM Proxy alternatives: six gateways for the proxy-server shape

If you typed "LiteLLM proxy alternatives" into Google, you almost certainly meant the LiteLLM Proxy server specifically — the standalone gateway you run as a container, not the Python SDK you import in your application code. The two have different alternatives and different decision criteria. This page reviews six gateways that match the proxy-server shape (three LLM-specific, three API-gateways-with-LLM-plugins), scored on the five concerns that actually matter when you're picking one.

TL;DR

Six gateways match the LiteLLM Proxy shape: Portkey Gateway (the closest direct OSS competitor — MIT-licensed, OpenAI-compatible, expressive routing YAML), Bifrost (Go-based, the right pick if p99 latency on a high-QPS workload matters), LangGate (Kubernetes-operator-native, declarative CRDs for models / providers / virtual keys), Envoy AI Gateway (Envoy filter — the right pick if you already run Envoy), Kong AI Gateway (Kong plugin — same logic for Kong shops), and Apache APISIX ai-proxy (the open-source generic equivalent for shops that already standardised on APISIX). The first three optimise for "LLM gateway as a discrete product"; the last three optimise for "LLM gateway as one route in your existing API gateway." Pick on which side of that line your platform team sits, then on the five proxy-shape concerns below. None of these capture spend on Stripe, Twilio, or Resend traffic — if that's actually what you're trying to solve, scroll to the pivot near the bottom.

First, a disambiguation: LiteLLM Proxy vs LiteLLM SDK

Searches for "LiteLLM proxy alternatives" and "LiteLLM alternatives" lead to different shortlists, and conflating them wastes evaluation cycles. LiteLLM ships two separate products under one name. The LiteLLM SDK is the Python library you import (from litellm import completion) — it normalises the OpenAI / Anthropic / Google / Cohere / Bedrock API shapes inside your application process. The LiteLLM Proxy is a standalone HTTP server (the litellm CLI launches it; it ships as a container) that sits on the network path between your application and provider endpoints, accepts OpenAI-compatible requests, applies routing / fallbacks / virtual-key budgets / spend caps, and forwards to the actual provider. The two share routing logic but they're operationally different things: the SDK runs in your application's address space; the Proxy runs on its own port and gets traffic from many applications.

If you're searching for proxy alternatives specifically, you've already decided one of three things: (a) you don't want application code linked against an LLM-routing SDK at all, you want it isolated behind an HTTP boundary; (b) you have non-Python applications calling your LLM stack and the SDK is single-runtime; (c) you've tried LiteLLM Proxy and want something with different operational characteristics — usually faster, smaller, or k8s-native. SDK-shaped alternatives (e.g. the OpenAI SDK directly with provider switching, LangChain's model abstractions, Vercel's AI SDK) don't address any of those — they sit inside the application process. So the alternatives below are all server-shaped: you deploy them, point traffic at them, and treat them as infrastructure.

Calibration: this page is opinionated about the proxy-shape concerns but not opinionated about which gateway is "best." All six are legitimately good for the slot they're designed for. The picks below are about fit; LiteLLM Proxy is also a reasonable answer to "what should I run" — alternatives matter only when one of the proxy-shape concerns below is the one that drove you to look for them.

The five proxy-shape concerns

Before the vendor reviews, here's the rubric. These are the dimensions that differentiate proxy-server gateways from each other; ranked roughly by how often they decide a switch.

Deployment topology. Sidecar container, sidecar pod with its own Service, daemonset on every node, regional gateway behind a load balancer, or filter inside an existing API gateway. Each topology has different scaling, observability, and blast-radius properties. LiteLLM Proxy is most often deployed as a regional gateway with multiple replicas behind a service; some teams sidecar it. The right topology constrains which alternative you can pick.
Hot-reload of virtual keys and policies. Can you add a virtual key, change its budget, or revoke it without restarting the proxy or breaking in-flight requests? LiteLLM Proxy supports this via its DB-backed config (Redis / Postgres). Some alternatives require a config-file rewrite + reload; others have a control-plane API you call from your provisioning layer.
Multi-tenant isolation. If you have a SaaS product where every tenant gets their own LLM budget, can the gateway enforce per-tenant caps without the gateway becoming the failure mode for your control plane? This is the dimension where Kubernetes-native alternatives often win and where the more monolithic gateways struggle.
Route-level fallback configuration. "If the primary gpt-4o endpoint returns 5xx, try Anthropic Opus; if both fail, fall back to Haiku and tag the request with a degraded flag." How expressive is the routing grammar, and is fallback per-route or only global? LiteLLM Proxy and Portkey Gateway are at the expressive end; the API-gateway plugins are usually at the simpler end.
Streaming and WebSocket behaviour. SSE and chunked transfer for token-by-token streaming need careful proxy handling — buffering breaks the user experience. WebSocket support matters for the realtime APIs (OpenAI Realtime, ElevenLabs, etc). Some gateways handle streaming cleanly; others buffer and ruin the latency.

Option 1 — Portkey Gateway

Repo: Portkey-AI/gateway. Licence: MIT. Implementation: Node / TypeScript on Hono.

Portkey Gateway is the closest direct alternative to LiteLLM Proxy in spirit and feature surface. Same OpenAI-compatible front door, same idea of a routing config, broader provider catalogue (~250 providers vs LiteLLM's ~100). The differentiator is the routing grammar: Portkey's config-as-YAML lets you express weighted load-balancing, A/B tests across providers, latency-aware fallbacks, and conditional retries with the same primitives — LiteLLM gets there too but the syntax is busier. Portkey's open-source gateway is the proxy itself; the SaaS control plane (dashboards, prompt library, team budget UI) is paid, and that's where the open-source vs hosted line is more apparent than for LiteLLM (whose UI is more open).

Where it wins on the five concerns. Routing grammar (concern 4) is the strongest among LLM-specific gateways — the YAML structure is compact, fallback is declarative per-route, and the conditions can include response codes, latency thresholds, and provider-specific error categories. Streaming (concern 5) is handled cleanly with SSE passthrough.

Where it loses. Multi-tenant isolation (concern 3) lives in the hosted control plane; the open-source gateway has virtual keys but the team-level budget aggregation and per-tenant admin UI are paid. Hot-reload (concern 2) works via API but the docs are gated behind the hosted product. If you stay fully open-source, you'll be writing your own admin tooling.

Option 2 — Bifrost

Repo: maximhq/bifrost. Licence: Apache 2.0. Implementation: Go.

Bifrost's pitch is performance: 10× faster than LiteLLM Proxy on steady-state throughput on the published benchmarks. The Go implementation avoids Python's GIL and the async-to-sync boundaries that cost LiteLLM Proxy tail latency at high QPS. In practice the difference is invisible at low throughput (a research workload at 5 req/s) and material at high throughput (a customer-facing RAG service at 500+ req/s where p99 matters). It's also a smaller binary and a smaller memory footprint, which makes it a better fit for sidecar deployment.

Where it wins on the five concerns. Deployment topology (concern 1) — the small binary and low memory footprint mean it can co-deploy as a sidecar without doubling the pod's resource request. Streaming (concern 5) is a first-class concern in the Go implementation; latency under load is consistent.

Where it loses. Provider breadth is similar to LangGate (around 30 providers vs LiteLLM Proxy's 100). The management UI is minimal — there's a JSON config file and a few admin endpoints; if you want a dashboard for spend and request inspection, you'll either build one or pair Bifrost with Helicone for observability. Routing grammar (concern 4) is functional but less expressive than Portkey's YAML.

Option 3 — LangGate

Repo: langgate/langgate. Licence: Apache 2.0. Implementation: Python; ships as a Kubernetes operator.

LangGate is the Kubernetes-native answer. Models, providers, virtual keys, and budgets are CRDs. You declare them in YAML manifests, apply them with kubectl, and the operator reconciles. This is a much cleaner ops story than LiteLLM Proxy's config-file + DB-backed approach if you already run Kubernetes — your LLM gateway config lives next to the rest of your platform config in Git, gets the same review process, and inherits your existing cluster's secret management and RBAC.

Where it wins on the five concerns. Multi-tenant isolation (concern 3) — the operator can scope CRDs to namespaces, so each tenant's LLM resources live in their own namespace with the cluster's standard isolation guarantees. Hot-reload (concern 2) is essentially free; CRD edits trigger reconciliation, in-flight requests aren't disrupted. Deployment topology (concern 1) — runs as a proper k8s operator with horizontal scale-out, no Redis dependency for multi-instance coordination the way LiteLLM Proxy needs.

Where it loses. Provider breadth (around 30, like Bifrost). Outside of Kubernetes the model breaks down — you don't run LangGate on a single VM the way you run LiteLLM Proxy. If your platform isn't k8s, this isn't your gateway.

Option 4 — Envoy AI Gateway

Repo: envoyproxy/ai-gateway. Licence: Apache 2.0. Implementation: Envoy filter (Wasm + native).

Envoy AI Gateway is a different shape: it's not a discrete LLM gateway product, it's a set of filters for the Envoy proxy you already run as your service mesh's data plane (or as your edge load balancer, or as your Istio sidecar). You get LLM-aware features — provider abstraction, virtual API keys with budgets, per-request token accounting, retry on provider-specific error codes — applied as Envoy listener filters on the routes carrying LLM traffic. Configuration is in your existing Envoy or Istio control-plane config (xDS, Gateway API, or k8s-native EnvoyFilter resources, depending on your setup).

Where it wins on the five concerns. Deployment topology (concern 1) — if you already run Envoy, the LLM gateway is now zero net infrastructure. No new container, no new region of failure modes; it's a filter on an existing proxy. Multi-tenant isolation (concern 3) inherits Envoy's existing per-route, per-vhost, per-cluster isolation primitives — the strongest in this list because Envoy was built for multi-tenant routing from day one. Streaming (concern 5) is handled at the Envoy layer where SSE passthrough is well-tested for general HTTP.

Where it loses. Routing grammar (concern 4) is Envoy's, not an LLM-specific YAML — that means the grammar is more powerful for general HTTP routing but less ergonomic for "fall back from gpt-4o to Opus to Haiku based on response code." You'll write filter chains rather than a routing block. Hot-reload (concern 2) is fine for Envoy operators; if your team isn't fluent in xDS the learning curve is non-trivial. Provider breadth depends on which filters are upstream-merged at the time you deploy — the Anthropic and Bedrock filters are mature; the long tail is thinner than the LLM-specific gateways.

Option 5 — Kong AI Gateway

Repo: Kong's kong-ai-plugins set. Licence: Apache 2.0 (OSS plugins) plus a paid Enterprise tier with additional plugins. Implementation: Kong plugin (Lua) on top of OpenResty / nginx.

Kong AI Gateway is the equivalent move for shops that have standardised on Kong as their API gateway. Same compositional logic as the Envoy approach: instead of running a separate LLM gateway, you add LLM-aware Kong plugins (ai-proxy, ai-prompt-template, ai-prompt-decorator, ai-rate-limiting-advanced) to the routes that carry LLM traffic. The OSS plugins cover the basics (provider proxy, prompt templating, rate limiting); the Enterprise plugins add semantic caching, prompt firewall, and per-consumer token-bucket budgets. Kong's strength is the consumer / route / service abstraction — virtual keys map cleanly to Kong consumers, which already have a mature admin API.

Where it wins on the five concerns. Deployment topology (concern 1) — same gain as Envoy if you already run Kong; the LLM gateway is one set of plugins on existing routes. Hot-reload (concern 2) works via Kong's admin API — virtual key creation, deletion, and budget changes propagate without restart. Multi-tenant isolation (concern 3) leans on Kong's consumer + workspace model.

Where it loses. The most differentiated features (semantic caching, advanced budget enforcement) are in the Enterprise tier — you'll be evaluating commercial pricing if you want feature parity with LiteLLM Proxy's open-source feature set. Routing grammar (concern 4) for cross-provider fallback is less expressive than Portkey or LiteLLM Proxy itself; you can express it via plugin chaining but it's verbose.

Option 6 — Apache APISIX with the ai-proxy plugin

Repo: apache/apisix + ai-proxy, ai-prompt-template, ai-rate-limiting plugins. Licence: Apache 2.0. Implementation: APISIX plugin (Lua) on top of OpenResty / nginx.

Apache APISIX is the open-source generic-purpose API gateway equivalent to Kong, with a similar plugin model and a similar set of LLM-aware plugins shipped as ai-*. The trade-off vs Kong is the standard one: APISIX is fully open-source, etcd-backed, and broadly Kong-feature-equivalent at the OSS layer; Kong is more polished and has a larger commercial-feature set behind the Enterprise tier. For shops on APISIX or shops choosing between generic gateways from scratch with a strong open-source preference, the AI-plugin set makes APISIX a genuine LiteLLM Proxy alternative without adding a new gateway product to the stack.

Where it wins on the five concerns. Deployment topology (concern 1) — same compositional gain as Envoy and Kong. Hot-reload (concern 2) is APISIX's strength; etcd-backed config means consumer / route / plugin changes are sub-second across all gateway nodes. Multi-tenant isolation (concern 3) via APISIX's consumer-group + service abstraction.

Where it loses. Provider catalogue and routing grammar (concern 4) are thinner than the LLM-specific gateways — the ai-proxy plugin handles OpenAI-compatible providers and a handful of others well; the long tail you'd get from LiteLLM Proxy isn't there yet. Streaming (concern 5) works but the ecosystem of community-tested edge cases is smaller than Envoy's.

Capability matrix — the five proxy-shape concerns

Gateway	Topology fit	Hot-reload keys	Multi-tenant isolation	Route-level fallback	Streaming / SSE
LiteLLM Proxy (baseline)	Standalone or sidecar	DB-backed (Redis / Postgres)	Yes (team / virtual key)	Expressive	Clean
Portkey Gateway	Standalone	API-driven (hosted UI gated)	Hosted control plane only	Most expressive YAML	Clean SSE passthrough
Bifrost	Standalone or sidecar (small)	Config + admin endpoints	Basic	Functional	First-class
LangGate	K8s operator only	CRD edits (free)	Namespace-scoped (strongest)	CRDs, declarative	Clean
Envoy AI Gateway	Filter on existing Envoy	xDS / Gateway API	Inherits Envoy's (strongest)	Filter chains (verbose)	Envoy-grade
Kong AI Gateway	Plugin on existing Kong	Admin API	Consumer + workspace	Plugin chaining (verbose)	nginx-grade
Apache APISIX ai-proxy	Plugin on existing APISIX	etcd-backed (sub-second)	Consumer-group + service	Plugin chaining (verbose)	nginx-grade

Read the table by row, not by column — there is no "best" gateway. The right pick is the row that matches what's currently constraining you. If multi-tenant isolation is the top concern, Envoy AI Gateway and LangGate dominate. If you're trying to keep the LLM gateway as a single discrete product with the most expressive routing config, Portkey Gateway and LiteLLM Proxy are the head-to-head pair. If you already run a generic API gateway, the choice is a strict win on operational complexity — adopt the AI plugins for that gateway.

The decision rule, in three branches

Most teams find their answer in one of three branches:

Branch A — discrete LLM gateway is what you want. You don't run a generic API gateway for the rest of your services, or you want the LLM gateway as a separately-owned product so the LLM team controls its own roadmap. Pick from: LiteLLM Proxy, Portkey Gateway, Bifrost, LangGate. Decide between them on three dimensions in this order: (1) where you deploy — k8s native means LangGate; sidecar-friendly with low footprint means Bifrost; generic standalone means LiteLLM Proxy or Portkey Gateway; (2) routing expressiveness — if you write complex fallback strategies regularly, Portkey wins; (3) provider breadth — if you depend on the long tail of providers, LiteLLM Proxy still has the widest catalogue.

Branch B — you already run a generic API gateway. Envoy, Kong, or APISIX is in production for your non-LLM traffic. Add the LLM plugins to that gateway rather than introducing a new product. The cost of a second gateway product (deployment, alerting, on-call, version upgrades) almost always outweighs the cost of slightly less expressive LLM routing. Re-revisit only if the routing-expressiveness gap actually bites a feature you need.

Branch C — you mainly want observability. If you typed "LiteLLM proxy alternatives" because the LiteLLM Proxy dashboard isn't what you wanted, the right move is probably not a different proxy — it's adding Helicone (or one of the observability-first tools from the broader open-source review) alongside whichever proxy you keep. Observability composes with proxying; you don't need to swap one to add the other.

What none of these gateways do

Every gateway above operates on LLM traffic — requests to api.openai.com, api.anthropic.com, the Bedrock and Vertex endpoints, the open-weights endpoints. None of them sit on the network path to api.stripe.com, api.twilio.com, api.resend.com, or any of the other roughly thirty SaaS APIs that an autonomous agent reaches for in a typical week. That's not a feature gap; it's a category gap. The concerns the LLM gateways are built around (token accounting, model abstraction, prompt templating, semantic caching) don't translate to SaaS-tool traffic, and the concerns that matter for SaaS-tool traffic (per-vendor USD caps with vendor-specific cost parsers, customer-scope allowlists, parameter-value allowlists, sub-second mid-run revoke, immutable audit) aren't on any of these gateways' roadmaps.

This matters because the dollar-blast incidents in 2026 incident postmortems are heavily tilted toward SaaS-tool spend, not LLM spend. A stuck gpt-4o loop at 100 req/s burns roughly $360/hour at typical token sizes. A stuck Stripe refund loop at the same rate burns $5.4M/hour at $15 per charge. The LLM-token blowout is recoverable; the SaaS-tool blowout often isn't (chargebacks, customer trust, regulatory). If the runaway you're trying to prevent is on the SaaS side, none of the gateways above will catch it — the calls happen outside their network path, and even if you routed them through, the gateways have no per-vendor cost parsers to enforce caps with.

The architectural answer is dual proxy: an LLM gateway from the list above on LLM traffic, and a separate SaaS-tool governance proxy on money-moving SaaS calls. Same network-boundary pattern, same virtual-key idea, different per-vendor logic. We cover the dual-proxy framing more fully on the LiteLLM alternative for Stripe page and the 2026 agent governance stack long-form post, including the audit-trail schema that joins both proxy logs on a single agent_run_id.

What Keybrake does (the SaaS-tool half of the dual proxy)

Keybrake is a reverse proxy on the SaaS-tool axis. You issue a vault_key per agent or per run; you attach a policy with daily USD cap, endpoint allowlist, customer or merchant allowlist, max-amount-per-call, and TTL. The agent calls proxy.keybrake.com/<vendor>; we look up the underlying real key, enforce the policy, forward the request, parse the cost from the vendor's response (Stripe's amount on the charge object, Twilio's price on the message resource, Resend's tier-table calculation), and write a per-call row to an immutable audit log. Mid-run revoke takes under a second — you don't redeploy anything; you flip a flag in the dashboard and the next call gets a 403. v1 vendors are Stripe, Twilio, Resend; we expand from there as the cost-parser library grows.

Most teams pick an LLM gateway from this page and a SaaS-tool governance proxy from us. The two compose cleanly via a shared x-agent-run-id header that both proxies pass through and log; one weekly cost report covers both axes.

Get early access

FAQ

Is "LiteLLM proxy" the same product as "LiteLLM"?

It's one of two products under the LiteLLM name. The LiteLLM SDK is the Python library you import (from litellm import completion); the LiteLLM Proxy is the standalone HTTP server you run as a container. They share routing logic but they have different alternatives — SDK alternatives sit inside your application process, proxy alternatives sit on the network. This page reviews proxy alternatives only; for SDK-shaped alternatives, the right comparison is the OpenAI SDK + LangChain's model abstractions + Vercel's AI SDK.

Does Envoy AI Gateway require Istio?

No. Envoy AI Gateway is a set of filters for Envoy itself; you can run Envoy standalone as a regional gateway without Istio, or as a Kubernetes Gateway API implementation, or as the data plane of an Istio mesh. The filters work the same way in all three deployments. If you don't already run Envoy in any of those modes, adopting Envoy purely for the LLM gateway is usually the wrong move — pick an LLM-specific gateway from branch A instead.

Why isn't Helicone in the matrix?

Helicone can be used as a proxy but its centre of gravity is observability — it's the right pick if your primary need is request/response logging, cost attribution, and trace UI. We treat it as an adjacent category here (you can run it alongside any of these six). The five-option open-source review at /seo/litellm-alternatives-open-source covers Helicone in detail.

Why isn't OpenRouter in this list?

Same reason — OpenRouter is hosted-only, not a self-deployable proxy. It's a great answer to "what's the laziest way to get one billing surface across many providers" but it's a different operational shape than the gateways here. If you need self-host (regulated environments, air-gap, on-prem), OpenRouter is out; if you don't, OpenRouter often dominates the discrete-LLM-gateway branch on ops cost.

Can I run two gateways in series — e.g. Envoy AI Gateway in front of LiteLLM Proxy?

You can, and a few teams do. The pattern is Envoy AI Gateway as the edge proxy doing authentication / virtual-key validation / rate limiting, and LiteLLM Proxy further inside doing the provider routing and fallback logic. The added latency is single-digit-millisecond; the added complexity is a second product to operate. Most teams that consider this end up picking one tier or the other. The pattern is sensible specifically if you've adopted Envoy as your edge for non-LLM reasons and only want LLM-specific routing logic isolated.

What about running LiteLLM Proxy on serverless (Cloud Run, Lambda, Workers)?

Cloud Run works well — LiteLLM Proxy is just a Python container. Lambda is awkward (Python cold-start + LLM-streaming response shape don't fit Lambda's response model cleanly). Workers / edge environments aren't supported by LiteLLM Proxy directly; if you want edge, Bifrost (Go) or Portkey Gateway (Hono / TypeScript) compile to environments where LiteLLM Proxy can't run, which is sometimes the actual reason people search for "LiteLLM proxy alternatives" — not an issue with the proxy itself, an issue with where they need to deploy it.

Does Keybrake replace any of these LLM gateways?

No. Keybrake is on a different axis (SaaS-tool APIs, not LLM endpoints). We're typically deployed in series with one of the gateways above, not instead of it. If your only spend concern is OpenAI / Anthropic / Bedrock, pick from this page and skip us. If your spend concern includes Stripe / Twilio / Resend, you need both.