LLM infrastructure · Honest review

LiteLLM alternatives (open source): five projects reviewed for real

Five open-source projects that get called "LiteLLM alternatives" in the wild — what each one actually does, where they overlap, where they don't, and a different framing if your real problem isn't OpenAI spend at all.

TL;DR

There are five open-source projects people mean when they say "LiteLLM alternative": Portkey Gateway, Helicone, LangGate, OpenRouter's proxy, and Bifrost. Each one is an LLM-provider proxy — they all sit between your code and OpenAI / Anthropic / Google / open-weights endpoints, normalize the API shape, and add observability. They differ on where the caching happens, whether the control plane is self-hosted, and how opinionated the routing logic is. None of them is a drop-in replacement for all of LiteLLM, and none of them will cap spend on Stripe, Twilio, or Resend — because none of them are in that network path. If that last sentence matches your situation, scroll to the bottom.

Why we wrote this page (and what to calibrate for)

We ship an API-key governance proxy, and the query "litellm alternatives open source" is a common doorway from Google into our corner. Most readers who land here are genuinely evaluating LLM proxies — for those readers, this page is an honest five-option review and we don't claim to be one of the options. A smaller second group is here because someone on their team said "we need something like LiteLLM but for Stripe" — that's a different product shape, and we are one of those; we'll separate out the framing clearly at the end.

Calibration: all five of the projects below are legitimate and actively maintained. The review is about fit, not quality. None of them are bad. The one that's right for you depends on whether your constraint is routing, observability, policy, or just "I don't want to pay LiteLLM's hosted plan."

Option 1 — Portkey Gateway (open source)

Repo: Portkey-AI/gateway. Model: open-source LLM gateway with an optional hosted control plane.

Portkey Gateway is a standalone HTTP proxy that accepts OpenAI-compatible requests and forwards them to any of ~250 providers. It's competitive with LiteLLM on provider coverage and has first-class support for fallbacks, retries, and load-balanced routing (the config file lets you express "try gpt-4o at 70% traffic, Anthropic Opus at 30%, fall through to Haiku on 5xx"). The gateway itself is MIT-licensed; the SaaS control plane (dashboards, budget alerts, prompt library) is paid.

Where it shines: you need sophisticated routing logic (A/B tests between providers, latency-aware fallbacks, weighted load-balancing) and you're OK self-hosting the gateway. The YAML routing grammar is the most expressive in this list.

Where it falls short of LiteLLM: cost tracking is usable but not as thoroughly pre-modeled per provider (LiteLLM has more accurate token-cost tables baked in, especially for smaller providers). Virtual keys and team budgets are in the hosted control plane, not the open-source gateway — you're running the dashboards yourself if you stay fully open.

Option 2 — Helicone

Repo: Helicone/helicone. Model: Apache-licensed LLM observability platform, deployable via Docker Compose.

Helicone is the one you pick if what you actually need is request/response logging, cost attribution, and caching — observability-first rather than gateway-first. It works either as a true proxy (route requests through Helicone's endpoint) or as an async logger (fire-and-forget from your app). The self-hosted version is free and runs out of a Supabase + Postgres backend.

Where it shines: debugging prompts, tracing cost per user / session / tag, semantic caching (not just exact-match), user-level rate limiting. The trace UI is the best in class for this list.

Where it falls short of LiteLLM: provider translation is thinner. Helicone expects you're already OpenAI-compatible or Anthropic-compatible client-side; it's not trying to be the universal adapter. If you want "one SDK for everything," this isn't it.

Option 3 — LangGate

Repo: langgate/langgate. Model: Apache-licensed model gateway, Kubernetes-native deployment.

LangGate is the closest-to-LiteLLM-in-spirit open-source alternative — intentionally designed as a drop-in replacement. Same OpenAI-compatible front door, similar virtual-key concept, similar budget/spend-cap primitives. It's younger than LiteLLM (started mid-2024) but has meaningful traction inside Kubernetes-heavy shops because it ships as a proper Kubernetes operator rather than an application binary you manage yourself.

Where it shines: Kubernetes-native teams who want declarative config (CRDs for models, providers, virtual keys). Scales out horizontally without external coordination the way LiteLLM needs Redis for multi-instance.

Where it falls short of LiteLLM: provider breadth. LiteLLM supports ~100 providers out of the box; LangGate supports about 30. For the long tail (Deepinfra, Replicate, Together, Fireworks, Groq), LiteLLM still wins. Also: the community is smaller, so fewer community-contributed provider integrations.

Option 4 — OpenRouter's proxy (client library + hosted endpoint)

Repo: OpenRouter publishes an open-source client but the router itself is a hosted service. Model: managed proxy with a unified billing surface.

This is the lazy-mode answer. OpenRouter is a hosted gateway that already implements what most teams use LiteLLM for: one API key, ~350 models, one invoice. You pay OpenRouter a small markup on inference (~5%) and forget about provider keys entirely. It's not "open source" in the LiteLLM sense — you don't self-host — but the integration surface is so thin (change your baseURL) that it competes for the same mindshare.

Where it shines: you want zero-ops, you're OK with a third party holding your provider credentials, and the ~5% markup is cheaper than the FTE-hour cost of maintaining LiteLLM yourself.

Where it falls short of LiteLLM: it's hosted-only (dealbreaker for regulated environments), you're trusting OpenRouter with your prompts and responses (privacy boundary), and routing logic is opinionated — you get their fallback strategy, not yours.

Option 5 — Bifrost

Repo: maximhq/bifrost. Model: Apache-licensed, performance-focused LLM gateway written in Go.

Bifrost's pitch is that it's 10x faster than LiteLLM on steady-state throughput (LiteLLM's Python/FastAPI baseline is the implicit comparison — Bifrost's Go implementation avoids the GIL and the async-to-sync boundaries that cost LiteLLM tail latency). In practice the speedup matters at the high-QPS end — if you're running a customer-facing RAG service that does 500+ req/s of LLM calls, the p99 difference is measurable. If you're at 5 req/s, you won't notice.

Where it shines: latency-sensitive, high-throughput production workloads. Also: minimal memory footprint, which matters if you're trying to co-deploy the proxy on the same node as the app.

Where it falls short of LiteLLM: the Python-ecosystem integrations (LangChain, LlamaIndex, Haystack) are thinner. Provider coverage is similar to LangGate (30-ish). The management UI is minimal.

Coverage matrix — at a glance

Project	Self-host	Provider breadth	Virtual keys / caps	Best-in-class dim.
LiteLLM	Yes	~100 (widest)	Yes (open-source)	Breadth
Portkey Gateway	Yes (gateway)	~250	In hosted tier only	Routing grammar
Helicone	Yes	Fewer (passthrough)	Yes	Observability / caching
LangGate	Yes (K8s)	~30	Yes (CRDs)	Kubernetes ergonomics
OpenRouter	No (hosted)	~350	Single account	Zero-ops onboarding
Bifrost	Yes (Go)	~30	Basic	Latency / throughput

A harder question: is your real problem LLM spend at all?

Every project above controls spend on OpenAI, Anthropic, Google, Cohere, and the ~30 open-weights endpoints. If your runaway bill last month was from one of those, you're in the right aisle — pick based on which row of the table matches your ops situation.

But we see a regular pattern where teams type "LiteLLM alternative" into Google and the actual incident was different. Something like:

An agent sent 4,300 Stripe charges to test-swept-to-live accounts before anyone noticed (not an LLM-spend issue — Stripe)
A marketing-automation loop fired 18,000 Twilio SMS to a misparsed contact list (not an LLM-spend issue — Twilio)
A drip-email bot spun up mid-deploy and sent 340,000 Resend emails to the entire customer table (not an LLM-spend issue — Resend)

In all three of those incidents, LiteLLM would not have helped. LiteLLM caps LLM tokens, not SaaS-tool API calls. None of the five alternatives above would have helped either — they also only see LLM traffic. If you cap GPT-4o at $500/day but your agent reaches into your Stripe key for 4,300 charges on the side, you'll trip neither proxy. The dollar burn happened outside their network path.

The proxy shape that catches that failure mode is the same network-boundary pattern, but sitting in front of api.stripe.com, api.twilio.com, and api.resend.com instead of api.openai.com. Same idea: virtual keys with policies (daily $ cap, customer allowlist, endpoint allowlist, max-amount-per-call, TTL), per-call audit log, sub-second revoke. Different vendors.

That's the category Keybrake is building in. It's not an LLM gateway — we would lose a feature-comparison against any of the five above on LLM coverage, because we don't touch LLM traffic at all. We're the other-half-of-the-stack: a SaaS-tool governance proxy. Most teams running agents in 2026 will end up wanting both, in series: LiteLLM (or one of the above) in front of LLM calls, a governance proxy in front of SaaS-tool calls, one audit schema that joins across agent run IDs.

What to do if that framing fits

Pick an LLM gateway from the five above based on the dimension that matches your constraint (breadth → LiteLLM; routing → Portkey; observability → Helicone; Kubernetes → LangGate; latency → Bifrost; zero-ops → OpenRouter).
Separately, put a governance proxy in front of your money-moving SaaS APIs. If you're on Stripe, the minimum floor is a scoped Restricted Key — see the five-tick example. The ceiling is a proxy like Keybrake that adds the controls Restricted Keys don't — see the coverage matrix for what gets added.
Join the two audit trails on agent_run_id. One weekly spend report that covers both LLM cost and SaaS-tool cost per agent run makes everything downstream (postmortems, cost attribution, spend caps) one query instead of two.

How Keybrake fits (if you want the governance-proxy half)

Keybrake sits between your agent (or MCP server, or Stripe Agent Toolkit, or n8n workflow) and the SaaS API. You issue a vault_key per agent or per run; you attach a policy (daily USD cap, endpoint allowlist, customer allowlist, max-amount, expiry); the agent sends requests to proxy.keybrake.com/<vendor>; we enforce the policy, forward the call, parse the real cost from the vendor response, and log it. One-click revoke works mid-run — no code redeploy. v1 vendors are Stripe, Twilio, Resend; we're hand-picking from there.

Get early access