Bloomberg has released its version of a free and open-source (FOSS) “AI Gateway” that it co-created with Tetrate after being concerned that proprietary versions were too expensive and “open-source” alternatives too limited with better features only available through enterprise licences.
The financial services firm aims to use the project to manage API calls to Large Language Models. The first stable version (v0.1) of their joint creation, an Apache 2.0 licenced “Envoy AI Gateway”, is now available.
As first reported by The Stack in early October 2024, the two teamed up to build it around the CNCF “Envoy” project, which both contribute to. Bloomberg’s platform engineering teams will use the Gateway in front of multiple LLMs to handle authentication, rate limiting and other features.
(If developers want to use a range of LLMs for their applications and that is not somehow managed through a central gateway, both shadow IT and costs can start to escalate. An “AI gateway” lets those responsible for providing platforms keep an eye on costs and who is using what.)
Bloomberg earlier said that it teamed up with Envoy maintainer Tetrate to “build it” rather than “buy it” in order to avoid vendor lock-in, or having to buy features on ostensibly open-source projects that are only accessible through additional enterprise licenses; API calls to LLMs already get expensive. Adding another layer of cost in front is not attractive.
The two said the initial Envoy AI Gateway release provides the following:
- “Unified API to simplify client integration with multiple LLM providers... Version 0.1 includes integrations with AWS Bedrock and OpenAI.”
- “Upstream Authorization to simplify sign-in with multiple LLM service providers…”
- “Usage Rate Limiting based on word tokens, ensuring cost-effectiveness and operational control. Token rates can be limited by LLM provider, customized per model or tailored to each client for a defined time period.”
On the near-term project roadmap, meanwhile:
- "Google Gemini 2.0 Integration out-of-the-box
- "Provider and Model Fallback Logic to ensure continuation of services should an AI service become temporarily unavailable
- "Prompt Templating to provide consistent context to the LLM service across requests
- "Semantic Caching to lower LLM usage costs by reusing responses from semantically similar requests, thereby minimizing expensive LLM interactions
“Envoy AI Gateway will enable Bloomberg to equip its engineers with the infrastructure needed to deliver generative AI applications quickly and at scale,” said Steven Bower, Manager of Bloomberg’s Cloud Native Compute Services Engineering group in a canned statement on February 25.
The project had its genesis when Dan Sun, Engineering Team Lead for Bloomberg's Cloud Native Compute Services and AI Inference team, came to the Envoy community and outlined his views of the problem space. Tetrate, a significant contributor to the Envoy project, stepped in to support. (Sun is also the founder of Kserve, a tool for serving predictive and generative AI models on Kubernetes adopted by AMD and NVIDIA.)
The teams opted to start create this AI Gateway on the foundations of Envoy Gateway, a Cloud Native Compute Foundation (CNCF) project launched in 2022 that builds on the Kubernetes Gateway API and aims, in part, to be a reference implementation for running Envoy in Kubernetes as an ingress controller. (Envoy itself is a popular, if complex OSS project originally launched in 2016 that can be used in a range of different ways, including in microservices-based architectures, to manage service discovery, load balancing, TLS termination, HTTP/2 and gRPC proxies.)