Building a High-Availability LLM Gateway: Bifrost Deployment and Setup Guide

bifrost gatewayhigh availability llm gatewaygo language api gatewaysemantic cachingllm failover
Published·Modified·

When building AI applications, the biggest headache is often integrating with various APIs from different providers while worrying about outages and latency. Bifrost was created to solve this problem—a super lightweight open-source gateway written in Go. It unifies models like OpenAI, Claude, and Gemini into a single standard interface, offering high speed along with built-in automatic failover and load balancing.

d782a4aaa4dc8d33.png

Bifrost GitHub Open Source Address: https://github.com/maximhq/bifrost

Bifrost Features

  • Extreme Performance: Developed in Go, it achieves microsecond-level latency and throughput far exceeding Python-based gateways (like LiteLLM).
  • Unified Interface: Fully compatible with the OpenAI API standard; integrate once to call global mainstream models (OpenAI, Anthropic, Google, AWS, etc.) and local models.
  • High Availability Architecture: Built-in automatic failover and intelligent load balancing ensure single points of failure are undetectable, guaranteeing uninterrupted service.
  • Intelligent Caching: Supports semantic caching, significantly reducing costs for repeated requests and improving response speeds.
  • Lightweight Deployment: No complex dependencies; official Docker images are provided, making it ideal for containerized environments and resource-constrained scenarios.
  • Strong Observability: Built-in detailed request logs and monitoring metrics facilitate tracking of latency, error rates, and token consumption.
  • Flexible Routing: Supports traffic distribution strategies based on model names, user tags, or custom rules.

Bifrost vs. New API

Currently, the most common gateway for mainstream large models in China might be New API. As an emerging high-performance solution, Bifrost shares some functional overlap with New API but also has significant differences. The following comparison analysis shows there is no right or wrong choice; users can select based on their specific scenarios.

Feature Bifrost New API (One-API)
Performance ⬆️ Extremely High High
Deployment Difficulty ⬆️ Simple Medium
Applicable Scenarios Enterprise/Internal/Individual Developers: Pursuing extreme stability, low latency, and multi-model disaster recovery Merchants/Webmasters: Want to build a website to "sell API keys"
Models & Ecosystem Rich, good support for foreign large models, less support for domestic models. ⬆️ Very Rich, good support for both domestic and foreign large models.
Special Engineering Features ⬆️ Semantic Caching, Adaptive Load Balancing, Automatic Failover, MCP (Model Context Protocol) support. ⬆️ Model Redirection, More Granular Billing
Web Interface Simple, but supports English only ⬆️ Beautiful, supports multiple languages

Deploying Bifrost with Docker Compose

Bifrost deployment is very simple; you can complete it in 30 seconds using Docker Compose. Create a compose.yaml file and fill in the following content:

services:
  bifrost:
    image: maximhq/bifrost
    container_name: bifrost
    ports:
      - "8080:8080" 
    user: "0:0"
    volumes:
      - ./data:/app/data 
    restart: unless-stopped

Then run docker compose up -d. After success, visit http://IP:8080.

Note: Here user: "0:0" uses the root user to run. If you do not want to use the root user, you can pass the user ID and group manually, while ensuring that ./data in the current directory has write permissions for the corresponding user; otherwise, it will not run.

Initialization

By default, Bifrost runs without protection, which is obviously not secure for production environments. However, you can set up account and password access in [Settings - Security - Password protect the dashboard].

3a931c224722c99e.png

Adding Channels

Next, add large model channels. In the console, go to [Models - Model Providers - Add New Provider] to add them. Built-in support includes over 20 common channels such as Anthropic, OpenAI, and Gemini, and you can also customize any OpenAI-compatible channel.

85566819c9019fd8.png

Taking Gemini as an example, you just need to give it a name, enter the key, select the available large model (e.g., gemini-2.5-flash), and save it, as shown below.

811758c8af625d97.png

Bifrost supports adding multiple upstream channels; simply choose different types of channels to add.

Setting Virtual Keys

Virtual keys allow for more convenient and secure calls to the Bifrost gateway, so this step is mandatory. You need to create a virtual key first in the console at [Governance - Virtual Keys].

9aafcae51b0bfd04.png

When creating a virtual key, you can set limits on quotas and token lengths.

1f85c54d47c31ea2.png

I originally thought that creating a virtual key would allow direct calls, but I got a 401 error. It took me several hours to figure out. The reason is that after creating a virtual key, it is not supported by default to call directly (which is a bit counter-intuitive). You need to enable it separately in the security settings.

The specific steps are: In the console, go to [Settings - Security] and check all three options below:

  • Disable authentication on inference calls
  • Enforce Virtual Keys on Inference
  • Allow Direct API Keys

As shown in the figure below:

243fc8c82b4d2cab.png

Then save, and you can use the virtual key to make calls.

Calling the API

The Bifrost gateway is fully compatible with the OpenAI format. The curl command call method is as follows:

curl -X POST http://IP:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
--header 'Authorization: Bearer Virtual Keys' \
  -d '{
    "model": "gemini/gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'
  • Change Virtual Keys to the virtual key you just created.
  • In the model field, you need to add the channel prefix. For example, if I added the Gemini channel and the corresponding gemini-2.5-flash model, the complete model ID is: gemini/gemini-2.5-flash.

Nginx Reverse Proxy

In production environments, IP access is generally not used. It is recommended to configure domain names and SSL access via Nginx reverse proxy. You can refer to the following Nginx reverse proxy configuration:

location / {
        proxy_pass http://IP:8080;
        
        client_max_body_size 64m;
        
        # Set timeout to 120s
        proxy_connect_timeout 120s;
        proxy_send_timeout 300s;
        proxy_read_timeout 300s;
        
        # Standard headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Connection settings for streaming
        proxy_http_version 1.1;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header Accept-Encoding gzip;
        
    }

Routing Rules

Bifrost also supports creating complex routing rules to achieve load balancing and failover for backend AI large models, thereby ensuring stability and high availability. Due to space limitations, we cannot elaborate in detail here; a future article will explain Bifrost routing rule configuration separately.

Conclusion

Bifrost, with its extreme performance and lightweight deployment, has become a powerful tool for building high-availability AI backends. Although it is not as rich as New API in terms of Chinese ecosystem and billing features, it has obvious advantages in self-use scenarios that pursue low latency and automatic disaster recovery. I hope this article helps you quickly build your own gateway, making model calls more stable and faster. In the future, I will delve into routing rule configuration, so stay tuned.

Bifrost GitHub Open Source Address: https://github.com/maximhq/bifrost