Show HN: GoModel – an open-source AI gateway in Go
TL;DR Highlight
GoModel unifies access to OpenAI, Anthropic, Gemini, and other AI providers through a single, OpenAI-compatible API, offering a compiled-language alternative to LiteLLM.
Who Should Read
Backend developers simultaneously using multiple LLM providers, or those interested in the performance, supply chain security, and Go ecosystem integration benefits over LiteLLM.
Core Mechanics
- GoModel is a Go-written AI gateway that integrates various providers—OpenAI, Anthropic, Gemini, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, and Ollama—into a single OpenAI-compatible API.
- It can be launched with a single Docker command, requiring only the API keys for the desired providers as environment variables; at least one provider key is needed for operation.
- Positioned as an alternative to LiteLLM, it natively supports observability (monitoring), guardrails (safety filters), and streaming (streaming responses).
- Its use of the Go compiled language is highlighted as a strength, offering greater security against runtime supply chain attacks compared to Python-based LiteLLM due to fixed dependencies at compile time.
- It supports Prometheus metric integration and includes separate configuration files (prometheus.yml) and a docker-compose.yaml for easy monitoring environment setup.
- A semantic caching layer appears to be present, with the gateway embedding requests and using vector similarity search to determine cache hits.
- A Helm chart is included, enabling deployment in Kubernetes environments.
- Currently, it has 319 stars and 20 forks on GitHub and is actively being committed, indicating an early-stage project.
Evidence
- "In response to a question about the importance of being written in Go, a comment pointed out that Go compiled binaries have a significantly smaller runtime supply chain attack surface than Python-based tools, a point also made by the developer of a similar Go gateway (sbproxy.dev). An experienced AI proxy maintainer noted that the most challenging aspect is adapting to changing input/output structures with each model/provider release, emphasizing that integration within 24 hours of a new model launch is crucial for a well-managed project. Concerns were raised about the maintenance burden of keeping up with provider updates due to the lack of a robust Go SDK compared to JavaScript and Python, a challenge the author acknowledges. A vllm user inquired about Ollama integration, and requests were made for cost tracking per model/route, particularly for mixed free/paid model usage. Questions were also raised about potential open-source rug pulls, and the need for the unified API to abstract provider-specific parameters like temperature, reasoning effort, and tool choice mode."
How to Apply
- "If you're using multiple LLM providers and want to avoid modifying client code with each model switch, deploy GoModel as an intermediary gateway and route all requests to its OpenAI-compatible endpoint at `http://localhost:8080`. Provider switching is then managed through environment variables. If you're running LiteLLM and concerned about Python runtime supply chain security or memory/performance overhead, consider switching to GoModel. Its compiled binary has no runtime dependencies and the Docker image is lightweight. For centralized management of AI traffic in Kubernetes, leverage the included Helm chart to deploy GoModel to your cluster and integrate it with Prometheus to monitor model response times and error rates. If your team manages AI provider keys individually, use GoModel as an internal gateway, directing team members to its endpoint to centralize key management."
Code Example
# Minimal execution (using only OpenAI)
docker run --rm -p 8080:8080 \
-e OPENAI_API_KEY="your-openai-key" \
enterpilot/gomodel
# Using multiple providers simultaneously
docker run --rm -p 8080:8080 \
-e OPENAI_API_KEY="your-openai-key" \
-e ANTHROPIC_API_KEY="your-anthropic-key" \
-e GEMINI_API_KEY="your-gemini-key" \
-e GROQ_API_KEY="your-groq-key" \
-e OPENROUTER_API_KEY="your-openrouter-key" \
-e XAI_API_KEY="your-xai-key" \
-e AZURE_API_KEY="your-azure-key" \
-e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \
-e AZURE_API_VERSION="2024-10-21" \
enterpilot/gomodel
# Then, in the client, only change the base_url
# openai.OpenAI(base_url="http://localhost:8080", api_key="any-value")Terminology
Related Papers
Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens
AI 에이전트가 CLI 명령어 출력을 읽을 때 불필요한 노이즈를 제거해 토큰 사용량을 줄여주는 Rust 기반 CLI 필터 도구. Claude Code, OpenCode 등 주요 AI 코딩 에이전트와 통합 가능하다.
1-Bit Bonsai Image 4B Image Generation for Local Devices
4B 파라미터 이미지 생성 모델의 가중치를 1비트/3값으로 극단적으로 압축해서 iPhone에서도 돌아가게 만든 모델. 7.75GB짜리 diffusion transformer를 0.93GB까지 줄였다.
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
vLLM의 핵심 기능을 C++와 CUDA로 직접 구현하며 배울 수 있는 교육용 LLM 추론 엔진 프로젝트로, 소스코드와 단계별 강의가 함께 제공된다.
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
Kog AI가 8× AMD MI300X에서 요청당 3,000 tokens/s를 달성하는 LLM 추론 엔진을 공개했고, 기존 소프트웨어 스택의 병목을 GPU 메모리 대역폭 최대화로 풀어냈다는 내용이다.
A sleep-like consolidation mechanism for LLMs
LLM이 긴 컨텍스트를 처리할 때 발생하는 Attention 비용 문제를 해결하기 위해, 사람의 수면처럼 주기적으로 컨텍스트를 fast weight에 압축·저장하는 새로운 메커니즘을 제안한 논문이다.
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
GPU에서 Transformer 학습 시 발생하는 메모리 병목을 해결하기 위해, 정규화·활성화 등 소규모 연산들을 GEMM 출력이 칩 위에 있는 동안 함께 실행하는 커널 추상화 CODA를 소개한다. LLM이 이 추상화를 활용해 고성능 커널을 자동 생성할 수 있다는 점이 특히 주목받고 있다.