The AI Gateway
Control Plane.
Route, govern, and observe every LLM call — across Azure, AWS, Anthropic, GCP, and Ollama. Single binary. No Kubernetes. Ships in 30 seconds.
[ CAPABILITIES ]
EVERYTHING AN
AI GATEWAY NEEDS.
Purpose-built for teams shipping AI to production. Not a SaaS. Not a platform. A binary you run anywhere.
Multi-Provider Routing
Send OpenAI-format requests to any LLM provider. Azure, AWS Bedrock, Anthropic, GCP Vertex, Cohere, or local Ollama — Knull transforms and routes transparently.
- Zero code changes
- Automatic format translation
- Fallback routing
Policy Enforcement
Per-key budget limits, model allow/deny lists, and token quotas. Enforced at the request layer in the ExtProc — before traffic ever reaches your provider.
- Budget limits (USD/tokens)
- Allow/deny model lists
- Real-time enforcement
RBAC & Teams
Teams, users, API keys, and roles. Enterprise-grade access control with full audit logs. Manage who can call which models, with what budget.
- Team-scoped API keys
- Role-based permissions
- Immutable audit trail
MCP Gateway
Proxy Model Context Protocol tool servers through the data plane. GitHub, Filesystem, custom tools — all with API key auth, usage tracking, and policy enforcement.
- SSE & HTTP transport
- Session management
- Auth-enforced tool calls
Built-in Agents
Native agentic loop with tool use. Run agents via the REST API — model call, tool execution, result injection, repeat — until completion. No external orchestrator needed.
- Multi-turn loops
- MCP tool integration
- Configurable max iterations
Full Observability
Token usage, USD cost tracking, latency histograms, and model performance — all in real-time. Prometheus metrics and a built-in analytics dashboard.
- Cost per model/key/team
- Latency percentiles
- Prometheus-compatible
[ CODE FIRST ]
WORKS WITH EVERY
OPENAI CLIENT.
Knull is a drop-in replacement for any OpenAI-compatible client. Zero code changes. Route to Azure, AWS, Anthropic, GCP, or Ollama — all through the same endpoint.
1975
Data Plane Port
8000
Admin API Port
1064
Metrics Port
HTTP/1.1
Transport
# Knull Core Configuration
# Works with any OpenAI-compatible client
gateways:
- name: aigw-run
port: 1975
models:
# Azure OpenAI
- id: gpt-4o-mini
provider: azure
endpoint: ${AZURE_OPENAI_ENDPOINT_HOSTNAME}
apiKey: ${AZURE_OPENAI_API_KEY}
# AWS Bedrock (Claude 3.5)
- id: claude-3-5-sonnet
provider: aws_anthropic
endpoint: bedrock-runtime.us-east-1.amazonaws.com
awsRegion: us-east-1
awsAccessKey: ${AWS_ACCESS_KEY_ID}
awsSecretKey: ${AWS_SECRET_ACCESS_KEY}
# Anthropic Direct
- id: claude-opus-4
provider: anthropic
apiKey: ${ANTHROPIC_API_KEY}
# GCP Vertex
- id: gemini-2-pro
provider: gcp_vertex
endpoint: us-central1-aiplatform.googleapis.com
gcpProject: ${GCP_PROJECT_ID}
gcpRegion: us-central1
# Local Ollama
- id: llama-3
provider: openai_compatible
endpoint: localhost:11434[ PROVIDER SUPPORT ]
ROUTE TO ANY PROVIDER.
ONE ENDPOINT.
One port. One API format. Every provider. Knull handles request transformation, credential injection, and protocol differences transparently.
OpenAI
OpenAI native
GPT-4o, GPT-4o-mini and all OpenAI models
Azure OpenAI
HTTP/1.1 enforced
Azure deployments with auto API key injection
AWS Bedrock
SigV4 signing
Claude, Llama, Titan via inference profiles
Anthropic
Anthropic native
Direct Anthropic API with streaming
GCP Vertex
OAuth2 bearer
Gemini models via Vertex AI endpoint
Cohere
Cohere native
Command models with full chat support
Ollama
OpenAI compatible
Local models — Llama, Mistral, Phi, etc.
Custom
OpenAI compatible
Any OpenAI-compatible endpoint with schema override
Any OpenAI-compatible endpoint works via provider: openai_compatible
[ CONTROL PLANE ]
Six layers between your request and the provider.
Policy enforcement, cost attribution, smart routing, and full observability — built into every call. Not bolted on afterward.
Drop-in. No code changes.
Your apps keep calling the OpenAI API. Knull intercepts every request at the edge — adding auth, routing, and policy — without a single line of application code changing.
Enforce rules before tokens fire.
Define exactly who can call what model, when, and how much — at the team, key, or request level. Policy violations are rejected inline. No exceptions.
No more end-of-month bill surprises.
Every token counted, attributed, and capped in real time — by team, by key, by model. Automatic budget stops before overruns happen. Export to finance tools, not just dashboards.
The right provider, every request.
Route by cost, latency, or custom rules. Fail over automatically when a provider is down. A/B test models in production. All config-driven — no deploys.
Full audit trail. Zero extra setup.
Every LLM call logged with full context — model, team, tokens, cost, latency. Export to Datadog or Grafana natively. Detect spend anomalies before they become incidents.
AI agents that use tools — safely.
Proxy any MCP tool server through the same control plane. Policy, cost tracking, and audit logging for agent tool calls — not just model calls. One gateway for all AI activity.
Exposed ports
:1975DATA PLANELLM proxy ingress:8000ADMIN APIConfig + management:9856MCP PROXYTool call gateway:1064METRICSPrometheus endpoint[ QUICKSTART ]
START IN 30 SECONDS.
Single binary. Self-contained. Write your config, run the binary, send requests. That's the entire setup.
# Build from source
git clone https://github.com/knull-sh
cd knull
make build
# Run with your config
./bin/knull run examples/knull.yamlApache 2.0 License · Free to use, self-host, and modify