Open Source · Apache 2.0

The AI Gateway
Control Plane.

Route, govern, and observe every LLM call — across Azure, AWS, Anthropic, GCP, and Ollama. Single binary. No Kubernetes. Ships in 30 seconds.

OpenAI CompatiblePolicy EnforcementCost TrackingMCP GatewayRBAC + Teams
6Providers
1Binary
30sSetup
Live alpha — request access
localhost:8000 · Knull Admin
K

AI GATEWAY · LAST 30 DAYS

$47.85Total Spend
Requests68.4k
Tokens5.2M
Tok / Req76

Request Throughput

Daily — last 7 days
MTWTFSS

Cost Flow

Provider routing
KNULLGATEWAY45% · $21.5529% · $13.8722% · $10.514% · $0.00
~0msAPI Key Lookup
~0msBudget Check
6+LLM Providers
Drop-inOpenAI Compatible
SQLite / PGDatabase
Single FileBinary Size
NeverK8s Required
Built-inMCP Tools
EnterpriseRBAC
Real-timeCost Tracking
Unix SocketExtProc Transport
NativeAgents
~0msAPI Key Lookup
~0msBudget Check
6+LLM Providers
Drop-inOpenAI Compatible
SQLite / PGDatabase
Single FileBinary Size
NeverK8s Required
Built-inMCP Tools
EnterpriseRBAC
Real-timeCost Tracking
Unix SocketExtProc Transport
NativeAgents
CAPABILITIES

[ CAPABILITIES ]

EVERYTHING AN
AI GATEWAY NEEDS.

Purpose-built for teams shipping AI to production. Not a SaaS. Not a platform. A binary you run anywhere.

Data Plane

Multi-Provider Routing

Send OpenAI-format requests to any LLM provider. Azure, AWS Bedrock, Anthropic, GCP Vertex, Cohere, or local Ollama — Knull transforms and routes transparently.

  • Zero code changes
  • Automatic format translation
  • Fallback routing
Control Plane

Policy Enforcement

Per-key budget limits, model allow/deny lists, and token quotas. Enforced at the request layer in the ExtProc — before traffic ever reaches your provider.

  • Budget limits (USD/tokens)
  • Allow/deny model lists
  • Real-time enforcement
Access Control

RBAC & Teams

Teams, users, API keys, and roles. Enterprise-grade access control with full audit logs. Manage who can call which models, with what budget.

  • Team-scoped API keys
  • Role-based permissions
  • Immutable audit trail
Tool Execution

MCP Gateway

Proxy Model Context Protocol tool servers through the data plane. GitHub, Filesystem, custom tools — all with API key auth, usage tracking, and policy enforcement.

  • SSE & HTTP transport
  • Session management
  • Auth-enforced tool calls
Agentic Loops

Built-in Agents

Native agentic loop with tool use. Run agents via the REST API — model call, tool execution, result injection, repeat — until completion. No external orchestrator needed.

  • Multi-turn loops
  • MCP tool integration
  • Configurable max iterations
Analytics

Full Observability

Token usage, USD cost tracking, latency histograms, and model performance — all in real-time. Prometheus metrics and a built-in analytics dashboard.

  • Cost per model/key/team
  • Latency percentiles
  • Prometheus-compatible
WORKFLOW

[ HOW IT WORKS ]

FROM CONFIG TO COMPLETION IN 30 SECONDS.

Step 01 / 04

Write your config

One YAML file. List your providers with API keys, set your gateway port. No CRDs, no Helm charts, no service mesh.

Write your config
Start the binary
Send requests
Observe everything
write_your_config.sh
gateways:
- name: aigw
port: 1975
models:
- id: gpt-4o-mini
provider: azure
apiKey: ${AZURE_API_KEY}
- id: claude-3-5-sonnet
provider: anthropic
apiKey: ${ANTHROPIC_KEY}
- id: gemini-2
provider: gcp_vertex
gcpProject: ${GCP_PROJECT}
SCROLL FOR NEXT STEP
INTEGRATION

[ CODE FIRST ]

WORKS WITH EVERY
OPENAI CLIENT.

Knull is a drop-in replacement for any OpenAI-compatible client. Zero code changes. Route to Azure, AWS, Anthropic, GCP, or Ollama — all through the same endpoint.

OpenAI SDK, LangChain, LlamaIndex — all work unchanged
API keys never leave your infra — Knull holds credentials
Switch providers without touching application code
Streaming, function calling, and JSON mode supported

1975

Data Plane Port

8000

Admin API Port

1064

Metrics Port

HTTP/1.1

Transport

# Knull Core Configuration
# Works with any OpenAI-compatible client

gateways:
  - name: aigw-run
    port: 1975

models:
  # Azure OpenAI
  - id: gpt-4o-mini
    provider: azure
    endpoint: ${AZURE_OPENAI_ENDPOINT_HOSTNAME}
    apiKey: ${AZURE_OPENAI_API_KEY}

  # AWS Bedrock (Claude 3.5)
  - id: claude-3-5-sonnet
    provider: aws_anthropic
    endpoint: bedrock-runtime.us-east-1.amazonaws.com
    awsRegion: us-east-1
    awsAccessKey: ${AWS_ACCESS_KEY_ID}
    awsSecretKey: ${AWS_SECRET_ACCESS_KEY}

  # Anthropic Direct
  - id: claude-opus-4
    provider: anthropic
    apiKey: ${ANTHROPIC_API_KEY}

  # GCP Vertex
  - id: gemini-2-pro
    provider: gcp_vertex
    endpoint: us-central1-aiplatform.googleapis.com
    gcpProject: ${GCP_PROJECT_ID}
    gcpRegion: us-central1

  # Local Ollama
  - id: llama-3
    provider: openai_compatible
    endpoint: localhost:11434
PROVIDERS

[ PROVIDER SUPPORT ]

ROUTE TO ANY PROVIDER.
ONE ENDPOINT.

One port. One API format. Every provider. Knull handles request transformation, credential injection, and protocol differences transparently.

Any OpenAI-compatible endpoint works via provider: openai_compatible

INTERNALS

[ CONTROL PLANE ]

Six layers between your request and the provider.

Policy enforcement, cost attribution, smart routing, and full observability — built into every call. Not bolted on afterward.

Layer 01· OPENAI-COMPATIBLE · PORT 1975

Drop-in. No code changes.

Your apps keep calling the OpenAI API. Knull intercepts every request at the edge — adding auth, routing, and policy — without a single line of application code changing.

OpenAI SDK compatible, zero refactoring
Virtual key auth + TLS termination
Sub-2ms latency overhead
Overhead< 2ms
Layer 02· RBAC · RATE LIMITS · ALLOWLISTS

Enforce rules before tokens fire.

Define exactly who can call what model, when, and how much — at the team, key, or request level. Policy violations are rejected inline. No exceptions.

Team + key-level RBAC, model allowlists
Per-team budget caps with hard enforcement
Rate limiting — configurable per key
Eval latency< 0.5ms
Layer 03· REAL-TIME SPEND ATTRIBUTION

No more end-of-month bill surprises.

Every token counted, attributed, and capped in real time — by team, by key, by model. Automatic budget stops before overruns happen. Export to finance tools, not just dashboards.

Token-level cost per team / key / model
Hard budget stops — no partial overruns
Normalized cost across all providers
Granularity$0.0001
Layer 04· FAILOVER · COST ROUTING · CANARY

The right provider, every request.

Route by cost, latency, or custom rules. Fail over automatically when a provider is down. A/B test models in production. All config-driven — no deploys.

Automatic provider failover < 50ms
Cost-optimized and latency-aware routing
Canary routing for model experiments
Failover< 50ms
Layer 05· AUDIT · ANALYTICS · ANOMALIES

Full audit trail. Zero extra setup.

Every LLM call logged with full context — model, team, tokens, cost, latency. Export to Datadog or Grafana natively. Detect spend anomalies before they become incidents.

Immutable audit log, every request
Prometheus metrics + Grafana dashboards
Anomaly detection with configurable alerts
RetentionConfigurable
Layer 06· TOOL PROXY · AGENT ACCESS CONTROL

AI agents that use tools — safely.

Proxy any MCP tool server through the same control plane. Policy, cost tracking, and audit logging for agent tool calls — not just model calls. One gateway for all AI activity.

Unified MCP tool registry for all agents
Per-agent access control policies
Full tool call session audit trail
ProtocolMCP 2024-11

Exposed ports

:1975DATA PLANELLM proxy ingress
:8000ADMIN APIConfig + management
:9856MCP PROXYTool call gateway
:1064METRICSPrometheus endpoint

[ QUICKSTART ]

START IN 30 SECONDS.

Single binary. Self-contained. Write your config, run the binary, send requests. That's the entire setup.

$
# Build from source
git clone https://github.com/knull-sh
cd knull
make build

# Run with your config
./bin/knull run examples/knull.yaml

Apache 2.0 License · Free to use, self-host, and modify