Overview
KubeBolt is an open-source Kubernetes monitoring and management platform. Full cluster visibility in under 2 minutes — no agents, no Prometheus, no databases.
What is KubeBolt?
KubeBolt reads directly from the Kubernetes API Server and Metrics Server to give you a visual, intuitive dashboard with actionable insights. It's designed for development teams and small-to-medium engineering organizations that deploy on Kubernetes but don't have dedicated SRE teams.
What you get
- 23 resource views with live CPU/memory metrics from Metrics Server
- Interactive cluster topology map with Grid and Flow layouts
- 12-rule insights engine with actionable recommendations
- Multi-cluster support — all kubeconfig contexts auto-discovered
- Gateway API support — Gateways and HTTPRoutes via dynamic client
- Cluster actions — Pod terminal, file browser, port-forward, restart, scale, YAML edit
- AI Copilot — 16 tools querying live data, multi-provider (Claude, GPT, Ollama, vLLM)
- Built-in authentication — Admin/Editor/Viewer roles with JWT sessions
- Global search — ⌘K across all resource types
- RBAC-aware — adapts to your ServiceAccount permissions automatically
Performance
| Metric | Value |
|---|---|
| Backend memory usage | ~19 MB |
| Frontend bundle (gzipped) | ~148 KB |
| API response time | <5ms (from informer cache) |
| Startup time | <3s |
Compatibility
| Provider | Metrics Server | Notes |
|---|---|---|
| Amazon EKS | Pre-installed | Mount ~/.aws for credentials |
| Google GKE | Pre-installed | Works out of the box |
| Azure AKS | Pre-installed | Works out of the box |
| k3s / k0s | Pre-installed | Works out of the box |
| Docker Desktop | Manual install | Use docker-kubeconfig.sh helper |
| Minikube | Addon | minikube addons enable metrics-server |
Quick Start
Get KubeBolt running in under 2 minutes.
Fastest: Helm Chart
helm install kubebolt \ oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt kubectl port-forward svc/kubebolt 3000:80
Open http://localhost:3000 and log in with the default admin user.
admin user and prints the generated password to the server logs. Check with kubectl logs deployment/kubebolt-api | grep "Generated admin password". You can also set it explicitly: --set auth.adminPassword=YourPassword.Docker Compose
# Remote clusters (EKS, GKE, AKS) kubectl config use-context my-cluster cd deploy && docker compose up -d # Docker Desktop K8s (needs kubeconfig rewrite) ./deploy/docker-kubeconfig.sh cd deploy && docker compose up -d
Frontend at http://localhost:3000. Nginx proxies /api and /ws to the Go backend.
Helm (in-cluster)
helm install kubebolt \ oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt kubectl port-forward svc/kubebolt 3000:80
Local Development
Requires Go 1.25+ and Node 20+.
# Backend (port 8080) cd apps/api && go run cmd/server/main.go --kubeconfig ~/.kube/config # Frontend (port 5173, proxies /api and /ws to backend) cd apps/web && npm install && npm run dev
Installation Methods
Multiple ways to install KubeBolt, from Docker Compose to in-cluster Helm deployment.
Helm Chart (recommended for Kubernetes)
Production deployment inside your cluster. OCI-based chart on GitHub Container Registry. Also listed on Artifact Hub.
helm install kubebolt oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt kubectl port-forward svc/kubebolt 3000:80 # With Ingress and custom admin password helm install kubebolt \ oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \ --set auth.adminPassword=YourSecurePassword \ --set ingress.enabled=true \ --set ingress.host=kubebolt.example.com
The chart creates a ClusterRole with full read permissions, a ServiceAccount, and the KubeBolt deployment. Configurable values include image tags, resource limits, Ingress, auth, and RBAC settings.
Docker Compose
Full stack with separate API and web containers. Multi-arch images (amd64/arm64).
# Remote clusters (EKS, GKE, AKS) kubectl config use-context my-cluster cd deploy && docker compose up -d # Docker Desktop K8s (needs kubeconfig rewrite) ./deploy/docker-kubeconfig.sh cd deploy && docker compose up -d
Frontend at http://localhost:3000. Admin password printed to logs (docker compose logs api).
~/.aws for AWS credential access. Ensure your AWS session is active.Single Binary (macOS, Linux, Windows)
One executable with embedded frontend. API + UI in a single process on one port. Download from GitHub Releases.
# macOS Apple Silicon curl -LO https://github.com/clm-cloud-solutions/kubebolt/releases/latest/download/kubebolt-darwin-arm64 chmod +x kubebolt-darwin-arm64 && mv kubebolt-darwin-arm64 /usr/local/bin/kubebolt # Linux amd64 curl -LO https://github.com/clm-cloud-solutions/kubebolt/releases/latest/download/kubebolt-linux-amd64 chmod +x kubebolt-linux-amd64 && sudo mv kubebolt-linux-amd64 /usr/local/bin/kubebolt # Run (auto-detects kubeconfig; admin password printed to logs) kubebolt --kubeconfig ~/.kube/config
Available for darwin-arm64, darwin-amd64, linux-arm64, linux-amd64, and windows-amd64. Verify with sha256sum -c CHECKSUMS.txt.
.env file from the current directory. Put KUBEBOLT_ADMIN_PASSWORD, KUBEBOLT_AI_API_KEY, etc. in a .env file next to the binary. System env vars and CLI flags take precedence.Local Development
Requires Go 1.25+ and Node 20+.
# Backend (port 8080) cd apps/api && go run cmd/server/main.go --kubeconfig ~/.kube/config # Frontend (port 5173, proxies /api and /ws to backend) cd apps/web && npm install && npm run dev
To build your own single binary:
make build-binary # Produces apps/api/kubebolt (embedded frontend)
Homebrew (macOS, Linux)
Install and update via Homebrew. Automatic version management via brew upgrade. Available for macOS and Linux (amd64 + arm64).
# Add the CLM tap (one-time setup) brew tap clm-cloud-solutions/tap # Install brew install kubebolt # Run kubebolt --kubeconfig ~/.kube/config # Update to the latest version brew upgrade kubebolt
Docker (single container)
Single image with embedded frontend. No nginx, no compose. Multi-arch (amd64/arm64). Runs as non-root.
docker run -p 3000:3000 \
-v ~/.kube:/root/.kube:ro \
ghcr.io/clm-cloud-solutions/kubebolt:latest
The image is signed with Cosign. Verify with:
cosign verify ghcr.io/clm-cloud-solutions/kubebolt:latest \
--certificate-identity-regexp 'https://github.com/clm-cloud-solutions/kubebolt/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com
~/.aws and set AWS_PROFILE so the binary can obtain tokens. The image includes aws-cli.kubectl Plugin (krew)
Run KubeBolt as a kubectl subcommand. Available via CLM's custom krew index.
# Prerequisite: install krew (one-time) # https://krew.sigs.k8s.io/docs/user-guide/setup/install/ # Add the CLM custom index kubectl krew index add clm https://github.com/clm-cloud-solutions/krew-index.git # Install kubectl krew install clm/kubebolt # Run (uses current kubectl context) kubectl kubebolt # Update kubectl krew upgrade clm/kubebolt
Coming Soon
The following distribution methods are planned but require additional infrastructure.
Install Script
One-command install that auto-detects OS and architecture. Requires a custom domain (kubebolt.io) to host the script.
curl -fsSL https://get.kubebolt.dev | sh
Official krew-index
After upstream acceptance into kubernetes-sigs/krew-index, users will be able to install without adding the CLM custom index:
kubectl krew install kubebolt
Kubernetes Operator
Declarative lifecycle management via a KubeBolt CRD. Separate controller project with Kubebuilder, handles upgrades and self-healing.
kubectl apply -f https://get.kubebolt.dev/operator.yaml
Kubernetes Operator
Declarative lifecycle management via KubeBolt CRD. Separate controller project with Kubebuilder.
kubectl apply -f https://get.kubebolt.dev/operator.yaml
Architecture
Go backend with in-memory caches and BoltDB for auth. React frontend with live WebSocket updates.
System Diagram
Kubernetes Cluster(s) │ API Server + Metrics Server ▼ kubeconfig (all contexts) KubeBolt Backend (Go) ├─ Cluster Manager → multi-cluster lifecycle, async connection ├─ Shared Informers → typed resources (client-go) ├─ Dynamic Client → Gateway API CRDs (unstructured) ├─ Permission Probe → 22 SelfSubjectAccessReview calls ├─ Metrics Collector → 30s poll, in-memory cache ├─ Insights Engine → 12 rules ├─ Auth Service → JWT sessions, BoltDB user store, role enforcement ├─ REST API (Chi v5) → resource lists, details, YAML, logs ├─ WebSocket Hub → real-time broadcasts └─ Copilot Proxy → LLM tool-calling bridge │ ▼ REST + WebSocket KubeBolt Frontend (React 18 + TypeScript + Vite 5 + Tailwind) ├─ 23 Resource Views → TanStack Table + Query ├─ Cluster Map → React Flow 11 ├─ AI Copilot → multi-provider, 16 tools └─ Theme → dark/light via CSS custom properties (--kb-*)
Go Workspace
Monorepo with go.work containing three modules:
apps/api— Main backend server (entry:cmd/server/main.go)packages/agent— Phase 2 lightweight node agent (stub)packages/shared— Shared Go utilities
Key Backend Packages
| Package | Purpose |
|---|---|
cluster/manager.go | Multi-cluster lifecycle, context switching, async initial connection |
cluster/connector.go | Shared informers + dynamic client, 20s cache sync timeout, 15s rest timeout |
cluster/permissions.go | RBAC probing via SSAR, cluster-wide then namespace fallback, semaphore of 10 |
cluster/nslister.go | Multi-namespace lister wrappers for namespace-scoped ServiceAccounts |
cluster/graph.go | In-memory topology graph with debounced rebuild (2s) |
cluster/relationships.go | Edge detection: ownerRefs, selectors, Gateway parentRefs, volumes |
metrics/collector.go | Metrics Server polling, per-namespace fallback, graceful degradation |
insights/engine.go | 12-rule evaluation engine |
auth/service.go | User management, JWT issue/verify, role enforcement, BoltDB persistence |
auth/middleware.go | Auth middleware with httpOnly cookie extraction and role-based route guards |
websocket/hub.go | Broadcast hub, 4096 buffer, silent drops when no clients |
api/router.go | Chi router with requireConnector middleware |
api/handlers.go | REST handlers with metrics injection, YAML, logs, deployment history |
Data Flow
- Manager reads kubeconfig contexts → async connection (HTTP server binds immediately, returns 503 until connected)
- Permission probe: 22 SSAR calls, cluster-wide then namespace fallback, ~2-5s
- Informers start only for permitted resources
- Namespace-scoped SAs → per-namespace informer factories with multi-lister aggregation
- Dynamic client discovers Gateway API CRDs (5s timeout, gracefully skipped)
- Metrics Collector polls every 30s → in-memory cache (per-namespace when cluster-wide denied)
- REST API serves enriched resources with metrics injection, paginated (50/page). 403 for restricted.
- WebSocket broadcasts resource changes with debounced topology rebuilds
Resource Views
23 resource types, each with search, filtering, pagination, and live metrics.
Supported Resources
| Category | Resources |
|---|---|
| Workloads | Pods, Deployments, StatefulSets, DaemonSets, Jobs, CronJobs, ReplicaSets |
| Traffic | Services, Ingresses, Gateways, HTTPRoutes, EndpointSlices |
| Storage | PersistentVolumeClaims, PersistentVolumes, StorageClasses |
| Config | ConfigMaps, Secrets (keys only, never values), HPAs |
| Cluster | Nodes, Namespaces, Events, Roles, ClusterRoles, RoleBindings, ClusterRoleBindings |
Resource Detail Views
Each resource has a tabbed detail page at /:type/:namespace/:name. Available tabs vary by resource type:
- Overview — Key fields, status, conditions, labels, annotations
- YAML — Syntax-highlighted definition with copy/download/edit. Secrets redacted.
- Pods — For workloads (Deployments, StatefulSets, DaemonSets, Jobs)
- Logs — Pod and workload logs with container selector, tail lines (100/500/1000), 10s auto-refresh
- Containers — Container specs, env vars, ports, mounts (Pods only)
- Volumes — Volume mounts and claims (Pods only)
- Related — Parent and child resources from topology edges
- History — Revision history via ReplicaSets (Deployments) or ControllerRevisions (StatefulSets, DaemonSets)
- Events — Events filtered to the specific resource
- Monitor — SVG donut gauges for CPU/memory from Metrics Server
- Files — Exec-based file browser with directory navigation and content viewer
- Terminal — WebSocket-to-SPDY exec bridge with xterm.js
Insights Engine
12 built-in rules that detect common Kubernetes issues and provide actionable recommendations.
Rule Definitions
| Rule | Severity | Condition |
|---|---|---|
| crash-loop | Critical | Pod in CrashLoopBackOff with restarts >3/hour |
| oom-killed | Critical | Container terminated with OOMKilled (exit code 137) |
| zero-replicas | Critical | Deployment with 0 available replicas |
| node-not-ready | Critical | Node condition Ready ≠ True |
| image-pull-backoff | Critical | Pod in ImagePullBackOff state |
| cpu-throttle-risk | Warning | CPU usage >80% of limit sustained |
| memory-pressure | Warning | Memory usage >85% of limit |
| hpa-maxed-out | Warning | HPA current replicas == max replicas |
| pvc-pending | Warning | PVC in Pending state for >5 minutes |
| frequent-restarts | Warning | Pod with >5 restarts in 24 hours (non crash-loop) |
| resource-underrequest | Info | Requests <40% of actual usage |
| evicted-pods | Info | Pods evicted from node due to pressure |
Each insight includes the affected resource, a human-readable message, and a specific suggestion with remediation steps or kubectl commands.
Cluster Map
Interactive topology visualization of resource relationships.
Layout Modes
- Grid — Compact grid of resources within namespace regions. Best for overview.
- Flow — Horizontal dependency chain: Ingress/Gateway → HTTPRoute → Service → Deployment → ReplicaSet → Pod. Best for understanding traffic flow.
In both modes, namespace regions are arranged in a grid of up to 3 columns. Regions are ReactFlow group nodes with child resource nodes.
Relationship Detection
| Relationship | Detection Method | Edge Type |
|---|---|---|
| Deployment → ReplicaSet → Pod | ownerReferences chain | owns |
| Service → Pods | Label selector matching | selects |
| Ingress → Service | spec.rules.http.paths.backend | routes |
| Gateway → HTTPRoute | HTTPRoute spec.parentRefs | routes |
| HTTPRoute → Service | spec.rules.backendRefs | routes |
| HPA → Deployment | spec.scaleTargetRef | hpa |
| Pod → PVC | spec.volumes.persistentVolumeClaim | mounts |
| PVC → PV | spec.volumeName | bound |
| Pod → ConfigMap/Secret | volumes + envFrom | mounts / envFrom |
AI Copilot
Talk to your cluster in natural language. Press ⌘J to open.
How it Works
The copilot sends your question to a configured LLM provider along with tool definitions that map to KubeBolt's REST API. The LLM calls tools to fetch live cluster data, analyzes the results, and responds with data-backed answers and kubectl commands.
17 Native Tools
| Tool | What it does |
|---|---|
get_cluster_overview | Resource counts, CPU/memory, health score, events |
list_resources | List any of 23 resource types with filtering |
get_resource_detail | Full detail of a specific resource |
get_resource_yaml | Raw YAML definition (secrets redacted) |
get_resource_describe | kubectl-describe output for deep troubleshooting |
get_pod_logs | Pod logs with container, tail, since and grep options |
get_workload_pods | Pods owned by a workload controller |
get_workload_history | Revision history for Deployments/StatefulSets/DaemonSets |
get_cronjob_jobs | Job children of a CronJob to investigate execution history |
get_topology | Full cluster topology graph |
get_insights | Active insights with severity |
get_events | Events with filtering |
search_resources | Global search by name across 16 resource types |
get_permissions | Detected RBAC permissions |
list_clusters | All available kubeconfig contexts |
get_kubebolt_docs | Product knowledge base (features, navigation, admin pages) |
Supported Providers
- Anthropic — Claude Sonnet 4.6, Claude Opus 4.6/4.7, Claude Haiku 4.5. Prompt caching on system prompt + tool definitions.
- OpenAI — GPT-5, GPT-5 Mini, GPT-4o, GPT-4o Mini. Automatic prompt caching,
max_completion_tokensfor reasoning models. - Self-hosted — Ollama, vLLM, Groq, DeepSeek (OpenAI-compatible).
Contextual "Ask Copilot"
Launches the Copilot panel with a pre-loaded prompt that already carries the cluster, namespace, resource name and symptom. Five surfaces:
- Insights — "Diagnose this insight and recommend a fix"
- Resource Detail (Pods, Deployments, StatefulSets, Services, Nodes) — "Investigate this resource"
- Events — button on every
Warningrow, "Explain this Kubernetes Warning event"
Templates live in services/copilot/triggers.ts and are versioned so the LLM never sees stale framing.
Conversation memory
Long sessions stop bleeding context. When the estimated conversation size crosses SESSION_BUDGET_TOKENS × AUTO_COMPACT_THRESHOLD (default 80%), the handler folds older turns into a summary generated by the provider's cheap-tier model (Haiku 4.5 / gpt-4o-mini) and stubs bulky tool_results in the preserved tail. The active turn's tool_results are always protected so mid-flight compacts never truncate a response.
A Scissors icon in the panel header exposes the same primitive on demand — "new session with summary" collapses the whole transcript into a single summary message so you can pivot topics without losing context.
| Env var | Default | Purpose |
|---|---|---|
KUBEBOLT_AI_AUTO_COMPACT | true | Master switch for auto-compaction |
KUBEBOLT_AI_SESSION_BUDGET_TOKENS | context window of the model | Total ceiling. Trigger fires at budget × threshold |
KUBEBOLT_AI_AUTO_COMPACT_THRESHOLD | 0.80 | Fraction of the budget at which compact fires |
KUBEBOLT_AI_COMPACT_MODEL | auto (cheap tier of same provider) | Override the summarisation model |
KUBEBOLT_AI_COMPACT_PRESERVE_TURNS | 3 | Turns kept intact after compaction |
Scope guardrail
The system prompt defines in-scope (Kubernetes operations, DevOps/SRE topics that support the user's cluster, KubeBolt itself) and out-of-scope (general coding unrelated to cluster resources, non-technical topics, competitor cloud products). The LLM refuses out-of-scope questions with a one-sentence polite redirect in the user's language — never answers partially.
Admin Copilot Usage
At /admin/copilot-usage: sessions, tokens billed, cache hit rate, estimated USD cost (best-effort pricing table for Anthropic and OpenAI), top tools with error rates, and a per-session drill-down modal with tool breakdown and compact events. Range selector 24h / 7d / 30d. Stored locally in BoltDB with a 30-day / 5000-entry retention cap. Requires authentication to be enabled.
RBAC & Permissions
KubeBolt auto-detects your kubeconfig's permissions and adapts automatically.
Permission Detection
- Uses
SelfSubjectAccessReviewAPI to testlistverb for 22 resource types - Two-phase: cluster-wide first, then namespace-level fallback for RoleBinding-based access
- Concurrent execution (semaphore of 10), completes in ~2-5 seconds
- If SSAR API itself is unavailable, falls back to assume full access
Access Levels
| Level | Backend | Frontend |
|---|---|---|
| Cluster-admin | All informers start normally | Full UI, no restrictions |
| Cluster read-only | Informers for permitted resources only | Restricted items dimmed, "Limited access" banner |
| Namespace-scoped | Per-namespace informer factories with multi-lister aggregation | Resources scoped to permitted namespaces |
Frontend Behavior
- "Limited access — showing X of Y resource types" banner
- Sidebar items dimmed with shield icon for restricted resources
- Summary cards show "No access" instead of "0"
- "No access to Nodes — capacity data unavailable" for node restrictions
PermissionDeniedcomponent for 403 resource pages
API Endpoint
GET /api/v1/cluster/permissions returns the full permission map per resource type with canList, canWatch, canGet, namespaceScoped, and namespaces fields.
Authentication
Built-in username/password authentication with role-based access control. No external identity provider required.
Overview
KubeBolt ships with a built-in auth system that supports three roles: Admin, Editor, and Viewer. Auth is enabled by default and uses BoltDB for user storage — no external database needed. Sessions are managed via JWT tokens stored in httpOnly cookies.
admin user is seeded automatically on first startup. The generated password is printed to the server logs. Change it immediately after first login.Roles
KubeBolt enforces three roles with increasing levels of access:
| Action | Viewer | Editor | Admin |
|---|---|---|---|
| View resources, metrics, topology, insights | Yes | Yes | Yes |
| View pod logs | Yes | Yes | Yes |
| Use AI Copilot (read-only tools) | Yes | Yes | Yes |
| Pod terminal (exec) | No | Yes | Yes |
| Edit YAML / Apply changes | No | Yes | Yes |
| Restart / Scale workloads | No | Yes | Yes |
| Delete resources | No | Yes | Yes |
| Port forwarding | No | Yes | Yes |
| Switch clusters | No | Yes | Yes |
| Manage users (create, edit, delete) | No | No | Yes |
| Change auth settings | No | No | Yes |
Session Management
- JWT tokens issued on login, stored in httpOnly secure cookies (not localStorage)
- Token expiry is configurable (default: 24 hours)
- Tokens are validated on every API request via auth middleware
- Logout invalidates the cookie client-side
Storage
User accounts are stored in a local BoltDB file. By default, the database is written to ./data/kubebolt.db. Use the KUBEBOLT_DATA_DIR environment variable to customize the storage path. In Kubernetes deployments, mount a PersistentVolume to this path for durability.
Environment Variables
| Variable | Default | Description |
|---|---|---|
KUBEBOLT_AUTH_ENABLED | true | Enable or disable authentication. Set to false to allow anonymous access. |
KUBEBOLT_ADMIN_PASSWORD | auto-generated | Override the default admin password on first boot. Ignored if admin user already exists. |
KUBEBOLT_JWT_SECRET | auto-generated | Secret key for signing JWT tokens. Auto-generated and persisted in BoltDB if not set. |
KUBEBOLT_DATA_DIR | ./data | Directory for BoltDB storage file. |
Helm Configuration
When deploying via Helm, configure auth through values:
# values.yaml auth: enabled: true adminPassword: "my-secure-password" # Or use an existing Kubernetes secret auth: enabled: true existingSecret: "kubebolt-auth-secret" # Secret must contain keys: admin-password, jwt-secret
# Install with inline password helm install kubebolt \ oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \ --set auth.adminPassword="my-secure-password" # Install with existing secret helm install kubebolt \ oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \ --set auth.existingSecret=kubebolt-auth-secret
Disabling Authentication
To run KubeBolt without authentication (e.g., behind a VPN or for local development):
# Local development KUBEBOLT_AUTH_ENABLED=false go run cmd/server/main.go --kubeconfig ~/.kube/config # Docker Compose (set in deploy/.env) KUBEBOLT_AUTH_ENABLED=false # Helm helm install kubebolt \ oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \ --set auth.enabled=false
Metrics Server
KubeBolt uses the Kubernetes Metrics Server for CPU/memory data.
How it Works
The Metrics Collector polls metrics.k8s.io/v1beta1 (PodMetrics and NodeMetrics) every 30 seconds. Results are stored in an in-memory cache — no database required.
Graceful Degradation
If Metrics Server is not installed, KubeBolt shows all resource state and events normally. CPU/memory bars display a message with a one-click install command. The Collector distinguishes between "not installed" and "403 Forbidden" via apierrors.IsForbidden().
Namespace-scoped Metrics
When cluster-wide metrics access is denied (namespace-scoped ServiceAccounts), the Collector falls back to per-namespace polling: PodMetricses(ns).List() for each accessible namespace.
Multi-Cluster
All kubeconfig contexts auto-discovered. Switch clusters in one click.
How it Works
The Cluster Manager reads all contexts from the kubeconfig file at startup. The initial connection targets the current-context. You can switch clusters at runtime via the API or the cluster selector in the UI.
Switching
When switching clusters, the manager tears down the old connector (informers, metrics collector, insights engine) and creates a new one for the target context. The permission probe runs again for the new cluster. The frontend shows a "Connecting to cluster" overlay during the switch.
API
GET /api/v1/clusters → list all contexts
POST /api/v1/clusters/switch → { "context": "production-eks" }
Notifications
Alert delivery to Slack, Discord and email when insights fire. Optional — disabled when no channel is configured.
Channels
Each channel activates when its connection details are set. You can enable one, two or all three at once.
- Slack — Block Kit messages with severity colour bar and a button that deep-links back to the resource in KubeBolt.
- Discord — Embed messages with equivalent formatting.
- Email (SMTP) — Plain/HTML email with three delivery modes:
instant,hourlydigest, ordailydigest. Supports multiple recipients.
Global settings
Cross-channel knobs that apply to every notifier:
| Env var | Default | Purpose |
|---|---|---|
KUBEBOLT_NOTIFICATIONS_ENABLED | true | Master kill switch — set false for maintenance windows. |
KUBEBOLT_NOTIFICATIONS_MIN_SEVERITY | warning | Insights below this threshold don't notify. Values: critical, warning, info. |
KUBEBOLT_NOTIFICATIONS_COOLDOWN | 1h | Dedup window — the same insight fires at most once per cooldown. |
KUBEBOLT_NOTIFICATIONS_BASE_URL | — | Public URL of your KubeBolt — embedded as a deep link in messages. |
KUBEBOLT_NOTIFICATIONS_INCLUDE_RESOLVED | false | Also notify when an insight transitions to resolved. |
Slack
# Create an Incoming Webhook in Slack, then: KUBEBOLT_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T.../B.../...
Discord
# Server Settings → Integrations → Webhooks → New Webhook, then: KUBEBOLT_DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...
KUBEBOLT_SMTP_HOST=smtp.example.com
KUBEBOLT_SMTP_PORT=587
KUBEBOLT_SMTP_USERNAME=alerts@example.com
KUBEBOLT_SMTP_PASSWORD=...
KUBEBOLT_SMTP_FROM=KubeBolt <alerts@example.com>
KUBEBOLT_SMTP_TO=sre@example.com,oncall@example.com
KUBEBOLT_SMTP_DIGEST_MODE=hourly # instant | hourly | daily
/admin/notifications page shows which channels are configured, surfaces the current global settings, and lets you send a test notification to each channel without restarting. Authenticated users with the Admin role only.REST API Reference
All endpoints under /api/v1.
Cluster
| Method | Endpoint | Description |
|---|---|---|
| GET | /clusters | List all kubeconfig contexts |
| POST | /clusters/switch | Switch active cluster |
| GET | /cluster/overview | Full cluster summary with counts, CPU/memory, health, events, workloads |
| GET | /cluster/permissions | Probed RBAC permissions per resource type |
Resources
| Method | Endpoint | Description |
|---|---|---|
| GET | /resources/:type | List with pagination (?limit=50), filtering (?namespace=, ?search=, ?status=) |
| GET | /resources/:type/:ns/:name | Detail with metrics injection |
| GET | /resources/:type/:ns/:name/yaml | Raw YAML (secrets redacted, managedFields stripped) |
| GET | /resources/pods/:ns/:name/logs | Pod logs (?container=, ?tailLines=100) |
| GET | /resources/:workload/:ns/:name/pods | Pods owned by deployment/statefulset/daemonset/job |
| GET | /resources/deployments/:ns/:name/history | Revision history via ReplicaSets |
Other
| Method | Endpoint | Description |
|---|---|---|
| GET | /topology | Full topology graph (nodes + edges) |
| GET | /insights | Active insights (?severity=critical,warning) |
| GET | /events | Events (?type=Warning, ?involvedName=, ?involvedKind=) |
| WS | /ws | WebSocket for real-time updates |
WebSocket Events
Real-time updates via /api/v1/ws.
Event Types
| Type | Description |
|---|---|
resource:updated | Kubernetes resource changed |
resource:deleted | Kubernetes resource removed |
event:new | New Kubernetes event |
insight:new | New insight detected |
insight:resolved | Insight resolved |
metrics:refresh | Metrics cache updated (every 30s) |
cluster.switched | Active cluster changed |
Broadcast buffer: 4096 messages. Messages dropped silently when no clients are connected to avoid log spam during cluster switches.
Amazon EKS Guide
Deploy KubeBolt on Amazon EKS clusters.
Authentication
- IRSA (IAM Roles for Service Accounts) — recommended for in-cluster deployment
- Pod Identity — newer EKS authentication method, also supported
- AWS CLI — for local/Docker deployment, mount
~/.awswith active SSO session
Docker Compose
The compose file already mounts ~/.aws for EKS token generation. Ensure your AWS profile/SSO session is active:
kubectl config use-context my-eks-cluster
cd deploy && docker compose up -d
Helm with ALB Ingress
helm install kubebolt \
oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \
--set ingress.enabled=true \
--set ingress.className=alb \
--set ingress.host=kubebolt.example.com
EKS Fargate is compatible — KubeBolt doesn't require DaemonSets in Phase 1.
Google GKE Guide
Deploy KubeBolt on Google Kubernetes Engine.
Authentication
- Workload Identity — recommended for in-cluster deployments
- GKE comes with Metrics Server pre-installed
Helm with GCE Ingress
helm install kubebolt \
oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \
--set ingress.enabled=true \
--set ingress.className=gce
GKE Autopilot clusters are compatible — no privileged containers needed.
Azure AKS Guide
Deploy KubeBolt on Azure Kubernetes Service.
Authentication
- Azure AD Workload Identity — recommended for in-cluster deployments
- Azure RBAC integration via kubeconfig
Helm with AGIC
helm install kubebolt \
oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt \
--set ingress.enabled=true \
--set ingress.className=azure-application-gateway
Docker Desktop Guide
Run KubeBolt with Docker Desktop's built-in Kubernetes.
The Problem
Docker Desktop K8s uses 127.0.0.1:6443 as the API server address. This works from your host machine, but not from inside a container (which has its own localhost).
The Solution
Use the helper script to rewrite the kubeconfig to use kubernetes.docker.internal instead:
# 1. Enable Kubernetes in Docker Desktop → Settings → Kubernetes → Enable # 2. Switch context kubectl config use-context docker-desktop # 3. Generate container-compatible kubeconfig ./deploy/docker-kubeconfig.sh # 4. Start cd deploy && docker compose up -d
Metrics Server
Docker Desktop doesn't include Metrics Server. Install it manually:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Roadmap
Where KubeBolt is heading.
Phase 1.0 — Core Platform ✅
23 resource views, multi-cluster, Gateway API, insights engine, cluster map, RBAC detection, Docker Compose.
Phase 1.3 — Terminal & Actions ✅
Pod terminal (xterm.js), port-forwarding, restart/scale, describe, YAML editing (CodeMirror 6), delete, global search (⌘K).
Phase 1.4 — File Browser & History ✅
Exec-based file browser, StatefulSet/DaemonSet history via ControllerRevisions, CronJob child jobs, YAML export.
Phase 1.5 — Distribution ✅
Helm chart (OCI), multi-arch images (amd64/arm64), GitHub releases, Artifact Hub, cloud guides.
Phase 1.6 — Animated Map & Notifications ✅
Draggable map nodes with animated pulse halos, cluster management UI, Slack/Discord/email notifications with digest modes, global settings (master toggle, base URL, resolved-insight alerts).
Phase 1.7 — Authentication ✅
Built-in username/password auth with Admin/Editor/Viewer roles, user management UI, JWT sessions, BoltDB storage.
Phase 1.8 — AI Copilot ✅
In-app AI assistant with 17 cluster tools, multi-provider (Claude, GPT, Ollama, vLLM), BYO API key, SSE streaming, fallback model, structured logging with per-session token accounting.
Phase 1.9 — Extended Distribution ✅
Single binary with embedded frontend, Homebrew tap, Docker single-container, kubectl plugin (krew).
Phase 1.5.x — Copilot maturity ✅
Contextual "Ask Copilot" buttons across insights, resource detail pages and warning events. Conversation memory with auto-compact at 80% of the budget (via cheap-tier model) and manual "new session with summary". Admin usage analytics page with cost estimates (BoltDB-backed, 30-day retention). Product knowledge base tool (get_kubebolt_docs) and scope guardrail. Prompt caching for Anthropic and OpenAI.
Phase 2.0 — OTel-native agent
Lightweight DaemonSet agent as an optimized distribution of the OpenTelemetry Collector. <1% CPU, <50MB RAM per node, <40MB binary. Exports in parallel to the customer's OTel backend — citizen of the ecosystem, not a silo.
Phase 2.1 — Hierarchical AI agents
Six specialized layers with different autonomy and compute budgets: deterministic detectors → router (Haiku) → investigator (Sonnet) → planner → deterministic executor → postmortem. 70% resolved without AI, 25% with cheap AI, 5% with expensive AI.
Phase 3.0 — SaaS Platform
Multi-tenant platform with OAuth/SSO, team collaboration, custom dashboards, billing.
Contributing
KubeBolt is open source under the MIT license. Contributions are welcome.
Development Setup
Requires Go 1.25+ and Node 20+.
# Clone git clone https://github.com/clm-cloud-solutions/kubebolt.git cd kubebolt # Backend cd apps/api && go run cmd/server/main.go --kubeconfig ~/.kube/config # Frontend (separate terminal) cd apps/web && npm install && npm run dev
CI
GitHub Actions on push/PR to main:
- Backend:
go build ./...(Go 1.25, ubuntu-latest) - Frontend:
npm ci && npm run build(Node 20, ubuntu-latest)
Repository Structure
kubebolt/ ├── apps/api/ # Go backend ├── apps/web/ # React frontend ├── packages/agent/ # Phase 2 node agent (stub) ├── packages/shared/ # Shared Go utilities ├── deploy/ # Docker Compose + Helm + scripts ├── docs/ # SPEC.md + images ├── go.work # Go workspace ├── CLAUDE.md # Claude Code context └── README.md