- Go 98.4%
- Makefile 1.2%
- Dockerfile 0.4%
|
|
||
|---|---|---|
| cmd/khepri-operator | ||
| internal | ||
| .dockerignore | ||
| .gitignore | ||
| .woodpecker.yml | ||
| Dockerfile | ||
| go.mod | ||
| go.sum | ||
| Makefile | ||
| README.md | ||
| renovate.json | ||
| sonar-project.properties | ||
| title | description | author | ms.date | ms.topic | keywords | estimated_reading_time | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Khepri Operator | Myth-inspired Kubernetes operator that restarts opted-in crash-looping workloads by deleting stuck pods after a configurable restart threshold and cooldown | GitHub Copilot | 2026-04-05 | overview |
|
6 |
Overview
Khepri Operator is a small Kubernetes operator built with controller-runtime. It watches pods and deletes only the pods you explicitly opt in through an annotation when they remain in CrashLoopBackOff beyond a configured restart threshold.
The repository is intentionally lightweight. It does not include full Kubebuilder scaffolding, and it stays focused on the operator codebase itself.
Deployment manifests and cluster-specific rollout configuration live in a separate repository. This repository contains the operator source, tests, build logic, and CI pipeline.
Project layout
The repository follows the common Go convention of keeping the executable entrypoint under cmd/ and implementation details under internal/.
cmd/khepri-operatorcontains the process entrypoint and CLI wiringinternal/configcontains configuration defaults, environment loading, and validationinternal/controllercontains the pod reconciliation logic and unit tests
How it works
Khepri targets only workloads that explicitly opt in through a pod-template annotation. When a managed pod is stuck in CrashLoopBackOff and has crossed the configured restart threshold, Khepri deletes the pod so its owner can recreate it.
Cooldown state is tracked in the operator process and keyed by owner when possible. That prevents rapid delete loops against the same workload while keeping the opt-in annotation purely declarative.
Safety model
The operator is conservative by default.
- It only acts on pods annotated with
khepri.io/restart-on-crashloop=true - It requires a minimum restart count before deleting a pod
- It applies a cooldown window per controller owner or pod key
- It exposes health and readiness probes
- It supports leader election
Important
Deleting a crash-looping pod does not fix a broken image, bad configuration, or missing secret. Use this operator only for workloads where recreating the pod is an accepted remediation step.
Configuration
You can configure the operator with flags or environment variables.
| Setting | Flag | Environment variable | Default |
|---|---|---|---|
| Metrics bind address | --metrics-bind-address |
METRICS_BIND_ADDRESS |
:8080 |
| Health probe address | --health-probe-bind-address |
HEALTH_PROBE_BIND_ADDRESS |
:8081 |
| Watch namespace | --watch-namespace |
WATCH_NAMESPACE |
empty, all namespaces |
| Opt-in annotation key | --target-annotation |
TARGET_ANNOTATION |
khepri.io/restart-on-crashloop |
| Opt-in annotation value | --target-annotation-value |
TARGET_ANNOTATION_VALUE |
true |
| Cooldown | --cooldown |
COOLDOWN |
2m |
| Restart threshold | --restart-threshold |
RESTART_THRESHOLD |
5 |
| Leader election | --leader-elect |
LEADER_ELECTION |
false |
| Development logging | --development-logging |
DEVELOPMENT_LOGGING |
false |
Local development
Prerequisites
- Go 1.26 or newer
Build and test
make fmt
make vet
make test
make build
Run the operator locally
The operator uses your current Kubernetes context.
go run ./cmd/khepri-operator --development-logging=true
If you want to limit the operator to one namespace:
go run ./cmd/khepri-operator --watch-namespace=default --development-logging=true
Opt-in annotation
The operator ignores pods unless the workload's pod template carries the opt-in annotation.
This annotation is not used for cooldown tracking. It is only a stable configuration flag that tells Khepri which workloads it is allowed to touch.
Default key and value:
khepri.io/restart-on-crashloop=true
CI pipeline
The Woodpecker pipeline covers source validation, Sonar analysis, and publish packaging.
fmtfails ifgofmtwould rewrite tracked Go filesvetruns static analysis with the Go toolchaintestexecutes unit tests and generates a coverage profilesonarpublishes code quality and coverage results from the generated Go coverage profilebuildconfirms the Linux manager binary can be produceddockerpackages the CI-built binary into the runtime image for publish events