khepri-operator

No description

Go 98.4%
Makefile 1.2%
Dockerfile 0.4%

Find a file

nyyu 02a0dee747 All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details ✨ feat(go.mod): update Kubernetes dependencies to v0.36.1 and other indirect dependencies		2026-06-07 20:16:16 +02:00
cmd/khepri-operator	✨ feat(main): implement Khepri operator entrypoint and configuration management	2026-04-05 14:25:40 +02:00
internal	✨ feat(main): implement Khepri operator entrypoint and configuration management	2026-04-05 14:25:40 +02:00
.dockerignore	feat: init	2026-04-05 10:52:25 +02:00
.gitignore	feat: init	2026-04-05 10:52:25 +02:00
.woodpecker.yml	✨ feat(main): implement Khepri operator entrypoint and configuration management	2026-04-05 14:25:40 +02:00
Dockerfile	feat: init	2026-04-05 10:52:25 +02:00
go.mod	✨ feat(go.mod): update Kubernetes dependencies to v0.36.1 and other indirect dependencies	2026-06-07 20:16:16 +02:00
go.sum	✨ feat(go.mod): update Kubernetes dependencies to v0.36.1 and other indirect dependencies	2026-06-07 20:16:16 +02:00
Makefile	✨ feat(main): implement Khepri operator entrypoint and configuration management	2026-04-05 14:25:40 +02:00
README.md	✨ feat(main): implement Khepri operator entrypoint and configuration management	2026-04-05 14:25:40 +02:00
renovate.json	chore: renovate config	2026-04-16 20:42:43 +02:00
sonar-project.properties	✨ feat(main): implement Khepri operator entrypoint and configuration management	2026-04-05 14:25:40 +02:00

README.md

title

description

author

ms.date

ms.topic

keywords

estimated_reading_time

Khepri Operator

Myth-inspired Kubernetes operator that restarts opted-in crash-looping workloads by deleting stuck pods after a configurable restart threshold and cooldown

GitHub Copilot

2026-04-05

overview

kubernetes

operator

crashloopbackoff

controller-runtime

Overview

Khepri Operator is a small Kubernetes operator built with controller-runtime. It watches pods and deletes only the pods you explicitly opt in through an annotation when they remain in CrashLoopBackOff beyond a configured restart threshold.

The repository is intentionally lightweight. It does not include full Kubebuilder scaffolding, and it stays focused on the operator codebase itself.

Deployment manifests and cluster-specific rollout configuration live in a separate repository. This repository contains the operator source, tests, build logic, and CI pipeline.

Project layout

The repository follows the common Go convention of keeping the executable entrypoint under cmd/ and implementation details under internal/.

cmd/khepri-operator contains the process entrypoint and CLI wiring
internal/config contains configuration defaults, environment loading, and validation
internal/controller contains the pod reconciliation logic and unit tests

How it works

Khepri targets only workloads that explicitly opt in through a pod-template annotation. When a managed pod is stuck in CrashLoopBackOff and has crossed the configured restart threshold, Khepri deletes the pod so its owner can recreate it.

Cooldown state is tracked in the operator process and keyed by owner when possible. That prevents rapid delete loops against the same workload while keeping the opt-in annotation purely declarative.

Safety model

The operator is conservative by default.

It only acts on pods annotated with khepri.io/restart-on-crashloop=true
It requires a minimum restart count before deleting a pod
It applies a cooldown window per controller owner or pod key
It exposes health and readiness probes
It supports leader election

Important

Deleting a crash-looping pod does not fix a broken image, bad configuration, or missing secret. Use this operator only for workloads where recreating the pod is an accepted remediation step.

Configuration

You can configure the operator with flags or environment variables.

Setting	Flag	Environment variable	Default
Metrics bind address	`--metrics-bind-address`	`METRICS_BIND_ADDRESS`	`:8080`
Health probe address	`--health-probe-bind-address`	`HEALTH_PROBE_BIND_ADDRESS`	`:8081`
Watch namespace	`--watch-namespace`	`WATCH_NAMESPACE`	empty, all namespaces
Opt-in annotation key	`--target-annotation`	`TARGET_ANNOTATION`	`khepri.io/restart-on-crashloop`
Opt-in annotation value	`--target-annotation-value`	`TARGET_ANNOTATION_VALUE`	`true`
Cooldown	`--cooldown`	`COOLDOWN`	`2m`
Restart threshold	`--restart-threshold`	`RESTART_THRESHOLD`	`5`
Leader election	`--leader-elect`	`LEADER_ELECTION`	`false`
Development logging	`--development-logging`	`DEVELOPMENT_LOGGING`	`false`

Local development

Prerequisites

Go 1.26 or newer

Build and test

make fmt
make vet
make test
make build

Run the operator locally

The operator uses your current Kubernetes context.

go run ./cmd/khepri-operator --development-logging=true

If you want to limit the operator to one namespace:

go run ./cmd/khepri-operator --watch-namespace=default --development-logging=true

Opt-in annotation

The operator ignores pods unless the workload's pod template carries the opt-in annotation.

This annotation is not used for cooldown tracking. It is only a stable configuration flag that tells Khepri which workloads it is allowed to touch.

Default key and value:

khepri.io/restart-on-crashloop=true

CI pipeline

The Woodpecker pipeline covers source validation, Sonar analysis, and publish packaging.

fmt fails if gofmt would rewrite tracked Go files
vet runs static analysis with the Go toolchain
test executes unit tests and generates a coverage profile
sonar publishes code quality and coverage results from the generated Go coverage profile
build confirms the Linux manager binary can be produced
docker packages the CI-built binary into the runtime image for publish events