No description
  • Go 98.4%
  • Makefile 1.2%
  • Dockerfile 0.4%
Find a file
nyyu c4a929bb68
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
chore: renovate config
2026-04-16 20:42:43 +02:00
cmd/khepri-operator feat(main): implement Khepri operator entrypoint and configuration management 2026-04-05 14:25:40 +02:00
internal feat(main): implement Khepri operator entrypoint and configuration management 2026-04-05 14:25:40 +02:00
.dockerignore feat: init 2026-04-05 10:52:25 +02:00
.gitignore feat: init 2026-04-05 10:52:25 +02:00
.woodpecker.yml feat(main): implement Khepri operator entrypoint and configuration management 2026-04-05 14:25:40 +02:00
Dockerfile feat: init 2026-04-05 10:52:25 +02:00
go.mod feat(go.mod): update dependencies 2026-04-16 20:33:58 +02:00
go.sum feat(go.mod): update dependencies 2026-04-16 20:33:58 +02:00
Makefile feat(main): implement Khepri operator entrypoint and configuration management 2026-04-05 14:25:40 +02:00
README.md feat(main): implement Khepri operator entrypoint and configuration management 2026-04-05 14:25:40 +02:00
renovate.json chore: renovate config 2026-04-16 20:42:43 +02:00
sonar-project.properties feat(main): implement Khepri operator entrypoint and configuration management 2026-04-05 14:25:40 +02:00

title description author ms.date ms.topic keywords estimated_reading_time
Khepri Operator Myth-inspired Kubernetes operator that restarts opted-in crash-looping workloads by deleting stuck pods after a configurable restart threshold and cooldown GitHub Copilot 2026-04-05 overview
kubernetes
operator
crashloopbackoff
controller-runtime
6

Overview

Khepri Operator is a small Kubernetes operator built with controller-runtime. It watches pods and deletes only the pods you explicitly opt in through an annotation when they remain in CrashLoopBackOff beyond a configured restart threshold.

The repository is intentionally lightweight. It does not include full Kubebuilder scaffolding, and it stays focused on the operator codebase itself.

Deployment manifests and cluster-specific rollout configuration live in a separate repository. This repository contains the operator source, tests, build logic, and CI pipeline.

Project layout

The repository follows the common Go convention of keeping the executable entrypoint under cmd/ and implementation details under internal/.

  • cmd/khepri-operator contains the process entrypoint and CLI wiring
  • internal/config contains configuration defaults, environment loading, and validation
  • internal/controller contains the pod reconciliation logic and unit tests

How it works

Khepri targets only workloads that explicitly opt in through a pod-template annotation. When a managed pod is stuck in CrashLoopBackOff and has crossed the configured restart threshold, Khepri deletes the pod so its owner can recreate it.

Cooldown state is tracked in the operator process and keyed by owner when possible. That prevents rapid delete loops against the same workload while keeping the opt-in annotation purely declarative.

Safety model

The operator is conservative by default.

  • It only acts on pods annotated with khepri.io/restart-on-crashloop=true
  • It requires a minimum restart count before deleting a pod
  • It applies a cooldown window per controller owner or pod key
  • It exposes health and readiness probes
  • It supports leader election

Important

Deleting a crash-looping pod does not fix a broken image, bad configuration, or missing secret. Use this operator only for workloads where recreating the pod is an accepted remediation step.

Configuration

You can configure the operator with flags or environment variables.

Setting Flag Environment variable Default
Metrics bind address --metrics-bind-address METRICS_BIND_ADDRESS :8080
Health probe address --health-probe-bind-address HEALTH_PROBE_BIND_ADDRESS :8081
Watch namespace --watch-namespace WATCH_NAMESPACE empty, all namespaces
Opt-in annotation key --target-annotation TARGET_ANNOTATION khepri.io/restart-on-crashloop
Opt-in annotation value --target-annotation-value TARGET_ANNOTATION_VALUE true
Cooldown --cooldown COOLDOWN 2m
Restart threshold --restart-threshold RESTART_THRESHOLD 5
Leader election --leader-elect LEADER_ELECTION false
Development logging --development-logging DEVELOPMENT_LOGGING false

Local development

Prerequisites

  • Go 1.26 or newer

Build and test

make fmt
make vet
make test
make build

Run the operator locally

The operator uses your current Kubernetes context.

go run ./cmd/khepri-operator --development-logging=true

If you want to limit the operator to one namespace:

go run ./cmd/khepri-operator --watch-namespace=default --development-logging=true

Opt-in annotation

The operator ignores pods unless the workload's pod template carries the opt-in annotation.

This annotation is not used for cooldown tracking. It is only a stable configuration flag that tells Khepri which workloads it is allowed to touch.

Default key and value:

khepri.io/restart-on-crashloop=true

CI pipeline

The Woodpecker pipeline covers source validation, Sonar analysis, and publish packaging.

  • fmt fails if gofmt would rewrite tracked Go files
  • vet runs static analysis with the Go toolchain
  • test executes unit tests and generates a coverage profile
  • sonar publishes code quality and coverage results from the generated Go coverage profile
  • build confirms the Linux manager binary can be produced
  • docker packages the CI-built binary into the runtime image for publish events