Overseer
The Overseer is an ephemeral container that Holden spawns when it needs to recreate itself. Since Docker containers can’t modify their own labels or pull a fresh image while running, Holden delegates this to a short-lived Overseer process.
The Overseer is the same Docker image as Holden, started with the --overseer flag.
Idle Mode
Section titled “Idle Mode”When Holden spawns an Overseer, it sets an in-memory idle flag and stops reconciling apps. This is just a variable — no container renames or Docker API calls needed.
If Holden restarts while idle (process crash, server reboot), the flag resets. Holden boots fresh and checks for a running holden-overseer or stale holden-next to decide whether to spawn a new Overseer.
Container Names
Section titled “Container Names”During the update process:
| Name | Role |
|---|---|
holden | Current orchestrator (idle while Overseer works) |
holden-next | Replacement being health-checked |
holden-overseer | Ephemeral Overseer process |
This follows the same -next convention used for zero-downtime app deployments.
When Holden Spawns an Overseer
Section titled “When Holden Spawns an Overseer”Initial boot - When Holden starts and wasn’t created by the Overseer (no holden.created-at label), it spawns an Overseer to recreate itself with the correct labels.
Label changes - Docker labels are immutable on running containers. If you change HOLDEN_PUBLIC_DOMAIN, Holden needs new Traefik labels. The Overseer recreates Holden with the correct labels.
Maintenance - After cleanup, Holden always spawns an Overseer to recreate itself, even if no new image is available. This ensures a fresh state and queues all apps for reconciliation.
Recovery - If Holden boots and finds a holden-next container or running holden-overseer, a previous update was interrupted. It spawns an Overseer to clean up and retry.
The Overseer container is always named holden-overseer. Docker prevents duplicate container names, so only one Overseer can run at a time. If Holden tries to spawn an Overseer but one already exists, it knows an update is already in progress and enters idle mode.
Trust Model
Section titled “Trust Model”The Overseer doesn’t trust instructions from Holden. It inspects the current Holden container, reads the environment variables, and derives the correct configuration independently.
Bootstrap Data
Section titled “Bootstrap Data”The Overseer gets everything by inspecting the current Holden container:
| Data | Source | Why |
|---|---|---|
| Environment vars | Container inspection | HOLDEN_PUBLIC_DOMAIN, HOLDEN_TRAEFIK_*, GITHUB_PAT, etc. |
| Image tag | Container inspection | e.g., holden:latest |
| Volume mounts | Container inspection | Docker socket, /data |
| Network memberships | Container inspection | Traefik network, default network |
| Port bindings | Container inspection | 6020:6020, 127.0.0.1:6021:6021 |
Labels (Traefik config) are derived from the environment variables.
The Algorithm
Section titled “The Algorithm”sequenceDiagram
participant H as Holden
participant O as Overseer
participant D as Docker
participant G as Git
H->>H: Set idle flag
H->>D: Spawn holden-overseer container
O->>D: Find container with holden.orchestrator=true
O->>D: Inspect → get env vars, image tag, port bindings
O->>O: Parse env vars → get public_domain
O->>O: Generate Traefik labels
loop Until health check passes
O->>D: Remove holden-next if exists (cleanup)
O->>D: Pull fresh image
Note over O,D: Phase 1 — Validate
O->>D: Create holden-next (no port bindings)
O->>D: Wait for health check
alt Health check fails
O->>D: Remove holden-next
O->>O: Wait (exponential backoff, 1h cap)
end
end
O->>D: Remove holden-next
Note over O,D: Phase 2 — Swap
O->>D: Stop and remove holden
Note over H: Old Holden gone
O->>D: Create holden (with port bindings)
O->>D: Start holden
O->>O: Exit
The Overseer uses a two-phase approach to avoid port binding conflicts:
Phase 1 — Validate: Create holden-next without port bindings to confirm the image is healthy. The old holden container still holds the ports during validation. Once health check passes, remove holden-next.
Phase 2 — Swap: Stop and remove the old holden container (freeing the ports), then create a new holden container with the correct labels, env, volumes, networks, and port bindings replicated from the original.
There is a brief downtime (seconds) between removing the old container and starting the new one. Port bindings survive because they’re copied from the original container’s inspection data.
Step by step:
- Holden enters idle mode - Sets in-memory idle flag, spawns
holden-overseercontainer with--overseerflag - Find current Holden - By
holden.orchestrator=truelabel - Inspect container - Get env vars, image tag, volumes, networks, and port bindings
- Parse env vars - Get
HOLDEN_PUBLIC_DOMAIN, generate Traefik labels - Cleanup - Remove
holden-nextif it exists (from a previous failed run) - Pull fresh image - Same tag, but latest digest (force pull)
- Create
holden-next- Without port bindings (validation only) - Wait for health check - If it fails: remove
holden-next, wait (exponential backoff: 10s initial, 2x multiplier, 1h cap), retry from step 5 - Remove
holden-next- Validation passed, no longer needed - Stop and remove old
holden- Frees port bindings - Create new
holden- With correct image, labels, env, volumes, networks, and port bindings - Overseer exits - Only after success (or if manually stopped)
The Overseer never gives up. If health checks keep failing, it retries forever (at 1h intervals once capped).
If the Overseer is stopped or crashes, the original holden container is still running with its correct name — no recovery needed. The stale holden-next container (if any) gets cleaned up on the next Overseer run.
Bootstrap Requirements
Section titled “Bootstrap Requirements”Your initial docker-compose must include the discovery label:
services: holden: image: benjick/holden:latest labels: - "holden.orchestrator=true" # Required for Overseer # ... rest of configWithout this label, the Overseer can’t find the Holden container to replace.
Updating docker-compose.yml
Section titled “Updating docker-compose.yml”After the first boot, the running holden container is managed by the Overseer, not by Compose. If you change your docker-compose.yml (e.g., add an environment variable), Compose won’t update the running container because it doesn’t recognize it as its own.
To apply changes:
docker rm -f holden && docker compose up -d --pull always holdenThis removes the Overseer-managed container (freeing the name), lets Compose create a fresh one with the updated config, and the Overseer will recreate it with the correct labels.
Why This Design
Section titled “Why This Design”Labels are immutable - Docker doesn’t allow modifying labels on a running container. The only way to change labels is to recreate the container.
Same pattern as app deploys - The Overseer uses the same -next convention as zero-downtime app deployments. Create the replacement, health-check it, then swap.
Clean recovery - If the Overseer crashes, the original holden container is still running with its correct name. No rename-back or special recovery logic needed — just clean up holden-next and retry.
Automatic rollback - If holden-next never becomes healthy, the Overseer removes it and retries. The original holden is untouched until the new one is confirmed healthy.
Git as source of truth - By always reading from git, the Overseer guarantees correctness. It doesn’t matter what state Holden was in or what instructions it tried to pass.
Generated Labels
Section titled “Generated Labels”The Overseer adds these labels to the new Holden container:
Creation timestamp - holden.created-at=<ISO timestamp> marks when this container was created. Its presence tells Holden it was properly created by the Overseer.
Traefik labels - When HOLDEN_PUBLIC_DOMAIN is set:
traefik.enable: "true"traefik.http.routers.holden.rule: "Host(`holden.example.com`)"traefik.http.routers.holden.entrypoints: "websecure"traefik.http.routers.holden.tls.certresolver: "letsencrypt"traefik.http.services.holden.loadbalancer.server.port: "6020"This routes webhook traffic from Traefik to Holden’s port 6020. The entrypoint and certresolver names are configurable. The internal API (6021) remains bound to localhost only, accessible via the CLI.