Skip to content

Reconciliation

Reconciliation compares desired state (from config) to running state (from Docker) and makes them match. This is the core of what Holden does when processing an app from the queue.

When an app is taken from the queue, Holden first checks whether anything has changed. If not, the app is skipped entirely.

flowchart TD
    A["Quick check: containers, git, image digests"] --> B{Changed?}
    B -->|No| C["Skip"]
    B -->|Yes| D["Fetch config from git"]
    D --> E["Resolve variables"]
    E --> F["Compare desired vs running"]
    F --> G["Apply changes"]

Before doing a full reconcile, Holden checks for changes cheaply:

  1. Container state — Are all expected containers running? If any are missing or stopped, reconcile.
  2. Git changesgit ls-remote compares the remote HEAD to the last deployed commit. No clone needed.
  3. Image digests — For apps with update_policy: always (the default), check the registry for newer images.

If all checks pass, the app is skipped — no clone, no diff, no Docker API calls.

The quick check only runs when Holden has already reconciled the app at least once (it needs a previous state to compare against). First-time reconciliations always go through the full process.

Git network errors trigger a full reconcile (fail-open — don’t skip when you can’t verify). Image digest errors skip the check (fail-closed — don’t trigger a reconcile you might not need).

When the quick check detects changes (or can’t run), Holden does a full reconcile:

  1. Fetch config from git (sparse checkout — only holden.yml and holden.vars.yml)
  2. Reconcile needs (postgres, valkey, etc.) — health-checked before proceeding
  3. Resolve variables (${needs.*}, ${config.*}, ${secret.*})
  4. Pull images — force-pull when update_policy is always, skip if image exists locally otherwise
  5. Compare desired state to running state
  6. Apply changes

Webhooks, holden deploy, and holden deploy <app> always trigger a full reconcile with force-pull — they bypass the quick check.

For each service in the config, Holden compares desired state to running state:

SituationAction
New service (not running)Pull image, create container
Config changedZero-downtime update if health check defined, otherwise stop-and-recreate
New image availableZero-downtime update if health check defined, otherwise stop-and-recreate
No changesNothing
Service removed from configRemove container

Running state is discovered via labels — containers with holden.managed=true.

Needs containers (postgres, valkey, etc.) are reconciled first and must pass health checks before services start. Holden waits up to 45 seconds for each needs container to become healthy — if the timeout expires, the deploy fails. This ensures databases are ready before your app tries to connect.

All containers are created with restart: unless-stopped — this isn’t configurable. Holden manages services that should be running; the restart policy handles what Docker does between reconciliation runs.

For one-shot tasks, use docker run directly.

The queue worker processes apps sequentially. If app A takes a long time (slow image pull, hanging health check), apps B, C, D wait their turn.

If a webhook or poll timer pushes app A again while it’s still being processed, the push is a no-op (queue is deduplicated). Once the current reconciliation finishes, app A won’t be in the queue unless something pushed it again after completion.

If Holden crashes mid-deploy, it may leave a -next container behind (from a zero-downtime deployment). On restart:

  1. Holden boots and queues all apps
  2. When the app is processed, reconciliation sees the -next container
  3. Removes it as a leftover
  4. Proceeds with a fresh deploy

The old container keeps running throughout — no downtime from the crash. The deploy just restarts from the beginning.

Holden re-queues all apps every HOLDEN_POLL_INTERVAL seconds (default: 300). Set to 0 to disable.

For users with webhooks, polling acts as a safety net — catching anything webhooks might have missed (network issues, GitHub outages) and detecting drift from external changes (someone manually stopped a container).

For users without webhooks, polling is the primary trigger for detecting changes.