Reconciliation

Reconciliation compares desired state (from config) to running state (from Docker) and makes them match. This is the core of what Holden does when processing an app from the queue.

The Process

When an app is taken from the queue, Holden first checks whether anything has changed. If not, the app is skipped entirely.

flowchart TD
    A["Quick check: containers, git, image digests"] --> B{Changed?}
    B -->|No| C["Skip"]
    B -->|Yes| D["Fetch config from git"]
    D --> E["Resolve variables"]
    E --> F["Compare desired vs running"]
    F --> G["Apply changes"]

Quick Check

Before doing a full reconcile, Holden checks for changes cheaply:

Container state — Are all expected containers running? If any are missing or stopped, reconcile.
Git changes — git ls-remote compares the remote HEAD to the last deployed commit. No clone needed.
Image digests — For apps with update_policy: always (the default), check the registry for newer images.

If all checks pass, the app is skipped — no clone, no diff, no Docker API calls.

The quick check only runs when Holden has already reconciled the app at least once (it needs a previous state to compare against). First-time reconciliations always go through the full process.

Git network errors trigger a full reconcile (fail-open — don’t skip when you can’t verify). Image digest errors skip the check (fail-closed — don’t trigger a reconcile you might not need).

Full Reconcile

When the quick check detects changes (or can’t run), Holden does a full reconcile:

Fetch config from git (sparse checkout — only holden.yml and holden.vars.yml)
Reconcile needs (postgres, valkey, etc.) — health-checked before proceeding
Resolve variables (${needs.*}, ${config.*}, ${secret.*})
Pull images — force-pull when update_policy is always, skip if image exists locally otherwise
Compare desired state to running state
Apply changes

Webhooks, holden deploy, and holden deploy <app> always trigger a full reconcile with force-pull — they bypass the quick check.

Comparing State

For each service in the config, Holden compares desired state to running state:

Situation	Action
New service (not running)	Pull image, create container
Config changed	Zero-downtime update if health check defined, otherwise stop-and-recreate
New image available	Zero-downtime update if health check defined, otherwise stop-and-recreate
No changes	Nothing
Service removed from config	Remove container

Running state is discovered via labels — containers with holden.managed=true.

Needs First

Needs containers (postgres, valkey, etc.) are reconciled first and must pass health checks before services start. Holden waits up to 45 seconds for each needs container to become healthy — if the timeout expires, the deploy fails. This ensures databases are ready before your app tries to connect.

Restart Policy

All containers are created with restart: unless-stopped — this isn’t configurable. Holden manages services that should be running; the restart policy handles what Docker does between reconciliation runs.

For one-shot tasks, use docker run directly.

Queue Isolation

The queue worker processes apps sequentially. If app A takes a long time (slow image pull, hanging health check), apps B, C, D wait their turn.

If a webhook or poll timer pushes app A again while it’s still being processed, the push is a no-op (queue is deduplicated). Once the current reconciliation finishes, app A won’t be in the queue unless something pushed it again after completion.

Crash Recovery

If Holden crashes mid-deploy, it may leave a -next container behind (from a zero-downtime deployment). On restart:

Holden boots and queues all apps
When the app is processed, reconciliation sees the -next container
Removes it as a leftover
Proceeds with a fresh deploy

The old container keeps running throughout — no downtime from the crash. The deploy just restarts from the beginning.

Polling

Holden re-queues all apps every HOLDEN_POLL_INTERVAL seconds (default: 300). Set to 0 to disable.

For users with webhooks, polling acts as a safety net — catching anything webhooks might have missed (network issues, GitHub outages) and detecting drift from external changes (someone manually stopped a container).

For users without webhooks, polling is the primary trigger for detecting changes.