Skip to content

Maintenance

Holden runs a nightly maintenance window that stops all containers, backs up data, and ensures all apps are reconciled with their latest configuration.

Every app stops during the maintenance window, even if there’s nothing to backup:

  • Consistent backups — Databases and volumes are backed up while stopped, ensuring data consistency
  • Clears memory leaks — Periodic restarts prevent gradual memory bloat
  • Forces stateless design — If your app can’t handle restarts, you’ll find out during maintenance, not during a crisis
  • Predictable window — All restarts happen at 3am (or whenever you schedule), not randomly throughout the day

When maintenance starts:

  1. Set isMaintenance = true flag
  2. Queue worker sees the flag and stops picking up new jobs
  3. Wait for any in-flight job to complete (e.g., a deployment mid-healthcheck)
  4. Proceed with maintenance

The flag is never cleared — this Holden instance will die at the end of maintenance anyway. The fresh instance starts with the flag unset.

Holden processes apps one at a time to minimize total downtime:

  1. Stop containers
  2. Run backups
  3. Pull new image (if update_policy: during_maintenance) — caches it for later
  4. Start containers

After all apps are processed:

  1. Run cleanup (see below)
  2. Holden reboots via Overseer

The fresh Holden instance queues all apps on boot, triggering reconciliation. Apps with pre-pulled images (step 3) get updated; others are verified against their current config.

Maintenance is triggered when the cron-scheduled time passes. With schedule 0 3 * * *, maintenance runs after 3am daily.

Configure maintenance via environment variables:

Terminal window
HOLDEN_MAINTENANCE_SCHEDULE="0 3 * * *" # Cron format (default: 3am daily)
HOLDEN_BACKUP_DIR=/mnt/backups # Where backups are stored
HOLDEN_MAX_BACKUPS=10 # Keep this many per app (default: 10)
FieldRequiredDescription
scheduleNoCron expression (default: 0 3 * * *)
backup_dirNoBackup staging directory (backups disabled if not set)
max_backupsNoRetention count per app (default: 10)

Needs (automatic) - All needs containers (postgres, valkey, garage) are backed up by default.

App volumes (opt-in) - Only volumes listed in backup_volumes:

holden.yml
services:
web:
volumes:
- ./uploads:/app/uploads
- ./cache:/app/cache
backup_volumes:
- ./uploads # Backed up
# ./cache is not backed up

backup_volumes is defined at the app level (not per-service) and uses host paths (the left side of volume definitions).

/mnt/backups/
├── myapp/
│ ├── 2024-01-15T03:00:00Z/
│ │ ├── postgres/
│ │ ├── valkey/
│ │ └── uploads/
│ └── 2024-01-14T03:00:00Z/
│ └── ...
└── other-app/
└── ...

Holden stages backups locally. Getting them offsite is up to you:

Terminal window
# /etc/cron.d/holden-offsite (runs after maintenance window)
0 4 * * * rclone sync /mnt/backups remote:holden-backups

To restore from a maintenance backup, see Backup & Restore.

Apps with update_policy: during_maintenance get their images pre-pulled during step 3. This is the only time these apps check for new images—no surprise updates during the day.

Use during_maintenance for production apps where you want predictable update windows.

After all apps have been processed, Holden removes resources that are no longer needed.

Containers with Holden labels (holden.managed=true) that don’t match any registered app are removed. This happens when you remove an app with holden app remove.

Data directories are never touched. If you want to delete an app’s data, do it manually from HOLDEN_BASE_DATA_DIR.

Holden attempts to remove all networks with the holden.managed=true label. Docker refuses to remove networks that have containers attached, so only empty networks are deleted.

When HOLDEN_DANGLING_IMAGES is enabled, Holden prunes dangling Docker images after maintenance.

After cleanup, Holden always spawns an Overseer to recreate itself. This happens even if Holden’s image hasn’t changed.

Why always reboot?

  • Fresh state — Any accumulated state or memory is cleared
  • Config refresh — Env var changes (like HOLDEN_PUBLIC_DOMAIN) take effect
  • Exercises the Overseer — Code paths that only run occasionally tend to bit-rot
  • Queues all apps — The fresh Holden queues all apps on boot, ensuring reconciliation

The reboot adds ~10-30 seconds of Holden unavailability (during health check). Since this happens at 3am during a maintenance window, the impact is minimal.