These used to be actual lists created by an admittedly terrible script that was run nightly by a systemd --user timer, but now I backup pacman's whole local database whenever there's a change using pakbak and systemd.path units.

Pakbak is pretty straightforward: You configure where you want the backups stored, the ammount of backups you want to keep, and then enable pakbak.path. This will trigger pacbak.service whenever there is a change to /var/lib/pacman/local in the filesystem, pakbak.path triggers pakbak.service, which runs /usr/lib/systemd/scripts/pakbak. The pakbak script checks for pacman's lock file before continuing, then creates a tar archive of /var/lib/pacman/local.

If later access is needed, untar the archive, and point pacman at that dir (e.g.: to get the package lists from an unrecoverable system). The archive includes the dirs leading up to the actual database, so if simple recovery is the goal, just untar at /. For accessing the database in other situations, it might be prudent to add --strip-components=3 in order to get just the local subdir.

# Backup the database to this folder

# Define how long backups should be kept
# Can be a number of days or empty to disable

# Define how many backups should be kept
# If more backup are found, the oldest are deleted
# Can be a number of file or empty to disable

This will keep only one copy of the database around, and delete all the others. Since they are being committed to a git repo, there's no need to keep several copies to have access to the history.

Once pakbak has created the archive, it is added, committed, and pushed to a vcsh git repo by a couple of systemd --user services. This is triggered by any change to the directory that pakbak writes its output to for that host. Activation of this process is handled by pkglists-commit.path:

Description=Path activation for pkglists-commit.service



This unit watches \(HOME/.pkglists/\)HOSTNAME for changes, and on any activity, activates pkglists-commit.service:

Description=Add, commit, and push pacman db backups

ExecStartPre=/usr/bin/bash -c 'wait $(pgrep pakbak)'
ExecStartPre=/usr/bin/vcsh pkglists pull
ExecStartPre=/usr/bin/vcsh pkglists add -A %h/.pkglists/%H/*
ExecStartPre=/usr/bin/vcsh pkglists commit -m "Auto-commiting %H pacman db"
ExecStart=/usr/bin/vcsh pkglists push

This service waits for pakbak to complete, pulls the pkglists repo, adds any changes to \(HOME/.pkglists/\)HOSTNAME, commits to the repo, then pushes. Thanks to git-add's -A option, the add step also includes removal, so that git will remove the old database from the branch, which avoids the possibility of having multiple databases exist when pulling from or cloning the repo.

This is where the main magic happens that allows automatic distributed backups. For each host that has this set up, they also are housing the backups. The origin branch is technically the canonical master, but since this is git, recovery is very easy to access from any machine. Also, since the trigger is literally any time pacman's local database changes (read: installation or removal of a package), the chances of all the clients having the latest revision of the repo becomes much higher.

  • I'm not a big fan of using my own systemd.path unit since one is already provided by pakbak, but since A) this process deals with files in a home folder, and B) there isn't any forseeable situation where a systemd --user instance won't be running when the database gets updated, I opted for systemd --user units to manage this part of the solution, and --user units can't depend on --system units. This could easily be fixed by installing a copy of pakbak's units to /usr/lib/systemd/user/ in the PKGBUILD and running it from there.
  • All of this could probably be replaced with my own systemd --user units or a pacman hook, but for the moment, I'm more concerned with getting it working.