Nomad to k8s, Part 12: Backup Plan

Wherein it seems I need a new backup strategy.

This is part 13 of my k8s migration series.

During the last week, I’ve started to work on implementing my backup strategy for the new Kubernetes cluster. The original plan was to stay with what I’m already doing in my Nomad cluster. But it turns out I can’t, so I need a new strategy.

If you’re prone to suffering from IT-related nightmares, you might wish to skip this one. The Nomad backup implementation ain’t pretty, and my current plans for the k8s backup implementation ain’t going to make it any prettier. You’ve been warned.

Speaking of backups

You should do backups. They don’t have to be perfect. As you will see in the next section, mine definitely aren’t. But they’re serving me well.

I’ve only ever needed backups once, right after I left university and my entire life was stored on a single laptop’s internal HDD - and that HDD failed. But: I was lucky, in that I had backups of my /home directory only about 24h old.

So all good, you might think. But not really. You see, my backups were encrypted. And now guess where the only copy of that decryption key was stored. Exactly. I got pretty lucky again, in that I was able to read the key from the broken disk. But these days, I’ve got my keys stored in several places.

In fact, I went one more step: There’s an unencrypted USB stick with a copy of my password manager and PGP keys in a bank vault. So don’t forget to separately back up your backup’s encryption key!

Generally, backups are supposed to be 3-2-1:

Three different copies of your data
On two different kinds of media
With one copy off-site

I do not have an off-site copy anywhere yet, safe for the aforementioned USB stick with my unencrypted password manager.

And for me, the “two different kinds of media” isn’t really two different kinds of media. It’s more like “two independent systems”, because even the data that’s really important to me is already too big to even store on multiple DVDs, which is the only medium I would call “sensible” for a consumer.

What I do find important is incremental backups. Don’t just override the previous day’s backup with the current one. Because incremental backups aren’t to protect you from faulty devices, but rather to protect you from yourself. This might be a fat-fingered rm -rf /, or a ransomware infection. With incremental backups, you can always go back.

My general strategy is:

One yearly backup
One monthly backup for the past six month
One weekly backup for the past six weeks
A daily backup for the past seven days

That should make sure that I can even recover from an accidental delete I only realized I made a couple months later.

The current state of backups in my Nomad cluster

The basis of my backups, both in my Homelab and for my other hosts, is restic. It’s a CLI backup program which supports a wide range of backup targets, encryption and incremental backups.

Restic then pushes all of those, both the data volumes from my Homelab services and my /home dir from my workstation and laptop, into an S3 bucket on my Ceph cluster, on a pool with two replicas. For the stuff coming from my workstation, this is already an improvement, because the backup is stored on different disks than the original data. But it doesn’t do very much for my Homelab data volumes, because those are all located on that same Ceph cluster. The only advantage those backups bring is their incremental nature, so if I accidentally delete a volume, I can still get the data back.

The second part, which fulfills the “two different types of media” requirement is a backup on an external HDD. This backup is a bit more selective than the relatively broad restic backup, because that single external HDD isn’t big enough to hold all of my data. But it is easily big enough to hold all the data I genuinely care about.

I’m running both of those backup jobs through the Nomad cluster. The first backup, dubbed my “services backup”, backs up the data volumes attached to my Homelab services. The second one, called the “external backup”, takes a couple of the S3 buckets used as targets in the services backup, and clones them onto an external HDD.

The services backup

The services backup is deployed in the Nomad cluster as a System Batch type job. These jobs are similar to Kubernete’s DaemonSet, in that they run a job instance on every host, but they are of the “run-to-completion” type, similar to Kubernete’s Job object, instead of expecting to start a daemon which stays active on each node.

This job needs to be run in privileged mode, because it mounts the directory where CSI drivers mount CSI volumes on the host into the job’s container.

Yes, you read that right: On all my Nomad cluster hosts, every night, there runs a container which mounts the mount directories of all mounted CSI volumes.

Once that’s done, the container runs a small Python program I’ve written to do the actual backup. It does roughly the following:

Check which jobs are running on the current node
Check which volumes from those jobs are noted as needing backups in a config file
Run restic against those directories and push them into an S3 bucket on Ceph

In addition, I’m also using rclone to back up S3 buckets from those apps which use them for storage. This, again, does not make the data more resilient, but it is again protection against accidental bucket deletion and similar things.

This approach has a number of downsides. First, it is not 100% reliable. I’m backing up the data from volumes while those volumes are mounted and used by their services. I don’t have too much of a problem with that, simply because the data on disk does not change too much during the times when I run the backups. But it is still something to consider. In addition, the overall backup job setup is also not the safest from a security point of view. I’m taking the mounted data for all of my services and mount them into a single container. At least from a data access standpoint, that container is basically root on my cluster nodes and can access the data for all services in the Homelab.

The external disk backup

The second part of the backup strategy is to take the backup repositories created by the service backups described above, as well as the ones created by my host backups, and cloning them onto an external HDD connected to one of my nodes. This is also implemented as a Nomad job.

This job receives the /dev device for the USB external HDD, instead of receiving the already mounted directory. This is another defense in depth, as it allows me to mount the backup disk only when it is really needed/used, and not have it mounted all the time. This is a small defense against both, encryption ransomware and accidental deletion.

But it also has one downside: Security, again. To be allowed to call the mount command in the container, I have to run it in privileged mode.

This job does not have to do anything fancy in the implementation itself. It mounts the external HDD and then runs rclone on all of the backup S3 buckets defined in the configuration file for the services backup, plus a couple of additional buckets for e.g. my /home backups. All of those get cloned onto the external HDD. Here, I’m not using restic and incremental backups, simple because the individual backup buckets already contain the incremental backups.

This part of the backup I had already transferred over to my Kubernetes cluster, without much issue.

The issue with migrating the backups to the k8s cluster

My main issue came yesterday, when I started to plan the addition of the service backup job to Kubernetes. The basic functionality seemed to be available. I could just mount the /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com directory into the container. That’s the directory where the Ceph CSI plugin I use to provide PersistentVolumes mounts the volumes.

But then I checked how to actually run the job. As noted above, I needed to have a run-to-completion pod running on every host in the cluster. And it looks like k8s just doesn’t have anything equivalent to Nomad’s System Batch type of job.

So what to do instead? One option would be to change the small Python app I wrote, so that it doesn’t just run the current backup cycle and then exits, but instead runs continuously. I could then put it into a DaemonSet on each k8s node. That would very likely have worked. But since my initial tests with k8s back in August 2023, I had thought that I might implement a small “Homelab Backups” operator.

Over-engineered, but hopefully fun

So if we take a hard look at my current setup for services and external backups, there are a number of crutches in there. First of all, there’s the sequencing problem. The nightly external HDD backup job should only run when all of the service backup jobs have done their work for the night. But I was never able to come up with a good way to do that, so I settled for just launching the external HDD backup an hour after the service backup job. Not very elegant, but worked well enough.

Then there was the issue of the “run the service job on every host” approach. This is a shotgun approach, and also not very explicit in its configs. It’s very possible that there was no job with backups even running on any given host, so the service backup run on that host would have been a waste.

Finally, the backup configuration, namely which volumes and S3 buckets should be backed up, was done in configuration files for the two backup jobs - not the individual app’s jobs. So when removing or adding an app, I always had to remember that I needed to also update the config of the backup jobs.

The idea I came up with, which solves all of the issues above, is to implement a “Homelab Backup” Kubernetes Operator. That operator would handle “HomelabBackup” objects, which I could individually configure for each app I’m running and which needs backups. When I then remove the app, that manifest would also be removed and the backup for that particular app would be stopped.

It might look something like this:

kind: HomelabBackup
metadata:
  name: nextcloud
spec:
  backupBucket: my-nextcloud-backup-bucket
  schedule: "30 2 * * *"
  volumeClaims:
    - name: my-nextcloud-pvc
      external: false
  s3Buckets:
    - name: my-nextcloud-data-bucket
      external: true

This would allow the definition of PersistentVolumeClaims and S3 buckets to back up, and also where to back them up to.

The operator would then not start one backup job for every node, but instead launch one k8s Job workload per HomelabBackup object. I would also be able to then watch those Jobs, and once they all finish one way or the other, I could then launch the external backup Job right away, instead of waiting for an arbitrary amount of time. I could now make the dependency explicit.

With the operator launching the jobs, I will also be able to launch the job on the right node where the volume is currently mounted. This can be done by looking at Kubernete’s VolumeAttachment objects, that show on which node any given volume is currently mounted.

I’m also considering some scheduling, to make sure that on any given node, there’s only ever going to be a single Job running, because anything else would likely tax my 1 Gbps network.

Looking around, I found the kopf framework for Kubernetes Operators in Python. This looks pretty well suited for my needs, and Python is currently among my most familiar languages anyway. It would be nice to go for Go instead, but I would first have to familiarize myself with the language before I could write the operator. And the main goal here is still to move forward with the k8s migration.

Overall, I’m not actually too mad about this detour. It looks like it’s going to be an interesting dive into Kubernete’s API and operator implementations, and it’s going to fix a couple of problems with my old backup implementation. In the end, stuff like this is why I set the migration up in such a way that I could do it iteratively, while both the Nomad and k8s clusters run side by side.

Speaking of backups#

The current state of backups in my Nomad cluster#

The services backup#

The external disk backup#

The issue with migrating the backups to the k8s cluster#

Over-engineered, but hopefully fun#