In the last couple of months, I’ve been working on a k8s operator for running backups of persistent volumes and S3 buckets in my cluster. Previous installments of the series can be found here.

And now, I’m finally done with it, and over the weekend, I ran the first successful backups. Time to describe what I’ve implemented, why and how.

Recap

Let’s start with a recap. For a more detailed description of the problem, have a look at this post in my k8s migration series.

In short, my previous backup implementation on my Nomad cluster runs a container on each host in the cluster. This container then checks which jobs run on the host and backs up the volumes noted in the config file for that job. This approach would not work on Kubernetes, because k8s does not provide an API similar to Nomad’s Sysbatch jobs. Those types of jobs launch a given container on every host in the cluster, with a run-to-completion setup. Kubernetes, on the other hand, only knows Jobs, which cannot be run on every host simultaneously, and DaemonSets, which don’t have run-to-completion semantics.

There would have, of course, been the easy way out: Using an existing solution. But where’s the fun in that?

So I decided to take this chance to learn the Kubernetes API a bit better, and write my own operator. Because I’m relatively familiar with Python, I decided to use the Kopf framework.

The end goal was to have a per-app configuration in the form of a custom resource definition which tells the operator which volumes and buckets need to be backed up. Here is an example I used for my tests:

apiVersion: mei-home.net/v1alpha1
kind: HomelabServiceBackup
metadata:
  name: test-service-backup
  namespace: testing
  labels:
    homelab/part-of: testing
spec:
  runNow: "12"
  backupBucketName: "backup-operator-testing"
  backups:
    - type: pvc
      name: mysql-pv-claim
      namespace: testing
    - type: pvc
      name: wp-pv-claim
      namespace: testing
    - type: s3
      name: service-backup-test

This object instructs my operator to back up the MySQL and WordPress volumes of a WordPress deployment I launched just for testing purposes. It also contains an S3 bucket that’s not used by the deployment and just exists to test that part of the operator.

High level overview

Alright, let’s assume that we’ve go the above example HomelabServiceBackup (HLSB). What do I want to happen when a backup is triggered?

On the most basic level, I want two things to happen:

  1. The two pvc type entries in the spec.backups list are run through restic to back them up. This means the backup needs access to those volumes.
  2. The s3 type bucket is downloaded to a temporary location, and then restic is run on that temporary location to make an incremental backup of the bucket.

BAD THINGS. This paragraph is the “Do as I say, not as I do” part of this post. First of all, running backups on live data is generally a bad idea. You might end up with inconsistent state in your backup. Second, there are perfectly good block-level backup capabilities right in Ceph. With consistency guarantees. But I don’t like those. They basically require a second Ceph cluster as a backup target. To reiterate: What I’m doing here is bad. And I know that what I’m doing here is bad. It’s working for me, but I’m really not advising you to do the same thing. That’s the main reason I will likely never publish the operator I wrote - I just don’t think it’s a good idea.

With that out of the way, which steps need to be completed?

  1. Determine where each of the pvc type volumes is mounted
  2. Split the volumes into groups by the host they’re currently mounted on
  3. For each of those groups:
    • Create a ConfigMap with the configuration for that particular group
    • Create a Job for each group and launch them, in sequence
  4. Determine whether all jobs were successful and update the HLSB object in the k8s cluster

The HLSB object has a status.state property, which can be one of:

  • Running
  • Success
  • Failed

These are then later used by a Grafana panel using Prometheus data from kube-state-metrics to show whether all of the backup were successful.

Now let’s have a closer look at the above steps.

Implementation details

Finding volumes and hosts

Let’s look at the backup list from the example HLSB above again:

backups:
  - type: pvc
    name: mysql-pv-claim
    namespace: testing
  - type: pvc
    name: wp-pv-claim
    namespace: testing

I’m ignoring the s3 type entry here, because quite frankly, it’s not that interesting.

For the pvc type entries, the very first step is to determine on which host they’re currently mounted. Because the PVC might be RWO, we cannot just mount them to the backup Pod while the app using it is already running. Instead, I will use a hostPath volume, to mount the directory where the Ceph CSI provider mounts the volumes into the Backup container.

For that to work, I need to know on which host the volume is actually mounted. And for apps having multiple pods and associated volumes, these may be multiple hosts. Which presents yet another challenge: Restic, when backing up to a repository, locks that repository, so there can only ever be a single writer. My backup buckets are separated by app, so even if an app has multiple volumes defined, like the example above, I can only ever run one backup in parallel. If multiple volumes happen to be mounted on a single host, that’s not a problem. The backup Job for that host can backup all of them. But if they happen to be mounted on separate hosts, there need to be multiple Jobs, running one after the other.

So how to get the volumes? With the Kubernetes API. As input for our journey, we’ve got the PVC defined, with its name and namespace, in the list of things to backup.

So the first action is to fetch the PVC via the Kubernetes API. Because I’m writing async code in Kopf, I’m using kubernetes_asyncio instead of the official Kubernetes Python lib.

Here’s what the PVC looks like, with the wp-pv-claim from the example:

{
    "apiVersion": "v1",
    "kind": "PersistentVolumeClaim",
    "metadata": {
        "labels": {
            "app": "wordpress",
            "app.kubernetes.io/managed-by": "Helm",
            "homelab/part-of": "testing"
        },
        "name": "wp-pv-claim",
        "namespace": "testing",
    },
    "spec": {
        "accessModes": [
            "ReadWriteOnce"
        ],
        "resources": {
            "requests": {
                "storage": "10Gi"
            }
        },
        "storageClassName": "rbd-bulk",
        "volumeMode": "Filesystem",
        "volumeName": "pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f"
    },
    "status": {
        "accessModes": [
            "ReadWriteOnce"
        ],
        "capacity": {
            "storage": "10Gi"
        },
        "phase": "Bound"
    }
}

I removed a couple of pieces which aren’t that interesting. With this info in hand, we can go to the next step, fetching the PersistentVolume backing this claim. This can also be done pretty easily with the read_persistent_volume API, which only needs a name as input, because PersistentVolumes are cluster level resources. The name of the volume backing the claim can be taken from the spec.volumeName property.

The result for the above PVC would look like this, again with unimportant bits removed:

{
    "apiVersion": "v1",
    "kind": "PersistentVolume",
    "metadata": {
        "name": "pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f",
    },
    "spec": {
        "accessModes": [
            "ReadWriteOnce"
        ],
        "capacity": {
            "storage": "10Gi"
        },
        "csi": {
            "controllerExpandSecretRef": {
                "name": "rook-csi-rbd-provisioner",
                "namespace": "rook-cluster"
            },
            "driver": "rook-ceph.rbd.csi.ceph.com",
            "fsType": "ext4",
            "nodeStageSecretRef": {
                "name": "rook-csi-rbd-node",
                "namespace": "rook-cluster"
            },
            "volumeAttributes": {
                "clusterID": "rook-cluster",
                "imageFeatures": "layering,exclusive-lock,object-map,fast-diff",
                "imageName": "csi-vol-3361c6d5-4269-4ab2-bc14-771420b768a7",
                "journalPool": "rbd-bulk",
                "pool": "rbd-bulk",
            },
            "volumeHandle": "0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7"
        },
        "persistentVolumeReclaimPolicy": "Retain",
        "storageClassName": "rbd-bulk",
        "volumeMode": "Filesystem"
    },
    "status": {
        "phase": "Bound"
    }
}

One potentially useful side-note: The spec.csi.volumeAttributes.imageName property is the name of the backing RBD volume in Ceph.

The third thing we need is the VolumeAttachment for the PersistentVolume, which tells us where it is currently mounted. Sadly, these don’t have an API to find the attachment for a given PersistentVolume (or multiple attachments of the same volume, if it is RWX). So instead, I’m fetching all of the attachments with the list_volume_attachments API. This one, again, is not namespaced. Here is the current attachment for the above PersistentVolume:

{
    "apiVersion": "storage.k8s.io/v1",
    "kind": "VolumeAttachment",
    "metadata": {
        "creationTimestamp": "2024-12-29T10:44:46Z",
        "name": "csi-8aee698fd97659b400535fa69969815fad87d2b761d69625d04afc95d53bf252",
        "resourceVersion": "152545692",
        "uid": "6cbe234b-e2c7-4596-a4b6-03d66eb45f5f"
    },
    "spec": {
        "attacher": "rook-ceph.rbd.csi.ceph.com",
        "nodeName": "sehith",
        "source": {
            "persistentVolumeName": "pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f"
        }
    },
    "status": {
        "attached": true
    }
}

The spec.nodeName provides us with what we need: The name of the host where the volume is currently mounted.

Next, how to figure out which hostPath to use to mount that volume into the backup container? That’s done with this small Python function:

def get_ceph_csi_host_path(pv):
    volume_handle = pv.spec.csi.volume_handle
    driver = pv.spec.csi.driver
    vol_id_digest = sha256(bytes(volume_handle, 'utf-8')).hexdigest()
    p = "/".join([
        CSI_MOUNT_PREFIX,
        driver,
        vol_id_digest,
        "globalmount",
        volume_handle
    ])
    return p

It takes the PersistentVolume as input, as well as the CSI_MOUNT_PREFIX, which is /var/lib/kubelet/plugins/kubernetes.io/csi. In addition, there is a hash of the spec.csi.volume_handle in the path. The full mount path looks like this:

/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fb3f47df032796f8ee3f021a858f09772c60bf6b30a75288a4887852a59b071f/globalmount/0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7

And yes, for some reason the path contains the volume’s volume_handle once in plain form, and once in hashed form. No idea what’s the reason behind that. Plus, it’s worth noting that this is specific to the Ceph CSI driver. The paths for other drivers would look different.

Creating the configuration file

Because we’ve only got two volumes in our example HLSB, let’s assume that both of them are mounted on the same host. So this particular backup would only need to run a single Job. That Job needs to be told what it’s supposed to back up, which I’m doing by creating a fresh ConfigMap for the job. An example for the two volumes in our example HLSB would look like this:

kind: ConfigMap
apiVersion: v1
data:
  hlsb-conf.yaml: |
    retention:
      daily: 7
      monthly: 6
      weekly: 6
      yearly: 1
    volumes:
    - name: testing-mysql-pv-claim
    - name: testing-wp-pv-claim    

This config describes the retention policy and the volumes for this backup. The retention policy is one of the shortcuts I took. It’s actually more of a global config, which I would normally provide to the backup Job via environment variables. But because the retention is not just a simple single value, I decided that it’s just easier to add it to the config file, even though it’s not specific to the currently executed backup Job.

The entries in the volumes: list are the combination of the PVC’s namespace+name. These are also the names of the directories under which they’re mounted into the backup container.

Creating the Job

As I’ve noted above, each host where one of the app’s volumes is mounted gets a Job. These Jobs only have one Pod, running a relatively simple Python app that reads the config file and runs restic backup on the mount directories of all the volumes to be backed up.

{
    "apiVersion": "batch/v1",
    "kind": "Job",
    "metadata": {
        "labels": {
            "hlsb": "audiobookshelf_backup-audiobookshelf",
            "homelab/part-of": "hlsb"
        },
        "name": "audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f",
        "namespace": "backups",
    },
    "spec": {
        "backoffLimit": 1,
        "completions": 1,
        "parallelism": 1,
        "template": {
            "spec": {
                "affinity": {
                    "podAntiAffinity": {
                        "requiredDuringSchedulingIgnoredDuringExecution": [
                            {
                                "labelSelector": {
                                    "matchLabels": {
                                        "homelab/part-of": "hlsb"
                                    }
                                },
                                "topologyKey": "kubernetes.io/hostname"
                            }
                        ]
                    }
                },
                "containers": [
                    {
                        "command": [
                            "hn-backup",
                            "kube-services"
                        ],
                        "env": [
                            {
                                "name": "HLSB_S3_BACKUP_HOST",
                                "value": "s3-k8s.mei-home.net:443"
                            },
                            {
                                "name": "HLSB_S3_SERVICE_HOST",
                                "value": "s3-k8s.mei-home.net:443"
                            },
                            {
                                "name": "HLSB_BACKUP_BUCKET",
                                "value": "backup-audiobookshelf"
                            },
                            {
                                "name": "HLSB_S3_SCRATCH_VOL_DIR",
                                "value": "/hlsb-mounts/backup-s3-scratch"
                            },
                            {
                                "name": "HLSB_VOL_MOUNT_DIR",
                                "value": "/hlsb-mounts"
                            },
                            {
                                "name": "HLSB_NAME",
                                "value": "backup-audiobookshelf"
                            },
                            {
                                "name": "HLSB_NS",
                                "value": "audiobookshelf"
                            },
                            {
                                "name": "HLSB_CONFIG",
                                "value": "/hlsb-mounts/hlsb-conf.yaml"
                            },
                            {
                                "name": "HLSB_S3_BACKUP_ACCESS_KEY_ID",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "AccessKey",
                                        "name": "s3-backup-buckets-cred",
                                        "optional": false
                                    }
                                }
                            },
                            {
                                "name": "HLSB_S3_BACKUP_SECRET_KEY",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "SecretKey",
                                        "name": "s3-backup-buckets-cred",
                                        "optional": false
                                    }
                                }
                            },
                            {
                                "name": "HLSB_S3_SERVICE_ACCESS_KEY_ID",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "AccessKey",
                                        "name": "s3-backup-buckets-cred",
                                        "optional": false
                                    }
                                }
                            },
                            {
                                "name": "HLSB_S3_SERVICE_SECRET_KEY",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "SecretKey",
                                        "name": "s3-backup-buckets-cred",
                                        "optional": false
                                    }
                                }
                            },
                            {
                                "name": "HLSB_RESTIC_PW",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "pw",
                                        "name": "restic-pw",
                                        "optional": false
                                    }
                                }
                            }
                        ],
                        "image": "harbor.mei-home.net/homelab/hn-backup:5.0.0",
                        "name": "hlsb",
                        "volumeMounts": [
                            {
                                "mountPath": "/hlsb-mounts/audiobookshelf-abs-data-volume",
                                "name": "vol-backup-audiobookshelf-abs-data-volume"
                            },
                            {
                                "mountPath": "/hlsb-mounts",
                                "name": "vol-backup-confmap",
                                "readOnly": true
                            }
                        ]
                    }
                ],
                "nodeSelector": {
                    "kubernetes.io/hostname": "khepri"
                },
                "priorityClassName": "system-node-critical",
                "restartPolicy": "Never",
                "volumes": [
                    {
                        "hostPath": {
                            "path": "/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74",
                            "type": ""
                        },
                        "name": "vol-backup-audiobookshelf-abs-data-volume"
                    },
                    {
                        "configMap": {
                            "defaultMode": 420,
                            "name": "backup-confmap-audiobookshelf-backup-audiobookshelf",
                            "optional": false
                        },
                        "name": "vol-backup-confmap"
                    }
                ]
            }
        }
    }
}

This one does not fit the HLSB I’ve been using as an example, but I hope you can forgive that oversight. I forgot to save the JSON for one of the jobs I ran against my example HLSB.

Let’s start with the metadata property:

"metadata": {
    "labels": {
        "hlsb": "audiobookshelf_backup-audiobookshelf",
        "homelab/part-of": "hlsb"
    },
    "name": "audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f",
    "namespace": "backups",
}

For identification reasons, all pieces belonging to a certain HLSB have that HLSB’s namespace and name in a HLSB label. In addition, they’re all marked as part-of the Homelab service backup as part of my general labeling scheme. The name of the Job again contains the namespace and name of the HLSB and is capped off by a random string. It is generated like this:

def get_new_job_name(hlsb_name, hlsb_namespace):
    name = f"{hlsb_namespace}-{hlsb_name}-{uuid.uuid4()}"
    truncated_name = name[0:61]
    if truncated_name[-1] == "-":
        return truncated_name[0:-1]
    else:
        return truncated_name

Creating this name was a lot more complicated than I anticipated. Because I don’t currently have any integration tests against a real k8s cluster, this function was a surprising source for issues. To begin with, the name of a Job can only be 63 chars long at maximum. So appending the full UUID lead to errors during the initial testing. Than I thought I had it, with my test HLSB running backups successfully. And then I implemented the above HLSB, for my Audiobookshelf deployment. And I then found that the cutoff at 61 chars I implemented left the name ending on a -. Which k8s also doesn’t allow, hence the check if the name ends on -. 🤦

Another thing worth mentioning: The backup jobs run in my backups namespace, not in the app’s namespace. This is mostly so that I can comfortably keep all of the necessary secrets in a separate namespace.

Then let’s continue with the spec, more precisely the affinity I’ve set up:

"affinity": {
    "podAntiAffinity": {
        "requiredDuringSchedulingIgnoredDuringExecution": [
            {
                "labelSelector": {
                    "matchLabels": {
                        "homelab/part-of": "hlsb"
                    }
                },
                "topologyKey": "kubernetes.io/hostname"
            }
        ]
    }
},

This config prevents multiple backup Jobs from running on the same host. This is necessary because sometimes, especially with larger S3 buckets to be backed up, the rclone invocation in the backup container can use quite some resources. Plus, I just generally didn’t want to tax any specific node too much.

Next, the node selector, which ensures that the Job runs on the host where the required volumes are mounted:

"nodeSelector": {
    "kubernetes.io/hostname": "khepri"
},

This is a definition computed from the values provided by the PVC probing I’ve described above. The volumes to be backed up get grouped by the hosts they’re mounted on, and then every resulting group/host gets one Job.

And then the more interesting part, the volumes:

"volumes": [
    {
        "hostPath": {
            "path": "/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74",
            "type": ""
        },
        "name": "vol-backup-audiobookshelf-abs-data-volume"
    },
    {
        "configMap": {
            "defaultMode": 420,
            "name": "backup-confmap-audiobookshelf-backup-audiobookshelf",
            "optional": false
        },
        "name": "vol-backup-confmap"
    }
]

The hostPath.path is computed as described above, via the information from the persistent volume. And the name for the volume is defined as vol-backup-pvc_namespace-pvc_name. Additionally, the ConfigMap described in the previous section also gets mounted.

And finally, the container itself. Let’s start with the command and image:

"command": [
    "hn-backup",
    "kube-services"
],
"image": "harbor.mei-home.net/homelab/hn-backup:5.0.0",

I’ve kept it pretty simple. And instead of mucking around with lots of command line switches, the configuration is done via the config file and environment variables. I won’t say much about the hn-backup program, as it’s mainly just a wrapper around rclone for fetching S3 buckets to be backed up and restic for the backups themselves.

The volume mounts look like this:

"volumeMounts": [
    {
        "mountPath": "/hlsb-mounts/audiobookshelf-abs-data-volume",
        "name": "vol-backup-audiobookshelf-abs-data-volume"
    },
    {
        "mountPath": "/hlsb-mounts",
        "name": "vol-backup-confmap",
        "readOnly": true
    }
]

All mounts are done into the /hlsb-mounts directory in the container, which is then used by hn-backup to construct the paths to be backed up.

Then there’s the env variable. Those I use to define the common configuration. So while the ConfigMap contains options relevant for the current Job, the env variables contain the common configs. These options are defined in the HomelabBackupConfig CRD, an example of which would look like this:

apiVersion: mei-home.net/v1alpha1
kind: HomelabBackupConfig
metadata:
  name: backup-config
  namespace: backups
  labels:
    homelab/part-of: hlbo
spec:
  serviceBackup:
    schedule: "30 1 * * *"
    scratchVol: vol-service-backup-scratch
    s3BackupConfig:
      s3Host: s3-k8s.mei-home.net:443
      s3Credentials:
        secretName: s3-backup-buckets-cred
        accessKeyIDProperty: AccessKey
        secretKeyProperty: SecretKey
    s3ServiceConfig:
      s3Host: s3-k8s.mei-home.net:443
      s3Credentials:
        secretName: s3-backup-buckets-cred
        accessKeyIDProperty: AccessKey
        secretKeyProperty: SecretKey
    resticPasswordSecret:
      secretName: restic-pw
      secretKey: pw
    resticRetentionPolicy:
      daily: 7
      weekly: 6
      monthly: 6
      yearly: 1
    jobSpec:
      jobNS: "backups"
      image: harbor.mei-home.net/homelab/hn-backup:5.0.0
      command:
        - "hn-backup"
        - "kube-services"

This CRD describes options common to all backups, so they don’t need to be repeated in every HomelabServiceBackup manifest. The most important parts here are the configs for S3 access.

s3BackupConfig describes access to the backup buckets to which restic will write the backup. It contains the host, optionally with port, and how to get the S3 credentials. Very important to me here was to be able to specify not just the name of the Secret, but also the key inside the Secret to use for the specific credential. Because I’ve been pretty annoyed by some Helm charts which only allow specifying the Secret’s name and then expecting certain keys to exist. Which makes using generated secrets, like those created by Ceph Rook for S3 buckets, a real pain.

The s3ServiceConfig has exactly the same structure, but provides the credentials for access to buckets used by services, which might also be backed up, and which might live on a completely different system. This is the case for my Nomad cluster apps right now, for example. Their S3 buckets still live on the baremetal Ceph cluster, while the backup buckets have already been migrated to the Ceph Rook cluster. And I decided to make such a setup possible here as well, just in case I wanted to migrate to a different S3 setup at some point.

The resticPasswordSecret describes the encryption password for the restic backup repos in the individual S3 buckets.

All of this information is put into environment variables on the Pod running the backup. Let’s start with the backup credentials:

{
    "name": "HLSB_S3_BACKUP_HOST",
    "value": "s3-k8s.mei-home.net:443"
},
{
    "name": "HLSB_S3_BACKUP_ACCESS_KEY_ID",
    "valueFrom": {
        "secretKeyRef": {
            "key": "AccessKey",
            "name": "s3-backup-buckets-cred",
            "optional": false
        }
    }
},
{
    "name": "HLSB_S3_BACKUP_SECRET_KEY",
    "valueFrom": {
        "secretKeyRef": {
            "key": "SecretKey",
            "name": "s3-backup-buckets-cred",
            "optional": false
        }
    }
},

The configs for the S3 service bucket credentials are very similar, so I won’t repeat them here. One noteworthy thing about the above setup, especially for the Secrets: The ServiceAccount for the operator does not require access to any Secrets in its namespace. Of course, that’s a bit cosmetic - because the operator is allowed to launch Jobs, which in turn can access the secrets. But still, I found it nice that due to the way I’d set things up, the operator itself would not need to touch any Secrets.

More interesting might be some odds and ends I’ve also defined in env variables, just to make accessing them more convenient. To my shame, I have to admit that I lied above, when I pretended that I had a clean separation between generic config going into environment variables and per-Job configs going into the config file. One piece of per-Job info did end up in the environment variables, and I have absolutely no idea why I decided to do that: The name of the backup bucket. No idea why I decided to go inconsistent just with this one value.

Some other interesting variables:

{
    "name": "HLSB_S3_SCRATCH_VOL_DIR",
    "value": "/hlsb-mounts/backup-s3-scratch"
},
{
    "name": "HLSB_VOL_MOUNT_DIR",
    "value": "/hlsb-mounts"
},
{
    "name": "HLSB_NAME",
    "value": "backup-audiobookshelf"
},
{
    "name": "HLSB_NS",
    "value": "audiobookshelf"
},
{
    "name": "HLSB_CONFIG",
    "value": "/hlsb-mounts/hlsb-conf.yaml"
},

These provide convenient access to the S3 scratch volume, which is used by rclone for downloading an entire S3 bucket, which is then backed up by restic. The HLSB’s name and namespace also ended up being convenient to have available in the Pod, if only for some meaningful log outputs. And finally it’s nice to have the path to the config file available as well.

And that’s it - that’s the entire Job. I’ve long thought about providing some code snippets used for creating the V1Job, but honestly, it’s just not very interesting. It took me a while to get right, but in the end it was all just value assignments. Here’s an example, the function which creates the Pod Volume spec for the scratch volume:

def create_s3_scratch_volume(backup_conf_spec):
    if "scratchVol" not in backup_conf_spec:
        logging.error("Did not find scratchVol in backup config.")
        return None

    pvc = V1PersistentVolumeClaimVolumeSource(
        claim_name=backup_conf_spec["scratchVol"], read_only=False)
    volume = V1Volume(name=S3_SCRATCH_VOL_NAME,
                      persistent_volume_claim=pvc)
    return volume

The backup_conf_spec here is the spec.serviceBackup object from the HomelabBackupConfig I’ve shown above. And the rest of the roughly 630 lines it took me to create the V1Job programmatically look very similar, perhaps with the occasional if thrown in, but mostly just value assignments and logs.

And because I’m a kind man, I will spare you all of it.

But I still want to show you some code I think could be interesting, so let’s jump to the Job execution.

Job execution

The Job itself will get submitted via the Python API again, nothing special here. But what is special: The current daemon (Kopf’s nomenclature for a long running change handler that doesn’t just run to completion for a specific event) needs to know the current Job has finished, in whatever way. For this I decided to make use of the fact that I was writing asyncronous code. So while the daemon waited for the Job to finish, it should yield. And luckily, Kopf already provides a way to watch events from any k8s object type you might be interested in. So I set up a watcher for events from Jobs:

@kopf.on.event('jobs', labels={'homelab/part-of': 'hlsb'})
def job_event_handler(type, status, labels, **kwargs):
    jobs.handle_job_events(type, status, labels)

This filters for the events of all Jobs with the homelab/part-of: hlsb label.

The actual handling of events then happens in this function:

def handle_job_events(type, status, labels):
    if type in ["None", "DELETED", None]:
        logging.debug(
            f"Ignored job event:\nStatus: {status}\nLabels: {labels}")
        return

    if "hlsb" not in labels:
        logging.error(
            "Got event without hlsb label:"
            + "\nStatus: {status}"
            + "\nLabels: {labels}")
        return
    else:
        ns, name = labels["hlsb"].split("_")
        job_state = get_job_state(status)
        if job_state in [JobState.COMPLETE, JobState.FAILED]:
            logging.info(f"Found finished job for {ns}/{name}")
            set_job_finished_event(ns, name)

This function only concerns itself with failed or completed jobs. And if it finds such a job, it sets a “Job finished” event. These events are part of the Python standard library’s async synchronization primitives, see here. They’re awaitable objects, where the coroutine waiting on an event can be woken up by executing the event.set method. And that’s what happens in the set_job_finished_event function called when the Job has been detected as finished.

So how to determine whether a k8s Job has finished, failed or is still running? Took me a while to figure out, but the safest way seems to be to look at the Job.status.conditions array. If the status doesn’t have that member at all, it’s a pretty good bet that the Job is running or pending. Then you can iterate over the conditions, and if the given condition has type Failed and status True, the job has failed. Same if type is Complete and status is still True. Here’s an example:

"conditions": [
{
    "lastProbeTime": "2025-01-10T01:30:23Z",
    "lastTransitionTime": "2025-01-10T01:30:23Z",
    "status": "True",
    "type": "Complete"
}
]

And here’s how that looks in Python:

def get_job_state(job_status):
    if "conditions" not in job_status or not job_status["conditions"]:
        return JobState.RUNNING

    for cond in job_status["conditions"]:
        if cond["type"] == "Failed" and cond["status"] == "True":
            return JobState.FAILED
        elif cond["type"] == "Complete" and cond["status"] == "True":
            return JobState.COMPLETE

    return JobState.RUNNING

Conclusion

And that’s it. To be completely honest, this is the third time I’m typing this conclusion, and I almost rm -rf’d this post multiple times. I don’t think it’s that good or engaging. It seems I’m just not that good at writing programming blog posts. I hope those of you who made it to this point still got something out of it.

So, time to do a recap: What did this bring me? And was it a good idea? It all started out with my burning wish to just copy+paste my backup mechanism from Nomad to Kubernetes, more-or-less verbatim. Add to that the fact that I don’t get to do much programming at $dayjob, and I was just missing it a bit. Honestly, if someone were to ask me “What’s your most-used programming language?”, my honest answer would need to be “Whatever Atlassian calls JIRA’s markup language.”

But I also learned quite a bit. I had never really worked with the k8s API before, and this was a good way to dive deeper into it. Although I’m not really convinced that possessing the knowledge that writing small operators is something I’m able to do isn’t just a tad bit dangerous. 😬

My first commit to the repo was on May 9th, 2024. Adding it all up, this took me nine months to do. With rather long interruptions at times, but most of those were more due to motivation than anything else. If I had just used something existing, I would have the k8s migration done by now. But where’s the fun in that?

There’s still a lot I would like to refactor in the implementation. For example, those of you who know the k8s API probably wondered why I went with async events instead of just creating a “watch” on the Jobs and waiting for them to finish via that? I’m honestly not sure. But I would like to dive into k8s API watches. Then there’s the UT code. There’s so much repeating myself in those tests, and especially the mocks. Then there’s still a lot of hardcoded constants in the code I’d like to make configurable via the HomelabBackupConfig or HomelabServiceBackup. And finally, there’s also my wish to finally go and learn Golang. With this operator, I’ve got a really good-sized first project. And I would have the advantage that it’s not a greenfield project. Most of the design is already done, so I would be able to concentrate on writing Go.

I will write one more post on the operator, as part of the Nomad to k8s series, treating it as just another app and describing what the deployment looks like.

And finally, I’m quite happy that I’m done with this now. I’ve been looking forward to being able to continue the k8s migration for way too long.

My longing for continuing the migration has been getting so bad that I’ve started to miss YAML.

Almost.