In the last couple of months, I’ve been working on a k8s operator for running backups of persistent volumes and S3 buckets in my cluster. Previous installments of the series can be found here.
And now, I’m finally done with it, and over the weekend, I ran the first successful backups. Time to describe what I’ve implemented, why and how.
Recap
Let’s start with a recap. For a more detailed description of the problem, have a look at this post in my k8s migration series.
In short, my previous backup implementation on my Nomad cluster runs a container on each host in the cluster. This container then checks which jobs run on the host and backs up the volumes noted in the config file for that job. This approach would not work on Kubernetes, because k8s does not provide an API similar to Nomad’s Sysbatch jobs. Those types of jobs launch a given container on every host in the cluster, with a run-to-completion setup. Kubernetes, on the other hand, only knows Jobs, which cannot be run on every host simultaneously, and DaemonSets, which don’t have run-to-completion semantics.
There would have, of course, been the easy way out: Using an existing solution. But where’s the fun in that?
So I decided to take this chance to learn the Kubernetes API a bit better, and write my own operator. Because I’m relatively familiar with Python, I decided to use the Kopf framework.
The end goal was to have a per-app configuration in the form of a custom resource definition which tells the operator which volumes and buckets need to be backed up. Here is an example I used for my tests:
apiVersion: mei-home.net/v1alpha1
kind: HomelabServiceBackup
metadata:
name: test-service-backup
namespace: testing
labels:
homelab/part-of: testing
spec:
runNow: "12"
backupBucketName: "backup-operator-testing"
backups:
- type: pvc
name: mysql-pv-claim
namespace: testing
- type: pvc
name: wp-pv-claim
namespace: testing
- type: s3
name: service-backup-test
This object instructs my operator to back up the MySQL and WordPress volumes of a WordPress deployment I launched just for testing purposes. It also contains an S3 bucket that’s not used by the deployment and just exists to test that part of the operator.
High level overview
Alright, let’s assume that we’ve go the above example HomelabServiceBackup (HLSB). What do I want to happen when a backup is triggered?
On the most basic level, I want two things to happen:
- The two
pvc
type entries in thespec.backups
list are run through restic to back them up. This means the backup needs access to those volumes. - The
s3
type bucket is downloaded to a temporary location, and then restic is run on that temporary location to make an incremental backup of the bucket.
BAD THINGS. This paragraph is the “Do as I say, not as I do” part of this post. First of all, running backups on live data is generally a bad idea. You might end up with inconsistent state in your backup. Second, there are perfectly good block-level backup capabilities right in Ceph. With consistency guarantees. But I don’t like those. They basically require a second Ceph cluster as a backup target. To reiterate: What I’m doing here is bad. And I know that what I’m doing here is bad. It’s working for me, but I’m really not advising you to do the same thing. That’s the main reason I will likely never publish the operator I wrote - I just don’t think it’s a good idea.
With that out of the way, which steps need to be completed?
- Determine where each of the
pvc
type volumes is mounted - Split the volumes into groups by the host they’re currently mounted on
- For each of those groups:
- Create a ConfigMap with the configuration for that particular group
- Create a Job for each group and launch them, in sequence
- Determine whether all jobs were successful and update the HLSB object in the k8s cluster
The HLSB object has a status.state
property, which can be one of:
Running
Success
Failed
These are then later used by a Grafana panel using Prometheus data from kube-state-metrics to show whether all of the backup were successful.
Now let’s have a closer look at the above steps.
Implementation details
Finding volumes and hosts
Let’s look at the backup list from the example HLSB above again:
backups:
- type: pvc
name: mysql-pv-claim
namespace: testing
- type: pvc
name: wp-pv-claim
namespace: testing
I’m ignoring the s3
type entry here, because quite frankly, it’s not that
interesting.
For the pvc
type entries, the very first step is to determine on which host
they’re currently mounted. Because the PVC might be RWO, we cannot just mount
them to the backup Pod while the app using it is already running. Instead,
I will use a hostPath
volume, to mount the directory where the Ceph CSI provider mounts the volumes
into the Backup container.
For that to work, I need to know on which host the volume is actually mounted. And for apps having multiple pods and associated volumes, these may be multiple hosts. Which presents yet another challenge: Restic, when backing up to a repository, locks that repository, so there can only ever be a single writer. My backup buckets are separated by app, so even if an app has multiple volumes defined, like the example above, I can only ever run one backup in parallel. If multiple volumes happen to be mounted on a single host, that’s not a problem. The backup Job for that host can backup all of them. But if they happen to be mounted on separate hosts, there need to be multiple Jobs, running one after the other.
So how to get the volumes? With the Kubernetes API. As input for our journey, we’ve got the PVC defined, with its name and namespace, in the list of things to backup.
So the first action is to fetch the PVC via the Kubernetes API. Because I’m writing async code in Kopf, I’m using kubernetes_asyncio instead of the official Kubernetes Python lib.
Here’s what the PVC looks like, with the wp-pv-claim
from the example:
{
"apiVersion": "v1",
"kind": "PersistentVolumeClaim",
"metadata": {
"labels": {
"app": "wordpress",
"app.kubernetes.io/managed-by": "Helm",
"homelab/part-of": "testing"
},
"name": "wp-pv-claim",
"namespace": "testing",
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "10Gi"
}
},
"storageClassName": "rbd-bulk",
"volumeMode": "Filesystem",
"volumeName": "pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f"
},
"status": {
"accessModes": [
"ReadWriteOnce"
],
"capacity": {
"storage": "10Gi"
},
"phase": "Bound"
}
}
I removed a couple of pieces which aren’t that interesting. With this info in
hand, we can go to the next step, fetching the PersistentVolume backing this
claim. This can also be done pretty easily with the read_persistent_volume
API, which only needs a name as input, because PersistentVolumes are cluster
level resources. The name of the volume backing the claim can be taken from
the spec.volumeName
property.
The result for the above PVC would look like this, again with unimportant bits removed:
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f",
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"capacity": {
"storage": "10Gi"
},
"csi": {
"controllerExpandSecretRef": {
"name": "rook-csi-rbd-provisioner",
"namespace": "rook-cluster"
},
"driver": "rook-ceph.rbd.csi.ceph.com",
"fsType": "ext4",
"nodeStageSecretRef": {
"name": "rook-csi-rbd-node",
"namespace": "rook-cluster"
},
"volumeAttributes": {
"clusterID": "rook-cluster",
"imageFeatures": "layering,exclusive-lock,object-map,fast-diff",
"imageName": "csi-vol-3361c6d5-4269-4ab2-bc14-771420b768a7",
"journalPool": "rbd-bulk",
"pool": "rbd-bulk",
},
"volumeHandle": "0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7"
},
"persistentVolumeReclaimPolicy": "Retain",
"storageClassName": "rbd-bulk",
"volumeMode": "Filesystem"
},
"status": {
"phase": "Bound"
}
}
One potentially useful side-note: The spec.csi.volumeAttributes.imageName
property is the name of the backing RBD volume in Ceph.
The third thing we need is the VolumeAttachment
for the PersistentVolume, which tells us where it is currently mounted.
Sadly, these don’t have an API to find the attachment for a given
PersistentVolume (or multiple attachments of the same volume, if it is RWX).
So instead, I’m fetching all of the attachments with the list_volume_attachments
API. This one, again, is not namespaced.
Here is the current attachment for the above PersistentVolume:
{
"apiVersion": "storage.k8s.io/v1",
"kind": "VolumeAttachment",
"metadata": {
"creationTimestamp": "2024-12-29T10:44:46Z",
"name": "csi-8aee698fd97659b400535fa69969815fad87d2b761d69625d04afc95d53bf252",
"resourceVersion": "152545692",
"uid": "6cbe234b-e2c7-4596-a4b6-03d66eb45f5f"
},
"spec": {
"attacher": "rook-ceph.rbd.csi.ceph.com",
"nodeName": "sehith",
"source": {
"persistentVolumeName": "pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f"
}
},
"status": {
"attached": true
}
}
The spec.nodeName
provides us with what we need: The name of the host where
the volume is currently mounted.
Next, how to figure out which hostPath
to use to mount that volume into the
backup container? That’s done with this small Python function:
def get_ceph_csi_host_path(pv):
volume_handle = pv.spec.csi.volume_handle
driver = pv.spec.csi.driver
vol_id_digest = sha256(bytes(volume_handle, 'utf-8')).hexdigest()
p = "/".join([
CSI_MOUNT_PREFIX,
driver,
vol_id_digest,
"globalmount",
volume_handle
])
return p
It takes the PersistentVolume as input, as well as the CSI_MOUNT_PREFIX
, which
is /var/lib/kubelet/plugins/kubernetes.io/csi
. In addition, there is a hash of
the spec.csi.volume_handle
in the path. The full mount path looks like this:
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fb3f47df032796f8ee3f021a858f09772c60bf6b30a75288a4887852a59b071f/globalmount/0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7
And yes, for some reason the path contains the volume’s volume_handle
once in
plain form, and once in hashed form. No idea what’s the reason behind that.
Plus, it’s worth noting that this is specific to the Ceph CSI driver. The
paths for other drivers would look different.
Creating the configuration file
Because we’ve only got two volumes in our example HLSB, let’s assume that both of them are mounted on the same host. So this particular backup would only need to run a single Job. That Job needs to be told what it’s supposed to back up, which I’m doing by creating a fresh ConfigMap for the job. An example for the two volumes in our example HLSB would look like this:
kind: ConfigMap
apiVersion: v1
data:
hlsb-conf.yaml: |
retention:
daily: 7
monthly: 6
weekly: 6
yearly: 1
volumes:
- name: testing-mysql-pv-claim
- name: testing-wp-pv-claim
This config describes the retention policy and the volumes for this backup. The retention policy is one of the shortcuts I took. It’s actually more of a global config, which I would normally provide to the backup Job via environment variables. But because the retention is not just a simple single value, I decided that it’s just easier to add it to the config file, even though it’s not specific to the currently executed backup Job.
The entries in the volumes:
list are the combination of the PVC’s namespace+name.
These are also the names of the directories under which they’re mounted into
the backup container.
Creating the Job
As I’ve noted above, each host where one of the app’s volumes is mounted gets a
Job. These Jobs only have one Pod, running a relatively simple Python app that
reads the config file and runs restic backup
on the mount directories of all
the volumes to be backed up.
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"labels": {
"hlsb": "audiobookshelf_backup-audiobookshelf",
"homelab/part-of": "hlsb"
},
"name": "audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f",
"namespace": "backups",
},
"spec": {
"backoffLimit": 1,
"completions": 1,
"parallelism": 1,
"template": {
"spec": {
"affinity": {
"podAntiAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": [
{
"labelSelector": {
"matchLabels": {
"homelab/part-of": "hlsb"
}
},
"topologyKey": "kubernetes.io/hostname"
}
]
}
},
"containers": [
{
"command": [
"hn-backup",
"kube-services"
],
"env": [
{
"name": "HLSB_S3_BACKUP_HOST",
"value": "s3-k8s.mei-home.net:443"
},
{
"name": "HLSB_S3_SERVICE_HOST",
"value": "s3-k8s.mei-home.net:443"
},
{
"name": "HLSB_BACKUP_BUCKET",
"value": "backup-audiobookshelf"
},
{
"name": "HLSB_S3_SCRATCH_VOL_DIR",
"value": "/hlsb-mounts/backup-s3-scratch"
},
{
"name": "HLSB_VOL_MOUNT_DIR",
"value": "/hlsb-mounts"
},
{
"name": "HLSB_NAME",
"value": "backup-audiobookshelf"
},
{
"name": "HLSB_NS",
"value": "audiobookshelf"
},
{
"name": "HLSB_CONFIG",
"value": "/hlsb-mounts/hlsb-conf.yaml"
},
{
"name": "HLSB_S3_BACKUP_ACCESS_KEY_ID",
"valueFrom": {
"secretKeyRef": {
"key": "AccessKey",
"name": "s3-backup-buckets-cred",
"optional": false
}
}
},
{
"name": "HLSB_S3_BACKUP_SECRET_KEY",
"valueFrom": {
"secretKeyRef": {
"key": "SecretKey",
"name": "s3-backup-buckets-cred",
"optional": false
}
}
},
{
"name": "HLSB_S3_SERVICE_ACCESS_KEY_ID",
"valueFrom": {
"secretKeyRef": {
"key": "AccessKey",
"name": "s3-backup-buckets-cred",
"optional": false
}
}
},
{
"name": "HLSB_S3_SERVICE_SECRET_KEY",
"valueFrom": {
"secretKeyRef": {
"key": "SecretKey",
"name": "s3-backup-buckets-cred",
"optional": false
}
}
},
{
"name": "HLSB_RESTIC_PW",
"valueFrom": {
"secretKeyRef": {
"key": "pw",
"name": "restic-pw",
"optional": false
}
}
}
],
"image": "harbor.mei-home.net/homelab/hn-backup:5.0.0",
"name": "hlsb",
"volumeMounts": [
{
"mountPath": "/hlsb-mounts/audiobookshelf-abs-data-volume",
"name": "vol-backup-audiobookshelf-abs-data-volume"
},
{
"mountPath": "/hlsb-mounts",
"name": "vol-backup-confmap",
"readOnly": true
}
]
}
],
"nodeSelector": {
"kubernetes.io/hostname": "khepri"
},
"priorityClassName": "system-node-critical",
"restartPolicy": "Never",
"volumes": [
{
"hostPath": {
"path": "/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74",
"type": ""
},
"name": "vol-backup-audiobookshelf-abs-data-volume"
},
{
"configMap": {
"defaultMode": 420,
"name": "backup-confmap-audiobookshelf-backup-audiobookshelf",
"optional": false
},
"name": "vol-backup-confmap"
}
]
}
}
}
}
This one does not fit the HLSB I’ve been using as an example, but I hope you can forgive that oversight. I forgot to save the JSON for one of the jobs I ran against my example HLSB.
Let’s start with the metadata
property:
"metadata": {
"labels": {
"hlsb": "audiobookshelf_backup-audiobookshelf",
"homelab/part-of": "hlsb"
},
"name": "audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f",
"namespace": "backups",
}
For identification reasons, all pieces belonging to a certain HLSB have that
HLSB’s namespace and name in a HLSB label. In addition, they’re all marked as
part-of
the Homelab service backup as part of my general labeling scheme.
The name of the Job again contains the namespace and name of the HLSB and is
capped off by a random string. It is generated like this:
def get_new_job_name(hlsb_name, hlsb_namespace):
name = f"{hlsb_namespace}-{hlsb_name}-{uuid.uuid4()}"
truncated_name = name[0:61]
if truncated_name[-1] == "-":
return truncated_name[0:-1]
else:
return truncated_name
Creating this name was a lot more complicated than I anticipated. Because I
don’t currently have any integration tests against a real k8s cluster, this
function was a surprising source for issues. To begin with, the name of a Job
can only be 63 chars long at maximum. So appending the full UUID lead to errors
during the initial testing. Than I thought I had it, with my test HLSB running
backups successfully. And then I implemented the above HLSB, for my
Audiobookshelf deployment. And I then found
that the cutoff at 61 chars I implemented left the name ending on a -
. Which
k8s also doesn’t allow, hence the check if the name ends on -
. 🤦
Another thing worth mentioning: The backup jobs run in my backups
namespace,
not in the app’s namespace. This is mostly so that I can comfortably keep all of
the necessary secrets in a separate namespace.
Then let’s continue with the spec, more precisely the affinity I’ve set up:
"affinity": {
"podAntiAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": [
{
"labelSelector": {
"matchLabels": {
"homelab/part-of": "hlsb"
}
},
"topologyKey": "kubernetes.io/hostname"
}
]
}
},
This config prevents multiple backup Jobs from running on the same host. This is necessary because sometimes, especially with larger S3 buckets to be backed up, the rclone invocation in the backup container can use quite some resources. Plus, I just generally didn’t want to tax any specific node too much.
Next, the node selector, which ensures that the Job runs on the host where the required volumes are mounted:
"nodeSelector": {
"kubernetes.io/hostname": "khepri"
},
This is a definition computed from the values provided by the PVC probing I’ve described above. The volumes to be backed up get grouped by the hosts they’re mounted on, and then every resulting group/host gets one Job.
And then the more interesting part, the volumes:
"volumes": [
{
"hostPath": {
"path": "/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74",
"type": ""
},
"name": "vol-backup-audiobookshelf-abs-data-volume"
},
{
"configMap": {
"defaultMode": 420,
"name": "backup-confmap-audiobookshelf-backup-audiobookshelf",
"optional": false
},
"name": "vol-backup-confmap"
}
]
The hostPath.path
is computed as described above, via the information from the
persistent volume. And the name for the volume is defined as vol-backup-pvc_namespace-pvc_name
.
Additionally, the ConfigMap described in the previous section also gets
mounted.
And finally, the container itself. Let’s start with the command and image:
"command": [
"hn-backup",
"kube-services"
],
"image": "harbor.mei-home.net/homelab/hn-backup:5.0.0",
I’ve kept it pretty simple. And instead of mucking around with lots of command
line switches, the configuration is done via the config file and environment
variables.
I won’t say much about the hn-backup
program, as it’s mainly just a wrapper
around rclone for fetching S3 buckets to be backed up
and restic for the backups themselves.
The volume mounts look like this:
"volumeMounts": [
{
"mountPath": "/hlsb-mounts/audiobookshelf-abs-data-volume",
"name": "vol-backup-audiobookshelf-abs-data-volume"
},
{
"mountPath": "/hlsb-mounts",
"name": "vol-backup-confmap",
"readOnly": true
}
]
All mounts are done into the /hlsb-mounts
directory in the container, which
is then used by hn-backup to construct the paths to be backed up.
Then there’s the env variable. Those I use to define the common configuration. So while the ConfigMap contains options relevant for the current Job, the env variables contain the common configs. These options are defined in the HomelabBackupConfig CRD, an example of which would look like this:
apiVersion: mei-home.net/v1alpha1
kind: HomelabBackupConfig
metadata:
name: backup-config
namespace: backups
labels:
homelab/part-of: hlbo
spec:
serviceBackup:
schedule: "30 1 * * *"
scratchVol: vol-service-backup-scratch
s3BackupConfig:
s3Host: s3-k8s.mei-home.net:443
s3Credentials:
secretName: s3-backup-buckets-cred
accessKeyIDProperty: AccessKey
secretKeyProperty: SecretKey
s3ServiceConfig:
s3Host: s3-k8s.mei-home.net:443
s3Credentials:
secretName: s3-backup-buckets-cred
accessKeyIDProperty: AccessKey
secretKeyProperty: SecretKey
resticPasswordSecret:
secretName: restic-pw
secretKey: pw
resticRetentionPolicy:
daily: 7
weekly: 6
monthly: 6
yearly: 1
jobSpec:
jobNS: "backups"
image: harbor.mei-home.net/homelab/hn-backup:5.0.0
command:
- "hn-backup"
- "kube-services"
This CRD describes options common to all backups, so they don’t need to be repeated in every HomelabServiceBackup manifest. The most important parts here are the configs for S3 access.
s3BackupConfig
describes access to the backup buckets to which restic will
write the backup. It contains the host, optionally with port, and how to get
the S3 credentials. Very important to me here was to be able to specify not just
the name of the Secret, but also the key inside the Secret to use for the
specific credential. Because I’ve been pretty annoyed by some Helm charts which
only allow specifying the Secret’s name and then expecting certain keys to exist.
Which makes using generated secrets, like those created by Ceph Rook for S3
buckets, a real pain.
The s3ServiceConfig
has exactly the same structure, but provides the
credentials for access to buckets used by services, which might also be backed
up, and which might live on a completely different system. This is the case for
my Nomad cluster apps right now, for example. Their S3 buckets still live on the
baremetal Ceph cluster, while the backup buckets have already been migrated to
the Ceph Rook cluster. And I decided to make such a setup possible here as well,
just in case I wanted to migrate to a different S3 setup at some point.
The resticPasswordSecret
describes the encryption password for the restic
backup repos in the individual S3 buckets.
All of this information is put into environment variables on the Pod running the backup. Let’s start with the backup credentials:
{
"name": "HLSB_S3_BACKUP_HOST",
"value": "s3-k8s.mei-home.net:443"
},
{
"name": "HLSB_S3_BACKUP_ACCESS_KEY_ID",
"valueFrom": {
"secretKeyRef": {
"key": "AccessKey",
"name": "s3-backup-buckets-cred",
"optional": false
}
}
},
{
"name": "HLSB_S3_BACKUP_SECRET_KEY",
"valueFrom": {
"secretKeyRef": {
"key": "SecretKey",
"name": "s3-backup-buckets-cred",
"optional": false
}
}
},
The configs for the S3 service bucket credentials are very similar, so I won’t repeat them here. One noteworthy thing about the above setup, especially for the Secrets: The ServiceAccount for the operator does not require access to any Secrets in its namespace. Of course, that’s a bit cosmetic - because the operator is allowed to launch Jobs, which in turn can access the secrets. But still, I found it nice that due to the way I’d set things up, the operator itself would not need to touch any Secrets.
More interesting might be some odds and ends I’ve also defined in env variables, just to make accessing them more convenient. To my shame, I have to admit that I lied above, when I pretended that I had a clean separation between generic config going into environment variables and per-Job configs going into the config file. One piece of per-Job info did end up in the environment variables, and I have absolutely no idea why I decided to do that: The name of the backup bucket. No idea why I decided to go inconsistent just with this one value.
Some other interesting variables:
{
"name": "HLSB_S3_SCRATCH_VOL_DIR",
"value": "/hlsb-mounts/backup-s3-scratch"
},
{
"name": "HLSB_VOL_MOUNT_DIR",
"value": "/hlsb-mounts"
},
{
"name": "HLSB_NAME",
"value": "backup-audiobookshelf"
},
{
"name": "HLSB_NS",
"value": "audiobookshelf"
},
{
"name": "HLSB_CONFIG",
"value": "/hlsb-mounts/hlsb-conf.yaml"
},
These provide convenient access to the S3 scratch volume, which is used by rclone for downloading an entire S3 bucket, which is then backed up by restic. The HLSB’s name and namespace also ended up being convenient to have available in the Pod, if only for some meaningful log outputs. And finally it’s nice to have the path to the config file available as well.
And that’s it - that’s the entire Job. I’ve long thought about providing some code snippets used for creating the V1Job, but honestly, it’s just not very interesting. It took me a while to get right, but in the end it was all just value assignments. Here’s an example, the function which creates the Pod Volume spec for the scratch volume:
def create_s3_scratch_volume(backup_conf_spec):
if "scratchVol" not in backup_conf_spec:
logging.error("Did not find scratchVol in backup config.")
return None
pvc = V1PersistentVolumeClaimVolumeSource(
claim_name=backup_conf_spec["scratchVol"], read_only=False)
volume = V1Volume(name=S3_SCRATCH_VOL_NAME,
persistent_volume_claim=pvc)
return volume
The backup_conf_spec
here is the spec.serviceBackup
object from the
HomelabBackupConfig I’ve shown above. And the rest of the roughly 630 lines it
took me to create the V1Job programmatically look very similar, perhaps
with the occasional if
thrown in, but mostly just value assignments and logs.
And because I’m a kind man, I will spare you all of it.
But I still want to show you some code I think could be interesting, so let’s jump to the Job execution.
Job execution
The Job itself will get submitted via the Python API again, nothing special here. But what is special: The current daemon (Kopf’s nomenclature for a long running change handler that doesn’t just run to completion for a specific event) needs to know the current Job has finished, in whatever way. For this I decided to make use of the fact that I was writing asyncronous code. So while the daemon waited for the Job to finish, it should yield. And luckily, Kopf already provides a way to watch events from any k8s object type you might be interested in. So I set up a watcher for events from Jobs:
@kopf.on.event('jobs', labels={'homelab/part-of': 'hlsb'})
def job_event_handler(type, status, labels, **kwargs):
jobs.handle_job_events(type, status, labels)
This filters for the events of all Jobs with the homelab/part-of: hlsb
label.
The actual handling of events then happens in this function:
def handle_job_events(type, status, labels):
if type in ["None", "DELETED", None]:
logging.debug(
f"Ignored job event:\nStatus: {status}\nLabels: {labels}")
return
if "hlsb" not in labels:
logging.error(
"Got event without hlsb label:"
+ "\nStatus: {status}"
+ "\nLabels: {labels}")
return
else:
ns, name = labels["hlsb"].split("_")
job_state = get_job_state(status)
if job_state in [JobState.COMPLETE, JobState.FAILED]:
logging.info(f"Found finished job for {ns}/{name}")
set_job_finished_event(ns, name)
This function only concerns itself with failed or completed jobs. And if it
finds such a job, it sets a “Job finished” event. These events are part of the
Python standard library’s async synchronization primitives, see here.
They’re awaitable objects, where the coroutine waiting on an event can be
woken up by executing the event.set
method. And that’s what happens in the
set_job_finished_event
function called when the Job has been detected as
finished.
So how to determine whether a k8s Job has finished, failed or is still running?
Took me a while to figure out, but the safest way seems to be to look at the
Job.status.conditions
array. If the status
doesn’t have that member at all,
it’s a pretty good bet that the Job is running or pending.
Then you can iterate over the conditions, and if the given condition has type
Failed
and status
True
, the job has failed. Same if type
is Complete
and status
is still True
. Here’s an example:
"conditions": [
{
"lastProbeTime": "2025-01-10T01:30:23Z",
"lastTransitionTime": "2025-01-10T01:30:23Z",
"status": "True",
"type": "Complete"
}
]
And here’s how that looks in Python:
def get_job_state(job_status):
if "conditions" not in job_status or not job_status["conditions"]:
return JobState.RUNNING
for cond in job_status["conditions"]:
if cond["type"] == "Failed" and cond["status"] == "True":
return JobState.FAILED
elif cond["type"] == "Complete" and cond["status"] == "True":
return JobState.COMPLETE
return JobState.RUNNING
Conclusion
And that’s it. To be completely honest, this is the third time I’m typing this
conclusion, and I almost rm -rf
’d this post multiple times. I don’t think
it’s that good or engaging. It seems I’m just not that good at writing programming
blog posts. I hope those of you who made it to this point still got something
out of it.
So, time to do a recap: What did this bring me? And was it a good idea? It all started out with my burning wish to just copy+paste my backup mechanism from Nomad to Kubernetes, more-or-less verbatim. Add to that the fact that I don’t get to do much programming at $dayjob, and I was just missing it a bit. Honestly, if someone were to ask me “What’s your most-used programming language?”, my honest answer would need to be “Whatever Atlassian calls JIRA’s markup language.”
But I also learned quite a bit. I had never really worked with the k8s API before, and this was a good way to dive deeper into it. Although I’m not really convinced that possessing the knowledge that writing small operators is something I’m able to do isn’t just a tad bit dangerous. 😬
My first commit to the repo was on May 9th, 2024. Adding it all up, this took me nine months to do. With rather long interruptions at times, but most of those were more due to motivation than anything else. If I had just used something existing, I would have the k8s migration done by now. But where’s the fun in that?
There’s still a lot I would like to refactor in the implementation. For example, those of you who know the k8s API probably wondered why I went with async events instead of just creating a “watch” on the Jobs and waiting for them to finish via that? I’m honestly not sure. But I would like to dive into k8s API watches. Then there’s the UT code. There’s so much repeating myself in those tests, and especially the mocks. Then there’s still a lot of hardcoded constants in the code I’d like to make configurable via the HomelabBackupConfig or HomelabServiceBackup. And finally, there’s also my wish to finally go and learn Golang. With this operator, I’ve got a really good-sized first project. And I would have the advantage that it’s not a greenfield project. Most of the design is already done, so I would be able to concentrate on writing Go.
I will write one more post on the operator, as part of the Nomad to k8s series, treating it as just another app and describing what the deployment looks like.
And finally, I’m quite happy that I’m done with this now. I’ve been looking forward to being able to continue the k8s migration for way too long.
My longing for continuing the migration has been getting so bad that I’ve started to miss YAML.
Almost.