<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Hlbo on ln --help</title>
    <link>https://blog.mei-home.net/tags/hlbo/</link>
    <description>Recent content in Hlbo on ln --help</description>
    <generator>Hugo -- 0.147.2</generator>
    <language>en</language>
    <lastBuildDate>Fri, 10 Jan 2025 22:10:52 +0100</lastBuildDate>
    <atom:link href="https://blog.mei-home.net/tags/hlbo/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Homelab Backup Operator Part III: Running Backups</title>
      <link>https://blog.mei-home.net/posts/backup-operator-3-running-backups/</link>
      <pubDate>Fri, 10 Jan 2025 22:10:52 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-3-running-backups/</guid>
      <description>Implementing the actual backups</description>
      <content:encoded><![CDATA[<p>In the last couple of months, I&rsquo;ve been working on a k8s operator for running
backups of persistent volumes and S3 buckets in my cluster.
Previous installments of the series can be found <a href="https://blog.mei-home.net/tags/hlbo/">here</a>.</p>
<p>And now, I&rsquo;m finally done with it, and over the weekend, I ran the first
successful backups. Time to describe what I&rsquo;ve implemented, why and how.</p>
<h2 id="recap">Recap</h2>
<p>Let&rsquo;s start with a recap. For a more detailed description of the problem,
have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">this post in my k8s migration series</a>.</p>
<p>In short, my previous backup implementation on my Nomad cluster runs a container
on each host in the cluster. This container then checks which jobs run on the
host and backs up the volumes noted in the config file for that job.
This approach would not work on Kubernetes, because k8s does not provide an API
similar to Nomad&rsquo;s <a href="https://developer.hashicorp.com/nomad/docs/schedulers#system-batch">Sysbatch jobs</a>.
Those types of jobs launch a given container on every host in the cluster, with
a run-to-completion setup.
Kubernetes, on the other hand, only knows Jobs, which cannot be run on every host
simultaneously, and DaemonSets, which don&rsquo;t have run-to-completion semantics.</p>
<p>There would have, of course, been the easy way out: Using an existing solution.
But where&rsquo;s the fun in that?</p>
<p>So I decided to take this chance to learn the Kubernetes API a bit better,
and write my own operator. Because I&rsquo;m relatively familiar with Python,
I decided to use the <a href="https://github.com/nolar/kopf">Kopf</a> framework.</p>
<p>The end goal was to have a per-app configuration in the form of a custom resource
definition which tells the operator which volumes and buckets need to be backed
up. Here is an example I used for my tests:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-service-backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">runNow</span>: <span style="color:#e6db74">&#34;12&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;backup-operator-testing&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mysql-pv-claim</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-pv-claim</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">service-backup-test</span>
</span></span></code></pre></div><p>This object instructs my operator to back up the MySQL and WordPress volumes
of a WordPress deployment I launched just for testing purposes. It also contains
an S3 bucket that&rsquo;s not used by the deployment and just exists to test that part
of the operator.</p>
<h2 id="high-level-overview">High level overview</h2>
<p>Alright, let&rsquo;s assume that we&rsquo;ve go the above example HomelabServiceBackup (HLSB).
What do I want to happen when a backup is triggered?</p>
<p>On the most basic level, I want two things to happen:</p>
<ol>
<li>The two <code>pvc</code> type entries in the <code>spec.backups</code> list are run through restic
to back them up. This means the backup needs access to those volumes.</li>
<li>The <code>s3</code> type bucket is downloaded to a temporary location, and then restic
is run on that temporary location to make an incremental backup of the bucket.</li>
</ol>
<p><strong>BAD THINGS.</strong> This paragraph is the &ldquo;Do as I say, not as I do&rdquo; part of this
post. First of all, running backups on live data is generally a bad idea. You
might end up with inconsistent state in your backup.
Second, there are perfectly good block-level backup capabilities right in Ceph.
With consistency guarantees. But I don&rsquo;t like those. They basically require a
second Ceph cluster as a backup target.
<strong>To reiterate:</strong> What I&rsquo;m doing here is bad. And I know that what I&rsquo;m doing
here is bad. It&rsquo;s working for me, but I&rsquo;m really not advising you to do the same
thing. That&rsquo;s the main reason I will likely never publish the operator I wrote -
I just don&rsquo;t think it&rsquo;s a good idea.</p>
<p>With that out of the way, which steps need to be completed?</p>
<ol>
<li>Determine where each of the <code>pvc</code> type volumes is mounted</li>
<li>Split the volumes into groups by the host they&rsquo;re currently mounted on</li>
<li>For each of those groups:
<ul>
<li>Create a ConfigMap with the configuration for that particular group</li>
<li>Create a Job for each group and launch them, in sequence</li>
</ul>
</li>
<li>Determine whether all jobs were successful and update the HLSB object in the
k8s cluster</li>
</ol>
<p>The HLSB object has a <code>status.state</code> property, which can be one of:</p>
<ul>
<li><code>Running</code></li>
<li><code>Success</code></li>
<li><code>Failed</code></li>
</ul>
<p>These are then later used by a Grafana panel using Prometheus data from
kube-state-metrics to show whether all of the backup were successful.</p>
<p>Now let&rsquo;s have a closer look at the above steps.</p>
<h2 id="implementation-details">Implementation details</h2>
<h3 id="finding-volumes-and-hosts">Finding volumes and hosts</h3>
<p>Let&rsquo;s look at the backup list from the example HLSB above again:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mysql-pv-claim</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-pv-claim</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span></code></pre></div><p>I&rsquo;m ignoring the <code>s3</code> type entry here, because quite frankly, it&rsquo;s not that
interesting.</p>
<p>For the <code>pvc</code> type entries, the very first step is to determine on which host
they&rsquo;re currently mounted. Because the PVC might be RWO, we cannot just mount
them to the backup Pod while the app using it is already running. Instead,
I will use a <a href="https://kubernetes.io/docs/concepts/storage/volumes/#hostpath">hostPath</a>
volume, to mount the directory where the Ceph CSI provider mounts the volumes
into the Backup container.</p>
<p>For that to work, I need to know on which host the volume is actually mounted.
And for apps having multiple pods and associated volumes, these may be multiple
hosts. Which presents yet another challenge: Restic, when backing up to a
repository, locks that repository, so there can only ever be a single writer.
My backup buckets are separated by app, so even if an app has multiple volumes
defined, like the example above, I can only ever run one backup in parallel.
If multiple volumes happen to be mounted on a single host, that&rsquo;s not a problem.
The backup Job for that host can backup all of them. But if they happen to be mounted
on separate hosts, there need to be multiple Jobs, running one after the other.</p>
<p>So how to get the volumes? With the Kubernetes API. As input for our journey,
we&rsquo;ve got the PVC defined, with its name and namespace, in the list of things
to backup.</p>
<p>So the first action is to fetch the PVC via the Kubernetes API. Because I&rsquo;m
writing async code in Kopf, I&rsquo;m using <a href="https://github.com/tomplus/kubernetes_asyncio">kubernetes_asyncio</a>
instead of the official Kubernetes Python lib.</p>
<p>Here&rsquo;s what the PVC looks like, with the <code>wp-pv-claim</code> from the example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;PersistentVolumeClaim&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;app&#34;</span>: <span style="color:#e6db74">&#34;wordpress&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;app.kubernetes.io/managed-by&#34;</span>: <span style="color:#e6db74">&#34;Helm&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;testing&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;wp-pv-claim&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;testing&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;resources&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;requests&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;storageClassName&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeMode&#34;</span>: <span style="color:#e6db74">&#34;Filesystem&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeName&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;capacity&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;phase&#34;</span>: <span style="color:#e6db74">&#34;Bound&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I removed a couple of pieces which aren&rsquo;t that interesting. With this info in
hand, we can go to the next step, fetching the PersistentVolume backing this
claim. This can also be done pretty easily with the <code>read_persistent_volume</code>
API, which only needs a name as input, because PersistentVolumes are cluster
level resources. The name of the volume backing the claim can be taken from
the <code>spec.volumeName</code> property.</p>
<p>The result for the above PVC would look like this, again with unimportant bits
removed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;PersistentVolume&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;capacity&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;csi&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;controllerExpandSecretRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;rook-csi-rbd-provisioner&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;driver&#34;</span>: <span style="color:#e6db74">&#34;rook-ceph.rbd.csi.ceph.com&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;fsType&#34;</span>: <span style="color:#e6db74">&#34;ext4&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;nodeStageSecretRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;rook-csi-rbd-node&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;volumeAttributes&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;clusterID&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;imageFeatures&#34;</span>: <span style="color:#e6db74">&#34;layering,exclusive-lock,object-map,fast-diff&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;imageName&#34;</span>: <span style="color:#e6db74">&#34;csi-vol-3361c6d5-4269-4ab2-bc14-771420b768a7&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;journalPool&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;pool&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;volumeHandle&#34;</span>: <span style="color:#e6db74">&#34;0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;persistentVolumeReclaimPolicy&#34;</span>: <span style="color:#e6db74">&#34;Retain&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;storageClassName&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeMode&#34;</span>: <span style="color:#e6db74">&#34;Filesystem&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;phase&#34;</span>: <span style="color:#e6db74">&#34;Bound&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>One potentially useful side-note: The <code>spec.csi.volumeAttributes.imageName</code>
property is the name of the backing RBD volume in Ceph.</p>
<p>The third thing we need is the <a href="https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/">VolumeAttachment</a>
for the PersistentVolume, which tells us where it is currently mounted.
Sadly, these don&rsquo;t have an API to find the attachment for a given
PersistentVolume (or multiple attachments of the same volume, if it is RWX).
So instead, I&rsquo;m fetching all of the attachments with the <code>list_volume_attachments</code>
API. This one, again, is not namespaced.
Here is the current attachment for the above PersistentVolume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;storage.k8s.io/v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;VolumeAttachment&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;creationTimestamp&#34;</span>: <span style="color:#e6db74">&#34;2024-12-29T10:44:46Z&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;csi-8aee698fd97659b400535fa69969815fad87d2b761d69625d04afc95d53bf252&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;resourceVersion&#34;</span>: <span style="color:#e6db74">&#34;152545692&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;uid&#34;</span>: <span style="color:#e6db74">&#34;6cbe234b-e2c7-4596-a4b6-03d66eb45f5f&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;attacher&#34;</span>: <span style="color:#e6db74">&#34;rook-ceph.rbd.csi.ceph.com&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;nodeName&#34;</span>: <span style="color:#e6db74">&#34;sehith&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;source&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;persistentVolumeName&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;attached&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The <code>spec.nodeName</code> provides us with what we need: The name of the host where
the volume is currently mounted.</p>
<p>Next, how to figure out which <code>hostPath</code> to use to mount that volume into the
backup container? That&rsquo;s done with this small Python function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_ceph_csi_host_path</span>(pv):
</span></span><span style="display:flex;"><span>    volume_handle <span style="color:#f92672">=</span> pv<span style="color:#f92672">.</span>spec<span style="color:#f92672">.</span>csi<span style="color:#f92672">.</span>volume_handle
</span></span><span style="display:flex;"><span>    driver <span style="color:#f92672">=</span> pv<span style="color:#f92672">.</span>spec<span style="color:#f92672">.</span>csi<span style="color:#f92672">.</span>driver
</span></span><span style="display:flex;"><span>    vol_id_digest <span style="color:#f92672">=</span> sha256(bytes(volume_handle, <span style="color:#e6db74">&#39;utf-8&#39;</span>))<span style="color:#f92672">.</span>hexdigest()
</span></span><span style="display:flex;"><span>    p <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span><span style="color:#f92672">.</span>join([
</span></span><span style="display:flex;"><span>        CSI_MOUNT_PREFIX,
</span></span><span style="display:flex;"><span>        driver,
</span></span><span style="display:flex;"><span>        vol_id_digest,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;globalmount&#34;</span>,
</span></span><span style="display:flex;"><span>        volume_handle
</span></span><span style="display:flex;"><span>    ])
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> p
</span></span></code></pre></div><p>It takes the PersistentVolume as input, as well as the <code>CSI_MOUNT_PREFIX</code>, which
is <code>/var/lib/kubelet/plugins/kubernetes.io/csi</code>. In addition, there is a hash of
the <code>spec.csi.volume_handle</code> in the path. The full mount path looks like this:</p>
<pre tabindex="0"><code>/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fb3f47df032796f8ee3f021a858f09772c60bf6b30a75288a4887852a59b071f/globalmount/0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7
</code></pre><p>And yes, for some reason the path contains the volume&rsquo;s <code>volume_handle</code> once in
plain form, and once in hashed form. No idea what&rsquo;s the reason behind that.
Plus, it&rsquo;s worth noting that this is specific to the Ceph CSI driver. The
paths for other drivers would look different.</p>
<h3 id="creating-the-configuration-file">Creating the configuration file</h3>
<p>Because we&rsquo;ve only got two volumes in our example HLSB, let&rsquo;s assume that both
of them are mounted on the same host. So this particular backup would only need
to run a single Job. That Job needs to be told what it&rsquo;s supposed to back up,
which I&rsquo;m doing by creating a fresh ConfigMap for the job. An example for the
two volumes in our example HLSB would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hlsb-conf.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    retention:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      daily: 7
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      monthly: 6
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      weekly: 6
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      yearly: 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    - name: testing-mysql-pv-claim
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    - name: testing-wp-pv-claim</span>
</span></span></code></pre></div><p>This config describes the retention policy and the volumes for this backup.
The retention policy is one of the shortcuts I took. It&rsquo;s actually more of a
global config, which I would normally provide to the backup Job via environment
variables. But because the retention is not just a simple single value, I
decided that it&rsquo;s just easier to add it to the config file, even though it&rsquo;s not
specific to the currently executed backup Job.</p>
<p>The entries in the <code>volumes:</code> list are the combination of the PVC&rsquo;s namespace+name.
These are also the names of the directories under which they&rsquo;re mounted into
the backup container.</p>
<h3 id="creating-the-job">Creating the Job</h3>
<p>As I&rsquo;ve noted above, each host where one of the app&rsquo;s volumes is mounted gets a
Job. These Jobs only have one Pod, running a relatively simple Python app that
reads the config file and runs <code>restic backup</code> on the mount directories of all
the volumes to be backed up.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;batch/v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;Job&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;hlsb&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf_backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;backups&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;backoffLimit&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;completions&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;parallelism&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;template&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;affinity&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;podAntiAffinity&#34;</span>: {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;requiredDuringSchedulingIgnoredDuringExecution&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;labelSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;matchLabels&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                },
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;topologyKey&#34;</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ]
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;containers&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;command&#34;</span>: [
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;hn-backup&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span><span style="display:flex;"><span>                        ],
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;env&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_BACKUP_BUCKET&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SCRATCH_VOL_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/backup-s3-scratch&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_VOL_MOUNT_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NAME&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NS&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_CONFIG&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/hlsb-conf.yaml&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_RESTIC_PW&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;pw&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;restic-pw&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ],
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;image&#34;</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hn-backup:5.0.0&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;volumeMounts&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/audiobookshelf-abs-data-volume&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;readOnly&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ]
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ],
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;nodeSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;kubernetes.io/hostname&#34;</span>: <span style="color:#e6db74">&#34;khepri&#34;</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;priorityClassName&#34;</span>: <span style="color:#e6db74">&#34;system-node-critical&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;restartPolicy&#34;</span>: <span style="color:#e6db74">&#34;Never&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;volumes&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;hostPath&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>                    },
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;configMap&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;defaultMode&#34;</span>: <span style="color:#ae81ff">420</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;backup-confmap-audiobookshelf-backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This one does not fit the HLSB I&rsquo;ve been using as an example, but I hope you can
forgive that oversight. I forgot to save the JSON for one of the jobs I ran
against my example HLSB.</p>
<p>Let&rsquo;s start with the <code>metadata</code> property:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;metadata&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;hlsb&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf_backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;backups&#34;</span>,
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>For identification reasons, all pieces belonging to a certain HLSB have that
HLSB&rsquo;s namespace and name in a HLSB label. In addition, they&rsquo;re all marked as
<code>part-of</code> the Homelab service backup as part of my general labeling scheme.
The name of the Job again contains the namespace and name of the HLSB and is
capped off by a random string. It is generated like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_new_job_name</span>(hlsb_name, hlsb_namespace):
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">{</span>hlsb_namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>hlsb_name<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>uuid<span style="color:#f92672">.</span>uuid4()<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    truncated_name <span style="color:#f92672">=</span> name[<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">61</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> truncated_name[<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;-&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> truncated_name[<span style="color:#ae81ff">0</span>:<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> truncated_name
</span></span></code></pre></div><p>Creating this name was a lot more complicated than I anticipated. Because I
don&rsquo;t currently have any integration tests against a real k8s cluster, this
function was a surprising source for issues. To begin with, the name of a Job
can only be 63 chars long at maximum. So appending the full UUID lead to errors
during the initial testing. Than I thought I had it, with my test HLSB running
backups successfully. And then I implemented the above HLSB, for my
<a href="https://www.audiobookshelf.org/">Audiobookshelf</a> deployment. And I then found
that the cutoff at 61 chars I implemented left the name ending on a <code>-</code>. Which
k8s also doesn&rsquo;t allow, hence the check if the name ends on <code>-</code>. &#x1f926;</p>
<p>Another thing worth mentioning: The backup jobs run in my <code>backups</code> namespace,
not in the app&rsquo;s namespace. This is mostly so that I can comfortably keep all of
the necessary secrets in a separate namespace.</p>
<p>Then let&rsquo;s continue with the spec, more precisely the affinity I&rsquo;ve set up:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;affinity&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;podAntiAffinity&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;requiredDuringSchedulingIgnoredDuringExecution&#34;</span>: [
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;labelSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;matchLabels&#34;</span>: {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;topologyKey&#34;</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>This config prevents multiple backup Jobs from running on the same host. This is
necessary because sometimes, especially with larger S3 buckets to be backed up,
the rclone invocation in the backup container can use quite some resources.
Plus, I just generally didn&rsquo;t want to tax any specific node too much.</p>
<p>Next, the node selector, which ensures that the Job runs on the host where
the required volumes are mounted:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;nodeSelector&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kubernetes.io/hostname&#34;</span>: <span style="color:#e6db74">&#34;khepri&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>This is a definition computed from the values provided by the PVC probing
I&rsquo;ve described above. The volumes to be backed up get grouped by the hosts
they&rsquo;re mounted on, and then every resulting group/host gets one Job.</p>
<p>And then the more interesting part, the volumes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;volumes&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;hostPath&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;configMap&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;defaultMode&#34;</span>: <span style="color:#ae81ff">420</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;backup-confmap-audiobookshelf-backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>The <code>hostPath.path</code> is computed as described above, via the information from the
persistent volume. And the name for the volume is defined as <code>vol-backup-pvc_namespace-pvc_name</code>.
Additionally, the ConfigMap described in the previous section also gets
mounted.</p>
<p>And finally, the container itself. Let&rsquo;s start with the command and image:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;command&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;hn-backup&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span><span style="display:flex;"><span>]<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;image&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hn-backup:5.0.0&#34;</span><span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>I&rsquo;ve kept it pretty simple. And instead of mucking around with lots of command
line switches, the configuration is done via the config file and environment
variables.
I won&rsquo;t say much about the <code>hn-backup</code> program, as it&rsquo;s mainly just a wrapper
around <a href="https://rclone.org/">rclone</a> for fetching S3 buckets to be backed up
and <a href="https://restic.net/">restic</a> for the backups themselves.</p>
<p>The volume mounts look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;volumeMounts&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/audiobookshelf-abs-data-volume&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;readOnly&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>All mounts are done into the <code>/hlsb-mounts</code> directory in the container, which
is then used by hn-backup to construct the paths to be backed up.</p>
<p>Then there&rsquo;s the env variable. Those I use to define the common configuration.
So while the ConfigMap contains options relevant for the current Job, the
env variables contain the common configs.
These options are defined in the HomelabBackupConfig CRD, an example of which
would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-config</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scratchVol</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3-k8s.mei-home.net:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3-k8s.mei-home.net:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">restic-pw</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticRetentionPolicy</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">daily</span>: <span style="color:#ae81ff">7</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">weekly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">monthly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">yearly</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">jobNS</span>: <span style="color:#e6db74">&#34;backups&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.mei-home.net/homelab/hn-backup:5.0.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;hn-backup&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span></code></pre></div><p>This CRD describes options common to all backups, so they don&rsquo;t need to be
repeated in every HomelabServiceBackup manifest.
The most important parts here are the configs for S3 access.</p>
<p><code>s3BackupConfig</code> describes access to the backup buckets to which restic will
write the backup. It contains the host, optionally with port, and how to get
the S3 credentials. Very important to me here was to be able to specify not just
the name of the Secret, but also the key inside the Secret to use for the
specific credential. Because I&rsquo;ve been pretty annoyed by some Helm charts which
only allow specifying the Secret&rsquo;s name and then expecting certain keys to exist.
Which makes using generated secrets, like those created by Ceph Rook for S3
buckets, a real pain.</p>
<p>The <code>s3ServiceConfig</code> has exactly the same structure, but provides the
credentials for access to buckets used by services, which might also be backed
up, and which might live on a completely different system. This is the case for
my Nomad cluster apps right now, for example. Their S3 buckets still live on the
baremetal Ceph cluster, while the backup buckets have already been migrated to
the Ceph Rook cluster. And I decided to make such a setup possible here as well,
just in case I wanted to migrate to a different S3 setup at some point.</p>
<p>The <code>resticPasswordSecret</code> describes the encryption password for the restic
backup repos in the individual S3 buckets.</p>
<p>All of this information is put into environment variables on the Pod running
the backup. Let&rsquo;s start with the backup credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>The configs for the S3 service bucket credentials are very similar, so I won&rsquo;t
repeat them here.
One noteworthy thing about the above setup, especially for the Secrets: The
ServiceAccount for the operator does not require access to any Secrets in
its namespace. Of course, that&rsquo;s a bit cosmetic - because the operator is
allowed to launch Jobs, which in turn can access the secrets. But still,
I found it nice that due to the way I&rsquo;d set things up, the operator itself
would not need to touch any Secrets.</p>
<p>More interesting might be some odds and ends I&rsquo;ve also defined in env variables,
just to make accessing them more convenient.
To my shame, I have to admit that I lied above, when I pretended that I had a
clean separation between generic config going into environment variables and
per-Job configs going into the config file. One piece of per-Job info did end
up in the environment variables, and I have absolutely no idea why I decided
to do that: The name of the backup bucket. No idea why I decided to go
inconsistent just with this one value.</p>
<p>Some other interesting variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SCRATCH_VOL_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/backup-s3-scratch&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_VOL_MOUNT_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NAME&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NS&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_CONFIG&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/hlsb-conf.yaml&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>These provide convenient access to the S3 scratch volume, which is used by
rclone for downloading an entire S3 bucket, which is then backed up by restic.
The HLSB&rsquo;s name and namespace also ended up being convenient to have available
in the Pod, if only for some meaningful log outputs. And finally it&rsquo;s nice to
have the path to the config file available as well.</p>
<p>And that&rsquo;s it - that&rsquo;s the entire Job. I&rsquo;ve long thought about providing some
code snippets used for creating the <a href="https://kubernetes-asyncio.readthedocs.io/en/latest/kubernetes_asyncio.client.models.v1_job.html">V1Job</a>,
but honestly, it&rsquo;s just not very interesting. It took me a while to get right,
but in the end it was all just value assignments.
Here&rsquo;s an example, the function which creates the Pod Volume spec for the
scratch volume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_s3_scratch_volume</span>(backup_conf_spec):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;scratchVol&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> backup_conf_spec:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Did not find scratchVol in backup config.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    pvc <span style="color:#f92672">=</span> V1PersistentVolumeClaimVolumeSource(
</span></span><span style="display:flex;"><span>        claim_name<span style="color:#f92672">=</span>backup_conf_spec[<span style="color:#e6db74">&#34;scratchVol&#34;</span>], read_only<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>)
</span></span><span style="display:flex;"><span>    volume <span style="color:#f92672">=</span> V1Volume(name<span style="color:#f92672">=</span>S3_SCRATCH_VOL_NAME,
</span></span><span style="display:flex;"><span>                      persistent_volume_claim<span style="color:#f92672">=</span>pvc)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> volume
</span></span></code></pre></div><p>The <code>backup_conf_spec</code> here is the <code>spec.serviceBackup</code> object from the
HomelabBackupConfig I&rsquo;ve shown above. And the rest of the roughly 630 lines it
took me to create the V1Job programmatically look very similar, perhaps
with the occasional <code>if</code> thrown in, but mostly just value assignments and logs.</p>
<p>And because I&rsquo;m a kind man, I will spare you all of it.</p>
<p>But I still want to show you some code I think could be interesting, so let&rsquo;s
jump to the Job execution.</p>
<h3 id="job-execution">Job execution</h3>
<p>The Job itself will get submitted via the Python API again, nothing special
here. But what is special: The current daemon (Kopf&rsquo;s nomenclature for a long
running change handler that doesn&rsquo;t just run to completion for a specific event)
needs to know the current Job has finished, in whatever way. For this I decided
to make use of the fact that I was writing asyncronous code. So while the
daemon waited for the Job to finish, it should yield. And luckily, Kopf
already provides a way to watch events from any k8s object type you might
be interested in. So I set up a watcher for events from Jobs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.event</span>(<span style="color:#e6db74">&#39;jobs&#39;</span>, labels<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#39;homelab/part-of&#39;</span>: <span style="color:#e6db74">&#39;hlsb&#39;</span>})
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">job_event_handler</span>(type, status, labels, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    jobs<span style="color:#f92672">.</span>handle_job_events(type, status, labels)
</span></span></code></pre></div><p>This filters for the events of all Jobs with the <code>homelab/part-of: hlsb</code> label.</p>
<p>The actual handling of events then happens in this function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_job_events</span>(type, status, labels):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> type <span style="color:#f92672">in</span> [<span style="color:#e6db74">&#34;None&#34;</span>, <span style="color:#e6db74">&#34;DELETED&#34;</span>, <span style="color:#66d9ef">None</span>]:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Ignored job event:</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Labels: </span><span style="color:#e6db74">{</span>labels<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;hlsb&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> labels:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;Got event without hlsb label:&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Status: </span><span style="color:#e6db74">{status}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Labels: </span><span style="color:#e6db74">{labels}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        ns, name <span style="color:#f92672">=</span> labels[<span style="color:#e6db74">&#34;hlsb&#34;</span>]<span style="color:#f92672">.</span>split(<span style="color:#e6db74">&#34;_&#34;</span>)
</span></span><span style="display:flex;"><span>        job_state <span style="color:#f92672">=</span> get_job_state(status)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> job_state <span style="color:#f92672">in</span> [JobState<span style="color:#f92672">.</span>COMPLETE, JobState<span style="color:#f92672">.</span>FAILED]:
</span></span><span style="display:flex;"><span>            logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Found finished job for </span><span style="color:#e6db74">{</span>ns<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>            set_job_finished_event(ns, name)
</span></span></code></pre></div><p>This function only concerns itself with failed or completed jobs. And if it
finds such a job, it sets a &ldquo;Job finished&rdquo; event. These events are part of the
Python standard library&rsquo;s async synchronization primitives, see <a href="https://docs.python.org/3/library/asyncio-sync.html#asyncio.Event">here</a>.
They&rsquo;re awaitable objects, where the coroutine waiting on an event can be
woken up by executing the <code>event.set</code> method. And that&rsquo;s what happens in the
<code>set_job_finished_event</code> function called when the Job has been detected as
finished.</p>
<p>So how to determine whether a k8s Job has finished, failed or is still running?
Took me a while to figure out, but the safest way seems to be to look at the
<code>Job.status.conditions</code> array. If the <code>status</code> doesn&rsquo;t have that member at all,
it&rsquo;s a pretty good bet that the Job is running or pending.
Then you can iterate over the conditions, and if the given condition has <code>type</code>
<code>Failed</code> and <code>status</code> <code>True</code>, the job has failed. Same if <code>type</code> is <code>Complete</code>
and <code>status</code> is still <code>True</code>. Here&rsquo;s an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;conditions&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;lastProbeTime&#34;</span>: <span style="color:#e6db74">&#34;2025-01-10T01:30:23Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;lastTransitionTime&#34;</span>: <span style="color:#e6db74">&#34;2025-01-10T01:30:23Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: <span style="color:#e6db74">&#34;True&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;Complete&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>And here&rsquo;s how that looks in Python:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_job_state</span>(job_status):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;conditions&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> job_status <span style="color:#f92672">or</span> <span style="color:#f92672">not</span> job_status[<span style="color:#e6db74">&#34;conditions&#34;</span>]:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>RUNNING
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> cond <span style="color:#f92672">in</span> job_status[<span style="color:#e6db74">&#34;conditions&#34;</span>]:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> cond[<span style="color:#e6db74">&#34;type&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;Failed&#34;</span> <span style="color:#f92672">and</span> cond[<span style="color:#e6db74">&#34;status&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;True&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>FAILED
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> cond[<span style="color:#e6db74">&#34;type&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;Complete&#34;</span> <span style="color:#f92672">and</span> cond[<span style="color:#e6db74">&#34;status&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;True&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>COMPLETE
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>RUNNING
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>And that&rsquo;s it. To be completely honest, this is the third time I&rsquo;m typing this
conclusion, and I almost <code>rm -rf</code>&rsquo;d this post multiple times. I don&rsquo;t think
it&rsquo;s that good or engaging. It seems I&rsquo;m just not that good at writing programming
blog posts. I hope those of you who made it to this point still got something
out of it.</p>
<p>So, time to do a recap: What did this bring me? And was it a good idea?
It all started out with my burning wish to just copy+paste my backup mechanism
from Nomad to Kubernetes, more-or-less verbatim. Add to that the fact that I
don&rsquo;t get to do much programming at $dayjob, and I was just missing it a bit.
Honestly, if someone were to ask me &ldquo;What&rsquo;s your most-used programming language?&rdquo;,
my honest answer would need to be &ldquo;Whatever Atlassian calls JIRA&rsquo;s markup language.&rdquo;</p>
<p>But I also learned quite a bit. I had never really worked with the k8s API
before, and this was a good way to dive deeper into it. Although I&rsquo;m not really
convinced that possessing the knowledge that writing small operators is something
I&rsquo;m able to do isn&rsquo;t just a tad bit dangerous. &#x1f62c;</p>
<p>My first commit to the repo was on May 9th, 2024. Adding it all up, this took
me nine months to do. With rather long interruptions at times, but most of those
were more due to motivation than anything else. If I had just used something
existing, I would have the k8s migration done by now. But where&rsquo;s the fun in
that?</p>
<p>There&rsquo;s still a lot I would like to refactor in the implementation. For example,
those of you who know the k8s API probably wondered why I went with async events
instead of just creating a &ldquo;watch&rdquo; on the Jobs and waiting for them to finish via that? I&rsquo;m
honestly not sure. But I would like to dive into k8s API watches.
Then there&rsquo;s the UT code. There&rsquo;s so much repeating myself in those tests,
and especially the mocks. Then there&rsquo;s still a lot of hardcoded constants in
the code I&rsquo;d like to make configurable via the HomelabBackupConfig or
HomelabServiceBackup.
And finally, there&rsquo;s also my wish to finally go and learn Golang. With this
operator, I&rsquo;ve got a really good-sized first project. And I would have the
advantage that it&rsquo;s not a greenfield project. Most of the design is already done,
so I would be able to concentrate on writing Go.</p>
<p>I will write one more post on the operator, as part of the Nomad to k8s series,
treating it as just another app and describing what the deployment looks like.</p>
<p>And finally, I&rsquo;m quite happy that I&rsquo;m done with this now. I&rsquo;ve been looking
forward to being able to continue the k8s migration for way too long.</p>
<p>My longing for continuing the migration has been getting so bad that I&rsquo;ve started
to miss YAML.</p>
<p>Almost.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part II: Basic Framework</title>
      <link>https://blog.mei-home.net/posts/backup-operator-2-basic-framework/</link>
      <pubDate>Sat, 25 May 2024 19:40:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-2-basic-framework/</guid>
      <description>My first steps in the operator implementation with kopf</description>
      <content:encoded><![CDATA[<p>In the <a href="https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/">last post</a>
of my <a href="https://blog.mei-home.net/tags/hlbo/">Backup Operator series</a>, I lamented the state
of permissions in the <a href="https://github.com/nolar/kopf">kopf</a> Kubernetes Operator
framework. After some thinking, I decided to go ahead with kopf and just accept
the permission/RBAC ugliness.</p>
<p>I&rsquo;ve just finished implementing the first cluster state change in the operator,
so I thought this is a good place to write a post about my approach and setup.</p>
<p>The journey up to now has been pretty interesting. I learned a bit about the
Kubernetes API, and a lot about how cooperative multitasking with coroutines
works in Python.</p>
<h2 id="why-write-an-entire-operator">Why write an entire operator?</h2>
<p>I&rsquo;ve already written some things about my backup setup in
<a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">the Kubernetes migration post</a>
which triggered this operator implementation.</p>
<p>Just to give a short refresher: I need to run daily backups on the persistent
volumes and S3 buckets of the services running in my Homelab. I&rsquo;m currently
doing that by launching a run-to-completion job on every one of my Nomad hosts
which backs up all the volumes which happen to be mounted on their host at the
time.
I can&rsquo;t do that in k8s, because it seems to lack a run-to-completion,
run-on-every-host type of workload. <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs</a>
can do the run-to-completion part, and <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSets</a>
can do the run-on-every-host part, but there doesn&rsquo;t seem to be a workload type
which can do both in one.
And that&rsquo;s why I&rsquo;ve decided to go with writing my own operator. There are two
main benefits this approach will have, compared to my previous one. First, I
will be able to explicitly schedule the second stage of my backup, backing up
certain backups onto an external disk. Right now, I just schedule that phase an
hour after the previous one.
Second, I will be able to package the backup config for each individual service.
In my current approach, I have the definition of which volumes and buckets to
back up configured in the backup job&rsquo;s config. With the Kubernetes operator, I
will introduce a CRD that can be deployed together with each service, e.g. as
part of the Helm chart.</p>
<h2 id="overview-of-the-approach">Overview of the approach</h2>
<p>As I&rsquo;ve mentioned above, I will write the operator in Python and use the
<a href="https://github.com/nolar/kopf">kopf</a> framework to do it. This is simply
because I&rsquo;m currently familiar with three languages: C++, C and Python. And
Python is the most comfortable of the three.
Due to the RBAC problems I described <a href="https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/">in my last post</a>, I briefly looked into other possibilities. But the Kubernetes ecosystem seems
to mostly live in Golang, which I haven&rsquo;t written anything in yet. And the main
goal currently is to get ahead with the Homelab migration to k8s, not to learn
yet another programming language. &#x1f642;</p>
<p>There will be a total of three custom resources the operator will look for.
The first one, HomelabBackupConfig, will be a one-per-cluster resource and
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apiextensions.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CustomResourceDefinition</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelabbackupconfigs.mei-home.net</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scope</span>: <span style="color:#ae81ff">Namespaced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#ae81ff">mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">names</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">plural</span>: <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">singular</span>: <span style="color:#ae81ff">homelabbackupconfig</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">versions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">v1alpha1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">served</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">schema</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">openAPIV3Schema</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;This object describes the general configuration of all backups created by the Homelab backup operator.&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The configuration for all service level backups created by the operator instance.&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">schedule</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The schedule on which all service level backups will be executed.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">scratchVol</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the PVC for scratch space. Needs to be a RWX volume.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the backup S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the service buckets which should be backed up.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the buckets which should be backed up.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the service S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The Secret with the Restic password for the backups.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the password.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">secretKey</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName Secret which contains the Restic password.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration of the Job launched for each service backup.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The container image to be used for all service Jobs.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The command handed to Job.spec.template.containers.command&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Additional entries for the containers.env list. These entries cann only be of the name,value variety. Other forms of env entries are not supported for now.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the env variable to add.&#34;</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">value</span>:
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The value of the env variable to add.&#34;</span>
</span></span></code></pre></div><p>This resource configures all of the common settings which will be shared by
all of the individual service backups I will describe next.</p>
<p>My backups will be running with <a href="https://restic.net/">restic</a>, backing up into
S3 buckets on my Ceph Rook cluster for each service.
As all service level backups will work like this, and back up to the same
S3 service, it makes sense to centralize the configuration, instead of copying
it into every service backup CRD.
This configuration happens in the <code>s3BackupConfig</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the backup S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span></code></pre></div><p>Pretty important to me is the flexibility when it comes to what the k8s Secrets
have to look like. I&rsquo;ve been annoyed with some of the Helm charts I&rsquo;ve been using
for prescribing exactly what the properties in the Secret need to be named,
so I introduced a config option here to not only define the Secret&rsquo;s name, but
also the name of the property for the access and secret keys for the S3
credentials.
The <code>s3ServiceConfig</code> has the same structure, but will be used for the
credentials for accessing the S3 buckets of services, instead of the S3 backup
buckets.</p>
<p>The <code>resticPasswordSecret</code> is the configuration of the restic password to
unlock the restic encryption keys.</p>
<p>Finally, there&rsquo;s the <code>jobSpec</code>. This will likely still change in the future,
as I have not yet implemented that part. This spec will be used to create the
<a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs</a> which
will run the actual backup. One will be created for each of the
HomelabServiceBackup instances I will describe next. I will not go into detail
on this part of the CRD today and instead keep it until I&rsquo;ve actually implemented
the Job creation.</p>
<p>Then there&rsquo;s the HomelabServiceBackup CRD:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apiextensions.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CustomResourceDefinition</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelabservicebackups.mei-home.net</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scope</span>: <span style="color:#ae81ff">Namespaced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#ae81ff">mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">names</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">plural</span>: <span style="color:#ae81ff">homelabservicebackups</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">singular</span>: <span style="color:#ae81ff">homelabservicebackup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">versions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">v1alpha1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">served</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">schema</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">openAPIV3Schema</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;This object describes the configuration of the backups for a specific service.&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">backupBucketName</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the S3 bucket to which the backup should be made.&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The elements, like PVCs and S3 buckets to back up for this service.&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The Type of the element, either s3 or pvc.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">enum</span>:
</span></span><span style="display:flex;"><span>                          - <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>                          - <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the element, either the name of an S3 bucket or a PVC&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">status</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Status of this service backup&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">nextBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Date and time of the next backup run&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">lastBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Status of latest backup&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">state</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">integer</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;State of the last backup. 1: Failed, 0: Successful&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">timestamp</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Date and time the last backup run was executed&#34;</span>
</span></span></code></pre></div><p>This CRD describes the backups to be done for an individual service. It contains
two main parts, the status and the spec. In the spec, I&rsquo;m configuring the
S3 bucket to be used for the backup, and a list of things to back up. Right now,
I&rsquo;ve only got PersistentVolumeClaims and S3 buckets in mind. An instantiation
might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-service-backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backup-tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;non-existant-bucket&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">non-existant-pvc</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">another-non-existant-pvc</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">non-existant-S3-bucket</span>
</span></span></code></pre></div><h2 id="kopf-overview">Kopf overview</h2>
<p>Kopf has a relatively nice approach to listening for changes to resources it is
supposed to be watching. It makes use of Kubernetes&rsquo; watch API. And then it
combines some Kubernetes events to provide a nicer interface than could be
provided when just using plain events.</p>
<p>The main method are event handlers for a small group of events. These handlers
can be defined for each of four different event categories:</p>
<ol>
<li>Creation of a new resource</li>
<li>Resume of the handler for an already existing resource after an operator
restart</li>
<li>Deletion of a resource</li>
<li>Change of a resource</li>
</ol>
<p>In addition, there are daemons, which are long running handlers. Instead of
running to completion for every event, they stay active from the moment a
resource is created to the moment it is deleted. They are automatically started
up after operator restarts as well.</p>
<p>Finally, there is a generic event handler, which does get the full firehose of
Kubernetes events, without the nice provisioning of diffs and the like you get
for kopf&rsquo;s event category handlers.</p>
<p>The handlers are Python functions with a decorator which describes the
event group it should listen on and the CRD it should listen for. Those
handlers can also be combined, so you can have the same Python function
handling both, creation of a new resource and resume after the operator restarts.</p>
<p>Handlers generally come in two flavors, using threads or using coroutines.
I spontaneously decided to go with the coroutine approach, because I had never
before used Python&rsquo;s <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a>
feature, but I was familiar with coroutines in C and C++.</p>
<h2 id="handling-the-homelabbackupconfig-crd">Handling the HomelabBackupConfig CRD</h2>
<p>There isn&rsquo;t too much to do with the generic handling for this CRD. There is
only ever supposed to be one of those, and the only thing which needs to be done
with it is to store it in memory in the operator and make it available to the
handlers of the HomelabServiceBackup CRD, so they can use the configs to launch
their job.</p>
<p>The implementation of the handlers themselves I kept pretty simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_backup_config <span style="color:#66d9ef">as</span> backupconf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.startup</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_backup_config_cond</span>(memo, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    memo<span style="color:#f92672">.</span>backup_conf_cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.create</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.resume</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.update</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_resume_update_handler</span>(spec, meta, memo, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> backupconf<span style="color:#f92672">.</span>handle_creation_and_change(meta[<span style="color:#e6db74">&#34;name&#34;</span>],
</span></span><span style="display:flex;"><span>                                                memo<span style="color:#f92672">.</span>backup_conf_cond, spec)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.delete</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">delete_handler</span>(meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    backupconf<span style="color:#f92672">.</span>handle_deletion(meta[<span style="color:#e6db74">&#34;name&#34;</span>])
</span></span></code></pre></div><p>This sets up a combined handler for creation, resumption and updates of the
CRD. It also creates a <a href="https://docs.python.org/3/library/asyncio-sync.html#condition">Condition</a>
which I will later use in the HomelabServiceBackup handlers to notify them
when the config changed.</p>
<p>The <code>homelab_backup_config</code> module looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> datetime
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> logging
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> croniter
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>__CONFIG <span style="color:#f92672">=</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_creation_and_change</span>(name, cond, spec):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">global</span> __CONFIG
</span></span><span style="display:flex;"><span>    __CONFIG <span style="color:#f92672">=</span> spec
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Set backup config from </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> to: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        cond<span style="color:#f92672">.</span>notify_all()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_deletion</span>(name):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">global</span> __CONFIG
</span></span><span style="display:flex;"><span>    __CONFIG <span style="color:#f92672">=</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>warning(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Config </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> deleted. No backups will be scheduled!&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_config</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> __CONFIG
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_next_service_time</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> __CONFIG:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Service schedule time requested, but no config present.&#34;</span>
</span></span><span style="display:flex;"><span>                      )
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> (<span style="color:#e6db74">&#34;serviceBackup&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">or</span> <span style="color:#e6db74">&#34;schedule&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>]):
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Config serviceBackup.schedule is missing.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    now <span style="color:#f92672">=</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> croniter<span style="color:#f92672">.</span>croniter(__CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>][<span style="color:#e6db74">&#34;schedule&#34;</span>], now
</span></span><span style="display:flex;"><span>    )<span style="color:#f92672">.</span>get_next(datetime<span style="color:#f92672">.</span>datetime)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_service_backup_spec</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> __CONFIG <span style="color:#f92672">or</span> <span style="color:#e6db74">&#34;serviceBackup&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Config serviceBackup is missing.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> __CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>]
</span></span></code></pre></div><p>As I said, I kept it <em>really</em> simple.
This implementation stores the spec as received from the handler in a module
level variable <code>__CONFIG</code> and then has a couple functions to make it available
to the rest of the operator.
The only really interesting part is the <code>get_next_service_time</code> function. It
looks at the <code>spec.serviceBackup.schedule</code> value, which is a string in cron
format, for example like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 18 * * *&#34;</span>
</span></span></code></pre></div><p>I decided to keep all times in UTC internally, just to prevent confusing myself.
Instead of writing my own cron parser, I used <a href="https://github.com/kiorky/croniter">croniter</a>.
It doesn&rsquo;t just provide a parser for the cron format, but also provides a helper
to get the time and date of the next scheduled execution, which I make use of
here.</p>
<h2 id="implementing-the-homelabservicebackup-handling">Implementing the HomelabServiceBackup handling</h2>
<p>The HomelabServiceBackup resource describes the backup for an individual
service. In the operator, it will ultimately need to launch a Job to run the
backup of the configured PersistentVolumeClaims and S3 buckets belonging to the
service.</p>
<p>The first thing I implemented was the waiting for the scheduled execution time
of the backup. For this, I initially thought to use kopf&rsquo;s <a href="https://kopf.readthedocs.io/en/stable/timers/">timers</a>,
but quickly realized that those only allow for a fix interval. But I needed an
adaptable wait, depending on the schedule configured on the HomelabBackupConfig.
For that reason, I reached for kopf&rsquo;s <a href="https://kopf.readthedocs.io/en/stable/daemons/">Daemons</a>.
These are long running handlers. One is created for each instance of the watched
resource.</p>
<p>The handler function itself is again simple, as I just call a separate function
in a module:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_service_backup <span style="color:#66d9ef">as</span> servicebackup
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.startup</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_backup_config_cond</span>(memo, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    memo<span style="color:#f92672">.</span>backup_conf_cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.daemon</span>(<span style="color:#e6db74">&#34;homelabservicebackups&#34;</span>, initial_delay<span style="color:#f92672">=</span><span style="color:#ae81ff">30</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">service_backup_daemon</span>(name, namespace, spec, memo, stopped, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> servicebackup<span style="color:#f92672">.</span>homelab_service_daemon(name, namespace, spec, memo,
</span></span><span style="display:flex;"><span>                                               stopped)
</span></span></code></pre></div><p>The daemon will spend most of its time waiting, as it only needs to do something
in two cases:</p>
<ol>
<li>When the scheduled time for a backup has arrived</li>
<li>When the backup schedule changes</li>
</ol>
<p>Let&rsquo;s look at the second case first. This is the reason for the usage of the
memo. The <a href="https://kopf.readthedocs.io/en/stable/memos/">memo</a> is a generic
container handled by kopf and made available to all handlers. I&rsquo;m creating a
Condition during operator startup. Every daemon will wait on this condition,
and the handler for HomelabBackupConfig updates will notify all waiters on
that condition when the HomelabBackupConfig changes. This is necessary because
the schedule is configured in the HomelabBackupConfig, so daemons might need to
adjust their wait timer.</p>
<p>Here is what that waiting currently looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">WakeupReason</span>(Enum):
</span></span><span style="display:flex;"><span>    TIMER <span style="color:#f92672">=</span> auto()
</span></span><span style="display:flex;"><span>    SCHEDULE_UPDATE <span style="color:#f92672">=</span> auto()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">cond_waiter</span>(cond):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> cond<span style="color:#f92672">.</span>wait()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">wait_for</span>(waittime, update_condition):
</span></span><span style="display:flex;"><span>    cond_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(cond_waiter(update_condition),
</span></span><span style="display:flex;"><span>                                    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;condwait&#34;</span>)
</span></span><span style="display:flex;"><span>    sleep_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(asyncio<span style="color:#f92672">.</span>sleep(waittime), name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sleepwait&#34;</span>)
</span></span><span style="display:flex;"><span>    done, pending <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait([cond_task, sleep_task],
</span></span><span style="display:flex;"><span>                                       return_when<span style="color:#f92672">=</span>asyncio<span style="color:#f92672">.</span>FIRST_COMPLETED)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> p <span style="color:#f92672">in</span> pending:
</span></span><span style="display:flex;"><span>        p<span style="color:#f92672">.</span>cancel()
</span></span><span style="display:flex;"><span>    wake_reasons <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> done:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;condwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;sleepwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>TIMER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wake_reasons
</span></span></code></pre></div><p>As I&rsquo;ve noted before, I&rsquo;m using Python&rsquo;s asyncio module, so instead of threads,
I&rsquo;m using coroutines. Luckily, the Python standard library already provides the
means to wait for multiple tasks and even tell me which task is done waiting
when the function returns. So here, I&rsquo;m creating two tasks. One is waiting on
the given <code>waittime</code>. This is the difference between the current time and the
next scheduled backup, in seconds. The second one is waiting on the condition
I mentioned previously. This condition will be notified by the handler for the
HomelabBackupConfig when that resource changes. This is necessary because the
daemon might need to adjust its wait time if the schedule for backups has changed.</p>
<p>Finally, I&rsquo;m checking which task finished waiting, and return a list of
enums to tell the caller why it woke up, to take different actions.</p>
<p>Then there&rsquo;s the main loop of the daemon:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">homelab_service_daemon</span>(name, namespace, spec, memo, stopped):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Launching daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#f92672">not</span> stopped:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;In main loop of </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        next_run <span style="color:#f92672">=</span> backupconfig<span style="color:#f92672">.</span>get_next_service_time()
</span></span><span style="display:flex;"><span>        wait_time <span style="color:#f92672">=</span> next_run <span style="color:#f92672">-</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> wait_for(wait_time<span style="color:#f92672">.</span>total_seconds(), memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Finished daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span></code></pre></div><p>This doesn&rsquo;t do much at the moment, as I haven&rsquo;t implemented the backups
themselves yet. It runs in an endless loop, checking the <code>stopped</code> variable,
which will be set to <code>True</code> by kopf if the HomelabServiceBackup this daemon is
handling is deleted or the operator is stopped. Kopf will also throw a
<a href="https://docs.python.org/3/library/asyncio-exceptions.html#asyncio.CancelledError">CancelledError</a>
into the coroutine in those cases, so the daemon will also be stopped when it
is currently waiting.</p>
<p>The waiting time is computed with the <code>get_next_service_time</code> function I discussed
above.</p>
<h2 id="implementing-status-updates">Implementing status updates</h2>
<p>The goal which triggered this blog post was me finally getting the scheduled
triggering and updates of the HomelabServiceBackup&rsquo;s status implemented, which
was my first change of the cluster status via the operator.</p>
<p>My goal was to have each daemon update a field in its HomelabServiceBackup
resource with the scheduled time of the next backup, which would ultimately
look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">status</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nextBackup</span>: <span style="color:#e6db74">&#34;2024-05-25T18:30:00+00:00&#34;</span>
</span></span></code></pre></div><p>The <code>status.nextBackup</code> field is what I was interested in setting. I first
looked at the <a href="https://github.com/kubernetes-client/python">Kubernetes Python Client</a>,
but found that it did not support asyncio. But I quickly found
<a href="https://github.com/tomplus/kubernetes_asyncio">kubernetes_asyncio</a>.
An interesting thing I learned while looking at these two libraries is that they
were, for the most part, not hand-written. Instead, they use the <a href="https://github.com/openapitools/openapi-generator">openapi-generator</a>
to automatically generate the API code from the Kubernetes API definition. Which
is pretty cool to see, to be honest. It leads to boatloads of repeated code, but
the alternative of writing all that code by hand probably doesn&rsquo;t bear thinking
about.</p>
<p>Of course, one of the downsides of using the Python API client was that it would
not have API support for the CRDs I&rsquo;ve written for my own cluster. Instead,
I needed to use the generic <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md">CustomObjectsAPI</a>.</p>
<p>Initially, because I wanted to specifically update the status of my resources,
I looked at the <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#patch_namespaced_custom_object_status">patch_namespaced_custom_object_status</a>
API. But running that API against a resource which did not have the status set
yet just returns a 404. It took me a <em>long while</em> to realize that the 404 was
not due to an error on my end, but simply because the resource needed to have
a status already for the status API to work.</p>
<p>So instead, I reached for the <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#patch_namespaced_custom_object">patch_namespaced_custom_object</a>
API. That, too, had a lot of issues. I initially thought I was the first person
to use the Python API package for accessing custom objects.
All the examples I could find stated that this should work:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kubernetes_asyncio <span style="color:#f92672">import</span> client, config
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kubernetes_asyncio.client.api_client <span style="color:#f92672">import</span> ApiClient
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pprint <span style="color:#f92672">import</span> pprint
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">main</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> config<span style="color:#f92672">.</span>load_kube_config()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> ApiClient() <span style="color:#66d9ef">as</span> api:
</span></span><span style="display:flex;"><span>        mine <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>CustomObjectsApi(api)
</span></span><span style="display:flex;"><span>        res <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> mine<span style="color:#f92672">.</span>patch_namespaced_custom_object(<span style="color:#e6db74">&#34;mei-home.net&#34;</span>, <span style="color:#e6db74">&#34;v1alpha1&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;backups&#34;</span>, <span style="color:#e6db74">&#34;homelabservicebackups&#34;</span>, <span style="color:#e6db74">&#34;test-service-backup&#34;</span>,
</span></span><span style="display:flex;"><span>                body<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;status&#34;</span>:{<span style="color:#e6db74">&#34;lastBackup&#34;</span>: {<span style="color:#e6db74">&#34;state&#34;</span>:<span style="color:#ae81ff">1</span>, <span style="color:#e6db74">&#34;timestamp&#34;</span>:<span style="color:#e6db74">&#34;foobar&#34;</span>}}}
</span></span><span style="display:flex;"><span>                )
</span></span><span style="display:flex;"><span>        pprint(res)
</span></span><span style="display:flex;"><span>asyncio<span style="color:#f92672">.</span>run(main())
</span></span></code></pre></div><p>But it did not. Instead, I kept getting errors like this back:</p>
<pre tabindex="0"><code>kubernetes_asyncio.client.exceptions.ApiException: (415)
Reason: Unsupported Media Type
HTTP response body: {&#34;kind&#34;:&#34;Status&#34;,&#34;apiVersion&#34;:&#34;v1&#34;,&#34;metadata&#34;:{},&#34;status&#34;:&#34;Failure&#34;,
&#34;message&#34;:&#34;the body of the request was in an unknown format - accepted media types
include: application/json-patch+json, application/merge-patch+json,
application/apply-patch+yaml&#34;,
&#34;reason&#34;:&#34;UnsupportedMediaType&#34;,
&#34;code&#34;:415}
</code></pre><p>I finally found <a href="https://github.com/tomplus/kubernetes_asyncio/issues/68">this bug</a>.
It seems to indicate that the issue is a wrong media type getting set in the
<code>content-type</code> header. This lead me to the <a href="https://github.com/tomplus/kubernetes_asyncio/blob/master/examples/patch.py">examples</a>
file, which shows that a specific content type could be forced, by adding
<code>_content_type='application/merge-patch+json'</code> as a parameter to the
<code>patch_namespaced_custom_object</code> call. With that addition, I was finally able
to properly update the time for the next backup in the status, by adding these
lines to the <code>homelab_service_daemon</code> function from before:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>status_body <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;nextBackup&#34;</span>: next_run<span style="color:#f92672">.</span>isoformat()
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">await</span> kubeapi<span style="color:#f92672">.</span>patch_mei_home_custom_object(
</span></span><span style="display:flex;"><span>    namespace, kubeapi<span style="color:#f92672">.</span>HOMELABSERVICEBACKUP_PLURAL, name, status_body)
</span></span></code></pre></div><p>The <code>patch_mei_home_custom_object</code> function is just a thin wrapper around
the <code>patch_namespaced_custom_object</code> function from above.</p>
<h2 id="some-notes-on-testing">Some notes on testing</h2>
<p>Writing UTs was not always simple here. First of all, I needed to employ a lot
of mocks to remove any attempted k8s cluster access. I&rsquo;m seriously considering
buying some additional Pis and setting up a test cluster. &#x1f601;</p>
<p>My first generic issue was: How do I even properly unit test asyncio code?
Luckily, that issue was easy to answer, at least in the abstract: I used
<a href="https://github.com/pytest-dev/pytest-asyncio">pytest-asyncio</a>. It allows me to
add <code>@pytest.mark.asyncio</code> at the top of my test function, or entire test classes,
and the pytest plugin will automatically setup the event loop infrastructure
and execute the test functions with it.</p>
<p>Still, I had a particular challenge with testing the waiting code, specifically
when it comes to testing whether the Condition properly fires. As a reminder,
here is what the code looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">cond_waiter</span>(cond):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> cond<span style="color:#f92672">.</span>wait()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">wait_for</span>(waittime, update_condition):
</span></span><span style="display:flex;"><span>    cond_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(cond_waiter(update_condition),
</span></span><span style="display:flex;"><span>                                    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;condwait&#34;</span>)
</span></span><span style="display:flex;"><span>    sleep_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(asyncio<span style="color:#f92672">.</span>sleep(waittime), name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sleepwait&#34;</span>)
</span></span><span style="display:flex;"><span>    done, pending <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait([cond_task, sleep_task],
</span></span><span style="display:flex;"><span>                                       return_when<span style="color:#f92672">=</span>asyncio<span style="color:#f92672">.</span>FIRST_COMPLETED)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> p <span style="color:#f92672">in</span> pending:
</span></span><span style="display:flex;"><span>        p<span style="color:#f92672">.</span>cancel()
</span></span><span style="display:flex;"><span>    wake_reasons <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> done:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;condwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;sleepwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>TIMER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wake_reasons
</span></span></code></pre></div><p>And here is my initial attempt at the test code:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> unittest.mock <span style="color:#f92672">import</span> AsyncMock, Mock
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_service_backup <span style="color:#66d9ef">as</span> sut
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@pytest.mark.asyncio</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">TestCondWait</span>:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">test_cond_wait_works</span>(self):
</span></span><span style="display:flex;"><span>        cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>        test_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(sut<span style="color:#f92672">.</span>wait_for(<span style="color:#ae81ff">15</span>, cond))
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>            cond<span style="color:#f92672">.</span>notify_all()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> test_task
</span></span><span style="display:flex;"><span>        res <span style="color:#f92672">=</span> test_task<span style="color:#f92672">.</span>result()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">assert</span> res <span style="color:#f92672">==</span> [sut<span style="color:#f92672">.</span>WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE]
</span></span></code></pre></div><p>I&rsquo;m trying to test whether the Condition works properly. My thinking is that
the code path goes like this:</p>
<ol>
<li>[testcode]: Creates an async task ready to run, executing the function under
test.</li>
<li>[appcode]: Runs until it hits the <code>asyncio.wait</code> line</li>
<li>[appcode]: Now waits for either the timer to expire or the Condition to be
triggered, hands back execution to the [testcode]</li>
<li>[testcode]: Executes the <code>cond.notify_all</code> function</li>
<li>[testcode]: Awaits the task, handing execution back to [appcode]</li>
<li>[appcode]: Gets notified in <code>cond_waiter</code> and runs to completion</li>
</ol>
<p>But that was not what happened. Sprinkling in some <code>print</code> statements, I found
that the test code continues running after the <code>create_task</code> call, straight
through the <code>notify_call</code> call. The first time the wait_for gets to do anything
is when the test code hits the <code>await test_task</code> line. And only then does it
reach the <code>await cond.wait</code> line. But at this point, the test code already
executed the <code>notify_all</code>, and the <code>wait_for</code> function does not return until the
timer, of the <code>sleepwait</code> task, is hit, resulting in a failed UT.</p>
<p>The only way I found around this issue is to have the test code explicitly hand
execution off. I did this by introducing a <code>await asyncio.sleep(0.05)</code> before
the <code>async with cond:</code> line of the test function.
Then the <code>wait_for</code> function gets to run until it hits the <code>await cond.wait</code> and
gets properly notified and the test reliably succeeds.</p>
<p>This was, yet again, a case where the UT ends up being more complicated than the
actual code.</p>
<p>One more issue I hit had to do with the merciless advance of time. Have another
look at the <code>homelab_service_daemon</code> function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">homelab_service_daemon</span>(name, namespace, spec, memo, stopped):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Launching daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#f92672">not</span> stopped:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;In main loop of </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        next_run <span style="color:#f92672">=</span> backupconfig<span style="color:#f92672">.</span>get_next_service_time()
</span></span><span style="display:flex;"><span>        wait_time <span style="color:#f92672">=</span> next_run <span style="color:#f92672">-</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>        status_body <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;nextBackup&#34;</span>: next_run<span style="color:#f92672">.</span>isoformat()
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> kubeapi<span style="color:#f92672">.</span>patch_mei_home_custom_object(
</span></span><span style="display:flex;"><span>            namespace, kubeapi<span style="color:#f92672">.</span>HOMELABSERVICEBACKUP_PLURAL, name, status_body)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> wait_for(wait_time<span style="color:#f92672">.</span>total_seconds(), memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Finished daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span></code></pre></div><p>It has to compute the waiting time as the difference between the current time
and the time of the next scheduled backup. But how to handle <code>datetime.now</code> in
UTs? I initially tried to do this with a bit of fuzziness when comparing the
arguments handed to the mocked <code>wait_for</code> with the expected wait time, but that
seemed a bit too brittle.</p>
<p><a href="https://github.com/spulec/freezegun">Freezegun</a> to the rescue. It provides a
nice API to patch <code>datetime.now</code> (and several other related functions) so that
it always returns a deterministic value.
Using it in a UT to verify that <code>homelab_service_daemon</code> calls <code>wait_for</code> as
expected could look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@pytest.fixture</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">mock_wait_for</span>(self, mocker):
</span></span><span style="display:flex;"><span>    wait_for_mock <span style="color:#f92672">=</span> AsyncMock(spec<span style="color:#f92672">=</span>sut<span style="color:#f92672">.</span>wait_for)
</span></span><span style="display:flex;"><span>    mocker<span style="color:#f92672">.</span>patch(<span style="color:#e6db74">&#39;hl_backup_operator.homelab_service_backup.wait_for&#39;</span>,
</span></span><span style="display:flex;"><span>                  side_effect<span style="color:#f92672">=</span>wait_for_mock)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wait_for_mock
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">test_daemon_waits_correctly</span>(self, mocker, mock_wait_for):
</span></span><span style="display:flex;"><span>    mock_memo <span style="color:#f92672">=</span> Mock()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_stopped <span style="color:#f92672">=</span> Mock()
</span></span><span style="display:flex;"><span>    mock_stopped_bool <span style="color:#f92672">=</span> Mock(side_effect<span style="color:#f92672">=</span>[<span style="color:#66d9ef">False</span>, <span style="color:#66d9ef">True</span>])
</span></span><span style="display:flex;"><span>    mock_stopped<span style="color:#f92672">.</span><span style="color:#a6e22e">__bool__</span> <span style="color:#f92672">=</span> mock_stopped_bool
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    time_now <span style="color:#f92672">=</span> datetime(year<span style="color:#f92672">=</span><span style="color:#ae81ff">2024</span>, month<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>, day<span style="color:#f92672">=</span><span style="color:#ae81ff">22</span>, hour<span style="color:#f92672">=</span><span style="color:#ae81ff">19</span>, minute<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>                        second<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>, tzinfo<span style="color:#f92672">=</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>    time_trigger <span style="color:#f92672">=</span> datetime(year<span style="color:#f92672">=</span><span style="color:#ae81ff">2024</span>, month<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>, day<span style="color:#f92672">=</span><span style="color:#ae81ff">22</span>, hour<span style="color:#f92672">=</span><span style="color:#ae81ff">19</span>, minute<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>                            second<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>, tzinfo<span style="color:#f92672">=</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_next_service_time <span style="color:#f92672">=</span> Mock(return_value<span style="color:#f92672">=</span>time_trigger)
</span></span><span style="display:flex;"><span>    mocker<span style="color:#f92672">.</span>patch(
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;hl_backup_operator.homelab_backup_config.get_next_service_time&#39;</span>,
</span></span><span style="display:flex;"><span>        side_effect<span style="color:#f92672">=</span>mock_next_service_time)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> freezegun<span style="color:#f92672">.</span>freeze_time(time_now):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> sut<span style="color:#f92672">.</span>homelab_service_daemon(<span style="color:#e6db74">&#34;tests&#34;</span>, <span style="color:#e6db74">&#34;testns&#34;</span>, {}, mock_memo,
</span></span><span style="display:flex;"><span>                                          mock_stopped)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_wait_for<span style="color:#f92672">.</span>assert_awaited_once_with(<span style="color:#ae81ff">2</span>, mock_memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span></code></pre></div><p>I&rsquo;m mocking away both, the <code>wait_for</code> and <code>get_next_service_time</code> functions,
and I&rsquo;m also defining two fixed times, one &ldquo;current&rdquo; time, and one trigger time.
In the <code>with freezegun.freeze_time(time_now)</code> context, <code>datetime.now</code> will now
reliably always return <code>time_now</code> instead of the actual current time. And with
that, I don&rsquo;t need to rely on any fuzziness when testing time-related
functionality.</p>
<h2 id="next-steps">Next steps</h2>
<p>After I&rsquo;m finally happy with the groundwork, I still need to implement a couple
of features before starting with the implementation of the backup Jobs
themselves.
The first one is proper handling of the case where there is no HomelabBackupConfig
configured. Currently, the <code>homelab_service_daemon</code> function would crash, because
<code>get_next_service_time</code> would return <code>None</code>, due to not having any configured
schedule. That is easily fixable by extending the waiting time to &ldquo;forever&rdquo;.
With the Condition mechanism already in place, the daemons will be woken up once
a HomelabBackupConfig appears and can then return to the right schedule.</p>
<p>The second feature currently missing is mostly for testing purposes. Right now,
I&rsquo;m only able to centrally set the schedule, which would be applicable for all
service daemons. This is bound to become cumbersome once I want to start testing
the Job creation and monitoring, so I will want the possibility to trigger a
single service daemon&rsquo;s backup immediately. I will likely introduce another
parameter into the HomelabServiceBackup CRD which makes the daemon trigger
a backup immediately.</p>
<p>Alright, that&rsquo;s all I have to say for now. This is my first &ldquo;programming&rdquo; post
on this blog, and I&rsquo;m honestly not sure how it came out. Were you actually
able to follow, or was it a confused mess? Was it actually interesting to read?
I&rsquo;d be glad for some feedback, e.g. via my <a href="https://social.mei-home.net/@mmeier">Fediverse account</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part I: RBAC permission issues</title>
      <link>https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/</link>
      <pubDate>Sun, 12 May 2024 20:40:59 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/</guid>
      <description>I ran into some issues with the RBAC permissions for my operator</description>
      <content:encoded><![CDATA[<p>As I&rsquo;ve mentioned in my <a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">last k8s migration post</a>,
I&rsquo;m working on writing a Homelab backup operator for my Kubernetes cluster.
And I&rsquo;ve run into some RBAC/permission issues I can&rsquo;t quite figure out. So let&rsquo;s
see whether writing about it helps. &#x1f642;</p>
<p>First, a short overview of the plan. I&rsquo;m using the <a href="https://github.com/nolar/kopf">kopf</a>
framework to build a Kubernetes operator. This operator&rsquo;s main goal is to handle
HomelabServiceBackup resources. These will contain a list of PersitentVolumeClaims
and S3 buckets which need to be backed up. I intend for there to be one
HomelabServiceBackup object for every service, located in the service&rsquo;s Namespace.</p>
<p>At the same time, I started out with defining a HomelabBackupConfig resource.
This will contain some configs which will be common among all service backups,
things like the hostname of the S3 server to store the backups and the image to
be used for the backup jobs.
There will only ever be one instance of this custom resource, and it should
always reside in the Namespace of the operator itself. At the same time, there
should also only ever be one operator for the entire k8s cluster.</p>
<p>This all seemed sensible to me until this afternoon, which was when I had finally
done all the yak-shaving all new projects need, creation of the repo, config of
the CI for image generation and UTs and such things. And I finally had a container
image I could run, with a very simple implementation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> logging
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.create</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.resume</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">resume_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.update</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">update_handler</span>(spec, status, meta, diff, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with diff: </span><span style="color:#e6db74">{</span>diff<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.delete</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">delete_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><p>The intention for this was merely to get a feeling for what I was actually
getting for each of the different events, and to play around with when each of
these handlers would be called.</p>
<p>For the first deployment, I launched kopf with the <code>-A</code> flag, which means it
will use the Kubernetes cluster APIs to watch every Namespace. As noted above,
I want every Namespace to be watched, as every one of them might contain a
HomelabServiceBackup object to take care of the backup for the service residing
in the Namespace.
But I started out with only the HomelabBackupConfig CRD defined, as that&rsquo;s the
first step in my implementation plan. The content of the CRD is not important
for now, I will show them in a later post when I&rsquo;ve actually got the implementation
ready.</p>
<p>I also needed to provide proper RBAC for the deployment, as the operator needs
access to the API server.
My thoughts went like this: For now, I only need the HomelabBackupConfig, and
I only need that in the same Namespace the operator is running in. So I created
the following Role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">events]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">create]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;mei-home.net&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-account</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span></code></pre></div><p>This produced a number of errors when trying to launch my rudimentary operator:</p>
<pre tabindex="0"><code>[2024-05-12 14:19:55,454] kopf._core.reactor.o [ERROR   ]
Watcher for homelabbackupconfigs.v1alpha1.mei-home.net@none has failed:
&#39;homelabbackupconfigs.mei-home.net is forbidden: User &#34;system:serviceaccount:backups:hlbo-account&#34; cannot list resource &#34;homelabbackupconfigs&#34; in API group &#34;mei-home.net&#34; at the cluster scope&#39;
</code></pre><p>Okay, this seems reasonably clear to me. I&rsquo;ve only created a Role and done a
RoleBinding for the <code>backups</code> Namespace, where the operator resides.</p>
<p>I also tried another variant. Instead of using <code>-A</code> to have kopf use the cluster
API, one can provide <code>--namespace=*</code>. This tells kopf to use the namespaced API,
but list all Namespaces and watch them all. Then, I allowed kopf to list all
Namespaces. I kept only allowing it access to the HomelabBackupConfig in the
backups Namespace, though. This results in a lot of errors when it tries to
watch HomelabBackupConfigs in Namespaces other than backups, but the operator
keeps running. So this might a &ldquo;solution&rdquo;.</p>
<p>I could also return to using <code>-A</code> and just configure everything in a ClusterRole.
But that&rsquo;s just too many permissions that the operator doesn&rsquo;t need. And I need
to grant it access to the Jobs API, and I don&rsquo;t want to do that cluster-wide
either.</p>
<p>And finally, the individual handlers don&rsquo;t allow defining a Namespace to watch
a specific resource in. The only config is the command line flag, and that
applies for all resources and their handlers.</p>
<p>So it looks like I have to search for another framework, as kopf doesn&rsquo;t seem to
allow me to do things in the least-privilege way I want them done. &#x1f614;</p>
<p>If you&rsquo;ve got a good idea or you think I&rsquo;ve overlooked something, please feel
free to ping me on the <a href="https://social.mei-home.net/@mmeier">Fediverse</a>.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
