<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>K8s on ln --help</title>
    <link>https://blog.mei-home.net/tags/k8s/</link>
    <description>Recent content in K8s on ln --help</description>
    <generator>Hugo -- 0.147.2</generator>
    <language>en</language>
    <lastBuildDate>Wed, 18 Mar 2026 23:45:41 +0100</lastBuildDate>
    <atom:link href="https://blog.mei-home.net/tags/k8s/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Smokeweb: Setting up a CI with Go Caches in Woodpecker</title>
      <link>https://blog.mei-home.net/posts/go-build-caches-in-woodpecker/</link>
      <pubDate>Wed, 18 Mar 2026 23:45:41 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/go-build-caches-in-woodpecker/</guid>
      <description>Setting up a CI with Go build and module caches in WoodpeckerCI</description>
      <content:encoded><![CDATA[<p>In my $dayjob, I&rsquo;m a build engineer in the CI team of a large company. So I&rsquo;m
reasonably confident that this is going to be only the first post in a long
series on the CI setup for my <a href="https://blog.mei-home.net/tags/smoking/">Smoking</a> project.</p>
<p>I like CIs and the automated testing they come with. I think it was one of the
better ideas the tech industry has come up with. I&rsquo;m seeing its benefit every day
at work. So I also have CIs for most of my private projects.</p>
<p>Over the past week, I&rsquo;ve been writing the first lines of code for my <a href="https://blog.mei-home.net/posts/smoking-web/">Smokeweb</a>
project. Just some general plumbing and scaffolding work, plus logging setup
and command line flags. With the first code established, the next task was the
introduction of a Makefile for the project, as well as a CI to automatically
test it all.</p>
<p>For all matters of CI, from project CIs to Docker image builds for my Homelab,
I&rsquo;ve got a <a href="https://woodpecker-ci.org/">WoodpeckerCI</a> instance running locally,
connected to my Forgejo instance. If you&rsquo;d like to read more about the setup,
see <a href="https://blog.mei-home.net/posts/k8s-migration-15-ci/">here</a>.</p>
<p>After creating the first build job, I was a bit shocked about how long it ran:
<figure>
    <img loading="lazy" src="first-run.png"
         alt="A screenshot showing a Woodpecker CI pipeline with two steps, clone and build. The clone step only takes 8 seconds, while the build step takes 02m:04s."/> <figcaption>
            <p>Two minutes is a bit long for a build of a few hundred lines of code.</p>
        </figcaption>
</figure>
</p>
<p>The project really isn&rsquo;t large yet, perhaps a couple hundred lines of code. Two
minutes, even on a Raspberry Pi 4, seems a tad long. Looking at the logs, it turned
out that the long duration wasn&rsquo;t due to the build of my project itself, but rather
all of the dependencies it needs. That makes a lot more sense.</p>
<p>Researching a bit, I came across two things: <a href="https://pkg.go.dev/cmd/go#hdr-Build_and_test_caching">Go build and test caching</a>
and the <a href="https://go.dev/ref/mod#module-cache">Go module cache</a>. The former caches
build results and test results, while the latter caches downloaded module sources.</p>
<p>I decided I wanted both in my CI, so the first thing I needed was a place to
put the caches, where they would persist between pipeline runs. For this, Woodpecker
allows <a href="https://woodpecker-ci.org/docs/usage/volumes">mounting additional volumes</a>.
These are separate from the volume Woodpecker automatically creates for every
Workflow, which is only shared between that Workflow&rsquo;s steps. That volume is
deleted after the Workflow finishes. With the k8s runner I&rsquo;m using, both the
Workflow volume and the additional volumes can be configured as PersistentVolumeClaims.
But while storing the cache on a Workflow&rsquo;s volume would probably improve the
runtime a bit already once I add more steps, each Workflow run would have to
still start from scratch. To avoid this, I&rsquo;ve created an additional PVC like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gocache-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">cephfs-class</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">5Gi</span>
</span></span></code></pre></div><p>I started out with a 5 GB volume, as my local build cache (at <code>~/.cache/go-build</code>
by default) is about 256 MB at the moment, and my module cache (at <code>~/go/pkg</code>) is
at 752 MB. That should give me some headroom. I&rsquo;m also using my CephFS-based
StorageClass for the PVC, as this allows me to mount the cache to multiple
Pods, e.g. if I ever decide to separate the pipeline into multiple Workflows.</p>
<p>With that done, I set my CI Workflow up like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">pull_request</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-build-cache</span> <span style="color:#ae81ff">/ci-go-cache/build-cache</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-mod-cache</span> <span style="color:#ae81ff">/ci-go-cache/mod-cache</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-image</span> <span style="color:#ae81ff">golang:1.25.6</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOCACHE</span>: <span style="color:#75715e">*golang-build-cache</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">make build</span>
</span></span></code></pre></div><p>One very, very important note: The <code>steps[].environment</code> key is a map. Not a list.
Thank me later. &#x1f609;</p>
<p>With this configuration, the first run of course again took two minutes, but the
next run (after I had figured out that <code>environment</code> is a map, not a list) took
only 25 seconds:</p>
<figure>
    <img loading="lazy" src="fast-run.png"
         alt="Another screenshot of the same two Woodpecker CI steps. This time, clone took 10 seconds and the build took only 25 seconds."/> <figcaption>
            <p>25 seconds sounds a lot better than 2 minutes.</p>
        </figcaption>
</figure>

<p>For good measure, I also introduced another step, which explicitly downloads all
module dependencies up-front, so that that&rsquo;s not done by each individual step
once I&rsquo;ve got more than one, running in parallel:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">pull_request</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-build-cache</span> <span style="color:#ae81ff">/ci-go-cache/build-cache</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-mod-cache</span> <span style="color:#ae81ff">/ci-go-cache/mod-cache</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-image</span> <span style="color:#ae81ff">golang:1.25.6</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">go mod download -x</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOCACHE</span>: <span style="color:#75715e">*golang-build-cache</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">make build</span>
</span></span></code></pre></div><p>Note how the <code>build</code> step now depends on the new <code>prepare mod cache</code> step, which
runs <code>go mod download -x</code> to download the external dependencies of my module.
Adding the <code>depends_on</code> here also has the effect of enabling parallelism.</p>
<p>My final pipeline looks like this for now:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">pull_request</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-build-cache</span> <span style="color:#ae81ff">/ci-go-cache/build-cache</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-mod-cache</span> <span style="color:#ae81ff">/ci-go-cache/mod-cache</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;golang-image</span> <span style="color:#ae81ff">golang:1.25.6</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">go mod download -x</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOCACHE</span>: <span style="color:#75715e">*golang-build-cache</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">make build</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">UTs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOCACHE</span>: <span style="color:#75715e">*golang-build-cache</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">make ut</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Linters</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#75715e">*golang-image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOCACHE</span>: <span style="color:#75715e">*golang-build-cache</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">make fmt vet modules/tidy-check</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Golang CI</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">golangci/golangci-lint:v2.11.3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gocache-volume:/ci-go-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOCACHE</span>: <span style="color:#75715e">*golang-build-cache</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">GOMODCACHE</span>: <span style="color:#75715e">*golang-mod-cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prepare mod cache</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">golangci-lint run</span>
</span></span></code></pre></div><p>Overall, this pipeline runs for about 53 seconds:</p>
<figure>
    <img loading="lazy" src="final-run.png"
         alt="One more screenshot of the Woodpecker UI, this time showing considerably more steps. Clone now takes 12 seconds, prepare mod cache takes 11, the build uses 17 seconds, the UTs take 12 seconds, the linters step 14 and finally, the Golang CI steps uses a whole 30 seconds."/> <figcaption>
            <p>All of the steps safe for &lsquo;clone&rsquo; and &lsquo;prepare mod cache&rsquo; ran in parallel.</p>
        </figcaption>
</figure>

<p>One last thing still missing here is the cleanup of the caches. Those 5 GB will
likely keep me for quite a while, but still: It needs proper cleanup. I looked
around a bit on that as well, but didn&rsquo;t find any good solution. Seemingly, Golang
doesn&rsquo;t do judicious cleanups of the cache? You can only nuke the entire cache,
which I find unfortunate. A task for later.</p>
<p>Before finishing, let&rsquo;s lighten the mood a bit at my expense. Because you see,
even though my code currently only contains a bit of scaffolding and startup
implementation, I still managed to get no less than five issues with the first
<a href="https://golangci-lint.run/">golangci-lint</a> run:</p>
<pre tabindex="0"><code>+ golangci-lint run
cmd/init.go:22:15: ST1005: error strings should not be capitalized (staticcheck)
	ErrVersion = errors.New(&#34;Version flag received&#34;)
	             ^
cmd/init.go:53:33: ST1005: error strings should not be capitalized (staticcheck)
		return &amp;application.Config{}, fmt.Errorf(&#34;Got invalid log type: %s&#34;, conf.LogType)
		                              ^
cmd/init.go:95:26: ST1005: error strings should not be capitalized (staticcheck)
		return slog.LevelInfo, fmt.Errorf(&#34;Got invalid debug level: %s&#34;, s)
		                       ^
cmd/main.go:29:3: SA9003: empty branch (staticcheck)
		if err == flag.ErrHelp {
		^
cmd/main.go:30:10: SA9003: empty branch (staticcheck)
		} else {
		       ^
5 issues:
* staticcheck: 5
</code></pre><p>Why yes, I&rsquo;m a bit embarrassed. Especially about those <code>empty branch</code> issues at
the end there. I really did leave an empty <code>if ... else ...</code> in the code after
having transformed it into a switch statement right above the if-else. And then
forgot to remove the empty if-else once I was done. &#x1f926;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Kubernetes Cert Renewal and Monitoring</title>
      <link>https://blog.mei-home.net/posts/k8s-certs/</link>
      <pubDate>Sun, 07 Dec 2025 11:15:45 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-certs/</guid>
      <description>I had a little k8s certificate SNAFU</description>
      <content:encoded><![CDATA[<p>Wherein I let my kubectl certs expire and implement some monitoring.</p>
<p>A couple of days ago, I was getting through my list of small maintenance tasks
in my Kubernetes cluster. Stuff like checking the resource consumption of new
deployments and adapting the resource limits. And in the middle of it, one of
my kubectl invocations was greeted by this message:</p>
<pre tabindex="0"><code>error: You must be logged in to the server (Unauthorized)
</code></pre><p>So I had a look at my kubectl credentials. For those who don&rsquo;t know, kubectl
authenticates to the cluster with a client TLS cert by default. I had just
copied the <code>admin.conf</code> config file kubeadm helpfully creates during cluster
setup. I didn&rsquo;t really see any reason to set up anything more elaborate,
considering that I&rsquo;m the only admin in the cluster.</p>
<p>And those certs had now expired. Not really a big deal, I have access to the
control plane nodes and could copy the new <code>admin.conf</code>. But I wanted to
introduce some monitoring and document how to renew the kubectl client certs.</p>
<p>The first problem to tackle: I wanted something a bit more elaborate than
&ldquo;just <code>cat /etc/kubernetes/admin.conf</code> and copy+paste the cert and key&rdquo;. And
here&rsquo;s where the embarrassment began. The <code>admin.conf</code> is available on my three
control plane nodes. But how to get it onto my command and control machine?</p>
<p>My first thought was: Just use SSH! But the problem was: I don&rsquo;t allow root
logins via SSH. And the <code>admin.conf</code> is owned by root and not readable by anyone
else. So if I wanted to do it over SSH, I would need to also somehow get a sudo
call in there. Easier said than done. Because the only account which has SSH
access to my machines can&rsquo;t just do sudo - it needs to provide a password, as
an additional security layer. And it took me a really, really long time to
figure out how to call sudo via SSH and get the password through the pipe to sudo.</p>
<p>Here&rsquo;s the script I came up with:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Kubeadm installs put an admin user kube.conf file at /etc/kubernetes/admin.conf</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># by default</span>
</span></span><span style="display:flex;"><span>ADMIN_FILE<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;/etc/kubernetes/admin.conf&#34;</span>
</span></span><span style="display:flex;"><span>ADMIN_TEMP<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>HOME<span style="color:#e6db74">}</span><span style="color:#e6db74">/temp/admin.conf&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Name of the control plane host</span>
</span></span><span style="display:flex;"><span>CP_HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;control-plane-1&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Request the sudo password and put it into SUDO_PASS</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># -s prevents echoing of the input on the terminal</span>
</span></span><span style="display:flex;"><span>read -p <span style="color:#e6db74">&#34;Sudo pass: &#34;</span> -r -s SUDO_PASS
</span></span><span style="display:flex;"><span>echo
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>ssh myuser@<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CP_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;sudo -p \&#34;\&#34; -S cat </span><span style="color:#e6db74">${</span>ADMIN_FILE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>SUDO_PASS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> &gt; ~/temp/admin.conf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># This extracts the certificate and the private key from the kube config</span>
</span></span><span style="display:flex;"><span>CERT_DATA<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>yq -r <span style="color:#e6db74">&#39;.users[0].user.&#34;client-certificate-data&#34;&#39;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ADMIN_TEMP<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>CERT_KEY<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>yq -r <span style="color:#e6db74">&#39;.users[0].user.&#34;client-key-data&#34;&#39;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ADMIN_TEMP<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Removing the temporary file for security</span>
</span></span><span style="display:flex;"><span>rm <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ADMIN_TEMP<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Finally outputting the cert</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;CERT:&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CERT_DATA<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;Key:&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CERT_KEY<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>The main piece here is the actual copying, which took me way too long to figure
out:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ssh myuser@<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CP_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;sudo -p \&#34;\&#34; -S cat </span><span style="color:#e6db74">${</span>ADMIN_FILE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>SUDO_PASS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> &gt; ~/temp/admin.conf
</span></span></code></pre></div><p>It SSH&rsquo;s to one of my CP hosts and runs <code>sudo -p &quot;&quot; -S cat /etc/kubernetes/admin.conf</code>.
The previously requested password read via <code>read</code> is piped into the SSH command&rsquo;s
<code>stdin</code> as a HERESTRING. The <code>-p &quot;&quot;</code> is actually load bearing here. Without it,
sudo will show a prompt for the password, which will end up being redirected
into the temporary file in addition to the <code>admin.conf</code> file&rsquo;s content.
The <code>-S</code> option tells sudo to expect receipt of the password on the command
line.</p>
<p>Another nifty little thing I discovered is <a href="https://mikefarah.gitbook.io/yq/">yq</a>,
basically an equivalent of <a href="https://jqlang.org/">jq</a> but for Yaml files.</p>
<p>I updated my credentials and everything worked again. But the fact that I allowed
the certs to expire bugged me, and I decided to introduce another little script
to regularly check the time to expiry of the kubectl client certs.</p>
<h2 id="monitoring-the-certs">Monitoring the certs</h2>
<p>The main problem with monitoring the cert was that it&rsquo;s a client cert, so there&rsquo;s
no HTTP endpoint I could hit to check it regularly. It is only present on my
command and control machine. So I needed something that runs on the C&amp;C host,
and that I wouldn&rsquo;t forget to check regularly. I ended up writing a small script
which checks the expiration dates and tuck it into my <code>~/.profile</code> so it runs
whenever I log into the machine.</p>
<p>The script looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># 30 days</span>
</span></span><span style="display:flex;"><span>WARNING_DURATION<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;2592000&#34;</span>
</span></span><span style="display:flex;"><span>COLOR_RED<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;\e[0;31m&#39;</span>
</span></span><span style="display:flex;"><span>NO_COLOR<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;\033[0m&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>PROD_CERT<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>pass show k8s/credentials | jq -r .status.clientCertificateData<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>CONFIG_CERT<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>pass show k8s/master-credentials | jq -r .status.clientCertificateData<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> checkExpiry<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  cluster<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>1<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  cert<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>2<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> ! openssl x509 -checkend <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>WARNING_DURATION<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> -noout &gt; /dev/null <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>cert<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    local endDate
</span></span><span style="display:flex;"><span>    endDate<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>openssl x509 -enddate -noout <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>cert<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | cut -d <span style="color:#e6db74">&#39;=&#39;</span> -f2<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>    printf <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>COLOR_RED<span style="color:#e6db74">}</span><span style="color:#e6db74">The </span><span style="color:#e6db74">${</span>cluster<span style="color:#e6db74">}</span><span style="color:#e6db74"> cluster kubectl cert is about to expire!\nEnd date: %b</span><span style="color:#e6db74">${</span>NO_COLOR<span style="color:#e6db74">}</span><span style="color:#e6db74">\n&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>endDate<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>checkExpiry <span style="color:#e6db74">&#34;production&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PROD_CERT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>checkExpiry <span style="color:#e6db74">&#34;configuration&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CONFIG_CERT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m starting out by fetching the credentials from my <a href="https://www.passwordstore.org/">pass store</a>.
If you want to read more about my kube credential setup and how I changed it so
that the kubectl credentials don&rsquo;t just sit unencrypted on the disk, have a look
at <a href="https://blog.mei-home.net/posts/securing-k8s-credentials/">this post</a>.</p>
<p>I&rsquo;m using the <code>openssl</code> command line tool to do the checking, which already has
the <code>checkend</code> flag to check whether the given certificate is valid for at least
<code>${WARNING_DURATION}</code> more seconds. Quite a useful function, removing the need to
do date arithmetic in bash. If the cert is not valid for at least another 30
days, the script will output a warning in red. 30 days should be enough time for
me to log into the C&amp;C host at least once, even during times like the current
one where I&rsquo;m not working on Homelab projects much.</p>
<p>I&rsquo;m calling the <code>checkExpiry</code> function twice, because I&rsquo;ve got two clusters and
hence two sets of credentials. One is my main cluster running most of my workloads.
The other is intended as a management cluster. It&rsquo;s currently still running in a
VM I only launch when needed, as part of my Tinkerbell experiments. I really need
to get back to those at some point&hellip;</p>
<p>My plan was to just stick the script into my <code>~/.profile</code> file, so the check is
only done once, when I log into the machine. The <code>~/.profile</code> script is only
sourced for a login shell, so it should not be executed when I&rsquo;m just opening a
fresh terminal. But this didn&rsquo;t work out as intended. I&rsquo;m using <a href="https://github.com/tmux/tmux">tmux</a>,
and for some reason, the script was executed whenever I open a new pane or window.</p>
<p>After some searching, I found that tmux runs a login shell for every new pane/window
<a href="https://www.mail-archive.com/tmux-users@lists.sourceforge.net/msg05901.html">by default</a>.
I found the solution for changing that behavior in the <a href="https://wiki.archlinux.org/title/Tmux#Start_a_non-login_shell">Arch Linux wiki</a>.
Following that instruction, I put the following line at the end of my <code>~/.tmux.conf</code>
file:</p>
<pre tabindex="0"><code>set -g default-command &#34;${SHELL}&#34;
</code></pre><p>With that, I&rsquo;d get the following output when the kubectl client cert gets close
to the expiration date:</p>
<pre tabindex="0"><code>The production cluster kubectl cert is about to expire!
End date: Sep 14 11:31:30 2026 GMT
The configuration cluster kubectl cert is about to expire!
End date: May 31 20:29:11 2026 GMT
</code></pre><h2 id="monitoring-kubeadm-certs">Monitoring kubeadm certs</h2>
<p>While looking for instructions on how to renew my kubectl certs, I came upon
<a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration">this Kubernetes docs page</a>.
It mentions this command for getting the expiration dates of Kubeadm&rsquo;s own certs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubeadm certs check-expiration
</span></span></code></pre></div><p>This command shows all of the certificates kubeadm generates for a cluster,
including the certs for all of the Kubernetes control plane components:</p>
<pre tabindex="0"><code>CERTIFICATE                  EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                   Sep 14, 2026 11:31 UTC   281d            ca                      no
apiserver                    Sep 14, 2026 10:24 UTC   281d            ca                      no
apiserver-etcd-client        Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
apiserver-kubelet-client     Sep 14, 2026 10:24 UTC   281d            ca                      no
controller-manager.conf      Sep 14, 2026 10:24 UTC   281d            ca                      no
etcd-healthcheck-client      Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
etcd-peer                    Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
etcd-server                  Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
front-proxy-client           Sep 14, 2026 10:24 UTC   281d            front-proxy-ca          no
scheduler.conf               Sep 14, 2026 10:24 UTC   281d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 17, 2033 19:15 UTC   8y              no
etcd-ca                 Dec 17, 2033 19:15 UTC   8y              no
front-proxy-ca          Dec 17, 2033 19:15 UTC   8y              no
</code></pre><p>Thinking back a little bit, I recalled that September 14th was the last time I
ran a cluster update, so those already do a certificate renewal. In theory, that
means I should be fine - I&rsquo;m doing cluster updates frequently enough that I
should never let those certs expire within their 365 day TTL. But I still wanted
to monitor those somehow, just in case.</p>
<p>As some of those are client certs, I couldn&rsquo;t just point my <a href="https://gatus.io/">Gatus</a>
instance at them, like I do for my Let&rsquo;s Encrypt main cert. While looking around,
I came across <a href="https://github.com/enix/x509-certificate-exporter">this Prometheus exporter</a>.
It can launch a DaemonSet on k8s nodes and then watch certificate files (and kube config
files as well) on disk and check their expiration dates. In short, it looked
exactly like what I wanted. But there was a problem, as stated in <a href="https://github.com/enix/x509-certificate-exporter/tree/main/deploy/charts/x509-certificate-exporter#watchfiles-and-inode-change">their docs</a>:</p>
<blockquote>
<p>Be aware that for every file path provided to watchFiles, the exporter container will be given read access to the parent directory. This is how we handle the problem of changing inodes. Metrics will of course be limited to the single targetted path, as the program is told to watch the real path from watchFiles.</p></blockquote>
<p>The full note explains that making the containing directory available is necessary
because when the certs are rotated, the exporter would keep the old file open, as
it wouldn&rsquo;t have a way to know that the file was rotated. This makes sense. But
I find it problematic. The <code>/etc/kubernetes/pki</code> directory on my control plane
nodes looks like this:</p>
<pre tabindex="0"><code>-rw-r--r-- 1 root root 1123 Sep 14 12:26 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1176 Sep 14 12:26 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1314 Sep 14 12:26 apiserver.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver.key
-rw-r--r-- 1 root root 1107 May  1  2025 ca.crt
-rw------- 1 root root 1675 May  1  2025 ca.key
drwxr-xr-x 2 root root 4096 May  1  2025 etcd
-rw-r--r-- 1 root root 1123 May  1  2025 front-proxy-ca.crt
-rw------- 1 root root 1679 May  1  2025 front-proxy-ca.key
-rw-r--r-- 1 root root 1119 Sep 14 12:26 front-proxy-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 front-proxy-client.key
-rw------- 1 root root 1679 May  1  2025 sa.key
-rw-r--r-- 1 root root  451 May  1  2025 sa.pub
</code></pre><p>So if I were to tell the exporter to watch all of the <code>.crt</code> files, it would also
necessarily gain read access to the <code>.key</code> files. Which means that I would now
have a program running in my cluster which could read the certificates and private
keys of the main Kubernetes infrastructure in my Homelab. That just does not
sound like a good idea to me.</p>
<p>I wasn&rsquo;t able to come up with a proper solution, so I decided to just monitor
the apiserver certificate and use it as a stand-in for the other cert&rsquo;s expiration
dates. They should all be renewed together during my regular cluster updates,
so just monitoring one of the certs should be good enough. &#x1f91e;</p>
<p>I did not even have to make any changes in Gatus, as it already reports the
expiry dates of all certificates for HTTPS endpoints it monitors. Creating a
Grafana panel was as easy as using this PromQL query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span>gatus_results_certificate_expiration_seconds{name<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">K8s: API</span>&#34;}
</span></span></code></pre></div><p>It refers to this entry in my Gatus config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;K8s: API&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;K8s&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;https://k8s.example.com:6443/livez&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;GET&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">5m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[STATUS] == 200&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">client</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">insecure</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>One last thing slightly bothering me are the CA certs. Those expire in 8 years,
and I decided to not bother monitoring them. I will leave them un-monitored
to add a bit of potential excitement to future me&rsquo;s life. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Updating my Kubeadm k8s Cluster from 1.30 to 1.33</title>
      <link>https://blog.mei-home.net/posts/kubernetes-cluster-update/</link>
      <pubDate>Sun, 21 Sep 2025 23:30:40 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-cluster-update/</guid>
      <description>Using Ansible to update my kubeadm k8s cluster from 1.30 to 1.33</description>
      <content:encoded><![CDATA[<p>Wherein I talk about updating my kubeadm Kubernetes cluster from 1.30 to 1.33
using Ansible.</p>
<p>I&rsquo;ve been a bit lax on my Kubernetes cluster updates, and I was still running
Kubernetes v1.30. I&rsquo;m also currently on a trip to fix a number of the smaller
tasks in my Homelab, paying down a bit of technical debt before tackling the
next big projects.</p>
<p>I already did one update, from my initial Kubernetes 1.29 to 1.30 in the past,
using an Ansible playbook I wrote to codify the kubeadm upgrade procedure. But
I never wrote a proper post about it, which I&rsquo;m now rectifying.</p>
<p>There were no really big problems - my cluster stayed up the entire time. But
there were issues in all three of the updates which might be of interest to
at least someone.</p>
<h2 id="the-kubeadm-cluster-update-procedure-and-version-skew">The kubeadm cluster update procedure and version skew</h2>
<p>The update of a kubeadm cluster is relatively straightforward, but it does
require some manual kubeadm actions directly on each node. The documentation
can be found <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/">here</a>.</p>
<p>Please note: Those instructions are versioned, and may change in the future
compared to what I&rsquo;m describing here. Please make sure you&rsquo;re reading the
instructions pertinent to the version you&rsquo;re currently running.</p>
<p>The first thing to do is to read the release notes. These are very nicely prepared
by the Kubernetes team <a href="https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG">here</a>,
sorted by major version.
And I approve of them wholeheartedly. I&rsquo;ve been known to rant a bit about release
engineering and release notes, but there&rsquo;s nothing to complain about when it comes
to Kubernetes. Besides perhaps their length, but that&rsquo;s to be expected in a
project of Kubernetes&rsquo; size.</p>
<p>I did not find anything relevant or interesting to me directly in any of the
releases, so I won&rsquo;t go into detail about the changes.</p>
<p>One thing to note, which will bite me later, is the <a href="https://kubernetes.io/releases/version-skew-policy/">version skew policy</a>.
It describes the allowed skew between versions, most importantly between the
kubelet and the kube-apiserver said kubelet is talking to. Namely, the versions
between the two can skew at most by a single minor version, and the kubelet must
not be newer than the kube-apiserver. Meaning the kube-apiserver always needs to
be updated first. More on this later, when I stumble over this policy.</p>
<p>Here is a short step-by-step of the kubeadm update process, always starting with
the control plane nodes:</p>
<ol>
<li>Update kubeadm to the new Kubernetes version</li>
<li>On the very first CP node, run <code>kubeadm upgrade apply v1.31.11</code>, for example</li>
<li>Then, update kubeadm on the other CP nodes and run <code>kubeadm upgrade node</code></li>
<li>Only after point 3) is completed on all nodes, update the kubelet as well</li>
</ol>
<p>The steps 2-4 are repeated for all non-CP nodes as well. The order of steps
3 and 4 is important. <code>kubeadm upgrade</code> needs to be run on all CP nodes before
any kubelet is updated. Or at least, that&rsquo;s true on a High Availability cluster,
where the kube-apiservers are sitting behind a virtual IP. That&rsquo;s because of
the version skew policy I mentioned above: The kubelet must never be newer than
the kube-apiserver it is talking to. Which makes some sense: The Kubernetes API
is the public API, with stability guarantees, backwards compatibility and such.
So it will likely be able to serve older kubelets just fine, as it will still
support the older APIs that kubelet depends on. But in the other direction, the
newer kubelet may access APIs which older kube-apiservers simply don&rsquo;t serve
yet.</p>
<h2 id="my-cluster-update-ansible-playbook">My cluster update Ansible playbook</h2>
<p>As I tend to do, I created an Ansible playbook during the first update, so that
I could do something else while the update runs fully automated. That did not work for any of the
updates this time around, but I will go into more detail later.</p>
<p>Let&rsquo;s start with the fact that I&rsquo;m using Ubuntu Linux as my OS on all of my
Homelab hosts, and I&rsquo;m getting the Kubernetes components from the official
apt repos provided by the Kubernetes project.
I&rsquo;m also using <a href="https://cri-o.io/">cri-o</a> as my container runtime. Until recently,
that was also hosted in the <a href="https://k8s.io">k8s.io</a> repos, but has since moved
to the <a href="https://www.opensuse.org/">openSUSE</a> repos.</p>
<p>Before starting the first tasks, here is my <code>group_vars/all.yml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">crio_version_prev</span>: <span style="color:#ae81ff">v1.30</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kube_version_prev</span>: <span style="color:#ae81ff">v1.30</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kube_version</span>: <span style="color:#ae81ff">v1.31</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kube_version_full</span>: <span style="color:#ae81ff">1.31.11</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">crio_version</span>: <span style="color:#ae81ff">v1.31</span>
</span></span></code></pre></div><p>I&rsquo;ve stored the versions here, instead of the <code>defaults/main.yml</code> of the role
because I also use the versions in a few other places, mainly my deployment
roles for configuring new cluster nodes.</p>
<p>But enough prelude, here are the first few tasks from the <code>tasks/main.yml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubernetes repo key</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">kubernetes-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/share/keyrings/kubernetes.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0644</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove old kubernetes deb repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/kubernetes.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://pkgs.k8s.io/core:/stable:/{{ kube_version_prev }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add kubernetes ubuntu repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/kubernetes.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://pkgs.k8s.io/core:/stable:/{{ kube_version }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update apt after kubernetes repos changed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span></code></pre></div><p>These deploy the apt key of the <code>K8s.io</code> repo for the main Kubernetes components,
remove the repo of the previous version and add the repo of the new version.
Finally, an apt cache update is executed to fetch the packages from the new repo
before running any install tasks.</p>
<p>One thing to note here is that I&rsquo;m manually fetching the Kubernetes repo key
and storing it in the repo via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | gpg --dearmor -o roles/kube-common/files/kubernetes-keyring.gpg
</span></span></code></pre></div><p>The next step is updating the kubeadm version:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unpin kubeadm version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">install</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;kubeadm={{ kube_version_full }}*&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubeadm version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_kubeadm</span>
</span></span></code></pre></div><p>The <code>update_kubeadm</code> variable is necessary because I&rsquo;m running this role twice for control plane nodes.
Once updating only kubeadm on all CP nodes, and then again to run the kubelet
update. But that second run won&rsquo;t need to run the kubeadm update again, hence
why the <code>update_kubeadm</code> variable exists.</p>
<p>Next is the <code>kubeadm upgrade</code> invocation, the main part of the cluster update:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run kubeadm update</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cmd</span>: <span style="color:#e6db74">&#34;kubeadm upgrade node&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">not kube_first_node and update_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run kubeadm update</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cmd</span>: <span style="color:#e6db74">&#34;kubeadm upgrade apply -y v{{ kube_version_full }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">kube_first_node and update_kubeadm</span>
</span></span></code></pre></div><p>There are two variants of this task, depending on whether <code>kube_first_node</code> is
set or not. This is necessary because only the first CP node updated needs to
run <code>upgrade apply -y v&lt;NEW_VERSION&gt;</code>. All other CP nodes and all non-CP nodes
just run <code>upgrade node</code>. Again, this setup using variables is mostly because
<em>in principle</em>, the update steps are the same for all nodes in the cluster. So
it made more sense to have one role where I could switch some tasks on/off, rather
than having multiple roles which each repeat a lot of their respective tasks.
The kubeadm update includes updating the control plane components: kube-apiserver,
kube-controller-manager and kube-scheduler as well as etcd. All of these are
static Pods, who&rsquo;s definition is controlled by kubeadm.</p>
<p>The next step is updating the kubelet and kubectl on the nodes, which is
headed by draining the node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>Here is the second variable I&rsquo;m using to restrict which tasks of the role are
executed for a particular host, the <code>update_non_kubeadm</code> variable. It indicates
that all tasks not related to the kubeadm update are to be executed.
This command is not issued on the node itself, but rather on my command and
control host, which also runs the Ansible playbook.</p>
<p>Then comes the update of cri-o:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove previous kube cri-o repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://download.opensuse.org/repositories/isv:/cri-o:/stable:/{{ crio_version_prev }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">libcontainers-crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39; and update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add libcontainers cri-o repo key</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/share/keyrings/libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0644</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add kube cri-o repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://download.opensuse.org/repositories/isv:/cri-o:/stable:/{{ crio_version }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">libcontainers-crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39; and update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update apt after cri-o repos changed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-o</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-tools</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">autostart cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>This is similar to the initial Kubernetes repo setup. Please note that from
version 1.30 to 1.32, cri-o lived in the k8s.io repos, but was then moved to
openSUSE repos.</p>
<p>Once cri-o is updated, the last part of the role is updating kubectl and kubelet:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unpin kubelet version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">install</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubelet</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;kubelet={{ kube_version_full }}*&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubelet version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unpin kubectl version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">install</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubectl</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;kubectl={{ kube_version_full }}*&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubectl version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">restart kubelet</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">daemon_reload</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">restarted</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>And finally, the node is uncordoned:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>This command is again delegated to my command and control host, which means the
command is not executed on the remote host by my Ansible user, but rather for
every host, the kubectl command is executed on a central host which has the
necessary permissions and keys to actually run kubectl against the cluster.</p>
<p>The role I&rsquo;ve described above is then used in a playbook running it against
the different groups of hosts in my Homelab. First is one of the control plane
hosts, running the required first <code>kubeadm upgrade apply -y &lt;NEW_KUBE_VERSION&gt;</code>
command, which only needs to be run on the first control plane node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">firstcp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update first kubernetes controller kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-kubeadm-first</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kube_first_node</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>Notably, this run gets the <code>kube_first_node</code> variable set, but doesn&rsquo;t run the
non-kubeadm updates, meaning the kubelet update, yet.
Next come the remaining control plane nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_controllers:!firstcp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update other kubernetes controllers kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>These nodes don&rsquo;t have the <code>kube_first_node</code> set, so they execute the <code>kubeadm upgrade node</code>
update command. Here, too, <code>update_non_kubeadm</code> is false, meaning the kubelets
are not updated yet. This is necessary because without this, there&rsquo;s a danger
that a kubelet that has already been updated would talk to a kube-apiserver which
hasn&rsquo;t yet been updated, potentially leading to errors.</p>
<p>After the kubeadm update follows the kubelet update for the controller nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for vault to be running</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_info</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">label_selectors</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">app.kubernetes.io/name=vault</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">app.kubernetes.io/instance=vault</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">field_selectors</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;spec.nodeName={{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait_condition</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">status</span>: <span style="color:#e6db74">&#34;True&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Ready&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait_sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait_timeout</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">register</span>: <span style="color:#ae81ff">vault_pod_list</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unseal vault prompt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">echo</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">prompt</span>: <span style="color:#e6db74">&#34;Please unseal vault: k exec -it -n vault {{ vault_pod_list.resources[0].metadata.name }} -- vault operator unseal&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>This runs the role with <code>update_kubeadm: false</code> but <code>update_non_kubeadm: true</code>,
leading to the kubeadm update being skipped as it was already run in the previous
play, and instead the kubelet is being updated. This is safe to do now, because
all kube-apiservers have been updated to the new version at this point.
I&rsquo;m running a two minute pause task at the end of each play, to give the cluster
a bit of time to start all Pods again.
This kubelet update step also contains some handling of my Vault containers, which
are running on the control plane nodes. They need to be manually unsealed
when they&rsquo;re restarted.</p>
<p>Next up are the Ceph nodes, which I do not throw together with the rest of the
worker nodes as they need to be run one at a time, to prevent storage downtime.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes Ceph nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/myuser/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd set noout</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for OSDs to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/myuser/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd status &#34;{{ ansible_hostname }}&#34; --format json</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">register</span>: <span style="color:#ae81ff">ceph_end</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">until</span>: <span style="color:#e6db74">&#34;(ceph_end.stdout | trim | from_json | community.general.json_query(&#39;OSDs[*].state&#39;) | select(&#39;contains&#39;, &#39;up&#39;) | length) == (ceph_end.stdout | trim | from_json | community.general.json_query(&#39;OSDs[*]&#39;) | length)&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">12</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delay</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">post_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unset osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/myuser/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd unset noout</span>
</span></span></code></pre></div><p>I&rsquo;m also setting the <code>noout</code> flag for Ceph. This ensures that Ceph doesn&rsquo;t start
automatic rebalancing when the OSDs on the upgraded host temporarily go down.
In addition, I&rsquo;m waiting for the OSDs on each host to be up again before continuing
to the next host, to prevent storage issues.</p>
<p>Last but not least are my worker nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes worker nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for one minute</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>Nothing special about these. In contrast to all the other plays, I&rsquo;m running
two hosts in parallel through it, because I do currently have enough slack in
the cluster to be able to tolerate the loss of two workers.</p>
<p>So now let me tell you how that beautiful theory I laid out up to now actually
worked in practice. &#x1f601;</p>
<h2 id="a-tale-of-three-updates">A tale of three updates</h2>
<p>I upgraded from Kubernetes 1.30 all the way to 1.33. None of the three went
through without at least one issue.</p>
<h3 id="updating-from-130-to-131">Updating from 1.30 to 1.31</h3>
<p>This one was the most complicated when it came to fixing the issue. I started
it with the previous iteration of my update playbook, which still fully updated
each control plane node in turn. So it first ran the kubeadm update on one
node and then immediately followed that up with updating the kubelet on that same
node.
Right on the first node, I was greeted with these errors for a number of the
Pods:</p>
<pre tabindex="0"><code>NAMESPACE     NAME                                             READY   STATUS                            RESTARTS      AGE
fluentbit     fluentbit-fluent-bit-km8r7                       0/1     CreateContainerConfigError        0             38m
kube-system   cilium-98hzq                                     0/1     Init:CreateContainerConfigError   0             14m
kube-system   cilium-envoy-tklh7                               0/1     CreateContainerConfigError        0             40m
kube-system   etcd-firstcp                                     1/1     Running                           2 (35m ago)   35m
kube-system   kube-apiserver-firstcp                           1/1     Running                           2 (35m ago)   35m
kube-system   kube-controller-manager-firstcp                  1/1     Running                           0             35m
kube-system   kube-scheduler-firstcp                           1/1     Running                           0             35m
kube-system   kube-vip-firstcp                                 1/1     Running                           0             35m
rook-ceph     rook-ceph.cephfs.csi.ceph.com-nodeplugin-bnmsd   0/3     CreateContainerConfigError        0             38m
rook-ceph     rook-ceph.rbd.csi.ceph.com-nodeplugin-hq82g      0/3     CreateContainerConfigError        0             38m
</code></pre><p>Note the error in the <code>STATUS</code> of all of the non-kube Pods. I had never heard
of a <code>CreateContainerConfigError</code> before, so I went to google and found
<a href="https://github.com/kubernetes/kubernetes/issues/127316">this issue</a>. It identified
the problem pretty clearly and the kubernetes maintainers helpfully pointed
to the <a href="https://kubernetes.io/releases/version-skew-policy/#kubelet">version-skew-policy</a>.
After reading said policy multiple times, I finally realized what my error was and
updated my Ansible playbook to first update all kubeadm versions on all CP nodes
and only then start updating the kubelet. I got the error fixed by just running
the kubeadm update on the other two control plane nodes as well.</p>
<p>After that, the rest of the update went through without a hitch.</p>
<h3 id="updating-from-131-to-132">Updating from 1.31 to 1.32</h3>
<p>In this one I stumbled over the fact that I hadn&rsquo;t fully understood the
release notes for 1.32, or rather their implications. Specifically, this point
in the <a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md#v1320">1.32 release notes</a>:</p>
<blockquote>
<p>kubeadm: kubeadm upgrade node now supports addon and post-upgrade phases. Users can use kubeadm upgrade node phase addon to execute the addon upgrade, or use kubeadm upgrade node &ndash;skip-phases addon to skip the addon upgrade. If you were previously skipping an addon subphase on kubeadm init you should now skip the same addon when calling kubeadm upgrade apply and kubeadm upgrade node. Currently, the post-upgrade phase is no-op, and it is mainly used to handle some release-specific post-upgrade tasks.</p></blockquote>
<p>So basically, addons, like kube-proxy for example, had been ignored during updates
up to this point. Which is why my updates worked up to now. But in 1.32,
the <code>kubeadm upgrade</code> command gained the ability to also update addons. And
seemingly also deploy them if they&rsquo;re not present, because I suddenly found
kube-proxy Pods on my nodes after the upgrade.</p>
<p>I did not use kube-proxy, because I was using Cilium&rsquo;s kube-proxy replacement.
I had disabled kube-proxy in my <code>InitConfiguration</code> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">skipPhases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;addon/kube-proxy&#34;</span>
</span></span></code></pre></div><p>But, the InitConfiguration isn&rsquo;t read during updates, and it seems that kubeadm
doesn&rsquo;t transfer this setting into the <code>kubeadm-config</code> ConfigMap during cluster
creation. So <code>kubeadm upgrade</code> didn&rsquo;t have any idea that it should be skipping
the addon, and happily deployed it on my nodes.</p>
<p>Luckily for me, it didn&rsquo;t seem to interfere with anything, and my cluster didn&rsquo;t
just collapse in on itself. I removed them all with the handy instructions from
the <a href="https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#quick-start">Cilium docs</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system delete ds kube-proxy
</span></span><span style="display:flex;"><span>kubectl -n kube-system delete cm kube-proxy
</span></span></code></pre></div><p>To prevent any further issues, I edited the kubeadm-config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n kube-system configmaps kubeadm-config -o yaml
</span></span></code></pre></div><p>And added an entry <code>proxy.disabled: true</code> to it. With this, the problem did not
occur again during the subsequent 1.33 update.</p>
<h3 id="updating-from-132-to-133">Updating from 1.32 to 1.33</h3>
<p>The last one. I was hoping it would go through without an issue, to at least
have one successful update during which I could move away from the computer and
read a bit, but no such luck.</p>
<p>During the update of the cri-o repository for 1.33, I got this error:</p>
<pre tabindex="0"><code>Failed to update apt cache: E:Failed to fetch https://pkgs.k8s.io/addons:/cri-o:/stable:/v1.33/deb/InRelease  403  Forbidden [IP: 3.167.227.100 443]
</code></pre><p>This was because cri-o&rsquo;s repos moved from k8s.io to openSUSE, see for example
<a href="https://github.com/cri-o/cri-o/issues/9341">this issue</a>. The adaption was
pretty simple, I just needed to change the address in my playbook.</p>
<p>After that fix, the update ran through without any further issues and I was
finally done. Cost me almost a day of work, but alas, most of the issues were of
my own making.</p>
<h2 id="increased-memory-requests">Increased memory requests?</h2>
<p>And finally for something amusing. When I looked at my Homelab dashboard on
the morning after the upgrade, I found that the memory requests for my worker
nodes were suddenly in the red, with almost 83% of available capacity used:</p>
<p><figure>
    <img loading="lazy" src="resource-usage.png"
         alt="A screenshot of several Grafana gauge visualizations. They show the utilization of memory and CPU resource usage in my k8s cluster, as measured by looking at the total resource requests from all Pods in the cluster. There are three gauges, one for each of my node groups, &#39;Control Plane&#39;, &#39;Ceph&#39; and &#39;Workers&#39;. Interesting here are the values for the &#39;Workers&#39; group, which show 72.5% for the CPU resource consumption and 82.8% for the memory resource consumption."/> <figcaption>
            <p>Resource usage the morning after the update. This shows the sum of resource requests on Pods divided by the overall resources of the group of nodes.</p>
        </figcaption>
</figure>

Normally, the memory utilization is more around 60%.</p>
<p>Thinking that the update must have changed something in how the memory utilization
was computed, or perhaps there was some Deployment which increased memory requests
after the update, I looked through my metrics, but wasn&rsquo;t able to find anything.</p>
<p>After some additional checking, I finally found the issue in how I was computing
the values for the metric:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>      kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">memory</span>&#34;}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">and</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">unless</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{}<span style="color:#f92672">))</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">/</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>      kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">memory</span>&#34;}
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">unless</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> kube_node_spec_taint{}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">)</span>
</span></span></code></pre></div><p>So I&rsquo;m using the <code>kube_pod_container_resource_requests</code> for the <code>memory</code> resource,
but only for Pods on nodes where there is no taint. Then I divide that by the
memory capacity of all nodes which don&rsquo;t have a taint. I use this because the taint
was readily available in the Prometheus data, and my worker nodes are the only
ones which don&rsquo;t have a taint applied to them, so it made sense to use them.</p>
<p>What I did not consider: There are a few non-catastrophic taints which Kubernetes
applies, in my case the disk pressure taint. This simply happened because the disks
were getting a bit full on a few worker nodes due to the many node drains and
subsequent reschedules of Pods. So there were a lot more unused images laying
around locally than was normally the case.</p>
<p>I was quite amused with myself when I realized that I had just spend half an
hour staring at completely the wrong plots. &#x1f601;</p>
<p>And that&rsquo;s it. Here&rsquo;s to hoping that the next Kubernetes update is not interesting
enough to blog about.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Setting up Bookwyrm</title>
      <link>https://blog.mei-home.net/posts/bookwyrm-setup/</link>
      <pubDate>Sun, 31 Aug 2025 23:50:51 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/bookwyrm-setup/</guid>
      <description>Setting up Bookwyrm, translating its docker-compose setup to Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I&rsquo;m adding <a href="https://joinbookwyrm.com">Bookwyrm</a> to my Homelab.</p>
<p>I used to read novels. A lot. On school days, I would spend the approximately
twenty minutes between the end of my morning routine and having to head off with
a novel. Ditto for lazy Sunday evenings. During my service as a conscript, I would
always find space for a book in my pack when we went on a training exercise. At
University, the most difficult decision while packing for a trip home would be
judging how many books I would need to pack to ensure I would not run out.</p>
<p>Getting my first Kindle in 2012 was a revolution. Suddenly, I didn&rsquo;t need to
think very hard anymore - I could take my entire library with me. &#x1f389;</p>
<p>But for the last couple of years, my reading has slowly dwindled. So taking a
break from my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">attempts to set up Tinkerbell</a>,
I decided to set up Bookwyrm, the Fediverse alternative to Goodreads.</p>
<p>Which, in hindsight, looks a bit weird: I want to read more novels. So first thing
to do is more homelabbing. &#x1f605;</p>
<h2 id="bookwyrm">Bookwyrm</h2>
<p>So, what does Bookwyrm look like? While I called it the Fediverse Goodreads
alternative, I never actually used Goodreads. So I wasn&rsquo;t sure exactly why I
was getting myself into.</p>
<p>Here is what my home timeline looks like in Bookwyrm:
<figure>
    <img loading="lazy" src="bookwyrm-profile.png"
         alt="A screenshot of my Bookwyrm Home Timeline. At the top is a menu with Lists, Discover and &#39;Your Books&#39; entries, as well as a search field, and on the far right is a profile picture and a dropdown menu with settings. Below on the left is a carousel with my books, first those I&#39;m currently reading, then two books I&#39;ve finished reading and finally a book I&#39;m wanting to read. For each book, its cover is shown. I will go into detail on which books these are in the main post. Below the carousel are some controls for the selected book. It shows the title and a button labeled &#39;Finish reading&#39;, because the selected book is in my &#39;Currently Reading&#39; shelf. Below that are tabs for writing a review of the book, another tab for adding a general comment, and finally one for posting a quote. Below the text box for entering my review is a button for posting, next to a dropdown for choosing post visibility. In the main part of the screen is my timeline, currently filled with my own posts in chronological order. At the top, the most recent post names a book I want to read, including its title, the series it belongs to and its cover. That post has the typical options, namely replying to it, boosting it and liking it. The next one marks a post about me finishing reading a book as having been boosted by my Mastodon account. At the bottom is another carousel, headed &#39;Who to follow&#39; with a couple of proposed accounts, represented by their profile pictures."/> <figcaption>
            <p>My Home Timeline</p>
        </figcaption>
</figure>
</p>
<p>This represents Bookwyrm pretty nicely. The core function of it is socializing
about books, so all interactions are relative to books. I believe there are private
messages which can just be send to another user, but there is no generic,
Mastodon-like microblogging. All actions are related to a book. In the above
example, you can see two of my posts. The top one represents me marking
<a href="https://bookwyrm.mei-home.net/book/6070/s/the-three-body-problem">The Three-Body Problem</a>
as a book I want to read. The post below it is a boost from my Mastodon account,
where I mark <a href="https://bookwyrm.mei-home.net/book/82/s/false-gods">False Gods</a> as
finished.</p>
<p>On the left of the screenshot is the new post interface, which reinforces what
I wrote above: Bookwyrm is all about books. The new post interface is not just
a text box I can write anything in, but it is instead made up of actions related
to the selected book. For my English-speaking readers, the title roughly translates
to &ldquo;Fateful hour of a Democracy&rdquo;, it&rsquo;s a book about the history of the Weimar
Republic. That short period in German history that should be a hell of a lot more
emphasized in history lessons than what came before or after it, but sadly isn&rsquo;t.</p>
<p>Back to Bookwyrm, I can write a review of the book, including a 0-5 star score,
a general comment, or a Quote from the book. So all actions I can take relate to
the book itself.</p>
<p>Each book also gets its own page, which looks like this:
<figure>
    <img loading="lazy" src="bookwyrm-book-page.png"
         alt="A screenshot of Bookwyrm&#39;s book page for &#39;On Basilisk Station&#39; by David Weber. Below the title is the name of the series, &#39;Honor Harrington&#39;, and the number 1, indicating that it&#39;s the first book. Both the series and the Author name are highlighted to indicate they are links. On the left side, it shows the book&#39;s cover. In this case, of a woman in a military uniform, with a spaceship firing a laser beam in the background. Below that is the rating, full five stars in this case. Then comes some general information about the book, including page count (422 pages), the language, the publishing date and the ISBN. On the right, the main part of the page starts with a description of the book. At the bottom of it is a link indicating nine more editions of the book being available. Then comes a section headed &#39;You have shelved this edition in&#39;, and it shows the &#39;Read&#39; shelf. Then comes a &#39;Your reading activity&#39; section, showing that I started reading this book on August 1st 2004 and finished on August 24th. Below that, the top of new post section I described in the previous section is visible."/> <figcaption>
            <p>An example of a book page</p>
        </figcaption>
</figure>
</p>
<p>Scrolling further down shows the reviews for the book:
<figure>
    <img loading="lazy" src="bookwyrm-book-page-bottom.png"
         alt="Another Bookwyrm screenshot, this time showing the bottom of the book page. There are multiple tabs, one for &#39;Reviews&#39; and one for &#39;Your reviews&#39;. Both just have a single entry, a review from me about the book and the Honor Harrington series overall. Below the review are buttons for boosting, replying and liking."/> <figcaption>
            <p>Bottom of the book page, with a review</p>
        </figcaption>
</figure>
</p>
<p>What I find a bit sad is that it only shows the related reviews and posts, but the
automatically created post about me starting to read the book is nowhere to
be found.</p>
<p>Another problem is finding the &ldquo;instance&rdquo; of a book. Here is a screenshot of
searching for &ldquo;On Basilisk Station&rdquo; in Bookwyrm:
<figure>
    <img loading="lazy" src="book-search.png"
         alt="A screenshot of Bookwyrm&#39;s book search results for &#39;On Basilisk Station&#39;. It shows a variety of results from different Bookwyrm instances. All of them vary, in Author title, publication date, cover art, and full book title, some containing &#39;Honor Harrington&#39;, the series name, as part of the title."/> <figcaption>
            <p>Bookwyrm book search results</p>
        </figcaption>
</figure>

One of the good things here is that it got the right results, they&rsquo;re all for
the correct book. Something I haven&rsquo;t shown here was that the initial result is
only the book from my own instance, but the search can then be broadened to
other sources.
Besides Bookwyrm instances, the search also looks at other sites like Inventaire
and OpenLibrary.</p>
<p>On better federated instances than mine, the book page for the same book looks
a bit more lively:
<figure>
    <img loading="lazy" src="federated-book.png"
         alt="Another screenshot of Bookwyrm&#39;s book page for &#39;On Basilisk Station&#39;, but this time from another instance than mine. The cover art and the description of the book are different. But besides my lone review, it shows reviews by multiple other people below the book&#39;s description. In addition, at the bottom of the page, there is a list of a number of other ratings, without full reviews, from a number of users. Each shows the user&#39;s name, profile pic, their rating of the book, and the date they read it."/> <figcaption>
            <p>The page for the same book as before, but now from books.theunseen.city.</p>
        </figcaption>
</figure>

This example comes from <a href="https://books.theunseen.city/book/38651/s/on-basilisk-station">books.theunseen.city</a>.
So with more connections, the book page will fill up on my instance as well.</p>
<p>And that&rsquo;s it for the Bookwyrm tour. I still haven&rsquo;t dived deeply into it, and
I&rsquo;m currently following only one other person. But I already like it as a way
for people to follow what I&rsquo;m reading. Let&rsquo;s see what the future holds.</p>
<h2 id="deploying-bookwyrm-on-kubernetes">Deploying Bookwyrm on Kubernetes</h2>
<p>Let&rsquo;s get on with the technical part. I of course wanted to deploy Bookwyrm in
my Kubernetes cluster. But its <a href="https://docs.joinbookwyrm.com/install-prod.html">default docs</a>
are geared towards deployment with docker-compose. And the instructions contain
some &ldquo;please run this script&hellip;&rdquo; which I had to integrate into my setup, to
ensure that I didn&rsquo;t have to rely on documenting the commands somewhere.</p>
<p>But the first step had to be to create a container image, as the Bookwyrm
project itself does not supply one.</p>
<h3 id="image-creation">Image creation</h3>
<p>I took the container build instructions from the <a href="https://github.com/bookwyrm-social/bookwyrm/blob/v0.7.5/Dockerfile">official Dockerfile</a>
and added the image to my CI. In the process, I completely remade my container
image build setup, see <a href="https://blog.mei-home.net/posts/improving-container-image-build-perf-with-buildah/">this post</a>
if you&rsquo;re interested.</p>
<p>The ultimate version of the image build looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Dockerfile" data-lang="Dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">ARG</span> python_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> python:${python_ver}</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">ENV</span> PYTHONUNBUFFERED <span style="color:#ae81ff">1</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> mkdir /app /app/static /app/images<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /app</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apt-get update <span style="color:#f92672">&amp;&amp;</span> apt-get install -y gettext libgettextpo-dev tidy libsass-dev <span style="color:#f92672">&amp;&amp;</span> apt-get clean<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> . /app<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> env SYSTEM_SASS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;true&#34;</span> pip install -r requirements.txt --no-cache-dir<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>I made two important changes compared to the official Dockerfile. First, the
official docker-compose deployment just mounts the Bookwyrm source code into
the image to make it available. I wanted the image to be self-contained, so
instead of only copying the <code>requirements.txt</code> file, I copied the entire
source code into the <code>/app</code> directory.</p>
<p>Another change is the addition of <code>libsass-dev</code> to the installed packages, and
adding the <code>SYSTEM_PASS=&quot;true&quot;</code> variable to the <code>pip</code> invocation installing the
dependencies. I found this to be required due to the arm64 image build. During
the amd64 build, a full wheel is available for the <code>libass</code> package. But no
wheel seems to be available for arm64, and so the C++ libsass is getting
build as part of the <code>pip</code> invocation. This takes quite a while on a Pi4,
especially as it looks like the compile is only using one core. The builds looked
like this:</p>
<p><figure>
    <img loading="lazy" src="image-build-time.png"
         alt="A screenshot of Woodpecker&#39;s pipeline overview. It shows a Bookwyrm image build, running for a total of 23 minutes. It has two build steps, one for amd64 and one for arm64. The amd64 image took 05:23 minutes, while the arm64 build of the same image took 17:30."/> <figcaption>
            <p>Image build for Bookwyrm without the system libsass.</p>
        </figcaption>
</figure>

The arm64 build took pretty much 3x as long as the amd64 build. Sure, some of it
can be attributed to the arm64 builds being run on Raspberry Pi 4. But the main
contributing factor was the fact that libsass needed to be rebuild for arm64,
but not for amd64.
After I started using the system libsass, This is what the build times look
like:
<figure>
    <img loading="lazy" src="image-build-sys-libsass.png"
         alt="Another screenshot of Woodpecker&#39;s pipeline overview. It again shows the Bookwyrm image build, but while the amd64 build still takes a comparable 5:40 minutes, the arm64 build now only takes 10:38 minutes. Still a lot longer, but no longer quite as bad."/> <figcaption>
            <p>Some improvements of the image build times after I started using the system libsass instead of letting pip build it.</p>
        </figcaption>
</figure>

Good enough for now.</p>
<p>But there was one issue remaining: As you can see, I&rsquo;m copying the Bookwyrm code
into the image. But I had to get that code from somewhere first, and I wanted
to have it in my Homelab, instead of fetching it from GitHub every time. So
I created a mirror on my Forgejo instance. That brought a new question: How to
fetch that repo from Forgejo from within a Woodpecker job? I could certainly have
made it a public repo and just fetched it, but I figured I would try to do it
properly and fetch it with credentials.</p>
<p>But where to get the credentials from? I didn&rsquo;t want to manually add them to the
repo config in Woodpecker, because I figured that Woodpecker already had the
credentials, because it had to fetch the container image repo where I put the
Containerfile for the Bookwyrm image. Reading up a bit, I found the
<a href="https://woodpecker-ci.org/docs/usage/environment#built-in-environment-variables">environment variable docs for Woodpecker</a>.
These contain the <code>CI_NETRC_USERNAME</code> and <code>CI_NETRC_PASSWORD</code> variables. These
are set to the credentials needed to fetch from the git forge configured for
the repository in Woodpecker. Note that the docs say this:</p>
<blockquote>
<p>Credentials for private repos to be able to clone data. (Only available for specific images)</p></blockquote>
<p>Sadly, it doesn&rsquo;t say which images get a netrc file with the credentials mounted.
I found more docs <a href="https://woodpecker-ci.org/docs/usage/project-settings#custom-trusted-clone-plugins">here</a>,
mentioning trusted clone plugins. I tried to build a small Alpine image with
git installed, but still didn&rsquo;t manage to get the credentials into that image.
The error massage always read:</p>
<pre tabindex="0"><code>fatal: could not read Username for &#39;https://forgejo.example.com&#39;: No such device or address
</code></pre><p>I then dug through the code and tried to find the check, to see what was wrong
with my new Alpine image, why it didn&rsquo;t get the netrc credentials. I found
<a href="https://github.com/woodpecker-ci/woodpecker/blob/e8beddeb36e5e14bd836d5084dbd49ef10a8a768/pipeline/frontend/yaml/types/container.go#L130">this function</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> (<span style="color:#a6e22e">c</span> <span style="color:#f92672">*</span><span style="color:#a6e22e">Container</span>) <span style="color:#a6e22e">IsTrustedCloneImage</span>(<span style="color:#a6e22e">trustedClonePlugins</span> []<span style="color:#66d9ef">string</span>) <span style="color:#66d9ef">bool</span> {
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">return</span> <span style="color:#a6e22e">c</span>.<span style="color:#a6e22e">IsPlugin</span>() <span style="color:#f92672">&amp;&amp;</span> <span style="color:#a6e22e">utils</span>.<span style="color:#a6e22e">MatchImageDynamic</span>(<span style="color:#a6e22e">c</span>.<span style="color:#a6e22e">Image</span>, <span style="color:#a6e22e">trustedClonePlugins</span><span style="color:#f92672">...</span>)
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Note that it doesn&rsquo;t just check the image, but also verifies that the step is
a plugin, not just an image executing commands. Instead of building a plugin,
I decided to try to work with the official clone plugin, which is also used to
clone the initial repository for a Woodpecker pipeline run. This ultimately
worked, and the step for fetching the Bookwyrm repo mirror from my Forgejo
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">clone bookwyrm repo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-git</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">depth</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">branch</span>: <span style="color:#ae81ff">production</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">partial</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remote</span>: <span style="color:#ae81ff">https://forgejo.example.com/mirrors/bookwyrm.git</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ref</span>: <span style="color:#e6db74">&#39;v0.7.5&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/woodpecker/bookwyrm</span>
</span></span></code></pre></div><p>Note that the <code>/mirrors/</code> part of the URL is not necessary to use it as a mirror,
I just put my Forgejo mirrors into a group called <code>mirrors</code>.</p>
<p>And with this, I was ending up with the Bookwyrm repo, checked out to the tag <code>v0.7.5</code>
in <code>/woodpecker/bookwyrm</code> in the rest of the pipeline steps.</p>
<p>Getting to the point of having the Bookwyrm image was quite a ride, but now it&rsquo;s
time for the actual Kubernetes deployment.</p>
<h3 id="kubernetes-deployment">Kubernetes deployment</h3>
<p>When it comes to dependencies, Bookwyrm requires a Postgres DB and Redis, plus
it supports an S3 bucket for media and other static assets. I will not go into
detail on those dependencies. If you&rsquo;re curious about how I&rsquo;m setting them
up in my Homelab, here are the two relevant posts:</p>
<ul>
<li><a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">CloudNativePG Postgres DB setup</a></li>
<li><a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/">Ceph S3 setup</a></li>
</ul>
<p>When looking at Bookwyrm&rsquo;s <a href="https://docs.joinbookwyrm.com/install-prod.html">setup docs</a>,
it requires executing a script during initial deployment.</p>
<blockquote>
<p>Initialize the database by running ./bw-dev migrate</p></blockquote>
<p>And:</p>
<blockquote>
<p>Initialize the application with ./bw-dev setup, and copy the admin code to use when you create your admin account.</p></blockquote>
<p>So I needed to somehow integrate that into my setup. Looking at the <a href="https://github.com/bookwyrm-social/bookwyrm/blob/v0.7.5/bw-dev">bw-dev script</a>,
it became pretty clear that Bookwyrm is really geared towards a docker-compose
deployment. The script is intended to be run outside of the Bookwyrm container,
as indicated by the fact that it calls docker-compose to achieve things:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> runweb <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    $DOCKER_COMPOSE run --rm web <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> initdb <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    runweb python manage.py initdb <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> migrate <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    runweb python manage.py migrate <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> admin_code <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    runweb python manage.py admin_code
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>This of course won&rsquo;t work in a Kubernetes deployment. To work around this, I
wrote my own script, using the <code>manage.py</code> commands directly, without calling
the <code>bw-dev</code> script. It ended up looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bookwyrm.sh</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #! /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    migrate() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py migrate &#34;$@&#34; || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    initdb() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py initdb &#34;$@&#34; || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    init() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;Running init function...&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      migrate || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      migrate &#34;django_celery_beat&#34; || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      initdb || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py compile_themes || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py collectstatic --no-input || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py admin_code || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      return 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    update() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;Running update function...&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      migrate || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py compile_themes || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py collectstatic --no-input || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      return 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    op=&#34;${1}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    if [[ &#34;${op}&#34; == &#34;init&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      init || exit 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    elif [[ &#34;${op}&#34; == &#34;update&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      update || exit 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    else
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;Unknown operation ${op}, aborting.&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      exit 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    fi
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    exit 0</span>
</span></span></code></pre></div><p>This script supports two functions, the first deployment initialization when
running <code>bookwyrm.sh init</code>, and the possible migration required during updates,
with <code>bookwyrm.sh update</code>.</p>
<p>Next question, how to run the script? For that, I looked into <a href="https://helm.sh/docs/topics/charts_hooks/">Helm chart hooks</a>.
These are annotations put into a template in a Helm chart which instantiates the
template only under certain circumstances. There are hooks available for all
phases of the Helm chart lifecycle, from install over delete to updates.</p>
<p>I sadly couldn&rsquo;t make use of the <code>post-install</code> hook for the <code>init</code> part of the
Bookwyrm script, because I had already installed the chart, as it also contains
the CloudNativePG and S3 bucket templates. and I already installed that part
of the chart. So for the init step, I opted for a simple workaround. The Job&rsquo;s
manifest looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>{{- <span style="color:#ae81ff">if .Values.runInit }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">batch/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Job</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-init</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-init</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">restartPolicy</span>: <span style="color:#ae81ff">Never</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">init-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">/hl/bookwyrm.sh</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/hl</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>{{- <span style="color:#ae81ff">end }}</span>
</span></span></code></pre></div><p>So it only gets created when the value <code>runInit</code> is <code>true</code> in the <code>values.yaml</code>
file.</p>
<p>But for the update Job, which does DB migrations and regenerates static assets,
I was able to use the <code>pre-upgrade</code> hook. The manifest looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">batch/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Job</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-update</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;helm.sh/hook&#34;: </span><span style="color:#ae81ff">pre-upgrade</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-update</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">restartPolicy</span>: <span style="color:#ae81ff">Never</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">/hl/bookwyrm.sh</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/hl</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span></code></pre></div><p>Note especially this part:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;helm.sh/hook&#34;: </span><span style="color:#ae81ff">pre-upgrade</span>
</span></span></code></pre></div><p>That is what marks the Job as a hook to be run before anything else is updated.</p>
<p>The upgrade hook has one unfortunate semantic though - it will be launched
whenever the Helm chart is updated. Not just when the Bookwyrm version is incremented.
What that means is that any time there is any change to the chart, even if it is
just an added label for example, the Job will be executed. And it will be executed
during the <code>helm upgrade</code> run, and before anything else. So you run <code>helm upgrade</code>,
and Helm won&rsquo;t return immediately. It will wait for the hook to finish running,
and only then update all of the other manifests, where necessary. So these Helm
runs will take a bit longer.
But that still seems to be a relatively small prize compared to having the
instructions written in a documentation page I need to remember to execute when
Bookwyrm is updated.</p>
<p>Here is some of the output of my run of the Bookwyrm initialization:</p>
<pre tabindex="0"><code>Running init function...
Operations to perform:
  Apply all migrations: admin, auth, bookwyrm, contenttypes, django_celery_beat, oauth2_provider, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  [...]
Operations to perform:
  Apply all migrations: django_celery_beat
Running migrations:
  No migrations to apply.
  Your models in app(s): &#39;bookwyrm&#39; have changes that are not yet reflected in a migration, and so won&#39;t be applied.
  Run &#39;manage.py makemigrations&#39; to make new migrations, and then re-run &#39;manage.py migrate&#39; to apply them.
Compiled SASS/SCSS file: &#39;/app/bookwyrm/static/css/themes/bookwyrm-dark.scss&#39;
Compiled SASS/SCSS file: &#39;/app/bookwyrm/static/css/themes/bookwyrm-light.scss&#39;
257 static files copied.
*******************************************
Use this code to create your admin account:
1234-56-78-910-111213
*******************************************
</code></pre><p>Especially the last part is important, as that code is needed to create the
initial admin account.</p>
<p>With that done, I was finally ready to write the Deployment. For that, I took
the official <a href="https://github.com/bookwyrm-social/bookwyrm/blob/v0.7.5/docker-compose.yml">docker-compose file</a>
as a blueprint:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nginx</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nginx:1.25.2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#ae81ff">unless-stopped</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;1333:80&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">web</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./nginx:/etc/nginx/conf.d</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">db</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">postgres:13</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">pgdata:/var/lib/postgresql/data</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">python manage.py runserver 0.0.0.0:8000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_activity</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;8000&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_activity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">redis:7.2.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">redis-server --requirepass ${REDIS_ACTIVITY_PASSWORD} --appendonly yes --port ${REDIS_ACTIVITY_PORT}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./redis.conf:/etc/redis/redis.conf</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_activity_data:/data</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_broker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">redis:7.2.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">redis-server --requirepass ${REDIS_BROKER_PASSWORD} --appendonly yes --port ${REDIS_BROKER_PORT}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./redis.conf:/etc/redis/redis.conf</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker_data:/data</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_worker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm worker -l info -Q high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_beat</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">flower</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dev-tools</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">dev-tools</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/app/dev-tools/</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">profiles</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">tools</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgdata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">static_volume</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">media_volume</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">exports_volume</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_broker_data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_activity_data</span>:
</span></span><span style="display:flex;"><span><span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">main</span>:
</span></span></code></pre></div><p>It&rsquo;s a pretty long one, so let&rsquo;s go through one-by-one. I skipped the Nginx
deployment entirely, as I&rsquo;m using Bookwyrm&rsquo;s S3 support for static assets and
images, and with that, the Nginx deployment doesn&rsquo;t seem to be necessary. For the
same reason, I also don&rsquo;t have any volumes for <code>/app/static</code> and <code>/app/images</code>.
I initially had volumes there, as the docs were not 100% clear whether the
directories might still be used even with S3, but after a couple of days of
running Bookwyrm, I found them to still be empty and removed the volumes. I also
ignored the <code>dev-tools</code> service, as that also seemed to be unnecessary. I also
skipped the <code>redis_activity</code> and <code>redis_broker</code> services as well as the <code>db</code>
service, as I already created those by using CloudNativePG and my existing Redis
instance.</p>
<p>That left me with the following services to run:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">python manage.py runserver 0.0.0.0:8000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_activity</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;8000&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_worker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm worker -l info -Q high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_beat</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">flower</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">main</span>:
</span></span></code></pre></div><p>One thing to note is that they all use the same <code>.env</code> file, and Bookwyrm&rsquo;s stack
is mostly configured via environment variables, which I applaud. So to not have
to copy the env for each container, I added this section to my <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POD_IP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">fieldRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fieldPath</span>: <span style="color:#ae81ff">status.podIP</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DEBUG</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ALLOWED_HOSTS</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm.example.com,localhost,$(POD_IP)&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SECRET_KEY</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">key</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DOMAIN</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">USE_HTTPS</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">PGPORT</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_DB</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_HOST</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_ACTIVITY_URL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379/0&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_BROKER_URL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379/1&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_BASIC_AUTH</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;$(FLOWER_USER):$(FLOWER_PASSWORD)&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_PORT</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;8888&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_HOST</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;mail.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_PORT</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_HOST_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm@example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_HOST_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mail-pw</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_SENDER_NAME</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_SENDER_DOMAIN</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">USE_S3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_STORAGE_BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">configMapKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_S3_CUSTOM_DOMAIN</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;s3-bookwyrm.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_S3_ENDPOINT_URL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ENABLE_THUMBNAIL_GENERATION</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span></code></pre></div><p>I won&rsquo;t go through all of the options, but there are a few I would like to highlight.
First, the <code>POD_IP</code> setting is important for Kubernetes probes to work. They will
by default access the pod via its IP, and that IP needs to be specifically allowed
for Django apps. I&rsquo;ve had a similar issue with <a href="https://docs.paperless-ngx.com/">Paperless-ngx</a>
before, which is also a Django app.</p>
<p>Another one is the flower auth:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_BASIC_AUTH</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;$(FLOWER_USER):$(FLOWER_PASSWORD)&#34;</span>
</span></span></code></pre></div><p>In the docker-compose example from Bookwyrm, the credentials are provided on
the command line:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">flower</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span></code></pre></div><p>I was never really able to get this working - for reasons I&rsquo;m unsure about but
probably have something to do with string escaping, I was not able to login with
my credentials. So I moved them to the <code>FLOWER_BASIC_AUTH</code> environment variable,
at which point they immediately started working.</p>
<p>With all of that out of the way, here is the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">bookwyrm</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>      {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">bookwyrm</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;python&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;manage.py&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;runserver&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;0.0.0.0:{{ .Values.ports.web }}&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.web }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.web }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-celery-worker</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;celery&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;celerywyrm&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-l&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;info&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-Q&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-celery-beat</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;celery&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;celerywyrm&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;beat&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-l&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;INFO&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--scheduler&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;django_celery_beat.schedulers:DatabaseScheduler&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-flower</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;celery&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;celerywyrm&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;flower&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--url_prefix=flower&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.flower }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>Only one comment to the above: Take the resource requests with a grain of salt,
I haven&rsquo;t gotten around to looking at the metrics for the first week of
deployment. The above values are still the semi-random values I drew out of a
hat while writing the manifest.</p>
<p>At this point, I thought I was done. But that would have been too easy.</p>
<h3 id="the-power-of-css">The power of CSS</h3>
<p>The reason I was sure I wasn&rsquo;t done yet is that the home page of Bookwyrm looked
like this when I first opened it:</p>
<figure>
    <img loading="lazy" src="bookwyrm-unstyled.png"
         alt="A screenshot of the homepage of my Bookwyrm instance before logging in. It is a bit...minimal, shall we say. The only styling visible is the font size of headings and the fact that those are written in bold, and the fact that links have the typical link coloring. Everything, including text boxes for username/password entry, is completely unstyled. And everything is squished on the left side of the page."/> <figcaption>
            <p>There&rsquo;s clearly something wrong.</p>
        </figcaption>
</figure>

<p>Obviously, that&rsquo;s not what it&rsquo;s supposed to look like. Those of you who are a
bit more familiar with webdev than I am will likely immediately see that there&rsquo;s
some problem with the CSS, but to me it was not quite that clear. A look into the
browser console with messages about the file not being found lead me to the
same conclusion. I saw the following when opening the sources of the page:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-html" data-lang="html"><span style="display:flex;"><span>&lt;<span style="color:#f92672">link</span> <span style="color:#a6e22e">href</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;https://s3-bookwyrm.mei-home.net/css/themes/bookwyrm-light.css&#34;</span> <span style="color:#a6e22e">rel</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;stylesheet&#34;</span> <span style="color:#a6e22e">type</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;text/css&#34;</span> /&gt;
</span></span></code></pre></div><p>But when looking at the S3 bucket, I saw that the file was at <code>/static/...</code>.
Searching a bit, I found <a href="https://github.com/bookwyrm-social/bookwyrm/issues/3383">this bug</a>.
It was already fixed in the newest release, <code>v0.7.5</code>, but I had started out with
<code>v0.7.4</code>, as I wanted to have a chance to test my upgrade hook/script right away.</p>
<p>After updating to <code>v0.7.5</code>, I at least got some proper styling, but it still
looked like some things were missing:
<figure>
    <img loading="lazy" src="bookwyrm-cors.png"
         alt="A screenshot of the homepage of my Bookwyrm instance. This time, there&#39;s definitely some styling present. But notably, some font issues are visible, with only the glyphs with the Unicode numbers showing, not the actual symbols."/> <figcaption>
            <p>Finally styled, but still with some font glyhs clearly missing.</p>
        </figcaption>
</figure>
</p>
<p>Note especially the missing glyphs for the symbols above &ldquo;Dezentral&rdquo;, &ldquo;Freundlich&rdquo;
and &ldquo;Nichtkommerziell&rdquo;. And please forgive the partial German language, I hadn&rsquo;t
realized the language mix when taking the screenshot.</p>
<p>Looking at the browser console again, I saw <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS/Errors/CORSMissingAllowOrigin">this error message</a>.
Checking a bit further, I found that I missed a part of Bookwyrm&rsquo;s <a href="https://docs.joinbookwyrm.com/external-storage.html#cors-settings">S3 setup docs</a>.
I followed <a href="https://docs.hetzner.com/storage/object-storage/howto-protect-objects/cors/">these docs from Hetzner</a>
to apply the necessary CORS configs to my S3 bucket. I couldn&rsquo;t directly apply
the JSON config provided in the Bookwyrm docs, because <code>s3cmd</code>, my default S3
tool, doesn&rsquo;t support JSON for the CORS config, only XML. So I translated it
to this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#f92672">&lt;CORSConfiguration&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&lt;CORSRule&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedHeader&gt;</span>*<span style="color:#f92672">&lt;/AllowedHeader&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>GET<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>HEAD<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>POST<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>PUT<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>DELETE<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;MaxAgeSeconds&gt;</span>3000<span style="color:#f92672">&lt;/MaxAgeSeconds&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;ExposeHeader&gt;</span>Etag<span style="color:#f92672">&lt;/ExposeHeader&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedOrigin&gt;</span>https://bookwyrm.example.com<span style="color:#f92672">&lt;/AllowedOrigin&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&lt;/CORSRule&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">&lt;/CORSConfiguration&gt;</span>
</span></span></code></pre></div><p>I stored the above XML config into a <code>cors.xml</code> file and applied it to my
Bookwyrm bucket with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd -c s3-conf setcors cors.xml s3://bookwyrm/
</span></span></code></pre></div><p>Here, <code>s3-conf</code> is the s3cmd config for my Ceph S3 setup.</p>
<p>And after that, I was finally done: Bookwyrm looked like it was supposed to! &#x1f389;</p>
<h3 id="initial-network-sync">Initial network sync?</h3>
<p>After I had finally set up my instance, I started to enter a few books, mostly
for testing purposes. Which was when I realized that I could hear a lot of disk
activity. And looking at my metrics, I found that the Bookwyrm container was
using a lot of CPU:
<figure>
    <img loading="lazy" src="high-cpu.png"
         alt="Screenshot of a Grafana time series graph. It shows the entirety of August 24th, from 00:00 to 23:59. For most of this time, the graph, which is from the bookwyrm-celery-worker container, shows more or less a flat line around 0.01, with only very occasional spikes to 0.6 at max. Then came 16:21, and the CPU utilization suddenly went up to peaks of 1.7 and did not get lower than 0.6 anymore, mostly oscillating around 1.4. This went on until about 21:45, when the line went back to 0.01."/> <figcaption>
            <p>CPU utilization of the bookwyrm-celery-worker container.</p>
        </figcaption>
</figure>
</p>
<p>Looking around a bit more, I also found that there were a lot of new objects
created in my S3 pool on Ceph:
<figure>
    <img loading="lazy" src="ceph-objects.png"
         alt="Another Grafana time series screenshot. This time, it shows the object creation and deletion in the Ceph pool used for data storage for my S3 setup. It again shows the entire day, from 00:00 to 23:59. It again mostly stays around 0, meaning no objects are created or deleted. But there is a very regular spike of 12 new objects being created every five minutes. Besides that, there are a couple of spikes, both for lots of added and lots of removed objects. The main event again happens starting around 16:21, with the creations suddenly increasing to about 600 objects. This goes on, like the celery cPU usage from the previous graph, to about 21:45, when it returns to the previous levels."/> <figcaption>
            <p>Object changes in the S3 data pool, negative values are removed objects, positive values are numbers of added objects.</p>
        </figcaption>
</figure>
</p>
<p>So it seemed that something was going on with Bookwyrm there, but I had no idea
what it might be. Checking the S3 bucket, I saw a lot more book covers appearing
in there. But I hadn&rsquo;t even done much at that point, just added a handful of
books. At that point I was flailing a little bit for what it might be. Then I
had the idea of looking at flower, which the Bookwyrm docs advertise as a way
to look at ongoing tasks.</p>
<p>This was the picture presented to me at the time:
<figure>
    <img loading="lazy" src="bookwyrm-flower-tasklist.png"
         alt="A screenshot of flower&#39;s task list. It shows a lot of them, an entire screen of 15 tasks, all started just between 19:39:15 and 19:39:26. The shown task names only have two variations, &#39;base_activity.set_related_field&#39; and &#39;add_status_task&#39;. The args are also shown, and all seem to be the addition of &#39;Works&#39;, which I think is a book in Bookwyrm&#39;s object model."/> <figcaption>
            <p>List of tasks in Flower</p>
        </figcaption>
</figure>
</p>
<p>Noteworthy is that most of the tasks are related to <code>Work</code> objects, which, if
I&rsquo;m not mistaken, are books in Bookwyrm&rsquo;s object model. So there seem to be a lot
of things being done with a lot of books. And I had only added two or three
books myself at that point, and hadn&rsquo;t followed a single person yet. Also note
that the tasks all started in the same minute, 19:39. And it went on and on like
this.</p>
<p>Then I saw that there&rsquo;s a link to my instance in the <code>args</code> column, and I clicked
one of the tasks to get to this details page:
<figure>
    <img loading="lazy" src="bookwyrm-single-task.png"
         alt="A screenshot of flower&#39;s task details for one of the &#39;base_activity.set_related_field&#39; tasks. The important part here is the full content of the args value: &#39;Edition,Work,parent_work,https://bookwyrm.mei-home.net/book/16858,https://bookwyrm.social/book/151006&#39;."/> <figcaption>
            <p>Example of task details.</p>
        </figcaption>
</figure>
</p>
<p>I then checked which book the shown <a href="https://bookwyrm.mei-home.net/book/16895/s/the-dark-tower">https://bookwyrm.mei-home.net/book/16858</a>
URL from the <code>args</code> value points to:
<figure>
    <img loading="lazy" src="dark-tower-book.png"
         alt="A screenshot of the Bookwyrm book page for Stephen King&#39;s The Dark Tower."/> <figcaption>
            <p>This was the book the flower task related to</p>
        </figcaption>
</figure>
</p>
<p>The thing is: I hadn&rsquo;t interacted with that book, at all. So I tried a few more
books from other flower tasks, and they were the same - books I had not interacted
with. So the only conclusion I can draw for now is that Bookwyrm looks at all
known instances and downloads their entire database of books and adds them to
my instance?</p>
<p>If you actually know what&rsquo;s going on here, please contact me at <a href="https://social.mei-home.net/@mmeier">my Mastodon account</a>
and tell me. I&rsquo;m genuinely curious.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>I&rsquo;m really curious what that initial database sync (?) was for.</p>
<p>The Bookwyrm setup also holds one last challenge: Resisting the temptation of
entering all the books I&rsquo;ve read in the last 32 years. &#x1f605;</p>
<p>Last but not least, if you&rsquo;d like to follow my reading, I&rsquo;m <a href="https://bookwyrm.mei-home.net/user/mmeier">https://bookwyrm.mei-home.net/user/mmeier</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Organizing Helm charts and other Manifests with Helmfile</title>
      <link>https://blog.mei-home.net/posts/helmfile/</link>
      <pubDate>Thu, 05 Jun 2025 21:20:53 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/helmfile/</guid>
      <description>How to organize and handle it all?</description>
      <content:encoded><![CDATA[<p>Wherein I describe how I organize Helm charts and other k8s manifests.</p>
<p>I&rsquo;ve had this post laying around in my draft folder for a long long time.
Mostly because I started writing it before I realized how useful it is to write
posts very close to when something happens.</p>
<p>The &ldquo;something happens&rdquo; in this case is the answer to the question &ldquo;How to
organize my Helm charts and other k8s manifests?&rdquo;. I liked Helm fine enough when
I looked at it. It&rsquo;s pretty nice to get all necessary manifests to run an app,
instead of having to write all of them myself.
But the question then was: How to store which exact Helm charts I have
installed, and in which version? And how/where to store the <code>values.yaml</code> files?
And then, what about random manifests, like additional PriorityClasses?</p>
<p>The solution that was pointed out to me on the Fediverse: <a href="https://github.com/helmfile/helmfile">Helmfile</a>.
It&rsquo;s a piece of software that allows reading a number of Helm charts to be
installed and deploying them onto a cluster. It does not re-implement Helm, but
simply calls a previously installed Helm binary.</p>
<p>All of the configuration for Helmfile is stored in a local Yaml file. A
good example for what that config looks like is my <a href="https://cloudnative-pg.io/">CloudNativePG</a>
setup. Helmfile by default reads the config from a file named <code>helmfile.yaml</code>
in the current working dir. My <code>helmfile.yaml</code>, stripped down only to the
CNPG setup, looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cloud-native-pg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://cloudnative-pg.github.io/charts</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">cloud-native-pg/cloudnative-pg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v0.21.2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./cnpg-operator/hl-values.yaml.gotmpl</span>
</span></span></code></pre></div><p>And the <code>hl-vaues.yaml.gotmpl</code> is then just the <code>values.yaml</code> file for the
CNPG Helm chart. With one additional wrinkle: Helmfile can do templating, on the
<code>values.yaml</code> file. Which is pretty cool. Just one example of how I&rsquo;m using this
is my <a href="https://external-secrets.io/latest/">external-secrets</a> addon <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:8200/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>Then in turn, I&rsquo;m writing that to a Secret:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-ca-cert</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><p>And the curl command is executed on the machine where Helmfile is executed. Which
is particularly nice when you&rsquo;re fetching some Secrets via this mechanism, because
it allows you to use local credentials that only exist on that single machine.</p>
<p>Once you&rsquo;ve entered a release into the Helmfile, it can be deployed with a
command like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>helmfile apply --selector name<span style="color:#f92672">=</span>cnpg-operator
</span></span></code></pre></div><p>This will automatically update all repositories and then run <code>helm upgrade</code>.
Very helpfully, it will also output the diff between the new release and what&rsquo;s
currently deployed on the cluster.</p>
<p>Besides working with Helm charts directly, you can also just throw a couple of
manifests into a directory and deploy it the same way. I&rsquo;m doing this for my
own priority classes for example. I just have them in a directory <code>hl-common/</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ls hl-common/
</span></span><span style="display:flex;"><span>prio-hl-critical.yaml  prio-hl-external.yaml
</span></span></code></pre></div><p>Helmfile will then use <a href="https://github.com/helmfile/chartify">Chartify</a> to
turn those loose files into an ad-hoc chart and deploy it.</p>
<p>The <code>release[].values[]</code> list is also a pretty useful feature. It allows setting
Helm chart values right in the Helmfile instead of a separate <code>values.yaml</code>.
I don&rsquo;t use this too much, as I like having all configs neatly in one file. But
I like using this approach in one instance, namely for <code>appVersion</code>-like values
on Helm charts I wrote myself. Here&rsquo;s an example from my Audiobookshelf entry:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">audiobookshelf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">./audiobookshelf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">audiobookshelf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">appVersion</span>: <span style="color:#e6db74">&#34;2.23.0&#34;</span>
</span></span></code></pre></div><p>The fact that I have the appVersion in the Helmfile directly makes it a lot more
convenient when I do my regular service update rounds. Unless something deeper
changed, I just need to have my Helmfile open during Service Upgrade Friday and
either update the chart version or the <code>appVersion</code> right there, without having
to switch between all of the <code>values.yaml</code> or <code>Chart.yaml</code> files.</p>
<p>For my standard approach, I&rsquo;m currently working with two release entries when
using a 3rd party chart. Let&rsquo;s look at my <a href="https://forgejo.org/">Forgejo</a>
deployment as an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">code.forgejo.org/forgejo-helm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">oci</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">forgejo/forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">12.5.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./forgejo/hl-values.yaml.gotmpl</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-addons</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">./forgejo-addons</span>
</span></span></code></pre></div><p>In this approach, the <code>forgejo/hl-values.yaml.gotmpl</code> file is the <code>values.yaml</code>
file for the Forgejo chart. But, in most instances, 3rd party charts don&rsquo;t
contain everything I need. One example which comes up almost every single time
are additional ExternalSecret manifests for credentials, or ObjectBucketClaims
for S3 buckets in my Ceph cluster. And those Yaml files need to go somewhere.</p>
<p>And that&rsquo;s what the <code>$chartname-addon</code> chart is for. It&rsquo;s a normal Helm chart,
including <code>Chart.yaml</code> and <code>templates/</code> directory. It also gets its own <code>values.yaml</code>
file. It gets deployed into the same Namespace as the primary chart.</p>
<p>I also trialed a different approach with some of my earliest charts. For those,
I created a &ldquo;parent&rdquo; chart, which contained the <code>Chart.yaml</code> and any additional
manifests on top of the 3rd party chart. Then said 3rd party chart got added
as a dependency. But I moved away from that approach, as I found the separation
between 3rd party chart and my own manifests in the <code>$chartname-addons</code> approach
more appealing. There was also the
fact that I couldn&rsquo;t just update the version of the 3rd party chart and then
deploy - Helm would always error out due to the <code>Chart.lock</code> file being
outdated. I moved away from this model completely.</p>
<h2 id="why-not-gitops">Why not GitOps?</h2>
<p>So the obvious question might be: Why not employ GitOps like Argo or Flux?
Mostly: Time. &#x1f601; I&rsquo;m not adverse to adding additional complexity to my
Homelab just for the fun of it. But a GitOps tool should have its own management
cluster, as it wouldn&rsquo;t make much sense to me to have e.g. ArgoCD running in
the same cluster that it&rsquo;s managing. So I skipped this option when I initially
looked for how I wanted to manage it all.</p>
<p>There&rsquo;s also the additional hassle of &ldquo;Okay, and then where will I store the
repo and execute the automation?&rdquo;. I have a Forgejo instance and Woodpecker as
CI, but both of those are deployed in my main cluster. So they would be controlled
by ArgoCD - which they would also be hosting.
But on the other hand, there&rsquo;s also the challenge to come up with something
reasonably small that can serve ArgoCD without being too much of a hassle.</p>
<p>Finally, there&rsquo;s also my current workflow: I generally work on a thing until it
works properly, and then it gets a commit in the Homelab repo. It would feel a
bit weird to make a commit for every thing I change, for no other reason than
that I need said commit to trigger a new deployment. I&rsquo;m used to this approach
from work, but there the CI triggers hundreds upon hundreds of jobs and tens of
thousands of tests. It is literally impossible to run the software on our
developer machines. But here? Making a commit for every change, pushing it just
to make a test deploy - it just feels a bit much?</p>
<p>All of the above being said - I&rsquo;d really like to hear what those of you who do
run GitOps tools to manage your cluster get out of it. What advantages does it
have for you? And what&rsquo;s your workflow? Do you perhaps always work with Helm
locally, and then let Argo do it&rsquo;s thing once everything already works? Ping me
<a href="https://social.mei-home.net/@mmeier">on the Fediverse</a>. I&rsquo;m genuinely curious.
And quite frankly, I want to be convinced - one more project for the Homelab
pile. &#x1f601;</p>
<h2 id="finale">Finale</h2>
<p>And that&rsquo;s it already for this one. I&rsquo;ve had it sitting in draft state for way
too long.</p>
<p>The next post will likely be on the setup of the tinkerbell lab, as I&rsquo;m done
with that now and have already deployed tinkerbell - but it&rsquo;s not working properly
yet.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Migrating my Kubernetes Control Plane to Raspberry Pi 5</title>
      <link>https://blog.mei-home.net/posts/control-plane-pi5/</link>
      <pubDate>Mon, 12 May 2025 00:05:05 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/control-plane-pi5/</guid>
      <description>Migrating my Kube control plane from Pi 4 to Pi 5</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve had problems with the stability of my Kubernetes control plane ever since
I migrated it to three Raspberry Pi 4 from their temporary home on a beefy x86
server. I will be going into more detail about the problem first, describe the
Pi 5 with NVMe a bit, and then describe the migration itself.</p>
<h2 id="the-problem">The problem</h2>
<p>I&rsquo;ve noted in a couple of the last posts that I&rsquo;ve started seeing instability
in my Kubernetes control plane. The main symptom I saw were my HashiCorp Vault
Pods going down regularly. This was pretty visible because I have not automated
unsealing for Vault, so each time the Pods are restarted, I have to manually
enter the unseal passphrase.</p>
<p>But looking closer at the nodes, all three Raspberry Pi 4 4GB showed a very high
number of restarts for all of their Pods:</p>
<ul>
<li>kube-vip, which I use to provide a virtual IP for the k8s API</li>
<li>kube-apiserver</li>
<li>kube-scheduler</li>
<li>kube-controller-manager</li>
<li>Ceph MON</li>
</ul>
<p>The only component which wasn&rsquo;t regularly restarted was etcd. I tried to dig
really deeply into the issue, but was never able to figure out what really
triggered the restarts. There were
a lot of timeouts in the logs of etcd, kube-apiserver and kube-vip. There are
also some really long multi-minute periods where the etcd cluster is unable to
elect a new leader because they think they&rsquo;re in different terms. In the end it
all always seems to heal itself, I never needed to manually interfere to get the
cluster back. But it didn&rsquo;t look good.</p>
<p>The following two plots illustrate this by showing the <code>apiserver_request_aborts_total</code>
and the <code>etcd_request_errors_total</code> metrics for the period where the Pi 4 were
running the control plane. Both metrics show the rate, summed up over all
label values.</p>
<p>Here is the <code>etcd_request_errors_total</code> metric:
<figure>
    <img loading="lazy" src="etcd-errors.png"
         alt="A screenshot of a Grafana time series plot. It shows the rate of errors in the etcd component of my Kubernetes cluster. The plot goes from April 11th to May 2nd. In the beginning, until April 13th, the plot is straight zero. Starting around 00:00 on the 13th, there are constant errors shown. Only at a rate of, at max, 0.6 per second, and most of the time far below that, but still - there were no errors at all before that. Then there&#39;s a large spike around 12:00 on May 2nd up to a rate of three errors per second, after which the plot goes back to straight zero until the end."/> <figcaption>
            <p>Rate of etcd request errors per second. I finished the migration of the control plane to the Pi 4 around 00:00 on April 13th. I migrated to the Pi 5 on May 1st.</p>
        </figcaption>
</figure>
</p>
<p>While the error rate is not <em>that</em> high, it&rsquo;s pretty clear that it started after
I migrated the control plane to the Pi 4 around April 12th, and vanished
completely after I migrated to the Pi 5.
That large spike on May 1st was when I accidentally bumped the USB-to-SATA adapter
of one of the Pi 4 nodes while another one was already down for replacement. The
single remaining Pi 4 did not take that very well. &#x1f605;</p>
<p>Here is a slightly different view of the aborted apiserver requests during the
same period:
<figure>
    <img loading="lazy" src="apiserver-aborts.png"
         alt="A screenshot of a Grafana time series plot. It shows the rate of aborts in the apiserver component of my Kubernetes cluster. The plot goes from April 11th to May 2nd. In the beginning, until April 11th, the plot is straight zero. Starting around 00:00 on the 11th, there are constant aborts shown. Only at a rate of, at max, 0.1 per second, and most of the time far below that, but still - there were no errors at all before that. Then there&#39;s a large spike around 12:00 on May 2nd up to a rate of 0.25 aborts per second, after which the plot goes back to straight zero until the end."/> <figcaption>
            <p>Rate of apiserver request errors per second. I finished the migration of the control plane to the Pi 4 around 00:00 on April 13th. I migrated to the Pi 5 on May 1st.</p>
        </figcaption>
</figure>
</p>
<p>These two plots already show pretty conclusively that something was wrong after
I migrated the control plane to the Pi 4. And that the migration to the Pi 5
fixed the issue. Here is a final plot, showing the container restarts for the
kube-apiserver, kube-scheduler, kube-controller-manager and Vault:
<figure>
    <img loading="lazy" src="container-restarts.png"
         alt="Another Grafana plot over the same time period. This time it shows the number of container restarts. Again, the plot is mostly flat up to about 00:00 on April 13th. Then it has several periods of 20&#43; restarts, but also some periods with no restarts at all. In the evening of April 19th, there is a couple of large spikes up to 120 restarts. The plot goes flat again after May 1st."/> <figcaption>
            <p>The increase in container restarts over the past hour.</p>
        </figcaption>
</figure>
</p>
<p>It&rsquo;s clear here that the problem was not persistent - there were several days
where no restart at all happened. But the problem was definitely there. One
major problem was that I couldn&rsquo;t really figure out what triggered the restart.
I spend several hours looking at the logs on the control plane hosts, but wasn&rsquo;t
able to identify the real culprit. It looked like at some point etcd just got
overwhelmed, which made both the local apiserver and then finally the kubelet
error out, leading to a round of container restarts.</p>
<p>There were also no clear indications in the machine&rsquo;s metrics. The only thing I
found was some increased IOWAIT time on the CPUs, but at the same time it didn&rsquo;t
look like the IO was actually overwhelmed.</p>
<p>I ended up with the conclusion that I was asking just a bit too much of the poor
Pi 4, and decided that this was the right moment to look at the Pi 5 and its
NVMe-capable PCIe connection.</p>
<h2 id="the-raspberry-pi-5">The Raspberry Pi 5</h2>
<p>When looking for a replacement for the three Raspberry Pi 4, it was pretty clear
that I would be going with the new Pi 5. Most of my Homelab already consists of
Pis, and at least the Pis are supported by an array of mainstream Linux
distros instead of empty promises. The main new feature of the Pi 5 for me is the
fact that it now provides an interface to a PCIe Gen2 x1 lane by default. This
lane can be updated to Gen3, but that&rsquo;s currently not officially supported.
With this PCIe lane comes the ability to connect to an NVMe SSD and even booting
off of it. As I suspected that part of my problem with the Pi 4 control plane
nodes was IO, this made me hopeful that a Pi 5 would be able to cope.</p>
<p>I also made the decision of buying the <a href="https://www.raspberrypi.com/products/raspberry-pi-5/">Pi 5</a>
in the 8 GB variant, as opposed to the 4 GB variant Pi 4 forming my control plane
before. I don&rsquo;t really see a need for the increased RAM right now, there was
still plenty of free RAM on the 4 GB models. But I wanted to invest in a bit of
future proving here.</p>
<p>For the cooling I wanted to go passive again. I had some very bad experience with
a Pi 4 case with an active fan when it was still said that the Pi 4 needed active
cooling, shortly after release. And with my rack sitting right next to my desk,
I want quiet. I bought <a href="https://www.berrybase.de/en/armor-gehaeuse-fuer-raspberry-pi-5-schwarz">this case</a>.
It&rsquo;s very similar to the passive heat sinks I&rsquo;ve been using for the Pi 4.</p>
<p>All the article links in this post will go to <a href="https://www.berrybase.de/en/">berrybase.de</a>,
as that was where I bought the equipment. It&rsquo;s mostly in German, but I&rsquo;m reasonably
sure that you could find the same stuff in many other places.</p>
<p>With cooling covered, I next went hunting for a way to fasten the SSD. A
traditional Pi HAT was off the table, due to the use of the large heat sink.
But after some searching, I found some good reviews of Pimoroni&rsquo;s <a href="https://shop.pimoroni.com/products/nvme-base?variant=41219587178579">NVMe base</a>.
Pimoroni is a pretty trustworthy brand, and they provided some compatibility info
on their page. Plus, they were available in the berrybase shop.</p>
<p>I then had a closer look at Pimoroni&rsquo;s compatibility section for NVMe SSDs,
and finally settled on the <a href="https://europe.kioxia.com/en-europe/personal/ssd/exceria-g2.html">Kioxia Exceria G2</a>.
It was on the compatibility list, was relatively affordable, from a trusted brand
and available at my trusted IT hardware retailer. I bought four of them, three
500 GB models for the new control plane and one 1 TB model, for some future
experiments.</p>
<p>Last but not least, I also had to buy a couple of mounting plates for my
<a href="https://racknex.com/raspberry-pi-rackmount-kit-12x-slot-19-inch-um-sbc-207/">Racknex Pi rack mount</a>.</p>
<p>Overall, this is what one of the new Pis cost me:</p>
<table>
  <thead>
      <tr>
          <th>Item</th>
          <th>Cost</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Raspberry Pi 5 8 GB</td>
          <td>€84,90</td>
      </tr>
      <tr>
          <td>Armor Heat Sink</td>
          <td>€9,90</td>
      </tr>
      <tr>
          <td>Power Supply</td>
          <td>€12,40</td>
      </tr>
      <tr>
          <td>NVMe Base</td>
          <td>€16,50</td>
      </tr>
      <tr>
          <td>500 GB Kioxia NVMe SSD</td>
          <td>€32,90</td>
      </tr>
      <tr>
          <td>Mounting Plate</td>
          <td>€10,80</td>
      </tr>
      <tr>
          <td>&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;</td>
          <td>&mdash;&mdash;&mdash;</td>
      </tr>
      <tr>
          <td>Total</td>
          <td>€167,40</td>
      </tr>
  </tbody>
</table>
<h3 id="construction">Construction</h3>
<p>With all of things arriving, I could get to my least favorite part of Homelabbing:
Hardware. That was a bit of a challenge in this project, mostly due to the
PCIe flat cable connecting the Pi and the NVMe base. Sadly, I only now realized
that I completely forgot to take pictures of the construction process. So this
is what one of the Pis looks like fully constructed:</p>
<figure>
    <img loading="lazy" src="pi5-finished.jpg"
         alt="A picture of a Raspberry Pi. The Pi 5 itself is covered with a black with a black aluminum heat sink which is about as high as the front IO connectors and covers the entire board, with some cutouts for access to connectors. At the back, a PCIe cable is going from the Pi down to the NVMe base mounted below the Pi. The cables flimsiness and shortness screams &#39;I am a pain to handle&#39;. The entire assembly is mounted onto a sturdy metal piece, with a front part angling up to about two Pis in height, with a large cutout for the Pi&#39;s IO in the front."/> <figcaption>
            <p>A finished Raspberry Pi with connected NVMe all mounted on a Racknex mounting plate. I will leave the tale of the installation of that very short PCIe cable at the back to your nightmares.</p>
        </figcaption>
</figure>

<p>That flat PCIe cable at the back was a bear to install. Getting it fitted to
the NVMe base was not a big problem. But then getting it fitted to the Pi, with
the PCIe base already connected was a nightmare. It was mostly that the cable is
extremely short, and you have to hold up the Pi awkwardly while somehow trying
to connect the cable to the NVMe base.
Pimoroni&rsquo;s install instructions were generally okay, but their proposed order was
to first connect the cable to the base and then connect the Pi side. I found this
entirely impossible. If you look very closely, the heat sink only has a small
cutout to put in the PCIe cable. Doing that while the NVMe base is already
connected proved impossible, at least at my level of dexterity, so I went the
other way around. That was still a pain. If I had bungled the job on one of the
Pis and had to reseat the cable, you might now be reading a post about my imminent
plan to move my entire Homelab to a few dedicated servers at Hetzner. &#x1f62c;</p>
<p>One important part to note: The M2.5 screws which come with the Pimoroni NVMe base
are long enough to connect the base, the Pi and the heat sink. But they turned
out too short to also fit the mounting plate. I had to order an additional set
of M2.5 x 20mm screws. Those were long enough to hold it all together.</p>
<p>Once deployed, this is what the three Pis looked like in the rack:</p>
<figure>
    <img loading="lazy" src="pi5-mounted.jpg"
         alt="A picture of a Racknex Pi mount in a 19 inch rack. There are twelve slots to mount Raspberry Pis, with 8 currently occupied. On the very left are two Pi 4, each occupying one slot. They are each covered by a large read heat sink. Each one is connected to a SATA SSD via a USB-to-SATA adapter. The SSDs are mounted vertically behind the Pis. On the right side, six slots are occupied with three Pi 5. Each of them has a network cable plugged in. They are covered with a black heat sink. There is definitely not a single speck of dust visible in the entire picture. Not one. You definitely cannot see the outline of three more SATA SSDs vertically mounted until recently behind the three Pi 5."/> <figcaption>
            <p>My three Pi 5 mounted in the Racknex mount on the right. The two Pi 4 on the left, connected to their SATA SSDs, are a similar setup as my control plane Pis had previously.</p>
        </figcaption>
</figure>

<p>Can we all agree on ignoring the fact that you can see where the SSDs for the
Pi 4 control nodes were mounted before? Thank you. &#x1f601;</p>
<h3 id="looking-closer-at-the-pi-5">Looking closer at the Pi 5</h3>
<p>Now that the hardware is build, let&rsquo;s take a closer look at the Pi 5. I have a
fourth Pi, with 16 GB of RAM and a 1 TB SSD, for some later project, and did
some initial testing with it. As with the rest of my Pi fleet, I&rsquo;m using Ubuntu
here, in version 24.04, which is the first one compatible with the Pi 5.</p>
<p>I used the <code>ubuntu-24.04.2-preinstalled-server-arm64+raspi.img.xz</code> image from
<a href="https://cdimage.ubuntu.com/ubuntu/releases/24.04.2/release/">the Ubuntu download page</a>.
But before putting it on a USB stick, I wanted to enable the PCIe Gen3 support.
This is not officially supported, but it worked immediately on all three of my
Pi 5 and I haven&rsquo;t had any issues in the week I&rsquo;ve now been running them.
I started by mounting the image locally:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>losetup -f --show -P ubuntu-24.04.2-preinstalled-server-arm64+raspi.img
</span></span><span style="display:flex;"><span>mount /dev/loop0p1 /mnt/raspi_boot/
</span></span><span style="display:flex;"><span>mount /dev/loop0p2 /mnt/raspi_root/
</span></span></code></pre></div><p>Then I enabled the Gen3 support by adding the following lines to <code>/boot/firmware/config.txt</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ini" data-lang="ini"><span style="display:flex;"><span><span style="color:#66d9ef">[pi5]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">dtparam</span><span style="color:#f92672">=</span><span style="color:#e6db74">pciex1</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">dtparam</span><span style="color:#f92672">=</span><span style="color:#e6db74">pciex1_gen=3</span>
</span></span></code></pre></div><p>Unmounting it all works like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>umount /mnt/raspi_boot/ /mnt/raspi_root/
</span></span><span style="display:flex;"><span>losetup -d /dev/loop0
</span></span></code></pre></div><p>And then I wrote it onto a USB stick with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>dd bs<span style="color:#f92672">=</span>4M <span style="color:#66d9ef">if</span><span style="color:#f92672">=</span>ubuntu-24.04.2-preinstalled-server-arm64+raspi.img of<span style="color:#f92672">=</span>/dev/YOUR_USB_STICK_HERE status<span style="color:#f92672">=</span>progress oflag<span style="color:#f92672">=</span>sync
</span></span></code></pre></div><p>The Pi immediately booted up - I saw it soliciting an IP from my DHCP server.
But I wasn&rsquo;t able to SSH in, because while SSH is enabled in the image, password
login is disabled for security reasons.</p>
<p>But I had come prepared. I&rsquo;ve been wanting to get myself a small screen for
debugging boot issues with my Pis for a long time, because I found connecting one
of my main monitors and switching the source around a bit tedious. I ended up
with <a href="https://www.berrybase.de/en/universal-5-0-display-mit-hdmi-vga-eingang-und-resisitivem-touchscreen">this screen</a>.
It&rsquo;s a bit overkill, because it&rsquo;s also a touch screen, but eh. That&rsquo;s how I
could set up the Pi like this:
<figure>
    <img loading="lazy" src="pi-with-screen.jpg"
         alt="Another picture of a Pi 5. The Pi itself looks similar to the other pictures. The important difference here is that it&#39;s sitting on a desk. It is connected to a relatively small TKL keyboard with a wonderful amount of rainbow puke going on. The center piece is a small 5 inch display. It is connected to both, a USB port and a HDMI port on the Pi. Squinting a bit, the text on the screen is legible, showing a terminal session with a download of an SSH public key and copying that key into the user&#39;s authorized_keys file."/> <figcaption>
            <p>A small 5 inch screen for my Pi experiments was a good idea.</p>
        </figcaption>
</figure>

The screen isn&rsquo;t really something to write home about, especially the viewing
angles are atrocious, but it did its job and allowed me to quickly copy my
SSH key and add it to the default user, <code>ubuntu</code>.</p>
<p>That finally done came the moment of truth: Would the NVMe SSD be visible? I was
feeling quite some dread at this moment. Mostly because the first thing I would
have to do for debugging was to try reseating that fiddly PCIe cable. But I
got lucky:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>root@ubuntu:/tmp/disk-mount# lsblk
</span></span><span style="display:flex;"><span>NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
</span></span><span style="display:flex;"><span>loop0     7:0    <span style="color:#ae81ff">0</span>  38.7M  <span style="color:#ae81ff">1</span> loop /snap/snapd/23546
</span></span><span style="display:flex;"><span>loop1     7:1    <span style="color:#ae81ff">0</span>  38.7M  <span style="color:#ae81ff">1</span> loop /snap/snapd/23772
</span></span><span style="display:flex;"><span>sda       8:0    <span style="color:#ae81ff">1</span>  57.3G  <span style="color:#ae81ff">0</span> disk
</span></span><span style="display:flex;"><span>├─sda1    8:1    <span style="color:#ae81ff">1</span>   512M  <span style="color:#ae81ff">0</span> part /boot/firmware
</span></span><span style="display:flex;"><span>└─sda2    8:2    <span style="color:#ae81ff">1</span>  56.8G  <span style="color:#ae81ff">0</span> part /
</span></span><span style="display:flex;"><span>nvme0n1 259:0    <span style="color:#ae81ff">0</span> 931.5G  <span style="color:#ae81ff">0</span> disk /tmp/disk-mount
</span></span></code></pre></div><p>The NVMe SSD was recognized! &#x1f389;</p>
<p>Next question: Was the Gen3 option working? First, I looked at the <code>dmesg</code> output
and found these encouraging lines:</p>
<pre tabindex="0"><code>[    2.123345] brcm-pcie 1000110000.pcie: Forcing gen 3
[    2.382834] pci 0000:01:00.0: 7.876 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
</code></pre><p>I went one step further and also checked <code>lspci</code>, because that could have also
been some other PCIe Gen 3 link:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>lspci -vv
</span></span><span style="display:flex;"><span>0000:01:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD <span style="color:#f92672">(</span>rev 01<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>prog-if <span style="color:#ae81ff">02</span> <span style="color:#f92672">[</span>NVM Express<span style="color:#f92672">])</span>
</span></span><span style="display:flex;"><span>	Subsystem: KIOXIA Corporation NVMe SSD
</span></span><span style="display:flex;"><span>	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
</span></span><span style="display:flex;"><span>	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL<span style="color:#f92672">=</span>fast &gt;TAbort- &lt;TAbort- &lt;MAbort- &gt;SERR- &lt;PERR- INTx-
</span></span><span style="display:flex;"><span>	Latency: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	Interrupt: pin A routed to IRQ <span style="color:#ae81ff">42</span>
</span></span><span style="display:flex;"><span>	Region 0: Memory at 1b00000000 <span style="color:#f92672">(</span>64-bit, non-prefetchable<span style="color:#f92672">)</span> <span style="color:#f92672">[</span>size<span style="color:#f92672">=</span>16K<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>80<span style="color:#f92672">]</span> Express <span style="color:#f92672">(</span>v2<span style="color:#f92672">)</span> Endpoint, MSI <span style="color:#ae81ff">00</span>
</span></span><span style="display:flex;"><span>		DevCap:	MaxPayload <span style="color:#ae81ff">256</span> bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
</span></span><span style="display:flex;"><span>			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
</span></span><span style="display:flex;"><span>		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
</span></span><span style="display:flex;"><span>			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
</span></span><span style="display:flex;"><span>			MaxPayload <span style="color:#ae81ff">256</span> bytes, MaxReadReq <span style="color:#ae81ff">512</span> bytes
</span></span><span style="display:flex;"><span>		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
</span></span><span style="display:flex;"><span>		LnkCap:	Port <span style="color:#75715e">#0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 &lt;64us</span>
</span></span><span style="display:flex;"><span>			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
</span></span><span style="display:flex;"><span>		LnkCtl:	ASPM Disabled; RCB <span style="color:#ae81ff">64</span> bytes, Disabled- CommClk+
</span></span><span style="display:flex;"><span>			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
</span></span><span style="display:flex;"><span>		LnkSta:	Speed 8GT/s, Width x1 <span style="color:#f92672">(</span>downgraded<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
</span></span><span style="display:flex;"><span>		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
</span></span><span style="display:flex;"><span>			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
</span></span><span style="display:flex;"><span>			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
</span></span><span style="display:flex;"><span>			 FRS- TPHComp- ExtTPHComp-
</span></span><span style="display:flex;"><span>			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
</span></span><span style="display:flex;"><span>		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
</span></span><span style="display:flex;"><span>			 AtomicOpsCtl: ReqEn-
</span></span><span style="display:flex;"><span>		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
</span></span><span style="display:flex;"><span>		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
</span></span><span style="display:flex;"><span>			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
</span></span><span style="display:flex;"><span>			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
</span></span><span style="display:flex;"><span>		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
</span></span><span style="display:flex;"><span>			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
</span></span><span style="display:flex;"><span>			 Retimer- 2Retimers- CrosslinkRes: unsupported
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>d0<span style="color:#f92672">]</span> MSI-X: Enable+ Count<span style="color:#f92672">=</span><span style="color:#ae81ff">9</span> Masked-
</span></span><span style="display:flex;"><span>		Vector table: BAR<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> offset<span style="color:#f92672">=</span><span style="color:#ae81ff">00002000</span>
</span></span><span style="display:flex;"><span>		PBA: BAR<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> offset<span style="color:#f92672">=</span><span style="color:#ae81ff">00003000</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>e0<span style="color:#f92672">]</span> MSI: Enable- Count<span style="color:#f92672">=</span>1/8 Maskable- 64bit+
</span></span><span style="display:flex;"><span>		Address: <span style="color:#ae81ff">0000000000000000</span>  Data: <span style="color:#ae81ff">0000</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>f8<span style="color:#f92672">]</span> Power Management version <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>		Flags: PMEClk- DSI- D1- D2- AuxCurrent<span style="color:#f92672">=</span>0mA PME<span style="color:#f92672">(</span>D0-,D1-,D2-,D3hot-,D3cold-<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		Status: D0 NoSoftRst+ PME-Enable- DSel<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> DScale<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> PME-
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">100</span> v1<span style="color:#f92672">]</span> Latency Tolerance Reporting
</span></span><span style="display:flex;"><span>		Max snoop latency: 0ns
</span></span><span style="display:flex;"><span>		Max no snoop latency: 0ns
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">110</span> v1<span style="color:#f92672">]</span> L1 PM Substates
</span></span><span style="display:flex;"><span>		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
</span></span><span style="display:flex;"><span>			  PortCommonModeRestoreTime<span style="color:#f92672">=</span>10us PortTPowerOnTime<span style="color:#f92672">=</span>60us
</span></span><span style="display:flex;"><span>		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
</span></span><span style="display:flex;"><span>			   T_CommonMode<span style="color:#f92672">=</span>0us LTR1.2_Threshold<span style="color:#f92672">=</span>76800ns
</span></span><span style="display:flex;"><span>		L1SubCtl2: T_PwrOn<span style="color:#f92672">=</span>60us
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">128</span> v1<span style="color:#f92672">]</span> Alternative Routing-ID Interpretation <span style="color:#f92672">(</span>ARI<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		ARICap:	MFVC- ACS-, Next Function: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>		ARICtl:	MFVC- ACS-, Function Group: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">200</span> v2<span style="color:#f92672">]</span> Advanced Error Reporting
</span></span><span style="display:flex;"><span>		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
</span></span><span style="display:flex;"><span>		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
</span></span><span style="display:flex;"><span>		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
</span></span><span style="display:flex;"><span>		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
</span></span><span style="display:flex;"><span>		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
</span></span><span style="display:flex;"><span>		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap+ ECRCChkEn-
</span></span><span style="display:flex;"><span>			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
</span></span><span style="display:flex;"><span>		HeaderLog: <span style="color:#ae81ff">00000001</span> 0000070f 0000001c 185194a3
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">300</span> v1<span style="color:#f92672">]</span> Secondary PCI Express
</span></span><span style="display:flex;"><span>		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
</span></span><span style="display:flex;"><span>		LaneErrStat: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	Kernel driver in use: nvme
</span></span><span style="display:flex;"><span>	Kernel modules: nvme
</span></span></code></pre></div><p>Okay, that&rsquo;s a lot. But it did show the expected value for the NVMe SSD, in particular
this line:</p>
<pre tabindex="0"><code>LnkSta:	Speed 8GT/s, Width x1 (downgraded)
</code></pre><p>So yay, PCIe Gen3 was working. And I got that same result on all four Pis. I
know I&rsquo;m repeating myself, but at that point I was so happy that I wouldn&rsquo;t need
to reseat that PCIe cable.</p>
<p>Next step was to have a look at the boot order. I thought I would need to
explicitly add the NVMe disk, but it turns out that the factory firmware already
had it in the boot order. I still went in and changed it, because by default the
NVMe was tried before USB boot. And I like it to be the other way around, so
that I could attach a USB stick if I bork the NVMe install in the future
and have it boot off of that first.</p>
<p>The bootloader on Pi sits in an EEPROM chip on the board, and it can be changed
with the <code>rpi-eeprom-config --edit</code> command. It looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>root@ubuntu:~# rpi-eeprom-config --edit
</span></span><span style="display:flex;"><span>Updating bootloader EEPROM
</span></span><span style="display:flex;"><span> image: /lib/firmware/raspberrypi/bootloader-2712/default/pieeprom-2023-12-06.bin
</span></span><span style="display:flex;"><span>config_src: blconfig device
</span></span><span style="display:flex;"><span>config: /tmp/tmplbyfxh81/boot.conf
</span></span><span style="display:flex;"><span><span style="color:#75715e">################################################################################</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>all<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>BOOT_UART<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>POWER_OFF_ON_HALT<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>BOOT_ORDER<span style="color:#f92672">=</span>0xf641
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">################################################################################</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>*** To cancel this update run <span style="color:#e6db74">&#39;sudo rpi-eeprom-update -r&#39;</span> ***
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>*** CREATED UPDATE /tmp/tmplbyfxh81/pieeprom.upd  ***
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>   WARNING: Installing an older bootloader version.
</span></span><span style="display:flex;"><span>            Update the rpi-eeprom package to fetch the latest bootloader images.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>   CURRENT: Mon Sep <span style="color:#ae81ff">23</span> 13:02:56 UTC <span style="color:#ae81ff">2024</span> <span style="color:#f92672">(</span>1727096576<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    UPDATE: Wed Dec  <span style="color:#ae81ff">6</span> 18:29:25 UTC <span style="color:#ae81ff">2023</span> <span style="color:#f92672">(</span>1701887365<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    BOOTFS: /boot/firmware
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&#39;/tmp/tmp.joXPbvsUuq&#39;</span> -&gt; <span style="color:#e6db74">&#39;/boot/firmware/pieeprom.upd&#39;</span>
</span></span><span style="display:flex;"><span>Copying recovery.bin to /boot/firmware <span style="color:#66d9ef">for</span> EEPROM update
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>EEPROM updates pending. Please reboot to apply the update.
</span></span><span style="display:flex;"><span>To cancel a pending update run <span style="color:#e6db74">&#34;sudo rpi-eeprom-update -r&#34;</span>.
</span></span></code></pre></div><p>The problem I ran into here was this line:</p>
<pre tabindex="0"><code>WARNING: Installing an older bootloader version.
         Update the rpi-eeprom package to fetch the latest bootloader images.

CURRENT: Mon Sep 23 13:02:56 UTC 2024 (1727096576)
 UPDATE: Wed Dec  6 18:29:25 UTC 2023 (1701887365)
</code></pre><p>The EEPROM version in Ubuntu, even the new Ubuntu 24.04 I was running, was too
old. And there was nothing newer available for the LTS release either. So I
installed <a href="https://launchpad.net/~waveform/+archive/ubuntu/eeprom">this PPA</a>.
After that, I got the same version in the EEPROM update as the factory firmware.</p>
<h3 id="testing-the-pi-5">Testing the Pi 5</h3>
<p>Next up were a couple of performance tests. I was particularly interested in the
IOPS of the NVMe Pi 5 versus the Pi 4 with a USB-attached SATA SSD, because I
think that the stability issues were mostly due to IO, not CPU performance.</p>
<p>I used <a href="https://github.com/axboe/fio">fio</a> to test the performance on the Pi 5
and Pi 4. On both, I used the following invocation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>fio --size<span style="color:#f92672">=</span>20M --rw<span style="color:#f92672">=</span>randrw --name<span style="color:#f92672">=</span>IOPS --bs<span style="color:#f92672">=</span>4k --direct<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> --filename<span style="color:#f92672">=</span>/tmp/disk/testfile --numjobs<span style="color:#f92672">=</span><span style="color:#ae81ff">4</span> --ioengine<span style="color:#f92672">=</span>libaio --iodepth<span style="color:#f92672">=</span><span style="color:#ae81ff">32</span> --refill_buffers --group_reporting --runtime<span style="color:#f92672">=</span><span style="color:#ae81ff">60</span> --time_based
</span></span></code></pre></div><p>So I did random read/write, with a block size of 4k, using direct IO, meaning
ignoring all FS caches, running for 60 second using the libaio engine. I also
ran four processes in parallel, as I figure that there would be more than one
writer with the machines serving as control plane nodes.</p>
<p>The full results for the Pi 5 look like this:</p>
<pre tabindex="0"><code>IOPS: (groupid=0, jobs=4): err= 0: pid=5241: Mon Apr 21 14:24:29 2025
  read: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec)
    slat (usec): min=2, max=3070, avg= 5.50, stdev= 6.14
    clat (usec): min=53, max=5266, avg=503.04, stdev=121.10
     lat (usec): min=58, max=5269, avg=508.54, stdev=121.22
    clat percentiles (usec):
     |  1.00th=[  302],  5.00th=[  347], 10.00th=[  371], 20.00th=[  408],
     | 30.00th=[  433], 40.00th=[  461], 50.00th=[  490], 60.00th=[  519],
     | 70.00th=[  553], 80.00th=[  594], 90.00th=[  652], 95.00th=[  701],
     | 99.00th=[  816], 99.50th=[  881], 99.90th=[ 1139], 99.95th=[ 1582],
     | 99.99th=[ 2999]
   bw (  KiB/s): min=582384, max=624383, per=100.00%, avg=604312.95, stdev=1736.39, samples=476
   iops        : min=145596, max=156095, avg=151077.84, stdev=434.11, samples=476
  write: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec); 0 zone resets
    slat (usec): min=3, max=3058, avg= 6.69, stdev= 7.72
    clat (usec): min=21, max=5212, avg=330.48, stdev=86.79
     lat (usec): min=28, max=5216, avg=337.17, stdev=87.20
    clat percentiles (usec):
     |  1.00th=[  208],  5.00th=[  245], 10.00th=[  262], 20.00th=[  281],
     | 30.00th=[  293], 40.00th=[  306], 50.00th=[  318], 60.00th=[  330],
     | 70.00th=[  343], 80.00th=[  367], 90.00th=[  420], 95.00th=[  474],
     | 99.00th=[  594], 99.50th=[  660], 99.90th=[  906], 99.95th=[ 1156],
     | 99.99th=[ 2933]
   bw (  KiB/s): min=584860, max=622736, per=100.00%, avg=603963.50, stdev=1777.65, samples=476
   iops        : min=146215, max=155684, avg=150990.54, stdev=444.42, samples=476
  lat (usec)   : 50=0.01%, 100=0.02%, 250=3.28%, 500=71.81%, 750=23.57%
  lat (usec)   : 1000=1.19%
  lat (msec)   : 2=0.10%, 4=0.03%, 10=0.01%
  cpu          : usr=17.90%, sys=49.51%, ctx=9396343, majf=0, minf=62
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, &gt;=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, &gt;=64=0.0%
     issued rwts: total=9052812,9047408,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec
  WRITE: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=9025140/9020061, sectors=72201120/72160584, merge=0/12, ticks=4333666/2703156, in_queue=7036831, util=100.00%
</code></pre><p>The bandwidth was 618 MB/s read and write:</p>
<pre tabindex="0"><code> READ: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec
WRITE: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec
</code></pre><p>This is a respectable result, considering that the max for PCIe Gen3 x1 is around
1 GB/s.
But more important for me are the IOPS. While the kube control plane is writing
at about 5-6 MB/s, even the USB-attached SATA SSDs shouldn&rsquo;t have had a problem
with that. And the IOPS were looking quite good:</p>
<pre tabindex="0"><code>read: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec)
 iops        : min=145596, max=156095, avg=151077.84, stdev=434.11, samples=476
write: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec); 0 zone resets
 iops        : min=146215, max=155684, avg=150990.54, stdev=444.42, samples=476
</code></pre><p>Both read and write reach over 145k IOPS. So let&rsquo;s look at the Pi 4 and its USB-attached
SATA SSD next:</p>
<pre tabindex="0"><code>IOPS: (groupid=0, jobs=4): err= 0: pid=27703: Mon Apr 21 16:33:26 2025
  read: IOPS=9989, BW=39.0MiB/s (40.9MB/s)(2341MiB/60002msec)
    slat (usec): min=14, max=63796, avg=182.32, stdev=1012.51
    clat (usec): min=229, max=160456, avg=5308.50, stdev=8919.53
     lat (usec): min=278, max=160518, avg=5490.82, stdev=9174.55
    clat percentiles (usec):
     |  1.00th=[  1123],  5.00th=[  1598], 10.00th=[  1958], 20.00th=[  2409],
     | 30.00th=[  2704], 40.00th=[  3032], 50.00th=[  3425], 60.00th=[  3884],
     | 70.00th=[  4490], 80.00th=[  5342], 90.00th=[  6915], 95.00th=[  9110],
     | 99.00th=[ 55837], 99.50th=[ 66323], 99.90th=[ 88605], 99.95th=[ 99091],
     | 99.99th=[116917]
   bw (  KiB/s): min= 3542, max=63975, per=99.82%, avg=39884.73, stdev=5709.77, samples=476
   iops        : min=  885, max=15993, avg=9970.55, stdev=1427.44, samples=476
  write: IOPS=10.0k, BW=39.1MiB/s (41.0MB/s)(2346MiB/60002msec); 0 zone resets
    slat (usec): min=15, max=53787, avg=187.38, stdev=1055.71
    clat (usec): min=703, max=184799, avg=7109.60, stdev=10493.22
     lat (usec): min=929, max=184878, avg=7296.98, stdev=10765.19
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    4],
     | 30.00th=[    5], 40.00th=[    5], 50.00th=[    5], 60.00th=[    6],
     | 70.00th=[    7], 80.00th=[    8], 90.00th=[    9], 95.00th=[   12],
     | 99.00th=[   65], 99.50th=[   81], 99.90th=[  108], 99.95th=[  117],
     | 99.99th=[  146]
   bw (  KiB/s): min= 3728, max=64184, per=99.79%, avg=39957.87, stdev=5722.84, samples=476
   iops        : min=  932, max=16046, avg=9988.86, stdev=1430.70, samples=476
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.03%, 1000=0.22%
  lat (msec)   : 2=5.41%, 4=40.14%, 10=48.37%, 20=2.36%, 50=1.65%
  lat (msec)   : 100=1.72%, 250=0.10%
  cpu          : usr=4.22%, sys=39.42%, ctx=342409, majf=0, minf=108
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, &gt;=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, &gt;=64=0.0%
     issued rwts: total=599380,600626,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=39.0MiB/s (40.9MB/s), 39.0MiB/s-39.0MiB/s (40.9MB/s-40.9MB/s), io=2341MiB (2455MB), run=60002-60002msec
  WRITE: bw=39.1MiB/s (41.0MB/s), 39.1MiB/s-39.1MiB/s (41.0MB/s-41.0MB/s), io=2346MiB (2460MB), run=60002-60002msec

Disk stats (read/write):
  sda: ios=592523/587906, sectors=4786048/4795680, merge=5733/11554, ticks=1431760/1951190, in_queue=3383201, util=70.66%
</code></pre><p>Well, yeah. The bandwidth doesn&rsquo;t get beyond 41 MB/s in read or write:</p>
<pre tabindex="0"><code> READ: bw=39.0MiB/s (40.9MB/s), 39.0MiB/s-39.0MiB/s (40.9MB/s-40.9MB/s), io=2341MiB (2455MB), run=60002-60002msec
WRITE: bw=39.1MiB/s (41.0MB/s), 39.1MiB/s-39.1MiB/s (41.0MB/s-41.0MB/s), io=2346MiB (2460MB), run=60002-60002msec
</code></pre><p>And the IOPS aren&rsquo;t looking any better:</p>
<pre tabindex="0"><code>  read: IOPS=9989, BW=39.0MiB/s (40.9MB/s)(2341MiB/60002msec)
   iops        : min=  885, max=15993, avg=9970.55, stdev=1427.44, samples=476
  write: IOPS=10.0k, BW=39.1MiB/s (41.0MB/s)(2346MiB/60002msec); 0 zone resets
   iops        : min=  932, max=16046, avg=9988.86, stdev=1430.70, samples=476
</code></pre><p>Again, yeah. Especially the <code>min</code> values are looking really bad - not even 1k IOPS?
And the averages just below 10k aren&rsquo;t exactly awe-inspiring. So the Pi 5 with
NVMe disks gave me an entire order of magnitude more IO - both for bandwidth and
for IOPS.</p>
<p>Next up, some temperature testing. I was worried in this area, because most Pi 5
cases seem to have an active cooler. But I really wanted the passive heat sink
to work. First, I observed that at idle, the Pi 5 already reached about 50 C.
Not a great sign. To put a bit of load on the machine, I started running
<code>stress -c4 -t 600</code> and watched the temps with <code>watch -n 5 cat /sys/class/thermal/thermal_zone0/temp</code>.
I also kept an eye on the CPU frequency to make sure the PI didn&rsquo;t downclock
with <code>watch -n 5 cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq</code>.
The good news is that no downclocking happened. But also, at the end of those
10 minutes, the temps were at a toasty 78 C. And it didn&rsquo;t look like the temps
were stable there, and if I had left it running for a bit longer it might have
gone higher.</p>
<p>Looking at the temps on my deployed Pis, I didn&rsquo;t need to worry: The temps of
all three, running the k8s control plane, are around 52 - 55 C.
One more piece to note is the NVMe temp. There&rsquo;s zero airflow over it. I don&rsquo;t
gather the NVMe temps in my metrics, but I did a couple of spot checks, and the
temps were around 65 C. Well within the SSD&rsquo;s spec, but also something I need
to keep a closer eye on in the future. If push comes to shove, I can mount a
couple of large Noctua fans behind the Pis, and that should be enough, even at
low RPMs.</p>
<p>This concluded the testing, and the last thing remaining was to verify that
my Ansible playbooks worked against a Pi 5 without changes. And they did. Both
my image creation with HashiCorp&rsquo;s <a href="https://developer.hashicorp.com/packer">Packer</a>
and my main deployment playbook worked without any change, and booting the test
Pi off of the NVMe worked out of the box. The only change I had to make was to
add the PCIe Gen3 config to the Raspberry Pi play. It&rsquo;s very nice to see how
little changes I needed.</p>
<h2 id="deploying-the-pis">Deploying the Pis</h2>
<p>For the deployment of the Pis, I&rsquo;d set myself a somewhat complicated goal: I
wanted to keep using the host names of my current control plane hosts. Which made
the initial install more bothersome. But I decided against taking down the
original nodes first, because I didn&rsquo;t want to leave the cluster with only two
CP nodes during the new host&rsquo;s install, especially considering the instability that already existed.</p>
<p>So I had roughly the following steps:</p>
<ol>
<li>Boot new Pi from USB</li>
<li>Adapt boot order to put NVMe behind USB</li>
<li>Add a temporary entry with a temporary name in static DHCP</li>
<li>Generate image, but again with temporary hostname</li>
<li>Install image onto the NVMe SSD and reboot</li>
<li>Run full Ubuntu update, set root PW</li>
<li>Run full deployment Ansible playbook</li>
<li>Drain the old control plane node</li>
<li>Remove the old CP node from the Kubernetes cluster with <code>kubeadm reset</code>
and <code>kubectl delete node foo</code></li>
<li>Shutdown both nodes</li>
<li>Deploy new HW and remove old Pi 4</li>
<li>Update DHCP entry of old CP node with new Pi 5 MAC and remove temporary entry</li>
<li>Boot new node</li>
<li>Go into Ansible, set node name for new node and re-run deployment playbook, which also sets the hostname</li>
<li>Reboot new node</li>
<li>Add new node to k8s cluster as control plane</li>
</ol>
<p>In contrast to previous attempts of mine to switch control plane hosts, this one
went off without a hitch.</p>
<p>And since that moment, I did not have any spurious restarts of any control plane
Pods anymore. Not a single one. So problem solved. By throwing better hardware
at it. &#x1f601;</p>
<p>But before I end this post, here&rsquo;s two more plots. This one shows the CPU utilization
of one of the Pi 4 control plane nodes during a random day:
<figure>
    <img loading="lazy" src="pi4-cpu.png"
         alt="A screenshot of a Grafana time series plot. It shows 24h worth of CPU utilization. The utilization is very stable, with the host about 25% utilization, safe for a couple IOWAIT spikes down to 40%."/> <figcaption>
            <p>CPU utilization of a Pi 4 control plane node.</p>
        </figcaption>
</figure>

And here is a 24h plot of the same node, only now running on a Pi 5:
<figure>
    <img loading="lazy" src="pi5-cpu.png"
         alt="A screenshot of a Grafana time series plot. It shows 24h worth of CPU utilization. As in the previous screenshot, the utilization is pretty stable overall at about 12%. The previous IOWAIT spikes are gone now, and there are only two spikes to about 20% utilization."/> <figcaption>
            <p>CPU utilization of a Pi 5 control plane node.</p>
        </figcaption>
</figure>
</p>
<p>These plots show the more powerful Pi 5 CPU. They also indicate that the IOPS
issue is gone, as the Pi 5 plot doesn&rsquo;t have any IOWAIT spikes anymore.</p>
<p>I would have also loved to show a power consumption plot, but honestly, I don&rsquo;t
see any changes after switching to the Pi 5.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This was a pretty nice project. It accomplished exactly what I had hoped, and
I didn&rsquo;t have any issues at all. Besides those PCIe cables. They almost drove my
entire Homelab into the arms of Hetzner.</p>
<p>Next up will be a post about migrating my Prometheus metrics storage to Thanos.</p>
<p>Re-reading the post and editing a bit, I should perhaps make the next project
a switch of my blog&rsquo;s theme. Those Grafana screenshots really are not very
readable. I need a theme which allows clicking on a figure and enlarging it.</p>
]]></content:encoded>
    </item>
    <item>
      <title>What&#39;s next after the K8s Migration?</title>
      <link>https://blog.mei-home.net/posts/whats-next-after-k8s-migration/</link>
      <pubDate>Tue, 29 Apr 2025 22:00:30 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/whats-next-after-k8s-migration/</guid>
      <description>A meandering stream of thought about future Homelab projects</description>
      <content:encoded><![CDATA[<p>Wherein I go over my future plans for the Homelab, now that the k8s migration
is finally done.</p>
<p>So it&rsquo;s done. The <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration</a> is finally
complete, and I can now get started with some other projects. Or, well, I can
once I&rsquo;ve updated my control plane Pis to Pi 5 with NVMe SSDs.</p>
<p>But what to do then? As it turns out, I&rsquo;ve got a very full backlog. I&rsquo;m decidedly
not in danger of boredom.</p>
<p>Without further ado, here is a meandering tour through my Homelab project list.</p>
<h2 id="baremetal-improvements">Baremetal improvements</h2>
<p>At the moment, all of the hosts in my Homelab are running baremetal, I do not
have any VMs. I&rsquo;ve got both, x86 hosts and a lot of Pis. I&rsquo;ve also got hosts with
and without local storage. And their management, especially the creation of new
hosts, is not great at the moment.</p>
<p>In short, I&rsquo;m taking the current Ubuntu LTS and adapt the image for every single
host. I&rsquo;m using HashiCorp&rsquo;s <a href="https://developer.hashicorp.com/packer">packer</a>
for the generation of the images. This generation varies wildly between x86
and Pi hosts. For the x86 hosts, Packer runs the full Ubuntu installer in a Qemu VM, just in
an automated way. For the Pis, I&rsquo;m taking the preinstalled Ubuntu Pi images.
In both cases I then apply an Ansible playbook which installs basic necessities,
especially my management user with its SSH key and some preconditions for
Ansible usage. Importantly, this playbook also configures the host name. So I
need to generate a fresh image for every new host, even though the only difference
is the Linux hostname config.</p>
<p>What happens next depends on whether the host is netbooted or not. For netbooters,
I put the new image onto a Ceph RBD to serve as the host&rsquo;s root disk and extract
the boot partition to the NFS share used to mount the boot content to the hosts
themselves and my netboot control host, which runs dnsmasq to make the files available
to the netbooters.
For hosts with local storage, the approach is to stick in a USB stick, boot
the host, mount an NFS share with the freshly generated image, and then <code>dd</code> that
imagine onto the local disk.</p>
<p>All of the above is not really great, to be honest. There&rsquo;s a number of manual
steps in there. The first thing I&rsquo;d like to somehow solve is the &ldquo;one image per host&rdquo;.
That&rsquo;s only there because I need to provide the hostname in the image so it gets
the right one at first boot.
But that shouldn&rsquo;t be necessary. I should really only need two images,
one for Pis and one for x86 machines. And then I should do all the necessary
config via <a href="https://cloud-init.io/">Cloud-init</a>. This might even include some
parts of the config I&rsquo;m currently doing via the Ansible playbook. For example,
the creation of my management/Ansible user should also be possible with cloud-init.</p>
<p>But even if that works, there&rsquo;s still the actual install of the image on the
host, be it netbooted or local storage. And for this, I&rsquo;ve been eyeing two
tools for quite a while: <a href="https://tinkerbell.org/">tinkerbell</a> and Canonical&rsquo;s
<a href="https://maas.io/">MaaS (Metal as a Service)</a>. Both of those offer some kind of
management for baremetal machines, making use of DHCP and netboot. With this,
I&rsquo;d like to move my hosts a bit more in the direction of cattle.</p>
<p>Both Tinkerbell and MaaS are capable of automatically installing baremetal machines.
There are two issues with that when it comes to my setup, both related to my netbooters.
First, the Raspberry Pis. Those have a &ldquo;bespoke&rdquo; netboot process, that doesn&rsquo;t
really follow any standards. But both MaaS and Tinkerbell rely on a certain
pre-boot environment. And the Raspberry Pi can work with both - <em>if</em> you have
an SD card to provide separate bootcode. In principle, you need to UEFI boot the
Pi. And that, as said, works fine. But from everything I&rsquo;ve read up to now,
that approach always requires an SD card or some other local storage to work.
It doesn&rsquo;t work via the Pi&rsquo;s netboot process.
But then again - there&rsquo;s a lot of open source stuff even way down in the Pi&rsquo;s
boot process. And I think it could be very interesting to get really deeply into
the weeds here and implement an adapted Pi firmware myself. That would be a
really large project, because I&rsquo;ve got pretty much no idea about development
that close to the metal. But it could be really interesting.</p>
<p>The second problem is with the root disk on my netbooting hosts. They&rsquo;re Ceph RBDs,
and I had to implement some special scripting in the initramfs to make those work
as root disks. And I doubt that whatever OS install mechanisms Tinkerbell and
MaaS support can actually handle Ceph RBDs as the install target. But that&rsquo;s
another thing to check. Both tools are open source, so perhaps I can hack
something together.</p>
<p>With all of this, I might even make some proper contributions to open source
again, if what I come up with is actually fit for wider consumption.</p>
<p>For all of this to work, I will also need some sort of separate host, because
I don&rsquo;t think running the tools responsible for host definitions and configuration
on the k8s cluster that the managed hosts run is a great idea.
For stuff like that which should not rely on any other services in the Homelab,
I&rsquo;ve already got a host, what I call my &ldquo;cluster master&rdquo;. It&rsquo;s a Pi 4 and
currently runs my PowerDNS server and dnsmasq for supporting the netbooting
hosts. But that&rsquo;s only a 4GB Pi 4, so I can&rsquo;t run too much on there. But with
MaaS or Tinkerbell, I&rsquo;d not like to run it just in Docker containers or even
baremetal. Instead, I&rsquo;d like to set up a small management k8s cluster. I will
use that to also test some of the lighter/smaller k8s distributions, like
k3s. It will definitely be a single node cluster.</p>
<p>In addition to running Tinkerbell or MaaS on that cluster, I&rsquo;d also like to get
into Cluster API and ultimately see how I like <a href="https://www.talos.dev/">Talos Linux</a>.
I initially bounced off of it due to not allowing SSH access to the host, but
now that the number of things actually running baremetal is even smaller than
before, I&rsquo;m warming to the idea. And it&rsquo;s supposedly supporting Pi 4 already,
and they&rsquo;re working on Pi 5 support, from what I understand. So that could be
really nice to look at.</p>
<p>Furthermore, I&rsquo;ve also been looking at GitOps for the cluster with Flux or Argo,
and both would need some sort of management cluster as well.</p>
<p>So I&rsquo;ve got a lot of things to work on for the baremetal/hardware side of the
Homelab. One important thing I also need to do is to look into whether I want
to continue with the Pi fleet. The advantage is that I&rsquo;m getting a lot of
physical hosts with a relatively small physical footprint and very low
electricity consumption.
But this also comes with downsides. One I&rsquo;ve already mentioned above: Pis don&rsquo;t
always do things the standard way. They also don&rsquo;t have much expandability.
The standard advise today is to get some small form factor thin client from
Dell, Lenovo or HP from the used market. They have a similar expandability
problem, but at least they&rsquo;d have a more standardized boot process, and I
wouldn&rsquo;t need to worry about whether a given OS supports them - they&rsquo;re just
UEFI machines. But they&rsquo;re also 10x the size of a Pi 4. They also aren&rsquo;t passively
cooled, like all my Pis are. And my rack is sitting right next to my desk in my
living room. If I go this route, I will have to look for models which support me
putting in some nice Noctua instead of using whatever is already in there.
Following this path, I&rsquo;d probably put LXD (or rather, Incus) on the machines and
run everything in VMs, again using Ceph RBDs as their root disks, so that the
internal NVMe would also need to support the underlying system.</p>
<p>I&rsquo;m honestly talking myself into going that route right now. What&rsquo;s enticing
me the most is honestly the return to something &ldquo;standard&rdquo;. That&rsquo;s really
tempting. Not having to think about whether my underlying machines can even
support what I want to do with them would be nice.</p>
<p>But then again: The Pi 4 are still good. Sure, I have to replace the control
plane Pi 4 with Pi 5, but the worker nodes are still trucking along. And I would
guess that they will keep working for my needs for another couple of years, at
minimum. Replacing them now, and just for the reason that I&rsquo;d like to have
something more standardized, would be a massive waste.</p>
<h2 id="networking">Networking</h2>
<p>This is a really big one. At the moment, my entire network is 1 GbE. I&rsquo;ve got
a couple of hosts with 2.5 GbE cards, but all of my network infra is still only
1 GbE. I&rsquo;d like to change that. There&rsquo;s really nothing to be done with the Pis,
they&rsquo;re going to stay at 1 GbE. But, and this is the main thing, I&rsquo;ve got my
OPNsense router and my Ceph hosts as well as my desktop. The Ceph hosts would
be nice to have with 2.5 GbE or even more, so they could supply several 1 GbE
connected devices at their full speed.</p>
<p>So some new hardware will be in order. Preferably, I&rsquo;d like to have a switch
with mostly 1GbE ports. Though I&rsquo;d also be happy with 1/2.5 GbE combo ports. But
for some reason, 1 GbE/2.5 GbE ports don&rsquo;t seem to be widespread? I still remember
that 10/100/1000 ports were definitely a thing for a long time. I will then instead
be looking for two switches, most likely. One with enough 1 GbE ports for my Pis
and enough highspeed uplinks to connect to a second 2.5 GbE switch, perhaps
even more. I&rsquo;ve still got a lot of free PCIe ports in my Ceph machines for a
faster network card.
I&rsquo;m currently eyeing MikroTik for all my future networking hardware needs,
mostly to buy from a EU manufacturer.</p>
<p>Another network appliance I&rsquo;d like to upgrade is my current router. It has more
than enough performance, and the advantage of being quiet. But it&rsquo;s also a mini
PC, with the accompanying lack of expandability. It only has 1 GbE ports, so
even if I upgrade the switches, the connection to the router would still be a
significant bottleneck. For a replacement, I&rsquo;d love something 1U I can mount
into my rack. The main issue is of course the &ldquo;1U&rdquo; wish - because that, almost
invariably, comes with fans. And small diameter fans at that. And as I&rsquo;ve said
above, the rack is sitting next to my desk, so a bit of quiet is appreciated.
I&rsquo;d like to look at the machines that OPNsense themselves offer, as the smaller
once are looking pretty sweet. Or I could go and see whether anyone has made a
1U machine which is passively cooled. I mean, even 1U in a rack should provide
enough volume to put in enough metal to cool something reasonable in a passive
setup. But yeah, without that, I will likely give a bit of money to OPNsense
for one of their HW offerings.
Hm, just while doing my final read through the post, I&rsquo;m thinking that I
might not necessarily need to replace my router HW. It has six 1 GbE ports.
And I&rsquo;m only using two of them at the moment. Why not just look into combining
them? Sure, it would take up more ports in my future switch, but that might be
acceptable, if it means I can keep using the HW for longer.</p>
<p>Then there is another big elephant in the room - IPv6. I&rsquo;m currently reading
<a href="https://nostarch.com/tcpip.htm">The TCP/IP Guide</a>, and honestly, IPv6 sounds
pretty interesting. And at this point at least, most hardware and software I&rsquo;m
using should be perfectly fine with it.</p>
<p>And finally, I&rsquo;d like to fix two issues I&rsquo;ve currently got with my networking
setup. The first one has to do with using Cilium&rsquo;s LoadBalancer support via BGP.
It allows me to use LoadBalancer functionality in my cluster, and it does so via
publishing routes to virtual IPs pointing at the hosts running the service.
There&rsquo;s just one issue with that: If anything in the Homelab subnet needs to
access one of those LoadBalancer services, I end up with asymmetric routing.
The packets coming from the requesting host go up to the router, because the
LoadBalancer IPs are all in a separate subnet, so they need routing from the
Homelab subnet. But when the Pods send answers, those are not routed back via
the same path. Those packets are send via the k8s host&rsquo;s interface, because
that&rsquo;s directly connected to the Homelab network.
The main issue this introduces is for the stateful firewall I&rsquo;ve got running
on the router. Here, it&rsquo;s problematic that the router only sees one piece of
the initial TCP connection, but not the other side. By default, pf does not
consider that a valid connection, so it will block packets trying to flow along
it.
I had to configure &ldquo;sloppy state&rdquo; for those firewall rules, which made it work,
but it&rsquo;s still not great, because the first few packets flowing along the
path still get blocked.</p>
<p>The second issue is about my external DNS. It is currently hosted with my domain
registrar, Strato. Which is fine, there&rsquo;s only one issue I have with Strato: It
doesn&rsquo;t offer any sort of API for its DNS, besides some DynDNS support. So
some things, like the DNS challenge to get a wildcard cert from Let&rsquo;s Encrypt,
need manual intervention. Whenever I need to get a new cert, I need to log into
the Web UI to change the TXT records with the new challenge values. And I&rsquo;d
like to fully automate that.
One option is <a href="https://donotsta.re/users/dns">ServFail</a>. A DNS network with a
bash based Web UI is right down my alley. But before I can do that, I will have
to fix my mail delivery, because I currently depend on Strato&rsquo;s mail package,
which in turn depends on your DNS being hosted by them - or you entering the
correct data into your own DNS server.</p>
<h2 id="mail">Mail</h2>
<p>Speaking of mail, that is another big one I&rsquo;d like to tackle at some point.
Even though it&rsquo;s currently pretty far down the list. I did buy Michael W Lucas'
<a href="https://www.tiltedwindmillpress.com/product/ryoms/">Run Your Own Mail Server</a>
a little while ago and plan to use it to set up my very own. Let&rsquo;s see whether
it&rsquo;s really as simple as some people claim.</p>
<p>One important thing I need to do first though: Organizing a static IP.</p>
<h2 id="remote-vps-as-an-entrypoint-to-the-homelab">Remote VPS as an entrypoint to the Homelab</h2>
<p>At the moment, the entire Homelab actually runs at home. The DNS for this
blog and other public things I host point to my Deutsche Telekom consumer
VDSL connection. This has been working fine for all these years, but some things
require a static IP. Especially the aforementioned self-hosted mail server.
I&rsquo;m reasonably sure any residential sender of mails will be blocked immediately.
I&rsquo;d then do the typical thing and create a WireGuard tunnel between that VPS and
my Homelab. One other thing I plan to use that VPS for is to get an outside
monitoring tool going, so I can actually get some indication of what&rsquo;s going
on when the Homelab completely crashes. Right now, my Gatus monitoring is running
in the k8s cluster it&rsquo;s monitoring. &#x1f605;</p>
<h2 id="ceph">Ceph</h2>
<p>Next up is Ceph. As I&rsquo;ve described in <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">one of my previous posts</a>,
one of my HDDs regularly displays an appallingly low IOPS value. I need to
figure out whether it&rsquo;s actually bad or whether there is something else wrong.
But for that, I need to understand Ceph better. A lot better. There was also
some weird behavior when I was moving around hosts after taking down the
baremetal Ceph cluster, where the mlock scheduler was not using a disk&rsquo;s full
capacity while backfilling.</p>
<p>All of these I&rsquo;d like to investigate. For this, I will likely have to actually
read up on the algorithms behind Ceph, including the papers on CRUSH for
example. And then I might even dig into the code, because while the Ceph docs
themselves are pretty good, I&rsquo;d like to really understand what&rsquo;s happening behind
the curtain.</p>
<p>Related to the IOPS issues, I&rsquo;m also considering adding a SATA SSD to all of my
Ceph hosts to put the WAL and RocksDB on it, at least for the HDD OSDs. That
should improve overall performance for operations on my HDD pool, by releasing
the pressure from having to handle the payload and metadata IO from a single
HDD. The main issue with that is that one of my storage hosts is an
<a href="https://www.hardkernel.com/shop/odroid-h4/">Odroid H4</a>, and that only has two
SATA power and data connectors, both already in use. So that one would need to
be replaced by something else.</p>
<p>Finally, one of the things which has been annoying me for a while is the fact
that I&rsquo;m currently hardcoding the IPs for my Ceph MON daemons in several places.
Most importantly, in the Ceph configs for my netbooting hosts. That has the
effect that I can&rsquo;t easily move the MONs around. But it now looks like Rook added
functionality to put the MONs behind Kubernetes services. This would allow me
to move them without having to constantly update the configs and reboot hosts.
I still couldn&rsquo;t have them move around freely, because they&rsquo;re using the local
disk to store their data, but still, not having to worry about their IPs would
be nice.</p>
<h2 id="monitoring">Monitoring</h2>
<p>My beloved graphs. There are going to be more of them. But first, I need to
deploy Thanos for my Prometheus instance. Because that&rsquo;s currently got a 250GiB
persistent volume. And I will need to increase the size of that volume again
this week, as it&rsquo;s currently at 94% full again. And no, I will not be contemplating
reducing my retention period below five years, thank you very much. &#x1f605;
Thanos will allow the Prometheus TSDB to consume the entirety of my HDD pool,
and I will finally be free from needing to regularly increase the size.</p>
<p>Once that&rsquo;s accomplished, I want to get into gathering metrics from apps. Right
now, I&rsquo;m only gathering host metrics as well as Ceph and k8s metrics, but that&rsquo;s
pretty much it. But there&rsquo;s a lot of apps running in the Homelab which also provide
metrics, and I&rsquo;d like to gather those too. And make pretty graphs of them. &#x1f913;</p>
<p>One big one I&rsquo;d like to tackle is my blog. Right now, I&rsquo;ve got zero metrics there,
besides the number of requests hitting it as part of my generic web server
metrics gathering. But to be honest, I&rsquo;d like to know more. Purely for the
pretty graphs. I don&rsquo;t want to track anyone or anything like that. Just some
basic &ldquo;How often did this article get clicked&rdquo; graphs. So I might just go
with some log analysis. But I&rsquo;ve also been eyeing something like
<a href="https://plausible.io/">Plausible</a>. It&rsquo;s just because I really like a good
dashboard. &#x1f913;</p>
<p>And then there&rsquo;s the big elephant in my monitoring room. At the moment, I&rsquo;m
mostly seeing any issues when I&rsquo;m actively looking at my Homelab dashboard.
Or, you know, when I suddenly hear fans ramping up or HDDs start rattling like
mad. &#x1f605;
I&rsquo;d like to change that with some proper alerting. Perhaps even including
push notifications to <em>waves arms</em> somewhere. At least for the most important
stuff like SMART issues on my disks.</p>
<p>And finally, I&rsquo;ve been thinking about a public dashboard. Much reduced compared
to what I&rsquo;ve got internally, but perhaps just something like Pod CPU usage,
overall memory usage and stuff like that? I&rsquo;m wondering whether other Homelabbers
would be interested in something along those lines.</p>
<h2 id="the-k8s-cluster">The k8s cluster</h2>
<p>Only two short points here. One, I&rsquo;d like to get back into GitOps with Flux or
Argo. I explored it a bit in the past, but the fact that I&rsquo;d basically need
another cluster, or at least a separate Git forge/CI system put me off. But
with the plan to run a single-node management cluster in the future, it might
be interesting to look at this again.</p>
<p>Second, I&rsquo;d like to get something like <a href="https://github.com/renovatebot/renovate">renovate</a>
going for my k8s apps. Just so I can have a list of updates with links and
everything when Homelab Service Friday rolls around.</p>
<h2 id="backups">Backups</h2>
<p>And last, as they so often are, backups. Here, again, I&rsquo;d like to improve my
metrics. Restic can produce quite a lot of them, and I&rsquo;d like to gather those.
Again, mostly because I like pretty graphs. &#x1f913;
I even started implementing something a while ago, but never finished it. It&rsquo;s
a nice combination of implementing something in Python and Homelabbing, because
I&rsquo;ll likely use the Prometheus Push Gateway.</p>
<p>The biggest issue with my backups at the moment is the complete lack of off-site
backups. My backups currently consist of a battery of S3 buckets on my Ceph
cluster, each of which holds the restic backup repository for one of my services.
Then there&rsquo;s a large external HDD onto which I rclone the most important ones
daily. The biggest problem here is that said external HDD is sitting on the
top shelf of the rack that also holds the other servers in the Homelab. So
should anything physically happen to my Homelab, that second backup location
is also going to be gone.</p>
<p>My idea is to pretty much take the content of the HDD and sync it to a Hetzner
StorageBox. Or potentially make the backups a bit more independent and sync
the important S3 buckets to Hetzner directly, so the external HDD and the off-site
backups are a bit more independent.</p>
<h2 id="what-will-i-actually-do-next">What will I actually do next?</h2>
<p>I hope you enjoyed this tour through my Homelab backlog. It was pretty nice to
write a &ldquo;stream of conscience&rdquo; post like this, as compared to my normal
tutorial/here-is-what-I-did-and-why posts.</p>
<p>The last remaining question: What will I actually do next in the Homelab?
First step is going to be deploying the three Pi 5 with NVMe currently still
strewn all over my table. Once that&rsquo;s done, next will very likely be the Thanos
deployment. I&rsquo;m getting a bit tired of regularly increasing the Prometheus PVC&rsquo;s size.</p>
<p>And then, the next big project will very likely be the baremetal deployment
enhancement. I got myself a bit excited while writing about digging into the
Pi bootloader and trying to get it to chainload an iPXE.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Final: It&#39;s done</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-final/</link>
      <pubDate>Thu, 24 Apr 2025 13:40:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-final/</guid>
      <description>The migration is complete.</description>
      <content:encoded><![CDATA[<p>Wherein I try to draw a conclusion about my migration to k8s.</p>
<p>This is the final part of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>After a total of 26 posts, this will be the last one in the migration series.
On the evening of April 13th, after one year, three months and 26 days, I set the
final task of my k8s migration plan to &ldquo;Done&rdquo;. I made the first commits for the
migration on December 19th 2023, shortly after starting my Christmas vacation
that year. It was the addition of the first VMs, for the control plane nodes.
I already did some experimentation in November, but I don&rsquo;t count that as time
spend for the migration.</p>
<p>Overall, I had defined 864 tasks for the migration, most of them during the
initial planning phase.</p>
<p>Apropos planning phase: How did that turn out? In my <a href="https://blog.mei-home.net/posts/k8s-migration-0-plan/">first migration post</a>, I laid out in detail how I planned to proceed. And for the most part,
I did follow that plan. The one thing I did not foresee was that k8s does not have
a combined CronJob+DaemonSet kind of workload, meaning a run-to-completion workload
that can be started on a schedule with an instance running on every machine.
That was what I was doing with my backups in Nomad. But it wasn&rsquo;t possible
in Kubernetes. This lead me to the decision to put the migration on hold and
implement my very own Kubernetes operator for orchestrating my backups.
Besides that sidetracking, most things went according to the plan. Besides the very
last step, <a href="https://blog.mei-home.net/posts/k8s-migration-25-controller-migration/">migrating the controllers</a>.
In short, the Pi 4 with USB-attached SSDs were too slow to handle the control
plane. This will be remedied with some Pi 5 with attached NVMe SSDs next week,
but I didn&rsquo;t see any reason to postpone this post.</p>
<h2 id="nomad-vs-k8s">Nomad VS k8s</h2>
<p>Let&rsquo;s take a closer look at Nomad vs k8s. Starting with the number of allocations
I had in the Nomad cluster vs the number of Pods I have in the k8s cluster.
The two are not exactly comparable, but at least approximately.</p>
<p>In the Nomad cluster, shortly before starting the migration in December,
I had <strong>57</strong> allocations. While I currently have <strong>193</strong> running Pods in the
k8s cluster. This is of course partially because I&rsquo;m running more things in the
k8s cluster than I ran in the Nomad cluster. For example, each host already has
one more Pod than the Nomad hosts had allocations due to the Cilium Pod.</p>
<p>One big topic I&rsquo;d like to call out is the comparative maturity of the ecosystems,
meaning &ldquo;how much ready-made stuff is available?&rdquo;. For sure, the comparison is
slightly unfair - Kubernetes runs on all of the large cloud providers, it is
a full Linux foundation project, is used in many public and private clouds.
Nomad, on the other hand, is only supported and mainly developed by a single
company. As far as I&rsquo;m aware, there is no public offering for running Nomad
as a service for customers to run their own workloads on it. It&rsquo;s used in private
cloud deployments only, for the most part.</p>
<p>Take, as an example, Ceph. With <a href="https://rook.io/">Rook Ceph</a>, there is a very
good package for deploying a Ceph cluster into a Kubernetes cluster. There is no
such way in Nomad, at least to my knowledge. You can still deploy Ceph baremetal
and then use the official Ceph CSI driver in Nomad to control volumes, of course.
But that&rsquo;s not the same as a good piece of software allowing me to run the entire
cluster inside Nomad.</p>
<p>Then there&rsquo;s just the sheer generic support for tools of all stripes. For example
stuff like external-secrets or external-dns. Sure, Nomad has direct, and very good
support for Vault. But that&rsquo;s not even close to the level of support for secret
providers that external-secrets provides.</p>
<p>And finally, there&rsquo;s Helm. Again as far as I know, Nomad doesn&rsquo;t have anything
similar that&rsquo;s equally widely used. At the beginning, I was a bit hesitant
to use them. Instead I wanted to write all of them myself. I relented pretty
quickly, at least of the Helm charts which were provided by the projects
themselves. So I&rsquo;m fine with using e.g. Gitea&rsquo;s chart, because it was at least
supported by the Gitea project. But I wouldn&rsquo;t use a Gitea chart from a third
party, because the project itself will make its release announcements for the
methods they officially support, not for third party Helm charts. So for each
tool, I would have to read two release notes - the ones from the app itself,
and the one for the Helm chart. Sure, I also need to do that for first party
Charts, but at least there I can be reasonably sure that they got all the necessary
adaptions correct.
In Nomad, on the other hand, I wrote every single job and volume file myself.
This definitely fostered a better understanding of both, the app I wanted to
deploy and Nomad, but it does get a bit repetitive at some point.</p>
<h2 id="ceph-rook">Ceph Rook</h2>
<p>I would like to concentrate a bit on Rook Ceph here. One thing I would like to
highlight is that it worked really nicely for me, and I was able to reason
pretty well about what the operator would do - for the most part. See the malheur
with the controller migration for an example of how I completely screwed up
and almost lost my storage cluster.</p>
<p>But what I&rsquo;m still not certain about: Would I have been quite as comfortable with
Rook Ceph if I hadn&rsquo;t been running Ceph baremetal for a couple of years beforehand?
I have been brooding about this question since I added the point to the notes
for this blog post. But I got nowhere. I&rsquo;d like to be the kind of person who
can spew forth some nugget of wisdom, but I&rsquo;m starting to get the feeling that
I don&rsquo;t really have that much to say about the migration&hellip;</p>
<p>One thing I did get surprised about was the sheer number of auxiliary Pods
Rook puts up. In total, the operator namespace runs 41 Pods in my cluster, and
the cluster namespace runs another 28. I actually ended up considerably reducing
the resource requests for several Pod types, because after setting up Rook,
I pretty much ran out of resources on my initial small cluster.</p>
<h2 id="resource-utilization">Resource utilization</h2>
<p>So what does the comparison of the resource consumption look like? I haven&rsquo;t
been able to come up with something general that makes sense - there&rsquo;s more things
running now, due to stuff like external-secrets or external-dns for example,
which Nomad simply did not have. Overall, I&rsquo;m happy to report that I&rsquo;ve now got
more resources available for workloads, due to the simple reason that I&rsquo;ve got
the Ceph hosts as part of the cluster as well. And that allows me to use any free
resources on them as well.</p>
<p>One thing we can look at is the control plane nodes, because those are basically
doing the same thing in both clusters.
Under Nomad, those nodes were running the control plane
for the cluster, meaning one of each of these:</p>
<ul>
<li>Nomad server</li>
<li>Consul server</li>
<li>Vault server</li>
<li>Ceph MON daemon</li>
</ul>
<p>And it&rsquo;s basically the same in the k8s cluster control plane:</p>
<ul>
<li>kube-apiserver</li>
<li>kube-controller-manager</li>
<li>kube-scheduler</li>
<li>kube-vip</li>
<li>Ceph MON daemon</li>
<li>Vault Pod</li>
</ul>
<p>Here are the CPU loads. As a reminder, the machine we&rsquo;re talking about here is
a Raspberry Pi 4 4GB, with a SATA SSD attached via USB.
The load on an average day in 2023, before any k8s migrations, looked like this:</p>
<p><figure>
    <img loading="lazy" src="cpu-control-plane-2023.png"
         alt="A screenshot of a Grafana time series plot. It shows the CPU usage in percent for the different CPU states on a Linux system. The system, over the entire day, shows about 88% idle. Further around 6 percent is system load, with the remaining 6% being user load. A couple of spikes for IOWAIT load down to about 40% utilization are visible."/> <figcaption>
            <p>CPU utilization by CPU state on one of my Pi 4 control plane nodes on an average day before the k8s migration.</p>
        </figcaption>
</figure>

As you can see, the load is somewhere around 88% idle, with only a few IOWAIT
spikes down to only 60% idle.
Next is the same host, but from yesterday, now running the k8s control plane:
<figure>
    <img loading="lazy" src="cpu-load-control-plane-2025.png"
         alt="A screenshot of a Grafana time series plot. It shows the CPU usage in percent for the different CPU states on a Linux system. The system, over the entire day, shows about 76% idle, again with a number of IOWAIT spikes, this time deep down to less than 30% idle CPU. The usage is again split equally between Sys and User state. But in contrast to the previous graph, there&#39;s now also a visible bit of about 1% to 2% IOWAIT during the entire day."/> <figcaption>
            <p>CPU utilization by CPU state on one of my Pi 4 control plane nodes on an average day after the k8s migration.</p>
        </figcaption>
</figure>

Here the difference becomes clear - the k8s control plane needs about 10% more
CPU in total. In addition, there&rsquo;s now a clearly visible, constant 1% to 2%
IOWAIT during the entire day.</p>
<p>I believe the majority of this difference is not due to the k8s control plane
being inherently less efficient. Instead I think it&rsquo;s entirely due to operators.
In the Nomad cluster, the only requests made to the control plane were the ones
kicked off by me entering some command, and the normal chatter between the cluster
servers and the clients on the workers.
But in the k8s cluster, I&rsquo;ve got a number of operators running which all use
the k8s API, and hence need to make apiserver requests and ultimately etcd requests.
Just off the top of my head:</p>
<ul>
<li>The Prometheus operator, probably running at least a watch on a number of resources</li>
<li>The Cilium operator and the Cilium per-node Pods, which definitely contribute
to the load</li>
<li>The Rook operator, which needs to keep track of all the Ceph daemon deployments
as well as PersistentVolumeClaims</li>
<li>Traefik, which has to keep tabs on Ingresses as well as its own resources</li>
<li>External DNS and external secrets</li>
<li>My own backup operator</li>
<li>CloudNativePG, again with a number of deployments and own CRDs it needs to keep
an eye on</li>
</ul>
<p>I believe that all of these taken together put quite some load on the apiserver,
and hence on etcd. And that in turn might be too much for the USB-attached SSDs
on my control plane nodes. In contrast, the Nomad/Consul servers did not get
this many requests all the time.</p>
<h2 id="going-incremental">Going incremental</h2>
<p>The decision to do the migration slowly, with some extra capacity to run the
two clusters side by side was an unquestionable positive. Sure, it cost a bit
more due to the increased electricity consumption, but I think it was worth it.</p>
<p>Going incrementally mostly afforded me one thing: The ability to do things
properly right from the start. It allowed me time to start with the Rook cluster,
instead of first migrating to k8s and then migrating the baremetal Ceph cluster
to Rook. It left me the time to write extensive notes and to write blog post
on any interesting pieces of the migration.</p>
<p>In addition, the experimental phase I did before even starting the migration
was also a good idea in hindsight. It allowed me to get some basic setup going,
especially exploring <a href="https://github.com/helmfile/helmfile">Helmfile</a>. I promise,
I will be writing a post about that at some point as well. &#x1f642;
One thing though: I wish I had dug a bit deeper into the backups. I did have the
backup setup on the agenda, but for some reason I saw the CronJob and decided
that that did everything I needed. I only realized that that didn&rsquo;t do what I
needed when I actually got to the implementation of the k8s backups. It would
have been nicer to write the backup operator up front, instead of in the middle
of the migration.
Because running two workload clusters - Nomad/Consul and baremetal Ceph/Rook Ceph - was not actually that much fun.</p>
<h2 id="advantages-gained">Advantages gained</h2>
<p>I&rsquo;ve gained quite some advantages for my Homelab from the migration to k8s,
appart from the original goal of moving away from HashiCorp&rsquo;s tooling.
The first thing I&rsquo;d like to mention is how much I enjoy Kubernetes as &ldquo;platform&rdquo;.
I&rsquo;ve now got a lot more things running on a common platform - Kubernetes - than
I had before. My individual hosts contain a lot less configuration. It&rsquo;s basically
just the kubelet now, where before I needed Nomad and Consul agents which needed
to be manually configured, including generating tokens for each individual host.</p>
<p>In that same vain, I also like the fact that both Vault and Ceph are now running
in Kubernetes instead of individually. Don&rsquo;t get me wrong, it doesn&rsquo;t reduce
the maintenance for both that much, but I still got to remove quite some Ansible
code.</p>
<p>Another big one was virtual IPs. With my Nomad cluster, I had an &ldquo;Ingress&rdquo; host
which ran things like FluentD and Traefik which machines from outside the cluster
needed to access. And that host was fixed, it had all the firewall configured
and so on. When that host was down, access to my Homelab services was down. But
back then, I didn&rsquo;t see any other way. Although I could probably have done
something with e.g. HAproxy or the like?
But with my k8s cluster, I no longer have that problem. I&rsquo;m using Cilium&rsquo;s
BGP LoadBalancer functionality to provide routes to my different services with
a virtual IP. So for example my Traefik ingress can now be deployed wherever,
and Cilium would update the routes when the host changed.</p>
<p>Another one in the &ldquo;quite nice&rdquo; category is that I finally got rid of Docker
in my Homelab. The daemon was just annoying me from time to time. For example
there was a memory leak in the FluentD logging driver for several months a couple
of years ago. I&rsquo;m now running cri-o as the CRI for
Kubernetes, and it just feels a lot better. One of the big advantages is that I
can configure pull-through caches not just for DockerHub, but any registry,
without having to muck around with image locations in manifests or Helm charts.</p>
<p>And the final advantage is that I&rsquo;ve now got more things which I can control
with versioned code. This is especially visible in Ceph. Here, I can now create
S3 buckets via the ObjectBucketClaim instead of doing it manually on the command
line. The same goes for example for Ceph users or even CephFS volumes. And
the Rook team is continually improving the Ceph API support too, for example
with the addition of bucket policies for the ObjectBucketClaim.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I had fun. That&rsquo;s really all there is to it, in the end, right? The best decision
of the entire migration was to make it so I could do it incrementally. I never
had any longer downtimes of any of the services in the Homelab I rely on. That
in turn meant that I could do it at my own pace. If I didn&rsquo;t feel like homelabbing
on a weekend, I didn&rsquo;t need to. The Homelab was always in a stable state I could
leave it at. It was interesting to dive into this new (to me) technology and
kick the tires, and I like what I ended up with.</p>
<p>The only two things which could have gone better was the backup situation for one,
and the performance/stability problems with the control plane for another.
It would have been more comfortable to have implemented the backup operator at
the beginning, instead of interrupting the migration for a couple of months.</p>
<p>So what&rsquo;s next? I will be starting another blog post right after this one where
I detail some of the larger ideas I&rsquo;ve got in mind. It would bloat this post a
bit too much to detail them here.</p>
<p>But short-term, I will work on replacing my control plane nodes with Pi 5 with
NVMe SSDs to hopefully fix the instability issues they&rsquo;re currently suffering
from. The last piece of hardware I was waiting for arrived today, and I will
likely get to it next week, as there&rsquo;s another long weekend in Germany. And then
I will get stuck into all the small and medium sized tasks that I&rsquo;ve been
postponing for the past 1.5 years. For example migrating to Forgejo from Gitea,
adding SSO support to some more services, cleaning up my log parsing and adding
some more services.</p>
<p>Finally, I&rsquo;ve greatly enjoyed accompanying the migration with this series of
blog posts. One thing I&rsquo;ve learned is that it is easier and more fun to write
a post about something when doing it right after the thing is done, instead of
putting the post on an ever-growing pile of posts to write at some point in
the future.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 25: Control Plane Migration</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-25-controller-migration/</link>
      <pubDate>Wed, 09 Apr 2025 23:47:45 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-25-controller-migration/</guid>
      <description>Migrating my control plane to my Pi 4 hosts.</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my control plane to the Raspberry Pi 4 nodes it is intended
to run on.</p>
<p>This is part 26 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>This one did not remotely go as well as I thought. Initially, I wasn&rsquo;t even
sure that this was going to be worth a blog post. But my own impatience and
the slowly aging Pi 4 conspired to ensure I&rsquo;ve got something to write about.</p>
<p>But let&rsquo;s start with where we are. This is very likely the penultimate post
of this series. By the time I&rsquo;m writing this, the migration is done:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>task proj:homelab.k8s.migration stats
</span></span><span style="display:flex;"><span>Category                   Data
</span></span><span style="display:flex;"><span>Pending                    <span style="color:#ae81ff">8</span>
</span></span><span style="display:flex;"><span>Waiting                    <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>Recurring                  <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>Completed                  <span style="color:#ae81ff">808</span>
</span></span><span style="display:flex;"><span>Deleted                    <span style="color:#ae81ff">48</span>
</span></span><span style="display:flex;"><span>Total                      <span style="color:#ae81ff">864</span>
</span></span></code></pre></div><p>Those last eight remaining tasks are just some cleanup. But tat the beginning
of the weekend, I still had one major thing to do: Migrating my control plane
from the three virtual machines it has been living on for over a year to the
three Raspberry Pi 4 4GB with attached SATA SSDs which have been serving as my
control plane before.</p>
<p>Control plane here means the k8s control plane, consisting of etcd, the
kube-apiserver, the kube-controller-manager and the kube-scheduler. In addition,
I had kube-vip running to provide a virtual IP for the k8s API. And the MONs of
my Rook Ceph cluster were running on there as well. And finally my Vault instances
are also assigned to those nodes.</p>
<p>While the kube control plane components probably don&rsquo;t need any explanation,
the other pieces do. Let&rsquo;s start with the Ceph MONs. Why put them here, instead
of on the Ceph nodes themselves? Mostly, habit, it was the setup I had previously.
Originally born from the thought that I might be running my Ceph nodes on Pi 4
as well. And on those hosts, memory would have been at a premium. I ended up not
going with that idea, but I still liked the thought of having control plane
nodes which run the servers/controller components of all my major services.
In the Nomad cluster setup, these nodes were running the Consul, Vault and
Nomad servers as well as the Ceph MONs. I liked that setup and decided to keep
it for the k8s setup. I couldn&rsquo;t run the MONs on any worker nodes, because none
of those have local storage. They all have their root disks on Ceph RBDs, which
means they could only run the MONs for that same Ceph cluster until the first
time they all went down at the same time. &#x1f609;</p>
<p>The reason for running Vault on the control plane nodes is one of convenience.
I&rsquo;ve got some automation for regular node updates. But my Vault instances need
manual unsealing. This means that after the reboot as part of the regular update,
I would need to manually unseal the instance on the host which was just updated.
This is fine in the current setup - the controllers are the first nodes to be
updated anyway, so I just need to pay attention right at the beginning of the
node update playbook. And after those nodes have been restarted and their Vault
instances have been unsealed, I can go and do something else.</p>
<p>So I needed to migrate the kube control plane and the MONs over to the Pis. I
would need to do the following steps:</p>
<ol>
<li>Setup Kubernetes on the three Pi 4</li>
<li>Join the three Pi 4 to the kube control plane</li>
<li>Add MONs on the three new nodes, for a total of 6 Ceph MONs</li>
<li>Add the new MONs to the MON lists and reboot everything</li>
<li>Remove the old control plane nodes</li>
</ol>
<p>The most complicated step here was the MON migration. That&rsquo;s due to the fact
that the MONs are generally configured via their IPs, so I had to change some
configuration. Specifically the configs outside the k8s cluster needed manual
adaption, and the most important config here was the MON list used by my
netbooting hosts to get their root disks. Just to make sure everything was okay,
I needed to reboot all netbooting hosts in my Homelab.</p>
<p>In preparation for the move, I fixed the MON deployments for Ceph to the existing
control plane nodes, to make sure that they were only migrated when I told them
to migrate:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp2&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp3&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span></code></pre></div><h2 id="migrating-the-k8s-control-plane">Migrating the k8s control plane</h2>
<p>This was reasonably easy to accomplish. I just needed to join the three Pi 4
into the cluster as control plane nodes.</p>
<p>But here I hit my first stumbling block. While most components - notionally
including the Cilium Pod - came up, the Fluentbit Pod for log collection did
not. Instead, on both hosts, it showed errors like these:</p>
<pre tabindex="0"><code>[2025/04/10 22:16:59] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2025/04/10 22:16:59] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2025/04/10 22:17:09] [error] [net] connection #60 timeout after 10 seconds to: kubernetes.default.svc:443
[2025/04/10 22:17:09] [error] [filter:kubernetes:kubernetes.0] kube api upstream connection error
[2025/04/10 22:17:09] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD fluentbit-fluent-bit-ls6tt
</code></pre><p>After some fruitless research, I found this line in the logs of the Cilium Pods
of the new control plane hosts:</p>
<pre tabindex="0"><code>Failed to initialize datapath, retrying later&#34; module=agent.datapath.orchestrator error=&#34;failed to delete xfrm policies on node configuration changed: protocol not supported&#34; retry-delay=10s
</code></pre><p>This brought me to the <a href="https://docs.cilium.io/en/stable/operations/system_requirements/#ubuntu-22-04-on-raspberry-pi">Cilium system requirements docs</a>.
And there it states pretty clearly that for the exact Ubuntu version I&rsquo;m running,
an additional package with kernel modules was needed. I didn&rsquo;t have any issues
with my Pi 4 worker nodes before, though. This was because those already had
the <code>linux-modules-extra-raspi</code> package installed, as that&rsquo;s needed for Ceph
support, and all of my worker nodes use Ceph RBDs for their root disks.
But the controller nodes never needed that, due to having local storage.</p>
<p>After installing the additional package, the new nodes worked properly. What I
found a bit disappointing was that the Cilium Pods did not show any indication
that anything was wrong, besides that single log line I showed above.</p>
<p>Another interesting sign that something was wrong was that I saw entries like
these in my firewall logs:</p>
<pre tabindex="0"><code>HomelabInterface		2025-04-11T00:34:59	300.300.300.4:39696	310.310.17.198:4240	tcp	Block all local access
HomelabInterface		2025-04-11T00:34:54	300.300.300.5:42022	310.310.19.209:2020	tcp	Block all local access
</code></pre><p>Which is odd, because the <code>310.310.0.0/16</code> CIDR subnet is my Pod subnet, and
those packets should really never show up at my firewall.</p>
<p>With that, my k8s control plane was up and running without further issue.</p>
<h2 id="how-not-to-migrate-mons">How not to migrate MONs</h2>
<p>Do not follow the steps in this section. I will speculate a bit on what I did
wrong, but I do not have another cluster to migrate to confirm what the right
way would be.</p>
<p><strong>This section is a cautionary tale, not a guide.</strong></p>
<p>So let&rsquo;s set the table. At the beginning of this, I had three MON daemons running
on the three old control plane nodes. Everything was fine. I planned to start
with replacing two old MONs with two new ones, leaving the one old MON available
to the netbooting hosts with their old configuration.</p>
<p>So I started out with just replacing two of the old nodes with two new ones in
the placement config for the MONs in the Rook Ceph cluster <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp2&#34;</span>
</span></span></code></pre></div><p>Deploying this did not work. This left the two MONs for <code>newcp1</code> and <code>newcp2</code>
in pending state, because the one remaining MON was too few. I then tried to
increase the number of MONs to five, with the three old nodes and the two new
ones. That brought my heart to a standstill with this message showing up in the
Rook operator&rsquo;s logs:</p>
<pre tabindex="0"><code>2025-04-12 10:31:16.256775 I | ceph-spec: ceph-object-store-user-controller: CephCluster &#34;k8s-rook&#34; found but skipping reconcile since ceph health is &amp;{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . [...]/src/mon/MonMap.h: In function &#39;void MonMap::add(const mon_i
nfo_t&amp;)&#39; thread 7f8ea6f2c640 time 2025-04-12T10:29:53.668780+0000
[...]/src/mon/MonMap.h: 221: FAILED ceph_assert(addr_mons.count(a) == 0)
</code></pre><p>Luckily for me, the operator checks the quorum before removing too many MONs,
and so the cluster was not broken. I fixed this by going back to my original
config, with three MONs placed on the three old control plane nodes. This
still did not bring back the cluster, still showing the above error. I fixed
this by editing the <code>rook-ceph-mon-endpoints</code> ConfigMap in the cluster namespace.
It&rsquo;s <code>data</code> key looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: <span style="color:#ae81ff">m=300.300.300.1:6789,k=300.300.300.2:6789,l=300.300.300.3:6789,n=300.300.300.4:6789,o=300.300.300.5:6789</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mapping</span>: <span style="color:#e6db74">&#39;{&#34;node&#34;:{&#34;k&#34;:{&#34;Name&#34;:&#34;oldcp1&#34;,&#34;Hostname&#34;:&#34;oldcp1&#34;,&#34;Address&#34;:&#34;300.300.300.1&#34;},&#34;l&#34;:{&#34;Name&#34;:&#34;oldcp2&#34;,&#34;Hostname&#34;:&#34;oldcp2&#34;,&#34;Address&#34;:&#34;300.300.300.2&#34;},&#34;m&#34;:{&#34;Name&#34;:&#34;oldcp3&#34;,&#34;Hostname&#34;:&#34;oldcp3&#34;,&#34;Address&#34;:&#34;300.300.300.3&#34;},&#34;n&#34;:{&#34;Name&#34;:&#34;newcp1&#34;,&#34;Hostname&#34;:&#34;newcp1&#34;,&#34;Address&#34;:&#34;300.300.300.4&#34;},&#34;o&#34;:{&#34;Name&#34;:&#34;newcp2&#34;,&#34;Hostname&#34;:&#34;newcp1&#34;,&#34;Address&#34;:&#34;300.300.300.5&#34;}}}&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">maxMonId</span>: <span style="color:#e6db74">&#34;12&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">outOfQuorum</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>This still had the new MONs in there, which did not work. After manually removing
the entries for the MONs <code>n</code> and <code>o</code>, which were the new ones and restarting the
operator, everything came up fine again with the original three MONs on the old
nodes.</p>
<p>So onto attempt Nr 2. Here I decided to go all in and immediately add all three
new nodes, instead of just two. That was because I realized that I could replace
all three MON addresses in the hardcoded netboot configs right away with the
new MONs if I just went straight to six MONs, the three old and three new ones.
This would save me one reboot for all netbooting cluster nodes.</p>
<p>So then I configured six MONs, and instead of replacing MONs in the placement
config, I just added the three new ones, so it now looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">count</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp2&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp3&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp2&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp3&#34;</span>
</span></span></code></pre></div><p>Applying this change worked without any issue whatsoever. The controller started
three new MON deployments, and they all came up without any problem.</p>
<p>I then changed the hardcoded MON IPs everywhere to the IPs of the three new
control plane nodes, and then rebooted the entire Homelab. Worked like a charm.</p>
<p>After that, the only thing remaining was to remove the old MONs. And here is
where the horror really started. I can&rsquo;t really put together what I did to create
this situation, so take the following paragraphs with a grain of salt. I hope
you can appreciate that I had other priorities than making good notes.</p>
<p>So I tried to get back to three MONs, but now running on the Pi 4 controller
nodes by going back to three MONs in the config and removing the three <code>oldcp</code>
nodes from the <code>nodeSelector</code>.</p>
<p>This seemed to lead the operator into an endless loop, because for some reason
it tried to stop one of the MON deployments on the new control plane nodes,
even though those were not supposed to be removed.</p>
<p>And here, I made my mistake. I got impatient, and manually deleted the
k8s Deployments of the MONs I did no longer need. Or I thought I no longer needed.
And when that did not really help, I edited the MON map ConfigMap again and
manually deleted the old MONs there as well.</p>
<p>The price for my impatience immediately showed up in the operator logs:</p>
<pre tabindex="0"><code>ceph-object-store-user-controller: CephCluster \&#34;k8s-rook\&#34; found but skipping reconcile since ceph health is &amp;{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] [...]}
</code></pre><p>The operator&rsquo;s attempts to even just get the cluster status timed out. I
confirmed that by trying to run <code>ceph -s</code>, to no avail. There were still three
MONs running. But no quorum anymore. I had just nuked my storage cluster.</p>
<p>Or so I thought. Looking at the logs of the still running MONs, I saw this line:</p>
<pre tabindex="0"><code>e15 handle_auth_request failed to assign global_id
</code></pre><p>For some reason I can&rsquo;t explain, I thought this might have to do with the old
MONs still being configured somewhere. Searching the web did not really deliver
any results. But I ended up on <a href="https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/">this Ceph docs page</a>.
It showed me a way to get the MON map when the Ceph client doesn&rsquo;t work anymore:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n rook-cluster rook-ceph-mon-n-f498b8448-dw55m -- bash
</span></span><span style="display:flex;"><span>ceph-conf --name mon.n --show-config-value admin_socket
</span></span><span style="display:flex;"><span>/var/run/ceph/ceph-mon.n.asok
</span></span></code></pre></div><p>With that information, I could dump the MON status info:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph --admin-daemon /var/run/ceph/ceph-mon.k.asok mon_status
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;mons&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;k&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.1:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.1:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;l&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.2:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.2:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;m&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.3:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.3:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;n&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.4:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.4:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;o&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.5:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.5:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;p&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.6:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.6:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>I&rsquo;ve removed a lot of additional information here, but the important part was:
Yes, the old MONs were still in the MON map. So how about updating the map so
it only contains the new MONs? Worth a try!</p>
<p>To do that, I needed to extract the actual MON map, in the correct format.
But that&rsquo;s, for some reason, only possible when the MON is stopped. But because
we&rsquo;re talking about Pods here, and not baremetal deployments, I couldn&rsquo;t just
stop a daemon and then access it still. So I looked closer at the error message:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph-mon -i n --extract-monmap /tmp/monmap
</span></span><span style="display:flex;"><span>2025-04-12T19:41:05.220+0000 ffffac4a5040 -1 rocksdb: IO error: While lock file: /var/lib/ceph/mon/ceph-n/store.db/LOCK: Resource temporarily unavailable
</span></span><span style="display:flex;"><span>2025-04-12T19:41:05.220+0000 ffffac4a5040 -1 error opening mon data directory at <span style="color:#e6db74">&#39;/var/lib/ceph/mon/ceph-n&#39;</span>: <span style="color:#f92672">(</span>22<span style="color:#f92672">)</span> Invalid argument
</span></span></code></pre></div><p>So, I thought: How much worse could it possibly get? And manually removed <code>/var/lib/ceph/mon/ceph-n/store.db/LOCK</code>.
The MON didn&rsquo;t seem to care and continued running, but now I had the MON map.</p>
<p>As I noted above: This is a cautionary tale. Not a how-to.</p>
<p>After I had the MON map, I needed to remove the three old MONs. For that, I was
able to use the <a href="https://docs.ceph.com/en/quincy/man/8/monmaptool/">monmaptool</a>:</p>
<pre tabindex="0"><code>monmaptool --rm k /tmp/monmap
monmaptool --rm l /tmp/monmap
monmaptool --rm m /tmp/monmap
</code></pre><p>And then I just needed to inject the MON map again, which I could do while the
MON was running:</p>
<pre tabindex="0"><code>ceph-mon -i n --inject-monmap /tmp/monmap
</code></pre><p>And then, after a restart of this utterly frankenstain&rsquo;ed MON&hellip;it came back up.
And it didn&rsquo;t throw the auth error anymore. And then one of the other running
MONs also came up again. And then the state check errors stopped in the operator
logs. And my <code>ceph -s</code> worked again. Much rejoicing was had. So much rejoicing.</p>
<p>I deleted the deployment of the last MON, as it wasn&rsquo;t willing to come up again.
And then the operator redeployed it, and it came up fine again.</p>
<p>Then I retreated to my fainting couch and contemplated my own hubris, stupidity
and especially impatience.</p>
<p>But it was done. Ceph being the battle-tested piece of software it is, there was
zero issue afterwards. The OSDs were almost happy the entire time and didn&rsquo;t
even need a restart.</p>
<p>If we could bottle the elation and relieve I felt when the first MON started
spewing its comfortably familiar log outputs again, we would have one hell of
a drug on our hands. Bottled euphoria, pretty much.</p>
<p>Restoring it all from a blank Ceph cluster would have been a hell of a lot of work.</p>
<h2 id="stability-problems">Stability problems</h2>
<p>So I now had my control plane running on three Raspberry Pi 4 4GB. I had tried
to make sure that those Pis would have enough resources by giving the VMs that
ran the control plane before only four cores and 4GB of RAM, to keep them at
least somewhat close to the Pis.</p>
<p>But that did not give me a realistic estimate on whether the Pis would be able
to run the control plane. On the morning after I had finished the migration,
I woke up to two of my Vault Pods requiring unsealing because they were restarted.
After some searching, I thought I had found the culprit with these error messages:</p>
<pre tabindex="0"><code>2025-04-13 10:16:28.000 &#34;This node is becoming a follower within the cluster&#34;
2025-04-13 10:16:28.000 &#34;lost leadership, restarting kube-vip&#34;
2025-04-13 10:16:27.372 &#34;1 leaderelection.go:285] failed to renew lease kube-system/plndr-cp-lock: timed out waiting for the condition
2025-04-13 10:16:27.371 &#34;1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: Get \&#34;https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock?timeout=10s\&#34;: context deadline exceeded (Client.Timeout exceeded while awaiting headers)&#34;
</code></pre><p>After some checking I realized that, for some reason, my Ansible playbook hadn&rsquo;t
deployed kube-vip to two of my three Pi control plane nodes.</p>
<p>But that, sadly, wasn&rsquo;t the real issue. In the following days, I regularly came
home to find one or more of my Vault Pods having been restarted during the night.</p>
<p>After some intense log reading, I think I identified the problem: Very regular
timeouts in etcd. That&rsquo;s Kubernetes&rsquo; distributed database, holding the cluster&rsquo;s
state. I very regularly get spurious leader elections where the three nodes
can&rsquo;t even agree on what term it is. That then ultimately leads to timed out
requests from the kube-apiserver and restarts of kube-apiserver, kube-controller-manager
and kube-scheduler. This isn&rsquo;t really that bad - they seem to be perfectly able
to come up again. But still.</p>
<p>It&rsquo;s also not a permanent situation. Yesterday, for example, I didn&rsquo;t have any
spurious restarts on any host for over 31 hours.
Here&rsquo;s an example of kube-apiserver complaining:</p>
<pre tabindex="0"><code>2025-04-17 11:18:28.292 status.go:71] apiserver received an error that is not an metav1.Status: &amp;errors.errorString{s:\&#34;http: Handler timeout\&#34;}: http: Handler timeout
2025-04-17 11:18:28.292 writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout
2025-04-17 11:18:28.291 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}: context deadline exceeded
</code></pre><p>And then there&rsquo;s etcd:</p>
<pre tabindex="0"><code>2025-04-17 11:18:30.903	slow fdatasync took=1.59687293s expected-duration=1s
2025-04-17 11:18:30.771	request stats start time=2025-04-17T09:18:28.771395Z time spent=2.000537375s remote=127.0.0.1:58000 response type=/etcdserverpb.KV/Range request count=0 request size=18 response count=0 response size=0 request content=&#34;key:\&#34;/registry/health\&#34;&#34;
2025-04-17 11:18:30.771	duration=2.000309284s start=2025-04-17T09:18:28.771470Z end=2025-04-17T09:18:30.771780Z steps=&#34;[\&#34;trace[1219274112] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 2.00005686s)\&#34;]&#34; step_count=1
2025-04-17 11:18:30.771	apply request took too long took=2.000065119s expected-duration=100ms prefix=&#34;read-only range &#34; request=&#34;key:\&#34;/registry/health\&#34; &#34; response= error=&#34;context canceled&#34;
2025-04-17 11:18:29.154	timed out sending read state timeout=1s
2025-04-17 11:18:28.750	request stats start time=2025-04-17T09:18:26.749079Z time spent=2.000959149s remote=127.0.0.1:57984 response type=/etcdserverpb.KV/Range request count=0 request size=18 response count=0 response size=0 request content=&#34;key:\&#34;/registry/health\&#34; &#34;
2025-04-17 11:18:28.749	duration=2.000699837s start=2025-04-17T09:18:26.749169Z end=2025-04-17T09:18:28.749869Z steps=&#34;[\&#34;trace[1722676343] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 2.00044495s)\&#34;]&#34; step_count=1
2025-04-17 11:18:28.749	apply request took too long took=2.000456339s expected-duration=100ms prefix=&#34;read-only range &#34; request=&#34;key:\&#34;/registry/health\&#34; &#34; response= error=&#34;context canceled&#34;
</code></pre><p>One really weird thing to note: The issues seem to always come at xx:18? The hour
at which they happen varies, but it&rsquo;s always around 18 minutes past the hour.</p>
<p>Just to illustrate how bad it sometimes gets, here are two attempts at getting
node agreement before linearized reads, taking 18 and 19 seconds:</p>
<pre tabindex="0"><code>2025-04-17 11:18:32.278 duration=19.273386011s start=2025-04-17T09:18:13.004696Z end=2025-04-17T09:18:32.278082Z steps=&#34;[\&#34;trace[416778461] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 19.242077797s)\&#34;]&#34; step_count=1
2025-04-17 11:18:32.278 duration=18.583085717s start=2025-04-17T09:18:13.694856Z end=2025-04-17T09:18:32.277941Z steps=&#34;[\&#34;trace[434574392] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 18.548892027s)\&#34;]&#34; step_count=1
</code></pre><p>On the positive side: The cluster itself didn&rsquo;t really seem fazed by the restarts.
It just trucked along. The only reason I had a problem with Vault was that I
like having the unseal key only be stored in my password manager, and not
automatically accessible.</p>
<p>But this is obviously not a permanent state. I have already found that running
<code>journalctl -ef</code> on one of the controller nodes pretty reliably brings down
at least one of the kube components. Updating a more complex Helm chart like
<a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>
also does the trick pretty reliably.</p>
<p>For once, I&rsquo;m decidedly not looking forward to the service updates I&rsquo;ve got
scheduled for tomorrow morning. Let&rsquo;s see how that goes.</p>
<p>But the remediation has already been put into action: I&rsquo;ve ordered three
Raspberry Pi 5 8GB plus 500 GB NVMe SSDs and NVMe hats for the Pis. I&rsquo;m assuming
that those will cope a hell of a lot better with the I/O load and tight latency
tolerances of the Kubernetes control plane.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>On Saturday night, after I had taken down the server which provided me with
some additional capacity during the migration, I felt most excellent. I was
starting to consider which Homelab project I would tackle next. There are so
many to choose from. I was rather disappointed when I was greeted by the downed
Vault Pods on Sunday morning.</p>
<p>But not to whine too much - this gave me an excellent reason to get started on
the Pi 5. Plus I also bought a 16 GB Pi 5 and a 1TB SSD. Those will serve me
well in some future ideas I&rsquo;ve got.</p>
<p>Plus, even though it&rsquo;s currently a bit unstable: I&rsquo;m done. &#x1f973;
The migration is done. I&rsquo;m now the proud owner of a Kubernetes cluster.</p>
<p>There is one more post to come in this series,
with my final thoughts and stats on the migration. But first, I want to migrate
the CP nodes to the Pi 5. But before that, I definitely want to upgrade the OS
in the Homelab from Ubuntu 22.04 to 24.04, because that&rsquo;s the first one with Pi 5
support, and I don&rsquo;t want to have multiple Ubuntu versions in the Homelab.</p>
<p>Now please excuse me while I go sharpen my Yak shaver.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Securing K8s Credentials</title>
      <link>https://blog.mei-home.net/posts/securing-k8s-credentials/</link>
      <pubDate>Mon, 07 Apr 2025 23:50:43 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/securing-k8s-credentials/</guid>
      <description>Using pass and gpg-agent to secure kubectl access credentials</description>
      <content:encoded><![CDATA[<p>Wherein I will explain how to use pass and GnuPG to secure k8s credentials.</p>
<p>Since I migrated my <a href="https://www.vaultproject.io/">HashiCorp Vault</a> instance
into my Kubernetes cluster, I started to feel a bit uncomfortable with the
Kubernetes access credentials just sitting in the <code>~/.kube/config</code> file in
plain text. Anyone who somehow gets access to my Command &amp; Control host would
be able to access them and do whatever they like with the Kubernetes cluster,
including the Vault deployment containing a lot of my secrets.</p>
<p>So I asked around on the Fediverse, and <a href="https://microblog.shivering-isles.com/@sheogorath">Sheogorath@shivering-isles.com</a>
came back with two interesting blog posts. <a href="https://shivering-isles.com/2024/11/kubernetes-oidc-keycloak">The first one</a>,
using OIDC, was interesting, but it would require some additional infrastructure
that would need to be up whenever I wanted to do something in Kubernetes. Which
would have also meant that I couldn&rsquo;t run that infrastructure in Kubernetes
itself.</p>
<p>But the <a href="https://shivering-isles.com/2022/03/store-kubernetes-credentials-pass">second post</a>
was very interesting, showing how to use <a href="https://www.passwordstore.org/">pass</a>
to store the k8s credentials.</p>
<p>I&rsquo;m already using pass as my password manager on my desktop and phone, so this
sounded like an excellent idea.</p>
<p>In short, pass is a pretty simple bash script which uses <a href="https://gnupg.org/">GnuPG</a>
do encrypt and decrypt files containing passwords, or really any data at all,
sitting in my home directory. The initial setup is a little bit
more involved due to needing GnuPG keys, but afterwards it&rsquo;s pretty easy to
use. Its main interface is a command line script with the possibility of
entering new passwords and showing existing ones, as well as moving passwords
around.
But there&rsquo;s also an Android app and a Firefox browser extension which both
work very nicely.</p>
<p>There was only one problem: I didn&rsquo;t want to set up a whole different set of
GnuPG keys to use on my Command &amp; Control host. After some searching, I figured
out that <a href="https://www.gnupg.org/documentation/manuals/gnupg/Invoking-GPG_002dAGENT.html">gpg-agent</a>
has some forwarding options, similar to ssh-agent. And I already had gpg-agent
running on my desktop.</p>
<p>Using a remote gpg-agent for access to the secret key also has an additional
advantage: Even if an attacker can get into my Command &amp; Control server, the
key necessary to decrypt the Kubernetes credentials is not physically present
on the machine. One more hurdle for an attacker to overcome.</p>
<h2 id="setting-up-gnupg-on-the-command--control-machine">Setting up GnuPG on the Command &amp; Control machine</h2>
<p>The first thing to do is to set up the public key of the secret key that will
later be used by pass to encrypt the Kubernetes credentials.
Note that only the public key is needed here - the private key stays on the
original machine, in my case my desktop computer.</p>
<p>First, list the keys on the original host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpg --list-public-keys
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>pub   rsa4096 2022-06-23 <span style="color:#f92672">[</span>SC<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>      3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F
</span></span><span style="display:flex;"><span>uid        <span style="color:#f92672">[</span> ultimativ <span style="color:#f92672">]</span> Foo Bar &lt;mail@example.com&gt;
</span></span><span style="display:flex;"><span>uid        <span style="color:#f92672">[</span> ultimativ <span style="color:#f92672">]</span> Baz Bar <span style="color:#f92672">(</span>Private<span style="color:#f92672">)</span> &lt;mail2@example.com&gt;
</span></span><span style="display:flex;"><span>sub   rsa4096 2022-06-23 <span style="color:#f92672">[</span>E<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>In this output, the important part is the keyhash in the line after the <code>pub</code>
line: <code>3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F</code>.
That&rsquo;s the identifier for the key.</p>
<p>Next, I needed to transfer the public key over to my Command &amp; Control host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpg --export 3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F | ssh myuser@candchost gpg --import
</span></span></code></pre></div><p>With that done, I could go ahead and set up the GnuPG agent forwarding. I followed
<a href="https://wiki.gnupg.org/AgentForwarding">this documentation</a> and did not have
any issues.</p>
<p>In short, I added these lines to the SSHD server configuration on the <code>candchost</code>:</p>
<pre tabindex="0"><code>Match User myuser
  StreamLocalBindUnlink yes
</code></pre><p>In addition, I also had to add these lines to my own SSH config for my user on
my desktop from where I&rsquo;m accessing the Command &amp; Control host, at <code>~/.ssh/config</code>:</p>
<pre tabindex="0"><code>Host candchost
  RemoteForward  /run/user/1000/gnupg/S.gpg-agent /run/user/1000/gnupg/S.gpg-agent.extra
</code></pre><p>As the documentation notes, the following commands can be used. For the
second path in the <code>RemoteForward</code> option, which is the local (on my desktop)
gpg-agent &ldquo;extra&rdquo; socket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpgconf --list-dir agent-extra-socket
</span></span></code></pre></div><p>And then to get the socket on the <code>candchost</code>, for the first argument of <code>RemoteForward</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpgconf --list-dir agent-socket
</span></span></code></pre></div><p>This is just the path of the standard GnuPG socket on that host.</p>
<p>And that&rsquo;s all there was to it. When I reconnected to the <code>candchost</code> via SSH, I
was able to use gpg-agent and got access to my remote agent on my desktop.</p>
<p>One last thing to do was to trust the public key transferred to the <code>candchost</code>.
This is only possible after the forwarding has been configured, because I didn&rsquo;t
have, and don&rsquo;t need, a private key to do any trusting with on the <code>candchost</code>.</p>
<p>Trusting a key works like this:</p>
<pre tabindex="0"><code>gpg --edit-key 3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F
Secret key is available.

[...]

gpg&gt; trust
[...]

Please decide how far you trust this user to correctly verify other users&#39; keys
(by looking at passports, checking fingerprints from different sources, etc.)

  1 = I don&#39;t know or won&#39;t say
  2 = I do NOT trust
  3 = I trust marginally
  4 = I trust fully
  5 = I trust ultimately
  m = back to the main menu

Your decision? 5
Do you really want to set this key to ultimate trust? (y/N) y

[...]
Please note that the shown key validity is not necessarily correct
unless you restart the program.

gpg&gt; q
</code></pre><p>This procedure uses the private key from the gpg-agent, meaning the key from
my desktop system, which was a nice confirmation that the forwarding setup
worked.</p>
<h2 id="setup-pass">Setup pass</h2>
<p>The next step is to setup pass. First, install it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt install --no-install-recommends --no-install-suggests pass
</span></span></code></pre></div><p>The <code>--no-install-suggests</code> and <code>--no-install-recommends</code> flags are very much
required, otherwise you&rsquo;re going to get pieces of X11 installed on an Ubuntu
system.</p>
<p>To initialize pass, the <code>init</code> command is used, with the public key&rsquo;s keyhash
used as input:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pass init 3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F
</span></span></code></pre></div><p>This creates the password store in the default location at <code>~/.password-store</code>.</p>
<h2 id="setup-kubernetes">Setup Kubernetes</h2>
<p>Following Sheogorath&rsquo;s <a href="https://shivering-isles.com/2022/03/store-kubernetes-credentials-pass">blog post</a>,
I first extracted the keys from the Kube config file with these commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl config view --minify --raw --output <span style="color:#e6db74">&#39;jsonpath={..user.client-certificate-data}&#39;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span> &gt; client-cert
</span></span><span style="display:flex;"><span>kubectl config view --minify --raw --output <span style="color:#e6db74">&#39;jsonpath={..user.client-key-data}&#39;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span> &gt; client-key
</span></span></code></pre></div><p>Then I added the values to an <a href="https://kubernetes.io/docs/reference/config-api/client-authentication.v1beta1/#client-authentication-k8s-io-v1beta1-ExecCredential">ExecCredential</a>
I stored in pass by running this command first:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pass edit k8s/credentials
</span></span></code></pre></div><p>This will open the editor in the <code>EDITOR</code> environment variable. Then I pasted
this into it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;client.authentication.k8s.io/v1&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;ExecCredential&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;clientCertificateData&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;clientKeyData&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I replaced the <code>clientCertificateData</code> with the content of the <code>client-cert</code>
file extracted with the previous command and the <code>clientKeyData</code> with the
content of the <code>client-key</code> file. Finally, the entire file content should be
squashed into a single line of text, and then the editor can be closed.</p>
<p>If everything worked as expected, pass has now stored that file content at
<code>~/.password-store/k8s/credentials</code>, encrypted with the public key given in the
<code>pass init</code> command. Try it out by running this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pass show k8s/credentials
</span></span></code></pre></div><p>If you haven&rsquo;t run any commands which require decryption up to now, a popup should
appear from your pinentry program asking you to unlock your GnuPG private key.
This will even appear when you&rsquo;ve previously unlocked that same private key
for local use on your desktop machine, as GnuPG treats the local and remote
machine as two different instances, for security reasons.</p>
<p>The final step is to adapt the <code>~/.kube/config</code> file to use the credentials from
pass. For that, I opened the file and edited it to look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">clusters</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certificate-authority-data</span>: <span style="color:#ae81ff">&lt;Cluster CA CERT&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#ae81ff">https://k8s.example.com:6443</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-kube-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">contexts</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">context</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cluster</span>: <span style="color:#ae81ff">my-kube-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">my-kube-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-kube-user@my-kube-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">current-context</span>: <span style="color:#ae81ff">my-kube-user@my-kube-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">preferences</span>: {}
</span></span><span style="display:flex;"><span><span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-kube-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">client.authentication.k8s.io/v1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">pass</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">show</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">k8s/credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interactiveMode</span>: <span style="color:#ae81ff">IfAvailable</span>
</span></span></code></pre></div><p>The only change necessary is in the <code>users</code> array, where the <code>user:</code> entry for
your user should be changed to contain the <code>exec</code> section shown, instead of the
<code>client-certificate-data</code> and <code>client-key-data</code> entries.</p>
<p>And with that, kubectl will execute the command <code>pass show k8s/credentials</code>
to access the credentials. And this doesn&rsquo;t just work for kubectl, but I&rsquo;ve also
tested it with the <a href="https://docs.ansible.com/ansible/latest/collections/kubernetes/core/docsite/kubernetes_scenarios/k8s_intro.html">Ansible k8s modules</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 24: Migrating Vault to Kubernetes</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-24-vault/</link>
      <pubDate>Mon, 07 Apr 2025 20:41:41 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-24-vault/</guid>
      <description>Migrating my baremetal Vault to the Kubernetes cluster.</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my HashiCorp Vault instance to the Kubernetes cluster.</p>
<p>This is part 25 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Look at all this Yak wool. That&rsquo;s how much it takes to migrate <a href="https://www.vaultproject.io/">Vault</a> from
baremetal to a Kubernetes deployment. I&rsquo;ve been going back and forth for quite
a while, trying to decide what to do with my Vault instance. It&rsquo;s the one piece
of HashiCorp software I do not currently plan to get rid of. But there was a
problem: My Vault, or rather the High Availability nature of it, relied on
HashiCorp&rsquo;s <a href="https://www.consul.io/">Consul</a> and its DNS service discovery
functionality. And while I did want to keep Vault, I did not want to keep
Consul. And I also didn&rsquo;t really want to introduce some other sort of method,
like <a href="https://www.haproxy.org/">HAProxy</a>.</p>
<p>In the end, I sat down and thought quite hard for quite a while, mostly
thinking about potential reasons for why I should not move Vault to the Kubernetes
cluster. My main worry is bootstrapping - what happens if my entire Homelab goes
down, unplanned, and all at once? Be it because I stumble over the absolutely
wrong cable, or because my oven develops a short again and throws the main fuse.
Could I still get my Homelab back up and do any massaging it might need?</p>
<p>I ended up deciding that Vault on Kubernetes should be fine. All Kubernetes
Secrets are synced into the cluster anyway, and any other secrets I might need
also live in my password manager. It should be fine. Watch this space for the
day I find out what I overlooked. &#x1f605;</p>
<p>And thus began the Yak shaving.</p>
<h2 id="vault">Vault</h2>
<p>But before we start onto that mountain of wool, let&rsquo;s take a short detour and
look at what Vault is and what I use it for. Brought down to the simplest terms,
HashiCorp&rsquo;s Vault is an API server for secrets of many, many different kinds.
It supports everything from simple key-value secrets to PKI certificates.
It can also serve short-lived tokens, including for HashiCorp&rsquo;s other products
like Consul or Nomad. I used it for a number of things over the years.</p>
<p>The most important part of it is the <a href="https://developer.hashicorp.com/vault/docs/secrets/kv">KV store</a>
for me. It stores all manner of passwords, keys and certificates, like my public
cert. And it makes all of those available, given proper authorization, over HTTP.
I use secrets from this store for my Ansible playbooks, the Mastodon secrets via
<a href="https://external-secrets.io/latest/">external-secrets</a> in my Kubernetes cluster
and in my image generation setup for new hosts as well.
Support for it is very widespread as well. In HashiCorp&rsquo;s own tools of course,
but also in other tools like Ansible, where you shouldn&rsquo;t confuse it with
Ansible&rsquo;s own Vault secret store.</p>
<p>In the past, I also used the <a href="https://developer.hashicorp.com/vault/docs/secrets/nomad">Nomad secrets engine</a>
to get a short-lived token for Nomad API access for my backup solution.</p>
<p>Another big use case for me is as an internal, self-signed CA. During my Nomad/Vault/Consul
cluster days, this was pretty important functionality, because those self-signed
certs were used by all three components of my Homelab to secure their HTTPS
communication. I&rsquo;ve even gone to the length of installing the CA on all of my
devices, so I don&rsquo;t get any untrusted certificate warnings when accessing
services secured with that CA.
Since the introduction of Kubernetes, I&rsquo;m not using the Homelab CA quite as
much, but there are still a few internal things secured with it.</p>
<p>For a short while, I even considered using Vault as my OIDC identity provider,
but in the end I decided against it. My main reason for that was that I would
have needed to hang my internal secret store into the public internet, because
I intended to use OIDC for some public sites. And even though I&rsquo;ve got no reason
to distrust HashiCorp&rsquo;s security practices, and I could have only made certain
paths publicly accessible, I decided against it.</p>
<p>So what does working with Vault actually look like? The main interface is the
Vault CLI executable. You can control anything you need from the command line.
But it also provides a WebUI, if that&rsquo;s more your cup of tea. I never bothered
with it.</p>
<p>The first step of working with Vault is to obtain a token for all further tasks.
For this, Vault offers <a href="https://developer.hashicorp.com/vault/docs/auth">a plethora</a>
of auth methods, ranging from the good old username/password to OIDC or TLS certs.
I&rsquo;m using the <a href="https://developer.hashicorp.com/vault/docs/auth/userpass">userpass</a>
method, which is just good old username+password. It&rsquo;s comfortable for me, I can
use my password manager and just copy+paste the password in. It looks something
like this:</p>
<pre tabindex="0"><code class="language-shel" data-lang="shel">vault login -method=userpass username=myuser
Password (will be hidden):
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run &#34;vault login&#34;
again. Future Vault requests will automatically use this token.

Key                    Value
---                    -----
token                  hvs.CAESII0RlV4BS_5_A2q8mIpzYxiye0XoE-_Vvlb0YIAYfl-6Gh4KHGh2cy5sSmpvZk5QMXN2QW0wZ0c0R1A3cXV3TkQ
token_accessor         5ofJhWq55yZGOk6CJVRyBacd
token_duration         4h
token_renewable        true
token_policies         [&#34;admin&#34; &#34;default&#34;]
identity_policies      []
policies               [&#34;admin&#34; &#34;default&#34;]
token_meta_username    myuser
</code></pre><p>Don&rsquo;t worry, this token has long since expired. &#x1f642;
When you use <code>vault login</code>, Vault automatically puts the received token into a
file in <code>~/.vault-token</code>. And the <code>vault</code> CLI as well as other things with Vault
integration check that path as well.</p>
<p>As you&rsquo;d expect from a properly secured application, the tokens you&rsquo;re getting
have a restricted TTL. How long a token is initially valid can be configured,
in addition to enabling token renewal and defining an upper bound on how long
a token can live under any circumstances.</p>
<p>Then there&rsquo;s also the policies. Those define what the holder of a token can
actually do with it. In this case, I&rsquo;m having the <code>default</code> and <code>admin</code> policies.
The <code>default</code> policy mostly allows the holder to access information about the
token they&rsquo;re using, while <code>admin</code> is my admin policy, allowing full access to
Vault. It looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/health&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;read&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Create and manage ACL policies broadly across Vault
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List existing policies
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/policies/acl&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;list&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Create and manage ACL policies
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/policies/acl/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Enable and manage authentication methods broadly across Vault
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Manage auth methods broadly across Vault
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;auth/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Create, update, and delete auth methods
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/auth/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List auth methods
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/auth&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;read&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Enable and manage the key/value secrets engine at `secret/` path
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List, create, update, and delete key/value secrets
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;secret/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Manage secrets engines
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/mounts/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Manage secrets engines
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/remount&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List existing secrets engines.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/mounts&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;read&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Homenet Root CA access
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;homenet-ca*&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Armed with this token, I can then for example take a look at my secrets:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault read secret/s3_users/blog
</span></span><span style="display:flex;"><span>Key                 Value
</span></span><span style="display:flex;"><span>---                 -----
</span></span><span style="display:flex;"><span>refresh_interval    768h
</span></span><span style="display:flex;"><span>access              abcde
</span></span><span style="display:flex;"><span>custom_metadata     map<span style="color:#f92672">[</span>managed-by:external-secrets<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>secret              <span style="color:#ae81ff">12345</span>
</span></span></code></pre></div><p>This is a pretty nice example, in fact. It shows that the <code>blog</code> secret consists
of two entries, <code>access</code> and <code>secret</code>, containing the standard S3 credentials.
But it also has <code>custom_metadata</code> indicating that it wasn&rsquo;t actually created
by me by hand, but was pushed into Vault via an external-secrets <a href="https://external-secrets.io/latest/api/pushsecret/">PushSecret</a>.
I&rsquo;m doing this because I need the S3 credentials for my blog in both, an Ansible
playbook I use to configure S3 buckets, and in the K8s cluster, because that&rsquo;s
where the bucket and credentials are created by Rook Ceph.</p>
<p>To put that same secret into Vault, the following command line could be used:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/s3_users/blog access<span style="color:#f92672">=</span>abcde secret<span style="color:#f92672">=</span><span style="color:#ae81ff">12345</span>
</span></span></code></pre></div><p>This would of course have the downside of putting the secret into the shell
history, unless a space is added at the front.
If you&rsquo;d prefer having Vault take the secret from stdin, you can run the same
command like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/s3_users/blog access<span style="color:#f92672">=</span>abcde secret<span style="color:#f92672">=</span>-
</span></span></code></pre></div><p>This will take the <code>access</code> key from the parameter, but for the <code>secret</code>, it
will ask you for the value, which keeps it out of the shell history.
But this approach also has a downside, because only one key can be used with
the <code>-</code> as input.
If you have more actually secret parameters, you can also put all of them into
a JSON file. I will demonstrate that later on when I migrate my Vault content
from my baremetal instance to the Kubernetes deployment.</p>
<p>If you want to use Vault values from within Ansible, I&rsquo;ve found the <a href="https://docs.ansible.com/ansible/latest/collections/community/hashi_vault/hashi_vault_lookup.html">Vault lookup</a>
pretty nice to use. It can be used  like this, to set a variable in a playbook:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Demonstration</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">demo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/s3_users/blog:access token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/s3_users/blog:secret token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m setting the <code>vault_token</code> with Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/file_lookup.html">file lookup</a> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">vault_token</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;file&#39;, &#39;/home/my_user/.vault-token&#39;) }}&#34;</span>
</span></span></code></pre></div><p>And because that file is automatically updated when the <code>vault login</code> command is
used, I&rsquo;m getting the current token automatically.</p>
<p>I will go into a bit more detail about generating certificates later as part
of the Vault k8s setup.</p>
<h2 id="setting-up-the-helm-chart">Setting up the Helm chart</h2>
<p>Alright. Let the Yak shaving finally commence. First of all, it&rsquo;s notable that there is
no official way to migrate the content of an instance to another instance. So I
had to go with setting up a completely new instance of Vault on k8s, instead of
doing some sort of migration.</p>
<p>So the first step was to configure and deploy the <a href="https://github.com/hashicorp/vault-helm">official Helm chart</a>,
following <a href="https://developer.hashicorp.com/vault/tutorials/kubernetes/kubernetes-raft-deployment-guide">this guide</a>.</p>
<p>And here is the result:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">global</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tlsDisable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">openshift</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serverTelemetry</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheusOperator</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">injector</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">debug</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">readinessProbe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/v1/sys/health?standbyok=true&amp;sealedcode=204&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/v1/sys/health?standbyok=true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">600</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/role</span>: <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">networkPolicy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;system-cluster-critical&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">active</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">standby</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;LoadBalancer&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#e6db74">&#34;Local&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">newvault.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#ae81ff">300.300.300.12</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">includeConfigAnnotation</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">auditStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dev</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraVolumes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vault-tls-certs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraEnvironmentVars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">VAULT_CACERT</span>: <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">standalone</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ha</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">raft</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">setNodeId</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">config</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        cluster_name = &#34;vault-k8s&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ui = false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        disable_mlock = false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        listener &#34;tcp&#34; {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          address = &#34;[::]:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          cluster_address = &#34;[::]:8201&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          tls_cert_file = &#34;/vault/userconfig/vault-tls-certs/certificate&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          tls_key_file  = &#34;/vault/userconfig/vault-tls-certs/private_key&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        storage &#34;raft&#34; {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          path = &#34;/vault/data&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          retry_join {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_api_addr = &#34;https://vault-0.vault-internal:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_ca_cert_file = &#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          retry_join {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_api_addr = &#34;https://vault-1.vault-internal:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_ca_cert_file = &#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          retry_join {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_api_addr = &#34;https://vault-2.vault-internal:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_ca_cert_file = &#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        service_registration &#34;kubernetes&#34; {}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ui</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">csi</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">serverTelemetry</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Let me explain. I&rsquo;m disabling the Ingress because I will make Vault accessible
via a LoadBalancer instead. There&rsquo;s no need to push it through Traefik, and
using it through Traefik would just mean one more service that needs to be up
and running before Vault is accessible.</p>
<p>Next, I&rsquo;m configuring the liveness probe. It needs to be reconfigured to make
sure that Vault also returns a <code>200</code> result when the pod being probed is in
standby. See also <a href="https://developer.hashicorp.com/vault/api-docs/system/health">these docs</a>.
And while setting Vault up, before initializing the cluster, just do yourself
the favor and completely disable the probes, or at least increase the <code>initialDelaySeconds</code>,
to prevent restarts while you&rsquo;re in the process of initializing the cluster.</p>
<p>Next, I&rsquo;m adding a toleration for <code>control-plane</code> nodes. This is mostly because
those are the only nodes with local storage, so they will be up first.</p>
<p>And then we come to the first problem with the chart, the service setup. In my
k8s cluster, I&rsquo;m using Cilium&rsquo;s BGP-based LoadBalancer support. And that requires
the Services Cilium looks at to have a specific, configurable label. But the
Vault Helm chart does not allow setting labels for the Services it creates.
Perhaps my use case is just really niche?
Anyway, I&rsquo;m enabling only the generic <code>vault</code> Service, set it to LoadBalancer
and, importantly, set the <code>externalTrafficPolicy</code> to <code>Local</code>. This means that
packets arriving for Vault will directly reach the node where the active Vault
Pod is running, instead of getting forwarded there by other nodes.
This is particularly important for Vault, because Vault can configure tokens
to be valid only when they&rsquo;re coming from certain IPs. This won&rsquo;t work when
the source IP looks like it&rsquo;s coming from another k8s node, instead of the actual
source host.
I&rsquo;m also setting a fixed IP to assign to the LoadBalancer, so I can easily set a
few choice firewall rules for access to that LoadBalancer.</p>
<p>After the Service configs follows the HA configuration. In this config, there can
be multiple Vault servers, continuously exchanging data. When the currently active
server goes down, another one can take over, and the previous leader goes into
the standby pool once its back.
Note that this is a high availability setup, not a load balancing setup. When
a request reaches a standby server, it is forwarded to the current active server.
The standby server never answers any requests, besides those to the health
endpoint, of course.
In my case, the HA config mostly consists of a config snippet for the Vault
config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>cluster_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vault-k8s&#34;</span>
</span></span><span style="display:flex;"><span>ui <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>disable_mlock <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">listener</span> <span style="color:#e6db74">&#34;tcp&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;[::]:8200&#34;</span>
</span></span><span style="display:flex;"><span>  cluster_address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;[::]:8201&#34;</span>
</span></span><span style="display:flex;"><span>  tls_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/certificate&#34;</span>
</span></span><span style="display:flex;"><span>  tls_key_file  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/private_key&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">storage</span> <span style="color:#e6db74">&#34;raft&#34;</span> {
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/data&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault-0.vault-internal:8200&#34;</span>
</span></span><span style="display:flex;"><span>    leader_ca_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault-1.vault-internal:8200&#34;</span>
</span></span><span style="display:flex;"><span>    leader_ca_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault-2.vault-internal:8200&#34;</span>
</span></span><span style="display:flex;"><span>    leader_ca_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">service_registration</span> <span style="color:#e6db74">&#34;kubernetes&#34;</span> {}
</span></span></code></pre></div><p>Most interesting here is the <code>retry_join</code> configuration, which needs to contain
the CA used to sign the TLS cert used in the <code>listener</code> stanza. I will explain
this more deeply in the next section, where I set up the cert generation.</p>
<p>Once that Helm chart gets deployed, a couple of things went wrong, leading to some
beautiful Yak shaving.</p>
<h2 id="setting-up-the-ciliumbgppeeringpolicy">Setting up the CiliumBGPPeeringPolicy</h2>
<p>As I&rsquo;ve noted above, the labels of the main Vault Service cannot be changed.
Interestingly, the two other services, for active and standby servers, do have
the option of configuring their labels. But not the main service.
Another issue: The type of the Services can only be set centrally, for all three
Services.</p>
<p>As you can read in a bit more detail <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">here</a>,
I&rsquo;m using Cilium&rsquo;s BGP-based support for setting up LoadBalancer type services.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/public-service&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>The main problem here is the <code>spec.virtualRouters[0].serviceSelector</code>, as it
only allows matching on labels - and I cannot influence the labels set on the
Vault Service. I then took a very close look at the Cilium docs and found out
that the selector can also select on the Service name and namespace. So I
tried extending the above config like this, adding another entry in the
<code>virtualRouters</code> list:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/public-service&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.name&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.namespace&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>But, this did not work at all. Cilium announced either only the Services which
matched the first <code>serviceSelector</code> or the ones matching the second, but never
both. When debugging this issue, you can use <code>cilium bgp routes</code> to show which
routes cilium advertises to neighbors.</p>
<p>What did end up working was to introduce two peering policies. The hosts they
apply to seem to also have to avoid any overlap, or it again won&rsquo;t work.
I&rsquo;ve got it configured like this now:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/public-service&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">controller-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.name&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.namespace&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>So one policy does the normal thing, for all of the Services where I can control
the labels, running on my Ceph and worker hosts.
Then there is the second policy, which only applies to the <code>vault</code> service in
the <code>vault</code> namespace. With that configuration, I got Cilium to announce both
of them.</p>
<h2 id="setting-up-the-certificates-for-vault">Setting up the certificates for Vault</h2>
<p>The next step, accounting for the majority of preparation work, was the certificate
setup. The Vault Pods need to be able to access each other, and they should do
it over HTTPS. So they need certificates. Initially, I did not realize that and
naively just told Vault to use my Let&rsquo;s Encrypt external cert. But of course
the Vault instances need to contact each other directly, not just the active
server available via the <code>newvault.example.com</code> address.</p>
<p>So I needed a specific certificate for Vault, with the following three SANs:</p>
<ul>
<li><code>vault-0.vault-internal</code></li>
<li><code>vault-1.vault-internal</code></li>
<li><code>vault-2.vault-internal</code></li>
</ul>
<p>And as I noted above, I&rsquo;ve already got an internal CA. And here is where I
knowingly committed a sin: That CA is provided by Vault, via the
<a href="https://developer.hashicorp.com/vault/docs/secrets/pki">PKI secrets engine</a>.
And I reused that CA for generating the Vault CA. Knowingly introducing a
dependency cycle into my setup. I feel a bit ashamed for it. But I also don&rsquo;t
want to introduce another complete PKI setup. And reusing the already existing
CA has the benefit that the CA cert is already widely deployed in my Homelab.</p>
<p>The first step is to set up a separate <a href="https://developer.hashicorp.com/vault/docs/secrets/pki/quick-start-intermediate-ca#configure-a-role">role</a>
for the certificate, so I can properly separate access to generating certificates
later.</p>
<p>I&rsquo;ve got all of my Vault configuration in Terraform, so I added the new role
there as well. Here is the full CA setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_mount&#34; &#34;my-ca-mount&#34;</span> {
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;my-ca&#34;</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pki&#34;</span>
</span></span><span style="display:flex;"><span>  default_lease_ttl_seconds <span style="color:#f92672">=</span> <span style="color:#ae81ff">157680000</span>
</span></span><span style="display:flex;"><span>  max_lease_ttl_seconds <span style="color:#f92672">=</span> <span style="color:#ae81ff">157680000</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_pki_secret_backend_root_cert&#34; &#34;my-root-cert&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_mount</span>.<span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ca</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">mount</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  common_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;My Private Root CA&#34;</span>
</span></span><span style="display:flex;"><span>  ttl <span style="color:#f92672">=</span> <span style="color:#ae81ff">157680000</span>
</span></span><span style="display:flex;"><span>  format <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pem&#34;</span>
</span></span><span style="display:flex;"><span>  private_key_format <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;der&#34;</span>
</span></span><span style="display:flex;"><span>  key_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;rsa&#34;</span>
</span></span><span style="display:flex;"><span>  key_bits <span style="color:#f92672">=</span> <span style="color:#ae81ff">4096</span>
</span></span><span style="display:flex;"><span>  exclude_cn_from_sans <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  ou <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Private&#34;</span>
</span></span><span style="display:flex;"><span>  organization <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Private&#34;</span>
</span></span><span style="display:flex;"><span>  country <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;DE&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_pki_secret_backend_config_urls&#34; &#34;my-root-urls&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_mount</span>.<span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ca</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">mount</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  issuing_certificates <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;https://vault.example.com/v1/my-ca/ca&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_pki_secret_backend_role&#34; &#34;vault-certs&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_mount</span>.<span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ca</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">mount</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  ttl <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;15552000&#34;</span>
</span></span><span style="display:flex;"><span>  max_ttl <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;15552000&#34;</span>
</span></span><span style="display:flex;"><span>  allow_localhost <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  allowed_domains <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;newvault.example.com&#34;, &#34;vault-internal&#34;, &#34;127.0.0.1&#34;</span> ]
</span></span><span style="display:flex;"><span>  allow_subdomains <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  allow_ip_sans <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  allow_wildcard_certificates <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  allow_bare_domains <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  key_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;rsa&#34;</span>
</span></span><span style="display:flex;"><span>  key_bits <span style="color:#f92672">=</span> <span style="color:#ae81ff">4096</span>
</span></span><span style="display:flex;"><span>  organization <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;My Homelab&#34;</span>]
</span></span><span style="display:flex;"><span>  country <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;DE&#34;</span>]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>There&rsquo;s also another role for certs deployed for other purposes, but that&rsquo;s not
important here. I will not go over the base CA and mount configuration and
instead concentrate on the <code>vault-certs</code> role.</p>
<p>This role allows creating certificates for the <code>vault-internal</code> domain, covering
the three Pod&rsquo;s internal DNS names, the externally visible address of the Vault
cluster that the LoadBalancer points to at <code>newvault.example.com</code> and the
localhost address. The localhost IP is there because that&rsquo;s how the <code>vault</code> CLI
launched inside the Pods contacts the local Vault instance. That use case is
required for initialization and unsealing later.</p>
<p>This is the beauty of Terraform at work again. I could of course also do the
same using the Vault CLI, but then I would need to document the commands somewhere
and remember to update that documentation whenever something changes. With the
Infrastructure as code approach, I would make any changes in this definition,
so it&rsquo;s always up-to-date.</p>
<p>After a <code>terraform apply</code>, I could produce a certificate with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write -format<span style="color:#f92672">=</span>json my-ca/issue/vault-certs common_name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;newvault.example.com&#34;</span> alt_names<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal&#34;</span> ttl<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;4000h&#34;</span> &gt; test.cert
</span></span></code></pre></div><p>The JSON file that produces looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;request_id&#34;</span>: <span style="color:#e6db74">&#34;5d72d050-8f80-ba4f-1067-b4165cf2d0f5&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_id&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_duration&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;renewable&#34;</span>: <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ca_chain&#34;</span>: [
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n[...]\n-----END CERTIFICATE-----&#34;</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;certificate&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;expiration&#34;</span>: <span style="color:#ae81ff">1757892246</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;issuing_ca&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;private_key&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN RSA PRIVATE KEY-----\n[...]\n-----END RSA PRIVATE KEY-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;private_key_type&#34;</span>: <span style="color:#e6db74">&#34;rsa&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;serial_number&#34;</span>: <span style="color:#e6db74">&#34;7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e&#34;</span>
</span></span><span style="display:flex;"><span>  },
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;mount_type&#34;</span>: <span style="color:#e6db74">&#34;pki&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>And I then revoked this test cert using the <code>serial_number</code> with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write my-ca/revoke serial_number<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e&#34;</span>
</span></span></code></pre></div><h3 id="getting-the-certificate-into-kubernetes">Getting the certificate into Kubernetes</h3>
<p>I could then of course just upload the key and certificate into a k8s Secret,
but that just doesn&rsquo;t feel very Kubernetes-y, plus it would be a step I would
need to document for future renewals. Instead, I had another look at external-secrets
and found the <a href="https://external-secrets.io/latest/api/generator/vault/">VaultDynamicSecret</a>.</p>
<p>This is another nice feature for getting Vault outputs into k8s Secrets, only
this time it&rsquo;s not static credentials, but a certificate, complete with automatic
renewal. And the usage of the PKI secrets engine is even the example used in the
docs.</p>
<p>I initially deployed a manifest that looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;my-ca/issue/vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;POST&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">common_name</span>: <span style="color:#e6db74">&#34;newvault.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alt_names</span>: <span style="color:#e6db74">&#34;vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ip_sans</span>: <span style="color:#e6db74">&#34;127.0.0.1&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resultType</span>: <span style="color:#e6db74">&#34;Data&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-ca-cert</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">caCert</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">appRole</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">roleId</span>: {{ <span style="color:#ae81ff">.Values.approleId }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;external-secrets-approle&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;secretId&#34;</span>
</span></span></code></pre></div><p>I deployed this manifest in the external-secret&rsquo;s namespace, because that was
where the <a href="https://developer.hashicorp.com/vault/docs/auth/approle">AppRole</a>
auth secrets lived.</p>
<p>Then I created the following ExternalSecret in the <code>vault</code> namespace to generate
a certificate:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;4000h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vault-tls-certs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">sourceRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">generatorRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span></code></pre></div><p>This didn&rsquo;t work, and I got an error about the <code>vault-certs-generator</code> not
being found. This was because the non-Cluster variants of external-secrets
objects are generally only available in the namespace where they were created.
So my ExternalSecret in the <code>vault</code> namespace wasn&rsquo;t able to access the
VaultDynamicSecret in the external-secrets namespace.</p>
<p>So I ended up moving the ExternalSecret into the external-secrets namespace
as well, just to make sure that it even works. That introduced me to an
authorization error looking something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;error&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#ae81ff">1742423406.4829333</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Reconciler error&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;controller&#34;</span>:<span style="color:#e6db74">&#34;externalsecret&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;controllerGroup&#34;</span>:<span style="color:#e6db74">&#34;external-secrets.io&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;controllerKind&#34;</span>:<span style="color:#e6db74">&#34;ExternalSecret&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;ExternalSecret&#34;</span>:{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;name&#34;</span>:<span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;namespace&#34;</span>:<span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>},
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;namespace&#34;</span>:<span style="color:#e6db74">&#34;external-secrets&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;name&#34;</span>:<span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;reconcileID&#34;</span>:<span style="color:#e6db74">&#34;d6b0b369-f959-479a-8228-f9a8d6fbc5bd&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;error&#34;</span>:<span style="color:#e6db74">&#34;error processing spec.dataFrom[0].sourceRef.generatorRef,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">err: error using generator: Error making API request.\n\nURL:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">PUT https://vault.example.com:8200/v1/my-ca/issue/vault-certs
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Code: 403. Errors:\n\n* 1 error occurred:\n\t* permission denied\n\n&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;stacktrace&#34;</span>:<span style="color:#e6db74">&#34;...&#34;</span>}
</span></span></code></pre></div><p>This was due to a mistake I had made in updating the policy for external-secrets
to allow it access to the <code>my-ca/issue/vault-certs</code> endpoint. The policy addition
I had made for that particular path looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;my-ca/issue/vault-certs&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;create&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>That&rsquo;s what all the examples I could find said. But it did not work. I finally
added all permissions and then slowly removed the capabilities one by one,
until I arrived at only the <code>update</code> capability missing from the above.</p>
<p>After fixing that I finally got a certificate created. But I still had to do
something about the fact that the ExternalSecret lived in the external-secrets
namespace now, while it was needed in Vault&rsquo;s namespace.</p>
<p>One option I looked at to resolve this issue is <a href="https://external-secrets.io/latest/api/generator/cluster/">ClusterGenerators</a>.
These work similar to namespaced generators like VaultDynamicSecret, but allow
usage in ExternalSecrets throughout the cluster.
I ended up deciding against that, for simple &ldquo;doing things properly&rdquo; reasons:
The generator will only ever be needed in the Vault namespace, because it is not
a generic generator for TLS certs, but a specific generator restricted to creating
certs for Vault.</p>
<p>So I decided to stay with the namespaced VaultDynamicSecret, but change the
auth method to Kubernetes.</p>
<h3 id="setting-up-kubernetes-auth">Setting up Kubernetes auth</h3>
<p>Being the Swiss army knife that it is, Vault can also authenticate with
<a href="https://developer.hashicorp.com/vault/docs/auth/kubernetes">Kubernetes</a>.
The way this works is that you can create a role in Vault and assign policies
defining what that role can do. Then, certain Kubernetes ServiceAccounts can be
allowed to authenticate with that role. Vault then expects to receive a Kubernetes
JWT token to verify the authentication, which it then contacts Kubernetes for
to ensure the token is valid and belongs to one of the ServiceAccounts allowed
to use the Vault role.</p>
<p>One problem is that the action of verifying Kubernetes tokens itself also needs
a Kubernetes token for the API server access. With a Vault deployed via the
Helm chart that&rsquo;s easy, the <code>vault</code> ServiceAccount created by the chart already
has the necessary permissions, and Vault can use that account&rsquo;s token.
Vault will also automatically reload the token periodically, as Kubernetes
tokens are generally short-lived.</p>
<p>But at least initially, I need to use my baremetal Vault for the certificate
generation, because those are the certs that the k8s Vault deployment will use
later. To work around this issue, one could still use long-lived tokens. But
another way would be to use the JWT of the process that&rsquo;s trying to use the
Vault auth method. This requires some changes in Kubernetes though. Namely, the
ServiceAccounts which should validate to Vault via Kubernetes auth need to have
the <code>system:auth-delegator</code> ClusterRole. This allows the ServiceAccount&rsquo;s token
to be used by other apps (here, Vault) to authenticate with that token. Vault
can use this to access the <a href="https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-review-v1/">TokenReview</a>
API to verify that the token is valid.
No change is necessary here, because the <code>vault</code> ServiceAccount I will be using
already has the <code>auth-delegator</code> role.</p>
<p>So with that out of the way, here is the Vault Kubernetes auth setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;kubernetes&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;kubernetes&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_kubernetes_auth_backend_config&#34; &#34;kube-backend-config&#34;</span> {
</span></span><span style="display:flex;"><span>  backend                <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_auth_backend</span>.<span style="color:#66d9ef">kubernetes</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  kubernetes_host        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://k8s.exmaple.com:6443&#34;</span>
</span></span><span style="display:flex;"><span>  issuer                 <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;api&#34;</span>
</span></span><span style="display:flex;"><span>  disable_iss_validation <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  kubernetes_ca_cert               <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n [...]&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_kubernetes_auth_backend_role&#34; &#34;vault-certs&#34;</span> {
</span></span><span style="display:flex;"><span>  backend                          <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_auth_backend</span>.<span style="color:#66d9ef">kubernetes</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  role_name                        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  bound_service_account_names      <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;vault&#34;</span>]
</span></span><span style="display:flex;"><span>  bound_service_account_namespaces <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;vault&#34;</span>]
</span></span><span style="display:flex;"><span>  token_ttl                        <span style="color:#f92672">=</span> <span style="color:#ae81ff">3600</span>
</span></span><span style="display:flex;"><span>  token_policies                   <span style="color:#f92672">=</span> [<span style="color:#66d9ef">vault_policy</span>.<span style="color:#66d9ef">vault</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">certs</span>.<span style="color:#66d9ef">name</span>]
</span></span><span style="display:flex;"><span>  token_bound_cidrs                <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;300.300.300.0/24&#34;</span>]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>In this setup, the Kubernetes auth is created for the k8s API at <code>k8s.example.com:6443</code>.
The general k8s info can be found via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl cluster-info
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Kubernetes control plane is running at https://k8s.example.com:6443
</span></span><span style="display:flex;"><span>CoreDNS is running at https://k8s.example.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>To further debug and diagnose cluster problems, use <span style="color:#e6db74">&#39;kubectl cluster-info dump&#39;</span>.
</span></span></code></pre></div><p>To get the <code>kubernetes_ca_cert</code>, you can have a look at the <code>kube-root-ca.crt</code>
ConfigMap that should be available in all namespaces, for example like this:</p>
<pre tabindex="0"><code>kubectl get -n kube-system configmaps kube-root-ca.crt -o jsonpath=&#34;{[&#39;data&#39;][&#39;ca\.crt&#39;]}&#34;
</code></pre><p>Finally, I&rsquo;Ve also restricted all tokens created by the <code>vault-certs</code> role so
that they&rsquo;re only valid coming from IPs in the Homelab. That&rsquo;s just a small
defense in depth method I like to apply for any tokens in Vault where it&rsquo;s possible.</p>
<h3 id="finally-setting-up-the-certificate-generator">Finally setting up the certificate generator</h3>
<p>With the authentication now configured properly, the certificate generation
can be set up like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;4000h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vault-tls-certs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">sourceRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">generatorRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;my-ca/issue/vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;POST&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">common_name</span>: <span style="color:#e6db74">&#34;newvault.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alt_names</span>: <span style="color:#e6db74">&#34;vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ip_sans</span>: <span style="color:#e6db74">&#34;127.0.0.1&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resultType</span>: <span style="color:#e6db74">&#34;Data&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-ca-cert</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">caCert</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mountPath</span>: <span style="color:#e6db74">&#34;kubernetes&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">role</span>: <span style="color:#e6db74">&#34;vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">serviceAccountRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span></code></pre></div><p>With this configuration, the Vault certs are collected (still from the old
baremetal Vault), with the Kubernetes authentication using the <code>vault</code> ServiceAccount.
This now works without issue, and a certificate usable by the k8s Vault instance
is generated.</p>
<p>I&rsquo;m also setting the renewal time of the Secret containing the certificate to
4000 hours. This should lead to automatic renewal with quite some time to spare,
as the certificates are given a lifetime of 4400h.</p>
<p>One thing to note is that the VaultDynamicSecret also needs the CA certificate.
The way I&rsquo;m currently supplying that one is a bit hacky. I&rsquo;m deploying Vault
with a Helm chart, and I&rsquo;ve added this to the <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:8200/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>This is a special functionality of the tool I&rsquo;ve been using to manage all of
the Helm charts in my cluster, <a href="https://github.com/helmfile/helmfile">Helmfile</a>.
It can interpret Go templates in the <code>values.yaml</code> file. That line fetches the
CA certificate from the Vault endpoint and stores it in the <code>caBundle</code> variable.
That is then used to create a Secret with the CA like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-ca-cert</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><h2 id="initializing-vault">Initializing Vault</h2>
<p>With all those Yaks safely shaven, I could finally go forward with initializing
the Kubernetes Vault cluster.</p>
<p>I used this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -n vault vault-0 -- vault operator init -key-shares<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> -key-threshold<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> &gt; vault-init.txt
</span></span></code></pre></div><p>This initialization failed with a certificate error:</p>
<pre tabindex="0"><code>Get &#34;https://127.0.0.1:8200/v1/sys/seal-status&#34;: tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn&#39;t contain any IP SANs
</code></pre><p>Even for local connections, Vault needs a cert. And that&rsquo;s why I&rsquo;ve got the
<code>127.0.0.1</code> IP SAN in the certificate used by Vault.</p>
<p>After I got that issue fixed, I finally successfully initialized the Vault
instance, resulting in this information:</p>
<pre tabindex="0"><code>Unseal Key 1: abcde123

Initial Root Token: hvs.foobar

Vault initialized with 1 key shares and a key threshold of 1. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 1 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated root key. Without at least 1 keys to
reconstruct the root key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See &#34;vault operator rekey&#34; for more information.
</code></pre><p>In the initialization command, I told Vault that I only need one key share.
Normally, you would split the key into multiple shares so they can be distributed,
but that doesn&rsquo;t make any real sense for a small personal instance. If somebody
somehow gets one of the key shares, they would very likely be able to get the
others the same way.</p>
<p>It is very important to save the initial root token <code>hvs.foobar</code>. This is
needed for the initial configuration, until some policies and other auth
methods have been configured.</p>
<p>The next step was then to unseal all three Vault instances with these commands
and the unseal key output by the <code>vault init</code> command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n vault vault-0 -- vault operator unseal
</span></span><span style="display:flex;"><span>kubectl exec -it -n vault vault-1 -- vault operator unseal
</span></span><span style="display:flex;"><span>kubectl exec -it -n vault vault-2 -- vault operator unseal
</span></span></code></pre></div><p>One interesting thing to note: All of the Vault Pods, as configured by the
Helm chart, run with the OnDelete update strategy. This has the effect that no
change to the configuration, including e.g. setting new environment variables,
will do anything. The Pods always need to be deleted manually to make a change.</p>
<h2 id="configuring-vault-logging">Configuring Vault logging</h2>
<p>I like having my logs all in at least approximately the same format, and so
I&rsquo;ve got a log parsing section for most apps in my FluentD config. Normally I
don&rsquo;t mention this, but Vault is a little bit weird. Namely, it does output
its logs as JSON if so configured, which is good. It makes parsing a lot simpler.
But, it also adds an <code>@</code> symbol to the names of <em>most</em> of the JSON keys:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;@level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;@message&#34;</span>:<span style="color:#e6db74">&#34;compacting logs&#34;</span>,<span style="color:#f92672">&#34;@module&#34;</span>:<span style="color:#e6db74">&#34;storage.raft&#34;</span>,<span style="color:#f92672">&#34;@timestamp&#34;</span>:<span style="color:#e6db74">&#34;2025-04-06T18:11:31.197956Z&#34;</span>,<span style="color:#f92672">&#34;from&#34;</span>:<span style="color:#ae81ff">893122</span>,<span style="color:#f92672">&#34;to&#34;</span>:<span style="color:#ae81ff">901345</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;@level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;@message&#34;</span>:<span style="color:#e6db74">&#34;snapshot complete up to&#34;</span>,<span style="color:#f92672">&#34;@module&#34;</span>:<span style="color:#e6db74">&#34;storage.raft&#34;</span>,<span style="color:#f92672">&#34;@timestamp&#34;</span>:<span style="color:#e6db74">&#34;2025-04-06T18:11:31.235460Z&#34;</span>,<span style="color:#f92672">&#34;index&#34;</span>:<span style="color:#ae81ff">911585</span>}
</span></span></code></pre></div><p>And I&rsquo;ve got no idea why. Or how it decides which keys get the <code>@</code> and which
do not. It made my log parsing a little bit more complicated. It now looks
like this:</p>
<pre tabindex="0"><code># Log config for the Vault deployment
&lt;filter services.vault.vault&gt;
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  &lt;parse&gt;
    @type multi_format
    &lt;pattern&gt;
      format json
      time_key &#34;@timestamp&#34;
      time_type string
      time_format %iso8601
      utc true
    &lt;/pattern&gt;
    &lt;pattern&gt;
      format regexp
      expression /^(?&lt;msg&gt;.*)$/
      time_key nil
    &lt;/pattern&gt;
  &lt;/parse&gt;
&lt;/filter&gt;

&lt;filter services.vault.vault&gt;
  @type record_modifier
  remove_keys _dummy_,@level
  &lt;record&gt;
    _dummy_ ${record[&#34;level&#34;] = record[&#34;@level&#34;] if record.key?(&#34;@level&#34;)}
  &lt;/record&gt;
&lt;/filter&gt;
</code></pre><p>The first filter does the main parsing, while the second one specifically
removes the <code>@</code> in front of the <code>level</code> entry in the log object, because
that&rsquo;s the key where my setup expects to see the log level.</p>
<p>Another weird thing, where Vault is by far not the biggest offender, are apps
which log in multiple different formats. That&rsquo;s why the first filter has a
<code>multi_format</code> parser. For reasons I&rsquo;m not sure about, Vault outputs some general
information in the beginning of the log, during startup, where it doesn&rsquo;t respect
the log format configuration:</p>
<pre tabindex="0"><code>==&gt; Vault server configuration:

Administrative Namespace:
             Api Address: https://10.8.1.61:8200
                     Cgo: disabled
         Cluster Address: https://vault-0.vault-internal:8201
   Environment Variables: HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, TERM, VAULT_ADDR, VAULT_API_ADDR, VAULT_CACERT, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_LOG_FORMAT, VAULT_LOG_LEVEL, VAULT_PORT, VAULT_PORT_8200_TCP, VAULT_PORT_8200_TCP_ADDR, VAULT_PORT_8200_TCP_PORT, VAULT_PORT_8200_TCP_PROTO, VAULT_PORT_8201_TCP, VAULT_PORT_8201_TCP_ADDR, VAULT_PORT_8201_TCP_PORT, VAULT_PORT_8201_TCP_PROTO, VAULT_RAFT_NODE_ID, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_HTTPS, VAULT_SERVICE_PORT_HTTPS_INTERNAL, VERSION
              Go Version: go1.23.6
              Listener 1: tcp (addr: &#34;[::]:8200&#34;, cluster address: &#34;[::]:8201&#34;, disable_request_limiter: &#34;false&#34;, max_request_duration: &#34;1m30s&#34;, max_request_size: &#34;33554432&#34;, tls: &#34;enabled&#34;)
               Log Level: debug
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.18.5, built 2025-02-24T09:40:28Z
             Version Sha: 2cb3755273dbd63f5b0f8ec50089b57ffd3fa330

==&gt; Vault server started! Log data will stream in below:
</code></pre><p>Why output that in plain text, instead of also putting it into JSON? It seems to
be a quirk of all of HashiCorp&rsquo;s tools, Nomad and Consul also do the same thing
if I remember correctly.</p>
<h2 id="migrating-to-the-new-vault-instance">Migrating to the new Vault instance</h2>
<p>With the instance on Kubernetes now configured, I need to migrate the data to
that instance. Sadly, there&rsquo;s not really a good way to migrate especially K/V
store entries from one Vault to another. So I just went with manual migration.</p>
<h3 id="running-terraform-against-the-new-instance">Running Terraform against the new instance</h3>
<p>As I&rsquo;ve mentioned before, I&rsquo;m using Terraform for a lot of the configuration for
Vault, because that is preferable to keeping a list of commands in my
docs.</p>
<p>But the issue was: I also needed to keep the configuration for the old Vault
instance, because I needed to keep that one running during the migration as well.</p>
<p>So I started out with just adding the second Vault as another provider to my
Terraform config, via <a href="https://developer.hashicorp.com/terraform/language/providers/configuration#alias-multiple-provider-configurations">provider aliases</a>.
It looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  alias <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;k8s&#34;</span>
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://newvault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This allows me to keep configurations for two Vault instances in the same Terraform
state. I initially only created the <code>userpass</code> auth method for the new Vault,
to verify that the Terraform setup worked:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;userpass&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  local <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;userpass-k8s&#34;</span> {
</span></span><span style="display:flex;"><span>  provider <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>.<span style="color:#66d9ef">k8s</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  local <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>With the <code>provider</code> setting, I could choose which Vault provider config I wanted
to use.</p>
<p>But trying a <code>terraform apply</code> with this configuration resulted in an error:</p>
<pre tabindex="0"><code>│ Error: failed to lookup token, err=Error making API request.
│
│ URL: GET https://vault.example.com:8200/v1/auth/token/lookup-self
│ Code: 403. Errors:
│
│ * 2 errors occurred:
│       * permission denied
│       * invalid token
</code></pre><p>This confused me - until I remembered that I had configured the Vault root token
for the new k8s Vault in the terminal I was running the command. Running my
customary <code>vault login -method=userpass username=myuser</code> on another shell and
executing the <code>terraform apply</code> of course also didn&rsquo;t work, because now it had
only the Vault token needed for the old Vault instance.</p>
<p>A quick look into the <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs#vault-authentication-configuration-options">Vault Terraform provider documentation</a>
lead to the solution. I could configure one provider <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs#token-file">with a filepath</a>
to a token file. That would be the provider for the old Vault instance. Then
I would leave the provider for the new Vault unconfigured, which would mean that
it would continue to use the <code>VAULT_TOKEN</code> environment variable. The resulting
provider config looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">auth_login_token_file</span> {
</span></span><span style="display:flex;"><span>    filename <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/home/myuser/.vault-token&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  alias <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;k8s&#34;</span>
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://newvault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I would then first run the login for the old provider:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault login -method<span style="color:#f92672">=</span>userpass username<span style="color:#f92672">=</span>myuser
</span></span></code></pre></div><p>Then, in the same terminal, I would set the <code>VAULT_TOKEN</code> variable to the root
token of the new Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export VAULT_TOKEN<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;hvs.foobar&#34;</span>
</span></span></code></pre></div><p>And with that, I was able to run <code>terraform apply</code> without issue, and both
Vault instances were configurable.</p>
<p>Next, I needed to be able to stop using the root token for the new instance and
instead create a <code>userpass</code> login for that one as well. This, I needed to do on
the command line, because the <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs/resources/generic_endpoint">Terraform resource</a>
that needs to be used to create a <code>userpass</code> user requires the password as part
of the Terraform resources, and I really did not want that. So I created it
on the command line:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write auth/userpass/users/myuser password<span style="color:#f92672">=</span>- token_policies<span style="color:#f92672">=</span>admin token_ttl<span style="color:#f92672">=</span>4h token_max_ttl<span style="color:#f92672">=</span>4h token_bound_cidrs<span style="color:#f92672">=</span>300.300.300.12 token_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;default&#34;</span>
</span></span></code></pre></div><p>This will create the user <code>myuser</code> in the <code>userpass</code> backend and will ask for
the password on the command line. Tokens issued by this auth method for the user
will be valid for four hours and will only be valid when used from the <code>300.300.300.12</code>
source IP, which is my Command &amp; Control host.</p>
<p>Now, instead of exporting the root token in the <code>VAULT_TOKEN</code> variable, I could
issue this command to instead get a token for the <code>myuser</code> role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export VAULT_TOKEN<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>vault login -method<span style="color:#f92672">=</span>userpass -token-only username<span style="color:#f92672">=</span>myuser<span style="color:#66d9ef">)</span>
</span></span></code></pre></div><h2 id="migrating-kv-secrets">Migrating K/V secrets</h2>
<p>After I had the <code>userpass</code> login configured, I could just copy+paste all of the
Terraform resources for my Vault setup, add the <code>provider = vault.k8s</code> option,
and one <code>terraform apply</code> later, most configuration was migrated to the new
Vault instance on Kubernetes.</p>
<p>The only problem were the K/V secrets. Those are not in Terraform, because that
would have required me to put my secrets into the Terraform config files and the
Terraform state. After searching around a little, it looked like there was no
official way to run a migration of K/V secrets, so I came up with my own.</p>
<p>First, I would export the <code>data</code> field, which contains the actual secrets, as
opposed to some metadata, from the old Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv get -field data -format json secret/topsecret/database-creds &gt; out.json
</span></span></code></pre></div><p>That would give me a JSON file like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;username&#34;</span>: <span style="color:#e6db74">&#34;foo&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;password&#34;</span>: <span style="color:#e6db74">&#34;bar&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>That could then be imported into the new Vault like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/topsecret/database-creds @out.json
</span></span></code></pre></div><p>I did that exactly 59 times, and all of my secrets were successfully migrated
over.</p>
<h2 id="update-playbook-changes">Update playbook changes</h2>
<p>Another interesting piece of code I would like to talk about is my Homelab host
update Ansible playbook. This playbook runs updates of the host OS, Ubuntu
server in my case, including automatic reboots and k8s node drains. But I would
need to manually unseal the Vault Pods once their host was updated and rebooted.
For that, I&rsquo;m just having an Ansible task outputting the command I can copy+paste
into another terminal to do the unseal.</p>
<p>This was pretty simple up to now, with the baremetal Vault, because I could
directly contact the host being updated, because the Vault instance on there
would be the one which needs the unseal.
But, with Vault in k8s, there&rsquo;s no obvious way to determine which of the three
Vault Pods ran on the host currently being updated. I needed an approach to
find the right container.</p>
<p>The first step is to wait for the local Vault Pod on the rebooted machine to
come up again, so that it would even accept the unseal command. I did that
with the following task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for vault to be running</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candchost</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">candcuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubernetes.core.k8s_info</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">label_selectors</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">app.kubernetes.io/name=vault</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">app.kubernetes.io/instance=vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">field_selectors</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;spec.nodeName={{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait_condition</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">status</span>: <span style="color:#e6db74">&#34;True&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Ready&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait_sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait_timeout</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">vault_pod_list</span>
</span></span></code></pre></div><p>This task uses the <a href="https://docs.ansible.com/ansible/latest/collections/kubernetes/core/k8s_info_module.html#ansible-collections-kubernetes-core-k8s-info-module">Kubernetes Ansible collection</a>
to have Ansible wait for the Vault Pod to be in <code>Ready</code> state. I&rsquo;m also saving the list
of discovered Vault Pods in a variable for later use. This task would only wait
for the Vault Pod on the host currently being updated, via the field selector.</p>
<p>Short aside: This also taught me that I could do the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get pods -A --field-selector <span style="color:#e6db74">&#34;spec.nodeName=mynode&#34;</span>
</span></span></code></pre></div><p>Instead of <code>kubectl get pods -A -o wide | grep mynode</code>. After over a year of
running Kubernetes in my Homelab. &#x1f926;</p>
<p>But let&rsquo;s move on. I now had the name of the Vault Pod on the rebooted host
in the <code>vault_pod_list</code> variable, which allowed me to output a command line
I could copy+paste to unseal the Vault instance:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unseal vault prompt</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">echo</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prompt</span>: <span style="color:#e6db74">&#34;Please unseal vault: kubectl exec -it -n vault {{ vault_pod_list.resources[0].metadata.name }} -- vault operator unseal&#34;</span>
</span></span></code></pre></div><p>This is a pretty convenient way to integrate manual operations into an Ansible
play and works quite well. I see this prompt, copy the line and unseal the Pod,
and then I just hit <code>&lt;Return&gt;</code> in the shell where Ansible is running and the
play will continue.</p>
<h2 id="switching-the-certs-over-to-the-new-vaults-ca">Switching the certs over to the new Vault&rsquo;s CA</h2>
<p>If you remember from further up (and I won&rsquo;t be mad if you don&rsquo;t, looking at the
length of this post&hellip;), I was using the baremetal Vault instance to generate the
certificates for the new Vault instance. But this also meant that those certs
were relying on the old Vault&rsquo;s CA.</p>
<p>The first step was to update the CA cert in the k8s Secret used for the
VaultDynamicSecret for the Vault certificate, which I did by changing the
line in my <code>values.yaml</code> file fetching the CA:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>{{- <span style="color:#ae81ff">exec &#34;curl&#34; (list &#34;-k&#34; &#34;https://newvault:8200/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>This did not have any direct effect on anything. The certificate Secret has a
TTL of 4000 hours, so won&rsquo;t try to recreate the certs anytime soon. At the same
time, Vault won&rsquo;t automatically reload a new CA either, so everything was fine.</p>
<p>Then I went into the VaultDynamicSecret and updated the Vault URL from the old
to the new Vault. This regenerated the Vault certificates. But again, Vault
itself doesn&rsquo;t react to that, so Vault was still up and running without issue.</p>
<p>Then I send <code>SIGHUP</code> to each Vault instance in turn, which triggered a configuration
reload, including fresh certificates.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n vault vault-0 -- sh
</span></span><span style="display:flex;"><span>kill -SIGHUP <span style="color:#66d9ef">$(</span>pidof vault<span style="color:#66d9ef">)</span>
</span></span></code></pre></div><p>And that&rsquo;s it. I added annotations to a couple of ExternalSecrets to trigger
refreshes to make sure it all worked, and it did, external-secrets successfully
got the secrets from the new Vault k8s instance.</p>
<p>This was quite a lot more work than I thought, but it was also the second-to-last
part of the migration.
Now, the only thing still missing is to migrate the control plane off of the
VMs on my extension host and onto the three Raspberry Pi 4 which previously served as
controller nodes and are now empty, thanks to the baremetal Vault having been shut down.</p>
<p>But it&rsquo;s Monday evening now, and the controller migration is more a weekend task,
because it also includes moving the MONs of the Rook Ceph cluster, and that
will need some full cluster restarts.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 23: Shutdown of the Baremetal Ceph Cluster</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-23-baremetal-ceph-shutdown/</link>
      <pubDate>Sat, 29 Mar 2025 15:10:33 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-23-baremetal-ceph-shutdown/</guid>
      <description>Migration of the last remaining data and shutdown of my baremetal Ceph cluster</description>
      <content:encoded><![CDATA[<p>Wherein I migrate the last remaining data off of my baremetal Ceph cluster and
shut it down.</p>
<p>This is part 24 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I set up my baremetal Ceph cluster back in March of 2021, driven by how much
I liked the idea of large pools of disk I could use to provide S3 storage, Block
devices and a POSIX compatible filesystem. Since then, it has served me rather
well, and I&rsquo;ve been using it to provide S3 buckets and volumes for my Nomad
cluster. Given how happy I was with it, I also wanted to continue using it for
my Kubernetes cluster.</p>
<p>To this end, I was quite happy to discover the <a href="https://rook.io/">Rook Ceph</a>
project, which at its core implements a Kubernetes operator capable of
orchestrating an entire Ceph cluster. I&rsquo;ve described my setup in far more
detail in <a href="https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/">this blog post</a>.</p>
<p>In the original baremetal cluster, I had three nodes, each with one HDD and one
SSD for storage, running all Ceph daemons besides the MONs, which ran on my
cluster controller Pi 4. I&rsquo;ve run the cluster with replicated pools and a
replication of two, as well as a minimal size of one, so I could reboot a node
without all writes to the cluster having to stop e.g. during maintenance.
I was lucky in that all of my data also comfortably fit on only two hosts
with a 1 TB SSD and a 4 TB HDD. So when the time to start the migration came,
I took my emergency replacement HDD and SSD and put them into my old Homeserver.
A VM running on that server became the first OSD node in the k8s cluster. I also
drained the OSDs and other daemons from one of the original baremetal nodes and
moved that into the k8s cluster as well. So I still ended up with 2x replication,
just with two clusters of two storage nodes each.</p>
<p>After I was finally done migrating all of my services from Nomad to Kubernetes,
I still had the following datasets on the baremetal Ceph cluster:</p>
<ol>
<li>An NFS Ganesha cluster serving the boot partitions of all of my netbooting hosts</li>
<li>A data dump CephFS volume that contained just some random data, like old slides and
digital notes from my University days</li>
<li>The root disks of all of my netbooting nodes, in the form of 50 GB RBDs</li>
</ol>
<p>In the rest of this post, I will go over how I migrated all three of those, shut
down the old baremetal cluster and migrated its two physical nodes into the
Rook Ceph cluster.</p>
<h2 id="root-disk-migration">Root disk migration</h2>
<p>The first step was migrating the root disks of my netbooting hosts. Those hosts
are eight Raspberry Pi CM4 and an x86 SBC, all without any local storage.
Those hosts use a 50 GB RBD each as their root disk. Those needed to be migrated
over to the new Rook Ceph cluster and their configuration changed to contact the
Rook MON daemons.
If you&rsquo;re interested in the details of my netboot setup, have a look at <a href="https://blog.mei-home.net/tags/netboot/">this series of posts</a>.</p>
<p>As these RBDs were block devices, I was initially at a bit of a loss when
thinking about migrating them over. Sure, those nine netbooters were as cattle-ish
as it got, so I could just completely recreate them - but the setup of fresh hosts
is the weakest part of my Homelab setup. It would have taken me a couple of evenings.</p>
<p>Luckily, <a href="https://www.reddit.com/r/ceph/comments/gi7jlz/ceph_snapshot_transfer/">Reddit to the rescue</a>.
It turns out that the <a href="https://docs.ceph.com/en/reef/man/8/rbd/">rbd tool</a> can
both import and export RBD images, including via stdin/stdout.</p>
<p>I did the migration node by node, and because at this point all of the nodes
were in the k8s cluster, I had to start with draining them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl drain --delete-emptydir-data<span style="color:#f92672">=</span>true --force<span style="color:#f92672">=</span>true --ignore-daemonsets<span style="color:#f92672">=</span>true examplehost
</span></span></code></pre></div><p>Then the node also needs to be shut down, because migrating the disk from one
Ceph cluster to another really isn&rsquo;t going to work online.
Once the host was safely shut down, I could do the actual copy operation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rbd --id admin export --no-progress hostdisks/examplehost - | rbd -c ceph-rook.conf -k client.admin.key --id admin import - hostdisks/examplehost
</span></span></code></pre></div><p>The first <code>rbd</code> invocation does not receive an explicit Ceph config file, so it
uses the default <code>/etc/ceph/ceph.conf</code> file, which at this point was still the
config for the baremetal cluster. The <code>hostdisks</code> pool was the destination pool
for the copy operation.
One issue worth noting here is that the <code>rbd</code> tool as provided by the <a href="https://github.com/rook/kubectl-rook-ceph">Rook kubectl plugin</a>
did not work as the receiving command. I was immediately getting broken
pipe errors. Probably something to do with how it is implemented as a kubectl plugin.</p>
<p>With the copy done, which took about ten minutes per disk, I then had to adapt
the configuration of the MON IPs in the host&rsquo;s kernel command line. For one of
my Pis, it looks something like this:</p>
<pre tabindex="0"><code>console=serial0,115200 dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc  boot=rbd rbdroot=300.300.300.310,300.300.300.311,300.300.300.312:cephuser:pw:hostdisks:examplehost::_netdev,noatime hllogserver=logs.internal:12345
</code></pre><p>The list of three IPs after <code>rbdroot=</code> contains the MONs used. Then I also had
to change the Ceph key in the <code>pw</code> field.</p>
<p>And then I could reboot the host. And what should I say, all nine hosts went
through without a single issue. I had expected at least some sort of problem,
but I was seemingly pretty well prepared.</p>
<p>Before going to the next migration, let&rsquo;s have a look at some Ceph metrics for
this copy operation. First the throughput:</p>
<p><figure>
    <img loading="lazy" src="disks-throughput.png"
         alt="A screenshot of a Grafana time series plot. It shows roughly three hours worth of data, time on the X axis and throughput in MB on the Y axis. For the quiet periods, the throughput is around 1-2 MB/s. But there are eight &#39;hills&#39; in the plot, each about ten minutes long, which show a throughput between 30 and 50 MB/s."/> <figcaption>
            <p>Throughput graph for the receiving Ceph cluster for eight of the disk migrations.</p>
        </figcaption>
</figure>

Interesting things here are the approximately ten minutes of duration for each
of the disk migrations and the fact that the maximum throughput reached is around
50 MB/s. It&rsquo;s worth noting that, in contrast to a <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">previous copy operation</a>,
the target disks were SSDs this time around. So 50 MB/s sounds a bit too little,
doesn&rsquo;t it? Well, yes and no. &#x1f642;
This time I made another little mistake I had discussed previously, namely I
ran the copy operation on my C&amp;C host. And that means that the data needs to
go through my router, because the Ceph cluster and the C&amp;C host live on different
VLANs and subnets.</p>
<p>This might already somewhat be the explanation, when looking at the throughput
on my router next:
<figure>
    <img loading="lazy" src="network-router.png"
         alt="A screenshot of a Grafana time series plot. It shows the network utilization Mb/s for the network interface both the Ceph hosts and the C&amp;C host hang off of. Like the previous throughput plot, it shows 8 load phases. In each of them, about 700 - 800 Mb/s come in and go out again."/> <figcaption>
            <p>Network graphs for the NIC both the Ceph hosts and the C&amp;C host hang off of.</p>
        </figcaption>
</figure>

While yes, this doesn&rsquo;t look like the 1 GbE interface is saturated, there might
be some other kind of issue? Meaning this might actually be the max it can do
for this particular routing scenario? Then again, the CPU really should be capable
of routing 1 Gbps.</p>
<p>Next, let&rsquo;s have a quick look at the IO utilization on the two receiving hosts:
<figure>
    <img loading="lazy" src="io-hostdisks.png"
         alt="A screenshot of a Grafana time series plot. It shows the IO utilization in percent of two hosts, each represented by its own plot. Again, the eight copy operations are clearly visible as hills in the plots. One host goes up to 20%, while the other goes up to almost 60%. Neither of them come close to 100% IO utilization."/> <figcaption>
            <p>IO utilization of the two receiving hosts.</p>
        </figcaption>
</figure>
</p>
<p>So clearly, this time around the IO utilization is not the problem. Neither is
the CPU:
<figure>
    <img loading="lazy" src="hostdisks-cpu.png"
         alt="A screenshot of a Grafana time series plot. The two hosts shown have an idle CPU percentage of 95% and 90% respectively. The eight copy operations are again clearly visible as hills in the plots. For one of the hosts, the idle percentage doesn&#39;t move much, only from 95% to 90%. For the other host the impact is more visible, moving the idle percentage down to 65% to 70%. "/> <figcaption>
            <p>CPU idle percentage of the two receiving hosts.</p>
        </figcaption>
</figure>

Both hosts still have a lot of headroom here. But I did find this as well:
<figure>
    <img loading="lazy" src="candc-cpu.png"
         alt="A screenshot of a Grafana time series plot. This time only one host&#39;s CPU idle percentage is shown. It is 97% idle at rest, but deep troughs down to around 55% idle are visible for the eight copy operations. "/> <figcaption>
            <p>CPU idle percentage the C&amp;C host doing the rbd import/export.</p>
        </figcaption>
</figure>
</p>
<p>This <em>might</em> be the explanation for why I&rsquo;m reaching no more than 50 MB/s throughput
even though this is a copy from SSD to SSD. The C&amp;C host is a pretty weak one,
it has an AMD Embedded G-Series GX-412TC CPU - very low powered. But normally
that&rsquo;s more than enough, as it doesn&rsquo;t need to do compute heavy stuff. But this
might be too much for it? I&rsquo;m not familiar with the <code>rbd</code> import/export implementation,
but looking at the plot, I could theorize: This looks like two of the four cores
being fully pegged. Possibly one by the <code>rbd export</code> and one by the <code>rbd import</code>?
And the roughly 50 MB/s is simply all it can really do?</p>
<p>I think I need to dig deeper into this at some point, running some proper testing
of what I can really do when it comes to reads and writes in Ceph.</p>
<p>That&rsquo;s it for the disk copying. Let&rsquo;s move onto the second element of my netboot
setup, the boot partitions sitting on NFS.</p>
<h2 id="nfs-setup">NFS setup</h2>
<p>For the boot partitions, I needed to come up with something special, because those
need to be &ldquo;shared&rdquo; between the host they belong to and my cluster master, which
runs a TFTP server. That&rsquo;s because to mount the RBD root disks, I need a kernel
running, and that kernel needs to come from somewhere. Plus, the hosts should
all be able to independently run updates, or even different operating systems.
So I couldn&rsquo;t just share one boot partition between all of them.</p>
<p>For this, again, I&rsquo;m using my Ceph cluster and the integrated support for
<a href="https://github.com/nfs-ganesha/nfs-ganesha">NFS Ganesha</a>.</p>
<p>I configured the cluster with the <a href="https://rook.io/docs/rook/latest-release/CRDs/ceph-nfs-crd/">Rook NFS CRD</a>,
looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephNFS</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hl-nfs</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># Settings for the NFS server</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">active</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;250m&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;system-cluster-critical&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">NIV_INFO</span>
</span></span></code></pre></div><p>This creates a single NFS pod in the cluster. If I read the docs right,
NFS doesn&rsquo;t do HA very well, so there&rsquo;s not much use to have more than one.
One of the things Ceph does when an NFS cluster is set up is to create the
<code>.nfs</code> pool, as location for some metadata. This in turn leads to the Ceph
PG autoscaler to stop working with this warning:</p>
<pre tabindex="0"><code>debug 2025-03-10T13:54:45.313+0000 7fa1ad391640  0 [pg_autoscaler WARNING root] pool 6 contains an overlapping root -3... skipping scaling
</code></pre><p>I&rsquo;ve written about the last time I encountered this issue <a href="https://blog.mei-home.net/posts/ceph-rook-crush-rules/">here</a>,
so suffice to say that the root cause is that the new pool is created with a
generic CRUSH rule that contains several other CRUSH rules. It&rsquo;s fixed by
applying a more specific rule to the pool.</p>
<p>Because I wanted to use the NFS cluster outside k8s, I also introduced this
Service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nfs-rook-external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">nfs.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#e6db74">&#34;300.300.300.102&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">rook-ceph-nfs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ceph_nfs</span>: <span style="color:#ae81ff">hl-nfs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">instance</span>: <span style="color:#ae81ff">a</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2049</span>
</span></span></code></pre></div><p>This allowed me not to have a fixed host I could enter into the <code>/etc/fstab</code> of
my hosts, and NFS is perfectly happy with having to use DNS to get the IP of
the NFS server.</p>
<p>Next is the NFS share itself. These shares can be backed by either a CephFS
subvolume or an S3 bucket. But the S3 bucket backend has severe restrictions.
I tried it once, and found that e.g. Git repos on such an NFS share don&rsquo;t work,
with Git commands returning <code>Not Implemented</code> errors. So I created a CephFS
subvolume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume create my-cephfs my-share
</span></span></code></pre></div><p>Then comes the creation of the NFS share:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph nfs export create cephfs --cluster-id hl-nfs --pseudo-path /my-share-path --fsname my-cephfs --path /volumes/_nogroup/my-share/UUID-HERE --client_addr 300.300.300.0/24 --client_addr 300.300.315.0/24
</span></span></code></pre></div><p>The <code>--path</code> parameter can be fetched via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume getpath my-cephfs my-share
</span></span></code></pre></div><p>One thing I&rsquo;m a bit sad about is that I had to use the command line to create
those two objects, the subvolume and the NFS share, instead of being able to use
CRDs in the k8s cluster.</p>
<p>The resulting NFS share definition, as fetched with <code>ceph nfs export ls hl-nfs --detailed</code>,
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>[
</span></span><span style="display:flex;"><span>  {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;access_type&#34;</span>: <span style="color:#e6db74">&#34;none&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;clients&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;access_type&#34;</span>: <span style="color:#e6db74">&#34;rw&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addresses&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;300.300.300.0/24&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;300.300.315.0/24&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;squash&#34;</span>: <span style="color:#e6db74">&#34;None&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;cluster_id&#34;</span>: <span style="color:#e6db74">&#34;hl-nfs&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;export_id&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;fsal&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;fs_name&#34;</span>: <span style="color:#e6db74">&#34;my-cephfs&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;CEPH&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;user_id&#34;</span>: <span style="color:#e6db74">&#34;nfs.hl-nfs.1&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/volumes/_nogroup/my-share/UUID-HERE&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;protocols&#34;</span>: [
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">4</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;pseudo&#34;</span>: <span style="color:#e6db74">&#34;/my-share-path&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;security_label&#34;</span>: <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;squash&#34;</span>: <span style="color:#e6db74">&#34;None&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;transports&#34;</span>: [
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;TCP&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>The end effect of all of this is an NFS share which can be mounted like this:</p>
<pre tabindex="0"><code>nfs.example.com:/my-share-path /mnt/example nfs defaults,timeo=900,_netdev 0 0
</code></pre><p>One small note on the migration: Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/posix/mount_module.html">mount module</a>
does not seem to automatically remount when a mount is changed. Which is likely
a good idea, but it meant that I had to execute these commands on all of my
netbooters:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;host1:host2:host3&#34;</span> -a <span style="color:#e6db74">&#34;umount /boot/firmware&#34;</span>
</span></span><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;host1:host2:host3&#34;</span> -a <span style="color:#e6db74">&#34;mount /boot/firmware&#34;</span>
</span></span></code></pre></div><p>After that, they all had the right NFS share mounted and I was one step closer
to shutting down the baremetal cluster.</p>
<h2 id="copying-my-warehouse-volume-over">Copying my warehouse volume over</h2>
<p>As I&rsquo;ve mentioned above, I&rsquo;ve got a &ldquo;random bunch of stuff&rdquo; CephFS subvolume
that is mounted on my desktop. It really contains exactly that: A random
assortment of data. Copies of old University slides and projects, backups for
my OpenWRT WiFi router&rsquo;s config and some old database dumps from services I&rsquo;m
no longer running. Overall, it&rsquo;s about 129 GB, so not too much data, in contrast
to my Linux ISO collection for example.</p>
<p>Here&rsquo;s the rsync command and its output:</p>
<pre tabindex="0"><code>rsync -av --info=progress2 --info=name0 /mnt/tempt1/* /mnt/temp2/
sending incremental file list
129,648,120,620  99%   41.86MB/s    0:49:14 (xfr#1415, to-chk=0/1537)

sent 129,679,927,316 bytes  received 27,564 bytes  43,892,352.30 bytes/sec
total size is 129,654,775,448  speedup is 1.00
</code></pre><p>Absolutely nothing interesting happened here, it took only about 49 minutes.
If you&rsquo;re interested in some metrics about a 1.7 TB copy operation from one
CephFS subvolume on one cluster to another subvolume on another cluster, have
a look at <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">this recent post</a>.</p>
<h2 id="final-takedown-of-the-baremetal-cluster">Final takedown of the baremetal cluster</h2>
<p>So that&rsquo;s it. With the warehouse volume transferred, there was, supposedly, nothing
important on that cluster anymore.</p>
<p>But I wasn&rsquo;t about to trust that. Instead, I ran <code>ceph df</code> to confirm, and found
that there was exactly 348 MB of data left. Deciding that that can&rsquo;t be anything
important, I ran the cluster purge by executing this command on all the remaining
cluster hosts, meaning the two OSD nodes and the three cluster controllers hosting
the MONs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cephadm rm-cluster --force --zap-osds --fsid a84c7196-7ebf-11eb-b290-18c04d00217f
</span></span></code></pre></div><p>And just like that, the baremetal Ceph cluster was gone. It lived for almost
exactly four years, having been created on 2021-03-06, at 21:05.</p>
<h2 id="adding-the-two-baremetal-hosts-to-the-rook-cluster">Adding the two baremetal hosts to the Rook cluster</h2>
<p>After the old cluster had been removed, I needed to add the two OSD hosts to
the Rook cluster. I did so by first adding them to the k8s cluster and then
updating the Rook cluster&rsquo;s Helm values:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;host1&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x5002538e90b5e22f&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x50014ee2ba48465d&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;host2&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x5002538e90b68866&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x50014ee20f9d1545&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span></code></pre></div><p>Both hosts have one 1 TB SATA SSD and one 4 TB HDD. To be absolutely save, I&rsquo;m
using the disk&rsquo;s <a href="https://en.wikipedia.org/wiki/World_Wide_Name">WWN</a> to identify
them.</p>
<p>After that, the rebalancing started:</p>
<p><figure>
    <img loading="lazy" src="pgs-host-addition.png"
         alt="A screenshot of a Grafana time series plot. It shows the state of the 265 Placement Groups in the cluster. At approximately 23:18, they go 130 PGs being remapped. Initially, the count of remapped PGs goes down relatively quickly, reaching only 67 remapped PGs around 01:02. But after that, the number of remapped PGs goes down only slowly, reaching zero around 19:44."/> <figcaption>
            <p>PG state during the rebalancing after adding the two additional hosts with four additional OSDs.</p>
        </figcaption>
</figure>

The initial relatively rapid reduction in remapped PGs was probably the SSD PGs,
and the rest were the HDD OSDs.</p>
<p>I would love to show you the overall throughput of the backfill operations, but
it looks like there are no metrics for those. The <code>ceph_osd_op_r_out_bytes</code> and
<code>ceph_osd_op_w_in_bytes</code> metrics I&rsquo;m using for the general cluster throughput
seem to only be actual client operations. That throughput definitely did not show
the backfill load on the OSDs.</p>
<p>So let&rsquo;s instead have a look at the throughput of the six disks in the cluster:
<figure>
    <img loading="lazy" src="host-additions-writes.png"
         alt="A screenshot of a Grafana time series plot. At the beginning, it hovers somewhere around 10 MB/s, until it goes up to 30 MB/s around 23:19. The next jump comes at 23:49, to 70 MB/s. It goes down a bit again at 01:00, to around 40 MB/s. After that, the plot hovers anywhere between 30 MB/s and 50 MB/s until about 09:20, where it goes up to 70 MB/s for another 45 minutes or so, before coming down to 30 - 40 MB/s at 10 and stays there until 12:00. Then it goes up yet again to 50 MB/s. After 14:00, the plot slowly goes down towards the initial 10 MB/s range, which it reaches around 19:40."/> <figcaption>
            <p>Accumulated bytes written per second on all disks, HDD and SSD, in the Rook cluster during the rebalancing.</p>
        </figcaption>
</figure>

I just created that graph by adding up the written bytes per second from the
node exporter data I&rsquo;m gathering, specifically for the eight disks which are
part of the Rook cluster at this point.</p>
<p>The graph has a couple of points worth discussing. The first one to note is that
there was not much load on the cluster overall from clients, it hovered around
1 - 2 MB/s, typical for my Homelab. And still, during the rebalancing only used,
at maximum, 70 MB/s worth of writes. And remember, these are not just HDDs, but
also SSDs. And I&rsquo;m pretty sure that this is entirely due to Ceph itself.
At the beginning of the plot, around 23:49, you can see a jump from around
30 MB/s to 70 MB/s. That happened after I entered the following two commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config set osd osd_mclock_override_recovery_settings true
</span></span><span style="display:flex;"><span>ceph config set osd osd_max_backfills <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>These instruct Ceph to use more than one backfill per OSD. Then you can also see
another jump at 09:20 the next morning, where the throughput suddenly goes from
around 40 MB/s to 70 MB/s again, at least for a short while. That was after
I entered this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config set osd osd_mclock_profile high_recovery_ops
</span></span></code></pre></div><p>I will refrain from any expletives at this point, because I don&rsquo;t understand this
well enough to judge whether I&rsquo;m the problem here, or whether Ceph really works
this way.</p>
<p>So, a little while ago, Ceph introduced a new IO scheduler, <a href="https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/">mclock</a>.
The config settings I showed above impact how that scheduler works.</p>
<p>Why do I have to make these settings? Why, with barely 1 MB/s throughput, do I
actually have to tell Ceph to run more than one backfill per OSD? And why doesn&rsquo;t
Ceph actually use the OSDs full throughput for even that one default backfill?
Because that graph above, that&rsquo;s not a single disk. That&rsquo;s the sum of the throughput
on all of them. This kind of write throughput would be pathetic for a single HDD.
I really don&rsquo;t understand why my mixed HDD/SSD cluster shows it.
What does the scheduler actually do here? I mean, don&rsquo;t get me wrong - there is
likely a good reason, but I don&rsquo;t understand it. Why not use an OSD&rsquo;s full write
capacity for backfills <em>when there is nearly no other traffic happening</em>?</p>
<p>I was really stumped when I saw these numbers. And also, why even have a scheduler
when I still need to manually set the maximum number of backfills allowed?</p>
<p>Anyway, there&rsquo;s now a &ldquo;Learn Ceph&rdquo; task in my backlog. When the migration is done,
I will not put my old home server back into storage. Instead, I will buy a couple
more disks and use it as a Ceph playground. And if I have to read the entire
Ceph source code from <code>int main()</code> to the end, I will. Because I&rsquo;m now intensely
curious at why the backfill was so darned slow.</p>
<p>And now, let&rsquo;s come to the &ldquo;Michael utterly embarrasses himself&rdquo; part of this post.</p>
<h2 id="arrogance">Arrogance</h2>
<p>After the addition of the new hosts was done, I could shut down the Ceph VM running
on my extension host, as it was no longer required.</p>
<p>And I learned that I have an unhealthy amount of arrogance. I went into this
thinking &ldquo;Well, I sure know how to remove a host from a Ceph cluster, I don&rsquo;t
need any docs!&rdquo;.</p>
<p><em>Narrator:</em> He did need docs.</p>
<p>So let&rsquo;s start with what I should have done. I should have followed
<a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Advanced/ceph-osd-mgmt/#host-based-cluster">these Rook docs</a>.
They describe, in very nice detail, what to do to remove OSDs from a Rook Ceph
cluster.</p>
<p>But no. That was of course not what I did. What I did was just winging it.
So I started by removing the host from the <code>values.yaml</code> file. That had only one
effect, namely showing messages like this in the logs of the Rook operator:</p>
<pre tabindex="0"><code>2025-03-15 19:20:15.090341 W | op-osd: not updating OSD 0 on node &#34;oldhost&#34;. node no longer exists in the storage spec. if the user wishes to remove OSDs from the node, they must do so manually. Rook will not remove OSDs from nodes that are removed from the storage spec in order to prevent accidental data loss
</code></pre><p>After that slightly embarrassing failure, I deigned to actually skim the doc I
linked to above, and found this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ kubectl rook-ceph rook purge-osd 0,1 --force
</span></span><span style="display:flex;"><span>Info: Running purge osd command
</span></span><span style="display:flex;"><span>2025/03/15 19:48:59 maxprocs: Leaving GOMAXPROCS<span style="color:#f92672">=</span>8: CPU quota undefined
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.731856 W | cephcmd: loaded admin secret from env var ROOK_CEPH_SECRET instead of from file
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.731894 I | rookcmd: starting Rook v1.16.5 with arguments <span style="color:#e6db74">&#39;rook ceph osd remove --osd-ids=0,1 --force-osd-removal=true&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.731897 I | rookcmd: flag values: --force-osd-removal<span style="color:#f92672">=</span>true, --help<span style="color:#f92672">=</span>false, --log-level<span style="color:#f92672">=</span>INFO, --osd-ids<span style="color:#f92672">=</span>0,1, --preserve-pvc<span style="color:#f92672">=</span>false
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.737462 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.737529 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921560 I | cephosd: validating status of osd.0
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921571 I | cephosd: osd.0 is healthy. It cannot be removed unless it is <span style="color:#e6db74">&#39;down&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921573 I | cephosd: validating status of osd.1
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921575 I | cephosd: osd.1 is healthy. It cannot be removed unless it is <span style="color:#e6db74">&#39;down&#39;</span>
</span></span></code></pre></div><p>So that wasn&rsquo;t exactly a success either. So what did I do? Did I now properly
read the entire page? No, of course not.
I instead decided that the right way to do this was to scale down the two OSDs
I wanted to remove:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n rook-cluster scale deployment rook-ceph-osd-0 --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>kubectl -n rook-cluster scale deployment rook-ceph-osd-1 --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>And then I repeated the previous command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ kubectl rook-ceph rook purge-osd 0,1 --force
</span></span><span style="display:flex;"><span>Info: Running purge osd command
</span></span><span style="display:flex;"><span>2025/03/15 19:50:33 maxprocs: Leaving GOMAXPROCS<span style="color:#f92672">=</span>8: CPU quota undefined
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.244662 W | cephcmd: loaded admin secret from env var ROOK_CEPH_SECRET instead of from file
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.244700 I | rookcmd: starting Rook v1.16.5 with arguments <span style="color:#e6db74">&#39;rook ceph osd remove --osd-ids=0,1 --force-osd-removal=true&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.244704 I | rookcmd: flag values: --force-osd-removal<span style="color:#f92672">=</span>true, --help<span style="color:#f92672">=</span>false, --log-level<span style="color:#f92672">=</span>INFO, --osd-ids<span style="color:#f92672">=</span>0,1, --preserve-pvc<span style="color:#f92672">=</span>false
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.250479 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.250539 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.432040 I | cephosd: validating status of osd.0
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.432049 I | cephosd: osd.0 is marked <span style="color:#e6db74">&#39;DOWN&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.622957 I | cephosd: marking osd.0 out
</span></span><span style="display:flex;"><span>2025-03-15 19:50:34.825971 I | cephosd: osd.0 is NOT ok to destroy but force removal is enabled so proceeding with removal
</span></span><span style="display:flex;"><span>2025-03-15 19:50:34.828262 E | cephosd: failed to fetch the deployment <span style="color:#e6db74">&#34;rook-ceph-osd-0&#34;</span>. deployments.apps <span style="color:#e6db74">&#34;rook-ceph-osd-0&#34;</span> not found
</span></span><span style="display:flex;"><span>2025-03-15 19:50:34.828271 I | cephosd: purging osd.0
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.055813 I | cephosd: attempting to remove host <span style="color:#e6db74">&#34;oldhost&#34;</span> from crush map <span style="color:#66d9ef">if</span> not in use
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.237143 I | cephosd: failed to remove CRUSH host <span style="color:#e6db74">&#34;oldhost&#34;</span>. exit status <span style="color:#ae81ff">39</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427664 I | cephosd: no ceph crash to silence
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427677 I | cephosd: completed removal of OSD <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427680 I | cephosd: validating status of osd.1
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427683 I | cephosd: osd.1 is marked <span style="color:#e6db74">&#39;DOWN&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.608670 I | cephosd: marking osd.1 out
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.329162 I | cephosd: osd.1 is NOT ok to destroy but force removal is enabled so proceeding with removal
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.331913 E | cephosd: failed to fetch the deployment <span style="color:#e6db74">&#34;rook-ceph-osd-1&#34;</span>. deployments.apps <span style="color:#e6db74">&#34;rook-ceph-osd-1&#34;</span> not found
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.331920 I | cephosd: purging osd.1
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.655373 I | cephosd: attempting to remove host <span style="color:#e6db74">&#34;oldhost&#34;</span> from crush map <span style="color:#66d9ef">if</span> not in use
</span></span><span style="display:flex;"><span>2025-03-15 19:50:37.663211 I | cephosd: removed CRUSH host <span style="color:#e6db74">&#34;oldhost&#34;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:37.930419 I | cephosd: no ceph crash to silence
</span></span><span style="display:flex;"><span>2025-03-15 19:50:37.930431 I | cephosd: completed removal of OSD <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>Note especially those lines:</p>
<pre tabindex="0"><code>2025-03-15 19:50:34.825971 I | cephosd: osd.0 is NOT ok to destroy but force removal is enabled so proceeding with removal
2025-03-15 19:50:36.329162 I | cephosd: osd.1 is NOT ok to destroy but force removal is enabled so proceeding with removal
</code></pre><p>That&rsquo;s where my arrogance really bit me. I had just copy+pasted the <code>rook purge-osd</code>
command from the docs, including the <code>--force</code> at the end. Not a good idea.
Instead of taking the OSD out and then letting the cluster rebalance, the OSD was
just removed, meaning I had a relatively long phase of reduced data redundancy.</p>
<p>Not my most stellar Homelab moment.</p>
<p>It took another 17 hours of rebalancing to recover the cluster. But I still wasn&rsquo;t
done yet, because now the Rook operator logs were showing these messages:</p>
<pre tabindex="0"><code>2025-03-20 20:31:32.598410 I | clusterdisruption-controller: osd &#34;rook-ceph-osd-0&#34; is down and a possible node drain is detected
2025-03-20 20:31:32.598473 I | clusterdisruption-controller: osd &#34;rook-ceph-osd-1&#34; is down and a possible node drain is detected
2025-03-20 20:31:32.814825 I | clusterdisruption-controller: osd is down in failure domain &#34;oldhost&#34;. pg health: &#34;all PGs in cluster are clean&#34;
</code></pre><p>And again, had I read the docs properly, this would not have happened. I fixed
the issue with the following commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>kubectl delete deployments.apps -n rook-cluster rook-ceph-osd-0
</span></span><span style="display:flex;"><span>kubectl delete deployments.apps -n rook-cluster rook-ceph-osd-1
</span></span><span style="display:flex;"><span>kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>With that I finally had removed the old host cleanly from the Rook cluster. It
could all have been a lot smoother if I&rsquo;d just read the docs properly the first
time. Again, no data loss of course, but it could have gone better.</p>
<p>The last step in the Ceph baremetal to Rook saga was to remove the old host from
the Kubernetes cluster entirely:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl drain oldhost --ignore-daemonsets --delete-local-data
</span></span><span style="display:flex;"><span>kubectl delete node oldhost
</span></span></code></pre></div><p>And then resetting the Ceph scheduler options:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rook-ceph ceph config rm osd osd_mclock_profile
</span></span><span style="display:flex;"><span>kubectl rook-ceph ceph config rm osd osd_mclock_override_recovery_settings
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>And that was it. The entire migration from baremetal Ceph took me quite a while,
but a lot of it was just waiting for copying and rebalancing operations to finish.
The effort I had to put in was relatively low. Even considering that somewhere
towards the end I temporarily forgot the value of reading documentation from
beginning to end.</p>
<p>The fact that I think I did not make use of my full storage performance especially
during the addition/removal of hosts has reinforced my wish to do a real deep
dive into Ceph, how it&rsquo;s implemented and how it works. Luckily, it&rsquo;s written in
C++, which I&rsquo;m working with for work as well. But I&rsquo;m hoping I can also find
some more high-level explanations of the algorithms used. I even plan to read
Weil&rsquo;s original thesis and papers on RADOS and the CRUSH algorithm.</p>
<p>While writing these lines, I&rsquo;m also working on the last step of the migration,
migrating Vault into the cluster and then migrating the cluster control plane
nodes from VMs to my three Pi 4.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 22: The end of Nomad</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-22-end-of-nomad/</link>
      <pubDate>Sun, 23 Mar 2025 23:10:32 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-22-end-of-nomad/</guid>
      <description>The end of a Workload Scheduler era</description>
      <content:encoded><![CDATA[<p>Wherein I shut down my Nomad cluster for good.</p>
<p>This is part 23 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>It is finally done, on the 13th of March I shut down my <a href="https://www.nomadproject.io/">Nomad</a> cluster. I had
originally set it up sometime around 2021. The original trigger was that I had
started to separate the Docker containers running public-facing services and the
purely internal ones. Around that setup, I had constructed a bunch of bash
scripts and a couple of shared mounts. It wasn&rsquo;t pretty, plus the Homelab had
recently turned from a utility into a genuine hobby. In short, increased complexity
was actually welcomed. &#x1f601;</p>
<p>So when I started reading about workload schedulers, I naturally first looked at Kubernetes.
I bounced off of that when I came to the &ldquo;Now chose a Container Networking Plugin&rdquo;
stage of the install instructions. And I didn&rsquo;t just not know which CNI plugin
to choose - no, I didn&rsquo;t even know <em>how to make said choice</em>.</p>
<p>And that&rsquo;s how I came across Nomad. Together with <a href="https://www.consul.io/">Consul</a>
and <a href="https://www.vaultproject.io/">Vault</a> I had a really enjoyable Homelab. Nomad,
as well as Consul and Vault, are absolutely excellent tools. Nomad has some really
great flexibility when it comes to the drivers it can use for its jobs. They
range from Docker to pure exec jobs run in a simple chroot. Networking can be
done as simple or complex as you like, and by default you don&rsquo;t need to worry
about any kind of separate network. If you like, you can run it all on the
network between your nodes without any complicated CNIs.</p>
<p>And that&rsquo;s what initially drew me to Nomad. Taken on its own, it doesn&rsquo;t do much more
than running workloads, and that&rsquo;s it. For secrets management or service discovery
you can then add Vault and Consul, or you can just leave those things out.</p>
<p>Since I started with Nomad, some service discovery and secrets management
capabilities have been added to Nomad itself, but I never tried them because I&rsquo;ve had Vault and
Consul already set up to my liking.</p>
<p>So let&rsquo;s have a short look at an example job:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;homenet&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">constraint</span> {
</span></span><span style="display:flex;"><span>    attribute <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${node.class}&#34;</span>
</span></span><span style="display:flex;"><span>    value     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;health&#34;</span> {
</span></span><span style="display:flex;"><span>        host_network <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>        to           <span style="color:#f92672">=</span> <span style="color:#ae81ff">9090</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;prometheus&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">9090</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {
</span></span><span style="display:flex;"><span>          <span style="color:#66d9ef">proxy</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">upstreams</span> {
</span></span><span style="display:flex;"><span>              destination_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;snmp-exporter&#34;</span>
</span></span><span style="display:flex;"><span>              local_bind_port <span style="color:#f92672">=</span> <span style="color:#ae81ff">9116</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;http&#34;</span>
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        path     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/-/ready&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2s&#34;</span>
</span></span><span style="display:flex;"><span>        port     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;health&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">volume</span> <span style="color:#e6db74">&#34;vol-prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>      type            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;csi&#34;</span>
</span></span><span style="display:flex;"><span>      source          <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-prometheus&#34;</span>
</span></span><span style="display:flex;"><span>      attachment_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;file-system&#34;</span>
</span></span><span style="display:flex;"><span>      access_mode     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;single-node-writer&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>      user <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;962:962&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;prom/prometheus:v2.50.0&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">mount</span> {
</span></span><span style="display:flex;"><span>          type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bind&#34;</span>
</span></span><span style="display:flex;"><span>          source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secrets/prometheus.yml&#34;</span>
</span></span><span style="display:flex;"><span>          target <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/etc/prometheus/prometheus.yml&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        args <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>          &#34;--config.file<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">etc</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span>.<span style="color:#66d9ef">yml</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--storage.tsdb.path<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--web.console.libraries<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">share</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">console_libraries</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--web.console.templates<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">share</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">consoles</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--web.page-title<span style="color:#f92672">=</span><span style="color:#66d9ef">Homenet</span> <span style="color:#66d9ef">Prometheus</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--storage.tsdb.retention.time<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span><span style="color:#960050;background-color:#1e0010">y&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--log.format<span style="color:#f92672">=</span><span style="color:#66d9ef">json</span><span style="color:#960050;background-color:#1e0010">&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">vault</span> {
</span></span><span style="display:flex;"><span>        policies <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;prometheus&#34;</span>]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">volume_mount</span> {
</span></span><span style="display:flex;"><span>        volume      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-prometheus&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/prometheus&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;prometheus/templates/prometheus.yml.templ&#34;</span>)
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;./secrets/prometheus.yml&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">resources</span> {
</span></span><span style="display:flex;"><span>        cpu <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>        memory <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This shows a pretty typical Nomad job setup in my Homelab. One of the interesting
things compared to Kubernetes is that most configuration is located in a single
file instead of a bunch of Yaml files. Very roughly speaking, the <code>group</code> is
an equivalent to Kubernetes Pods, in that it provides a common networking and
filesystem/volume namespace, and all <code>tasks</code> in a group get scheduled on
the same node.</p>
<p>The great integrations of Vault and Consul are both visible
here. First, there&rsquo;s the service, which hooks Prometheus into Consul&rsquo;s Connect
Mesh for service discovery. How that looks from a consumer service can be seen
in the <code>connect</code> stanza, which sets an upstream for an SNMP exporter running in
a separate job. Then there&rsquo;s the <code>vault</code> stanza, which configures a policy that
any Vault secrets access can use. These policies can then be tuned to allow only
access to the secrets the specific job actually needs.</p>
<p>Also something I learned to appreciate was the <code>template</code> stanza. It internally
uses <a href="https://github.com/hashicorp/consul-template">consul-template</a> to template
configuration files, complete with Vault integration. This made running apps
which expect their secrets in their configuration files a lot more convenient.</p>
<p>But I don&rsquo;t want to go into too much detail here. I&rsquo;m planning to write a
series of Homelab history posts where I will go into a lot more detail on the
setup and dredge up all manner of old configurations and notes.</p>
<p>In the end, the trigger for my decision to migrate my well-functioning Homelab
to k8s was HashiCorp&rsquo;s decision to relicense under a more restrictive license.
But I could have survived that one as well. And then they went and changed the
ToS for the Terraform provider registry to exclude the FOSS fork of Terraform.
That looked very much like pure spite to me, and I no longer trusted HashiCorp
enough to build my Homelab on their tools. More details can be found in
<a href="https://blog.mei-home.net/posts/hashipocalypse/">this post</a>.</p>
<p>So even though I liked (and still like) the tools, I&rsquo;ve now moved away from them
for the most part. Here is a screenshot of the cluster when it was in full swing:</p>
<figure>
    <img loading="lazy" src="nomad-full.png"
         alt="A screenshot of Nomads topology Web UI. It shows that the cluster had 9 clients, running 56 allocations. It had 68.66 GiB of RAM, of which 41% was reserved by jobs. The cluster also had 58.24 GHz of compute, of which 59% was used. To the right, a list shows the nine hosts, running anywhere between 2 and 10 allocations. Most of the hosts are Raspberry Pi CM4, with 8 GiB of RAM and 6000 MHz of compute."/> <figcaption>
            <p>The Nomad cluster when it was in full use.</p>
        </figcaption>
</figure>

<p>And then, on March 13th, it looked like this:</p>
<figure>
    <img loading="lazy" src="empty-nomad.png"
         alt="A screenshot of Nomads topology Web UI. It shows the same cluster, but now with only five instead of nine clients. The list of allocations assigned to hosts now only shows &#39;Empty client&#39; for the remaining clients."/> <figcaption>
            <p>The Nomad cluster right before shutdown.</p>
        </figcaption>
</figure>

<p>At that point, a couple of hosts were already migrated over to the k8s cluster.</p>
<p>It all ended with this:</p>
<pre tabindex="0"><code>Mar 13 20:42:52 nomad[657]: ==&gt; Caught signal: interrupt
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.837+0100 [INFO]  agent: requesting shutdown
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.837+0100 [INFO]  nomad: shutting down server
Mar 13 20:42:52 systemd[1]: Stopping Nomad...
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.837+0100 [WARN]  nomad: serf: Shutdown without a Leave
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.861+0100 [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-znuisv3m75ywtkofhwsukx47zklaefe3 error=&#34;Unexpected response code: 403 (Permission denied: token with AccessorID &#39;eaab766d-7627-3cda-21fe-a3d5fb63dd7a&#39; lacks permission &#39;service:write&#39; on \&#34;nomad\&#34;)&#34;
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.863+0100 [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-fi2peeufsfjc6po3r6v3vrhwg2pcyymo error=&#34;Unexpected response code: 403 (Permission denied: token with AccessorID &#39;eaab766d-7627-3cda-21fe-a3d5fb63dd7a&#39; lacks permission &#39;service:write&#39; on \&#34;nomad\&#34;)&#34;
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.866+0100 [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-ppg65djoq2gktz3gnzojkqza4d4idkv4 error=&#34;Unexpected response code: 403 (Permission denied: token with AccessorID &#39;eaab766d-7627-3cda-21fe-a3d5fb63dd7a&#39; lacks permission &#39;service:write&#39; on \&#34;nomad\&#34;)&#34;
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.869+0100 [INFO]  agent: shutdown complete
Mar 13 20:42:52 systemd[1]: nomad.service: Main process exited, code=exited, status=1/FAILURE
Mar 13 20:42:52 systemd[1]: nomad.service: Failed with result &#39;exit-code&#39;.
Mar 13 20:42:52 systemd[1]: Stopped Nomad.
Mar 13 20:42:52 systemd[1]: nomad.service: Consumed 16h 25min 36.860s CPU time.
</code></pre><p>And with that it&rsquo;s gone. &#x1f641;</p>
<p>You will note the errors complaining about Consul. I completely forgot about
the service registrations before removing the Consul tokens allowing Nomad to
handle its own services. This was fixable by removing the Nomad service manually:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>consul services deregister -id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;_nomad-server-zytfvzuzuboej3ehgwdihrgykyfj46pp
</span></span></span></code></pre></div><p>This command needs to be run against the Consul agent where the service was
registered, it can&rsquo;t be executed against just any Consul agent.</p>
<p>And with that, Nomad is gone. There&rsquo;s still a lot to do. I&rsquo;m already done shutting
down my Ceph cluster as well, that will likely be the next post.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 20: Migrating Mastodon</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-20-mastodon/</link>
      <pubDate>Thu, 06 Mar 2025 22:45:05 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-20-mastodon/</guid>
      <description>Migrating my Mastodon instance to k8s with the official Helm chart</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Mastodon instance to the k8s cluster.</p>
<p>This is part 21 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p><a href="https://github.com/mastodon/mastodon">Mastodon</a> is currently serving as my
presence in the <a href="https://fediverse.party/en/fediverse/">Fediverse</a>. You can
find me <a href="https://social.mei-home.net/@mmeier">here</a>, although I&rsquo;m pretty sure
that most of my readers are coming from there already. &#x1f604;</p>
<p>If you&rsquo;re at all interested in joining a genuine community around Homelabbing,
I can only recommend to join the fun by following the HomeLab or SelfHosted
hashtags and wildly following everyone appearing on there. It&rsquo;s a great community
of rather friendly people enjoying everything from a lonely Pi to several
42U 19&quot; racks full of equipment. If you&rsquo;re interested in learning more about
my own experience with the Fediverse and hosting my own single-user instance,
have a look at <a href="https://blog.mei-home.net/tags/fediverse/">these older posts</a>.</p>
<h2 id="preparations">Preparations</h2>
<p>There were two things which needed to be migrated from my Nomad cluster to
the k8s deployment: The S3 bucket holding all of the media, and the database.</p>
<p>The database is, by a very large margin, the biggest in my Homelab, clocking in
at 2.5 GB. I think it could be a lot smaller, but I completely disabled cleanups
for remote posts a while ago. That was due to the fact that the automated cleanup
also deletes posts I had bookmarked for reading later, and I&rsquo;m not very good
at actually keeping up with those - so after a while I went through them and
became pretty convinced that I was missing some I had bookmarked a while ago.
I will likely do some cleanups manually when it really becomes too big to be
manageable.</p>
<p>I will not describe the entire migration process here, because it is similar to
previous migrations. If you&rsquo;re interested, have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-16-gitea/#database-setup-and-migration">my post about the Gitea
migration</a>,
where I describe the database migration with <a href="https://cloudnative-pg.io/">CNPG</a>
in detail.
In short, it was very painless. I provided the database with a 15 GB volume,
which seems a bit overboard in hindsight. At some point in the future I will
have to figure out how to do database sizing and go through all of my CNPG
clusters, because I&rsquo;m pretty sure most of them are overprovisioned.</p>
<p>Next came the S3 bucket. The first mistake I made here was to forget to
exclude the <code>cache/</code> prefix. So I copied all of the currently cached media over
instead of just letting Mastodon re-fetch whatever it actually needed. That
prefix currently holds 56 GB out of 61 GB total. Which reminds me that I need to
check whether the automatic cleanup is working on the k8s setup or not.
But yeah, if I had remembered to remove that prefix, I could have saved a lot
of time for the copy operation. As it stands, these are the stats for the copy,
which I did with <a href="https://rclone.org/">rclone</a>:</p>
<pre tabindex="0"><code>Transferred:       61.786 GiB / 61.786 GiB, 100%, 6.279 MiB/s, ETA 0s
Transferred:       384921 / 384921, 100%
Elapsed time:    3h7m29.8s
</code></pre><p>Those 6.279 MiB/s are utterly abysmal. Those of you who read my previous post
on <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">my media library copy operation</a>
probably already know: It was the 4 TB Seagate HDD, which was fully slammed again.
There&rsquo;s definitely something bad about this disk.
But anyway, three hours later I was done and had everything copied over.</p>
<p>Before I close the preparations, let&rsquo;s have some fun and look at the CPU usage
of the FluentD container in my k8s cluster:
<figure>
    <img loading="lazy" src="fluentd-usage.png"
         alt="A screenshot of a Grafana time series plot. It&#39;s showing the CPU usage, given on the Y axis in &#39;cores&#39;, of my FluentD instance over the three hours from 09:55 to 13:10 where the S3 bucket was copied. It hovers at 0.1 in the beginning and end, but goes up to 0.4, with spikes to 0.5 between 09:55 and 13:10, before then going down again."/> <figcaption>
            <p>CPU usage of my FluentD log aggregation container.</p>
        </figcaption>
</figure>

Not even the RGW or OSD containers were using more CPU during the copy. The reason
seems to be that I&rsquo;ve still got my ingress Traefik instance set to debug log level:
<figure>
    <img loading="lazy" src="traefik-log-rate.png"
         alt="A screenshot of a Grafana time series plot. It shows the log rate of my Traefik ingress container during the S3 bucket copy. The rate goes from about 1 log entry per second to over 70 per second, where it stays throughout the copy operation, before finally going back to about 1 per second."/> <figcaption>
            <p>Log rate of my Traefik ingress container.</p>
        </figcaption>
</figure>

I&rsquo;m now starting to wonder whether this might be part of the reason for why the
copy was so slow - the disk might have also been loaded by Loki pushing all these
log lines to its own S3 bucket. &#x1f926;
Sadly, I don&rsquo;t have precise enough metrics for that, as I can only see the
throughput by pool in my Ceph stats, and both the Mastodon bucket and the Loki
bucket are in the same pool.
Something to try to dig into a little bit later.</p>
<h2 id="the-mastodon-setup">The Mastodon setup</h2>
<p>I deployed my Mastodon instance with the <a href="https://github.com/mastodon/chart">official Mastodon chart</a>.
One important note: This one is, at some point in the future, going to be replaced
with <a href="https://github.com/mastodon/helm-charts">a new one</a>, see the <a href="https://github.com/mastodon/chart/issues/129">relevant issue</a>.</p>
<p>I won&rsquo;t go through every single option I set, but there were a couple of things
which tripped me up.</p>
<p>The first and perhaps most important one: The default <code>appVersion</code> of the current
chart is <code>4.2.17</code>. But I was already on <code>4.3.3</code>. The main issue I encountered
related to this discrepancy in versions is the split of the Mastodon container
into two containers, one for the streaming component, and one for everything
else. To fix this, I had to explicitly set the image in the <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">streaming</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">repository</span>: <span style="color:#e6db74">&#34;ghcr.io/mastodon/mastodon-streaming&#34;</span>
</span></span></code></pre></div><p>With that, the chart seems to work for 4.3.3 and 4.3.4 without issues.</p>
<p>Then there&rsquo;s the Redis configuration. I&rsquo;ve got a central Redis instance in my
cluster, instead of running one for every app. And the chart supports this, but
unless I&rsquo;ve overlooked something here, the chart requires the Redis instance
to have a password, which mine does not. The way this shows is that the
<code>mastodon-redis</code> secret is unconditionally added to each container&rsquo;s env,
for example in the mastodon-web deployment from <a href="https://github.com/mastodon/chart/blob/main/templates/deployment-web.yaml">here</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;REDIS_PASSWORD&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: {{ <span style="color:#ae81ff">template &#34;mastodon.redis.secretName&#34; . }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">redis-password</span>
</span></span></code></pre></div><p>There&rsquo;s no condition around that, checking whether Redis is configured with a
password. I also tried to just set an empty password in <code>redis.auth.password</code>,
but in this case the Secret is not created by the chart, and my containers are
left in ContainerCreationError state because of the missing Secret.
The only way I found was to create a dummy secret with an empty <code>data.redis-password</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">masto-redis-mock</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">type</span>: <span style="color:#ae81ff">Opaque</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis-password</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>And then using that Secret in the Helm chart:</p>
<pre tabindex="0"><code>redis:
  auth:
    existingSecret: &#34;masto-redis-mock&#34;
</code></pre><p>With that, the Redis password env variable is set, but to an empty value, which
seems to make Mastodon use Redis properly, without adding a password of any
kind to the connection string.</p>
<p>The next noteworthy configuration to be set was the <code>mastodon.trusted_proxy_ip</code>
variable. This one needed the source IP of my Traefik ingress, but that doesn&rsquo;t
have a fixed IP, so I needed to add the Pod CIDR:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">trusted_proxy_ip</span>: <span style="color:#e6db74">&#34;300.300.300.1,127.0.0.1,10.8.0.0/16&#34;</span>
</span></span></code></pre></div><p>Without this setting, I got the following error in the mastodon-web logs:</p>
<pre tabindex="0"><code>[05332434-d3d6-40b1-950d-ae73da0d4967] ActionDispatch::RemoteIp::IpSpoofAttackError (IP spoofing attack?! client 10.8.4.103 is not a trusted proxy HTTP_CLIENT_IP=nil HTTP_X_FORWARDED_FOR=&#34;67.241.47.40, 10.86.10.10&#34;)
</code></pre><p>I also decided to switch off the CronJob for media removal:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">removeMedia</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>This is because I recently <a href="https://blog.mei-home.net/posts/mastodon-media-cache-cleanup-issue/">spend quite some time</a>
working on Masotodon&rsquo;s internal process. From what I can see, this CronJob
uses the <code>tootctl</code> CLI with the <a href="https://docs.joinmastodon.org/admin/tootctl/#media-remove">tootctl media remove</a>
command. I like that better than the internal Mastodon process, because back
when I looked at it, <code>tootctl</code> worked a lot better because it made separate
DELETE requests. But the one thing which keeps me from using the CronJob is that
I can&rsquo;t configure the retention periods. I might still use it later and just live
with the defaults.</p>
<p>And that&rsquo;s really all I have to say. For completeness&rsquo; sake, here is the full
<code>values.yaml</code> content:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">createAdmin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">removeMedia</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">local_domain</span>: <span style="color:#e6db74">&#34;social.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">trusted_proxy_ip</span>: <span style="color:#e6db74">&#34;300.300.300.1,127.0.0.1,10.8.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">singleUserMode</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">autherizedFetch</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">limitedFederationMode</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">s3</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-bucket&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">masto-media</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">endpoint</span>: <span style="color:#e6db74">&#34;http://rook.service:80&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alias_host</span>: <span style="color:#e6db74">&#34;s3-mastodon.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deepl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hcaptcha</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secrets</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-secrets&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidekiq</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1024Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">smtp</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth_method</span>: <span style="color:#e6db74">&#34;plain&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">from_address</span>: <span style="color:#e6db74">&#34;Meiers Mastodon &lt;mastodon@mei-home.net&gt;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">openssl_verify_mode</span>: <span style="color:#e6db74">&#34;peer&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;mail.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tls</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-mail&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">streaming</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">repository</span>: <span style="color:#e6db74">&#34;ghcr.io/mastodon/mastodon-streaming&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">2000Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1000Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cacheBuster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">statsd</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exporter</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">otel</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraEnvVars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">SMTP_SSL</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_CLIENT_ID</span>: <span style="color:#e6db74">&#34;mastodon&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_DISPLAY_NAME</span>: <span style="color:#e6db74">&#34;Login with Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_ISSUER</span>: <span style="color:#e6db74">&#34;https://login.example.com/realms/example&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_DISCOVERY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_SCOPE</span>: <span style="color:#e6db74">&#34;openid,profile,email&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_UID_FIELD</span>: <span style="color:#e6db74">&#34;preferred_username&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_REDIRECT_URI</span>: <span style="color:#e6db74">&#34;https://social.mei-home.net/auth/auth/openid_connect/callback&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_SECURITY_ASSUME_EMAIL_IS_VERIFIED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_END_SESSION_ENDPOINT</span>: <span style="color:#e6db74">&#34;https://login.example.com/realms/example/protocol/openid-connect/logout&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OMNIAUTH_ONLY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">RAILS_SERVE_STATIC_FILES</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">S3_BATCH_DELETE_LIMIT</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">S3_READ_TIMEOUT</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">S3_BATCH_DELETE_RETRY</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ALLOWED_PRIVATE_ADDRESSES</span>: <span style="color:#e6db74">&#34;300.300.300.1&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/controller</span>: <span style="color:#e6db74">&#34;none&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">social.mei-home.net</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">streaming</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">elasticsearch</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresqlHostname</span>: <span style="color:#e6db74">&#34;mastodon-pg-cluster-rw&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresqlPort</span>: <span style="color:#e6db74">&#34;5432&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>: <span style="color:#e6db74">&#34;mastodon&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#e6db74">&#34;mastodon&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-pg-cluster-app&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hostname</span>: <span style="color:#e6db74">&#34;redis.example&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;6379&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;masto-redis-mock&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidekiq</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>To be honest, somewhere during that Sunday I started thinking that starting the
Mastodon migration on a Sunday morning might have been a mistake, but in the end
it worked out well enough.</p>
<p>Now there are only a few services left to migrate over, chief amongst them
my Keycloak instance. Let&rsquo;s see whether I might even be able to clean out the
entire cluster during this weekend. There&rsquo;s definitely a light at
the end of the migration tunnel. I guess this weekend will show whether it&rsquo;s
a freight train. &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 19: Migrating Nextcloud</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-19-nextcloud/</link>
      <pubDate>Mon, 24 Feb 2025 21:25:54 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-19-nextcloud/</guid>
      <description>Migrating my Nextcloud instance to Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Nextcloud instance to the k8s cluster.</p>
<p>This is part 20 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p><a href="https://nextcloud.com/">Nextcloud</a> is the oldest continuously running service
in my Homelab. It started out as an <a href="https://owncloud.com/">OwnCloud</a> deployment back
when I just called my Homelab my &ldquo;Heimserver&rdquo;. It ran continuously for more than
ten years, and I quite like it.</p>
<p>Initially I only used it for file sharing between my devices and as a better
alternative to a Samba share.
Over the years, I also started using it for contacts and calendar sharing
between my phone and desktop as well as sharing of my Firefox bookmarks between
my laptop and desktop via <a href="https://floccus.org/">Floccus</a>.
One perhaps somewhat surprising use case is for backups of OPNsense, which has
support for backing up its configuration to Nextcloud <a href="https://docs.opnsense.org/manual/how-tos/cloud_backup.html#setup-nextcloud-api-usage">out of the box</a>.</p>
<p>My most recent use case was for notes sharing. When I&rsquo;m researching something,
say a new app I&rsquo;d like to deploy in the Homelab, I like to plonk down in my
armchair with my tablet. For a long time, I then had a problem with sharing
notes between the tablet and my desktop. After some searching, I found Nextcloud&rsquo;s
<a href="https://apps.nextcloud.com/apps/notes">Notes app</a>. It isn&rsquo;t the greatest
note taking app, but it does the job adequately for what I need, allowing me
to paste some links and write some comments on them while lounging in my armchair.</p>
<h2 id="nextcloud-configuration">Nextcloud configuration</h2>
<p>I&rsquo;ve been using Nextcloud&rsquo;s community-lead <a href="https://hub.docker.com/_/nextcloud/">FPM image</a>,
which only contains Nextcloud itself, but no web server or anything else.
For serving static assets and also just generally fronting Nextcloud, I&rsquo;m using
<a href="https://caddyserver.com/">Caddy</a>.
For improved performance (or rather, reduced load), I&rsquo;m also deploying the
Rust-based <a href="https://github.com/nextcloud/notify_push">push_notify app</a>.
It sends update notifications to connected clients, instead of needing the
clients to poll the server for changes.</p>
<p>Finally, Nextcloud needs some regular cleanup tasks to be executed. And it being
a PHP app without any scheduling capability, it needs the trigger for those
regular tasks to come from outside the app itself. This can be configured in three
ways:</p>
<ol>
<li>Running a task or two for every page load by a user</li>
<li>Calling a dedicated URL regularly</li>
<li>Setting up a cron job to call a dedicated PHP file</li>
</ol>
<p>I&rsquo;ve opted for option 2), because running a cron job in a container still doesn&rsquo;t
seem to be a solved problem, and I&rsquo;ve found that using option 1) was not enough,
because I don&rsquo;t actually visit the web interface too often.</p>
<p>Then there&rsquo;s also the question of data storage. A couple of years back, after I
got my Ceph cluster up and running, I switched from a file-based backend to
S3. This allowed me to stop worrying about partition sizes at least. But this,
as all too many things in Nextcloud, has its quirks. Most importantly: Not all
data gets stored in the S3 bucket. You still need to provide Nextcloud with a
persistent volume, but at least its small: For my 10 - 15 years old instance,
it&rsquo;s only 29 MB worth of data. But still, it&rsquo;s there.</p>
<h2 id="preparations">Preparations</h2>
<p>Preparing for the move, I had to set up three volumes.</p>
<p>The first one is the
<em>webapp</em> volume. This volume will be mounted into all of the containers of the
Pod, and it will contain Nextcloud&rsquo;s <code>/var/www/html</code> directory, where the
Nextcloud code lives.
This needs to be an RWX volume, because it needs to be accessed by the Nextcloud
FPM container, the Caddy container and the notify-push container. For this,
I created a 10 GB CephFS PersistentVolumeClaim, as that doesn&rsquo;t have any issues
with concurrent access.</p>
<p>The second volume is for the data. As noted above, this one should not need too
much space due to me using S3 for storage, so it&rsquo;s only 1 GB. And finally there&rsquo;s
a scratch volume for Caddy, which also needs a bit of local storage. But that&rsquo;s
even smaller than the data volume, at only 500 MB.</p>
<p>Nextcloud also needs a database, which I&rsquo;m running on <a href="https://cloudnative-pg.io/">CloudNativePG</a>
again. I&rsquo;ve described how I&rsquo;m migrating databases in detail <a href="https://blog.mei-home.net/posts/k8s-migration-16-gitea/#database-setup-and-migration">here</a>.</p>
<h2 id="nextclouds-deployment">Nextcloud&rsquo;s deployment</h2>
<p>The Nextcloud Deployment manifest is pretty long, due to the number of containers
I&rsquo;m running in the Pod. Here it is in its entirety, I will describe the pieces
in detail later:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config-nc</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/nextcloud-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config-caddy</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/caddy-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">33</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsUser</span>: <span style="color:#ae81ff">33</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsGroup</span>: <span style="color:#ae81ff">33</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine:latest</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;cp&#34;</span>, <span style="color:#e6db74">&#34;/config/config.php&#34;</span>, <span style="color:#e6db74">&#34;/data/config/config.php&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/homenet-data/data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">subPath</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/html</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">2048Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-bucket</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-secrets</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">configMapRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-bucket</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_REDIS_HOST</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis.redis.svc.cluster.local&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_REDIS_PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;6379&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_NAME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_HOST</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_USER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_PW</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-push</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-c&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;chmod u+x /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push; /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/html</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">NEXTCLOUD_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;https://nc.example.com&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.ports.notifyPush }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DATABASE_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">uri</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">caddy:{{ .Values.caddyVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/my-apps/nextcloud</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webscratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/caddy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-cron</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/cron-scripts/webcron.sh&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/cron-scripts</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SLEEPTIME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">INITIAL_WAIT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;10m&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">nextcloud-data</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">nextcloud-webapp</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webscratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">nextcloud-webscratch</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span></code></pre></div><p>The first thing to discuss is the Nextcloud configuration file, which is just
a PHP file, the <code>config.php</code>. It can be split, but I&rsquo;ve always had it all in a
single file and decided not to change that. In addition, while looking at the
<a href="https://github.com/nextcloud/docker">README of the container GitHub repo</a>, I
found that the image has some capability to do the entire configuration in
environment variables. That&rsquo;s something to look at later.
The configuration file has one big quirk: It needs to be writable by Nextcloud,
at least during updates, because it contains the Nextcloud version. Which I find
an extremely weird thing to do. This leads to the manual step of updating the
ConfigMap containing the <code>config.php</code> after an update is done.</p>
<p>Before I continue, I&rsquo;d like to thank <a href="https://transitory.social/@rachel">@rachel</a>,
who was kind enough to provide me with her Nextcloud manifests and especially
her Nextcloud config file. The most important thing those taught me was the
use of the <code>getenv</code> PHP function, so that I could provide all of the secrets
as environment variables, instead of having to construct an elaborate
external-secrets template.</p>
<p>As a consequence, my <code>config.php</code> ConfigMap now looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config.php</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    &lt;?php
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    $CONFIG = array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;apps_paths&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        0 =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;path&#39; =&gt; &#39;/var/www/html/apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;url&#39; =&gt; &#39;/apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;writable&#39; =&gt; false,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        1 =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;path&#39; =&gt; &#39;/var/www/html/custom_apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;url&#39; =&gt; &#39;/custom_apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;writable&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;instanceid&#39; =&gt; &#39;ID&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;datadirectory&#39; =&gt; &#39;/homenet-data/data&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;objectstore&#39; =&gt; [
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              &#39;class&#39; =&gt; &#39;\\OC\\Files\\ObjectStore\\S3&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              &#39;arguments&#39; =&gt; [
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;bucket&#39; =&gt; getenv(&#39;BUCKET_NAME&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;autocreate&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;key&#39;    =&gt; getenv(&#39;AWS_ACCESS_KEY_ID&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;secret&#39; =&gt; getenv(&#39;AWS_SECRET_ACCESS_KEY&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;hostname&#39; =&gt; getenv(&#39;BUCKET_HOST&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;port&#39; =&gt; getenv(&#39;BUCKET_PORT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;use_ssl&#39; =&gt; false,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;use_path_style&#39;=&gt;true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              ],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;trusted_domains&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        0 =&gt; &#39;nc.example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        1 =&gt; &#39;127.0.0.1&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;trusted_proxies&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        0 =&gt; &#39;127.0.0.1/32&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;memcache.local&#39; =&gt; &#39;\\OC\\Memcache\\Redis&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;redis&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;host&#39; =&gt; getenv(&#39;HL_REDIS_HOST&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;port&#39; =&gt; getenv(&#39;HL_REDIS_PORT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;memcache.locking&#39; =&gt; &#39;\\OC\\Memcache\\Redis&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;user_oidc&#39; =&gt; [
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;allow_multiple_user_backends&#39; =&gt; 0,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;auto_provision&#39; =&gt; false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;allow_local_remote_servers&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;overwrite.cli.url&#39; =&gt; &#39;https://nc.example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;overwriteprotocol&#39; =&gt; &#39;https&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;overwritewebroot&#39; =&gt; &#39;/&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;maintenance_window_start&#39; =&gt; 100,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;default_phone_region&#39; =&gt; &#39;DE&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbtype&#39; =&gt; &#39;pgsql&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;version&#39; =&gt; &#39;30.0.6.2&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbname&#39; =&gt; getenv(&#39;HL_DB_NAME&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbhost&#39; =&gt; getenv(&#39;HL_DB_HOST&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbport&#39; =&gt; getenv(&#39;HL_DB_PORT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbuser&#39; =&gt; getenv(&#39;HL_DB_USER&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbpassword&#39; =&gt; getenv(&#39;HL_DB_PW&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbtableprefix&#39; =&gt; &#39;oc_&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;installed&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;maintenance&#39; =&gt; false,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;loglevel&#39; =&gt; 2,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;logfile&#39; =&gt; &#39;/dev/stdout&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;log_type&#39; =&gt; &#39;file&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_domain&#39; =&gt; &#39;example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_from_address&#39; =&gt; &#39;nextcloud&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpmode&#39; =&gt; &#39;smtp&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtphost&#39; =&gt; &#39;mail.example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpport&#39; =&gt; &#39;465&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpsecure&#39; =&gt; &#39;ssl&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpauth&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpname&#39; =&gt; &#39;nc@example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtppassword&#39; =&gt; getenv(&#39;HL_MAIL_PW&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;passwordsalt&#39; =&gt; getenv(&#39;HL_PW_SALT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;secret&#39; =&gt; getenv(&#39;HL_SECRET&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    );</span>
</span></span></code></pre></div><p>One noteworthy piece here is the <code>trusted_domains</code> setting, which contains
not only the actual domain Nextcloud is hosted on, but also <code>127.0.0.1</code>. This is
necessary because of the cron setup I will describe later.
I find this kind of configuration setup, where I can have a config file plus
environment variables for secrets quite convenient. It lets me have an actual
config file, but it also allows me to extract the secrets without having to work
with some sort of templating.</p>
<p>Another advantage of this setup, where I can define the names of config variables,
is that I can use autogenerated Secrets directly, as you can see in the S3
setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-php" data-lang="php"><span style="display:flex;"><span><span style="color:#e6db74">&#39;objectstore&#39;</span> <span style="color:#f92672">=&gt;</span> [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;class&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#e6db74">&#39;\\OC\\Files\\ObjectStore\\S3&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;arguments&#39;</span> <span style="color:#f92672">=&gt;</span> [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;bucket&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;BUCKET_NAME&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;autocreate&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;key&#39;</span>    <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;AWS_ACCESS_KEY_ID&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;secret&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;AWS_SECRET_ACCESS_KEY&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;hostname&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;BUCKET_HOST&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;port&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;BUCKET_PORT&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;use_ssl&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;use_path_style&#39;</span><span style="color:#f92672">=&gt;</span><span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>],
</span></span></code></pre></div><p>Here I was able to define the env variables in such a way that I could just
use the ConfigMap and Secret generated by Rook via <code>envFrom</code> in the Deployment,
instead of having to define every variable individually.</p>
<p>But as I&rsquo;ve noted above, Nextcloud needs write access to the config file, so
just mounting the ConfigMap into the container is not an option, because ConfigMaps
are always mounted read-only.
So I had to reach for the typical init container trick used in these situations
and copy the config map into the webapp volume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      <span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine:3.21.2</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;cp&#34;</span>, <span style="color:#e6db74">&#34;/config/config.php&#34;</span>, <span style="color:#e6db74">&#34;/data/config/config.php&#34;</span>]
</span></span></code></pre></div><p>Next comes the Nextcloud container itself. The main thing I&rsquo;d like to point out
here is a gotcha that had me scratching my head for a little while. You can
see that I set two env variables for Redis, <code>HL_REDIS_HOST</code> and <code>HL_REDIS_PORT</code>.
When I first launched the Pod, those were called <code>REDIS_HOST</code> and <code>REDIS_PORT</code>,
which just so happen to be the same environment variables that the image uses.
It resulted in this error message:</p>
<pre tabindex="0"><code>Configuring Redis as session handler
/entrypoint.sh: 111: cannot create /usr/local/etc/php/conf.d/redis-session.ini: Permission denied
</code></pre><p>It made me pretty suspicious, because the ownership of the <code>/usr</code> hierarchy
cannot have changed between Nomad and k8s, and the container was running with
the same UID/GID as it was in the Nomad cluster. So why was I suddenly seeing
this permission issue? I rummaged a bit through the <a href="https://github.com/nextcloud/docker/blob/master/docker-entrypoint.sh">Docker entrypoint</a>
of the image and found that the error message was coming from this piece of
code:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>REDIS_HOST+x<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#34;Configuring Redis as session handler&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            file_env REDIS_HOST_PASSWORD
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#39;session.save_handler = redis&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># check if redis host is an unix socket path</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span>$REDIS_HOST<span style="color:#e6db74">&#34;</span> | cut -c1-1<span style="color:#66d9ef">)</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD+x<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;unix://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">?auth=</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;unix://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># check if redis password has been set</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">elif</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD+x<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;tcp://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">:</span><span style="color:#e6db74">${</span>REDIS_HOST_PORT:=6379<span style="color:#e6db74">}</span><span style="color:#e6db74">?auth=</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;tcp://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">:</span><span style="color:#e6db74">${</span>REDIS_HOST_PORT:=6379<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#34;redis.session.locking_enabled = 1&#34;</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#34;redis.session.lock_retries = -1&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># redis.session.lock_wait_time is specified in microseconds.</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Wait 10ms before retrying the lock rather than the default 2ms.</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#34;redis.session.lock_wait_time = 10000&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span> &gt; /usr/local/etc/php/conf.d/redis-session.ini
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>That sets up some Redis configurations, and I invariably ran into this because
I named my env variables the same as the image&rsquo;s.
The error went away when I renamed the env variables to have the <code>HL_</code> prefix,
so didn&rsquo;t hit the <code>if</code> above anymore.</p>
<p>Additionally noteworthy is the fact that the Nextcloud container doesn&rsquo;t expose
any port, only the
Caddy web server does, which will proxy all requests targeting PHP files to the
Nextcloud container.</p>
<p>That Caddy container looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">caddy:{{ .Values.caddyVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/my-apps/nextcloud</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webscratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/caddy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>It doesn&rsquo;t need any of the Secrets and environment variables that the Nextcloud
container needs, and gets its configuration from a <code>Caddyfile</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">Caddyfile</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      admin off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      auto_https off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        output stdout
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        level INFO
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      servers {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        trusted_proxies static 127.0.0.1/32 300.300.300.1/32
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    :{{ .Values.ports.caddy }} {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      root * /my-apps/nextcloud
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      file_server
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        output stdout
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        format filter {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          wrap json
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          fields {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            request&gt;headers&gt;Authorization delete
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            request&gt;headers&gt;Cookie delete
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      route /push/* {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          uri strip_prefix /push
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          reverse_proxy http://localhost:{{ .Values.ports.notifyPush }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      @provider-matcher {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        path_regexp ^\/(?:updater|oc[ms]-provider)(?:$|\/)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      rewrite @provider-matcher {path}/index.php
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      @php-matcher {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        path_regexp ^\/(?:index|remote|public|cron|core\/ajax\/update|status|ocs\/v[12]|updater\/.+|oc[ms]-provider\/.+)\.php(?:$|\/)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      php_fastcgi @php-matcher localhost:9000 {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        root /var/www/html
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/carddav /remote.php/dav/ 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/caldav /remote.php/dav/ 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/webfinger /index.php{uri} 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/nodeinfo /index.php{uri} 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      @forbidden {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /.htaccess
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /.user.ini
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /3rdparty/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /authors
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /build/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /config/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /console*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /copying
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /data/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /db_structure
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /lib/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /occ
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /README
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /templates/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /tests/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /console.php
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      respond @forbidden 404
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }</span>
</span></span></code></pre></div><p>This config does a couple of things. First, it defines the webapp volume as the
HTTP root and thus serves the content directly. This is so Caddy serves
Nextcloud&rsquo;s static assets. An important piece is the log config, which removes
some secret data like cookies and auth headers from the request log.
Then there&rsquo;s a number of routes, the first one redirecting requests for the
notify-push backend to that container&rsquo;s port. Then there&rsquo;s a rewrite of some
&ldquo;special&rdquo; paths to PHP and the general PHP matcher, which forwards all PHP file
requests to the Nextcloud container. And finally a couple of explicitly
forbidden paths containing files that shouldn&rsquo;t have external access.</p>
<p>Then there&rsquo;s the nextcloud-push container:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-push</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-c&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;chmod u+x /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push; /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/html</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">NEXTCLOUD_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;https://cloud.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.ports.notifyPush }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DATABASE_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">uri</span>
</span></span></code></pre></div><p>The notify-push component, which is separate from Nextcloud&rsquo;s main codebase,
is their attempt to solve some performance issues. Normally, clients have to
proactively poll the server for changed files. This becomes inefficient fast
even at a small, maximum three connected clients deployment like mine. In
contrast to the majority of Nextcloud, this component was written in Rust for
performance reasons. I&rsquo;ve just checked and did not see much change in the CPU
usage of my Nomad Nextcloud deployment after deploying notify-push for the firs
time, but I still figure: Why not.</p>
<p>There were a couple of problems with this deployment though. The very first
one was the fact that the Rust binaries are located in per-arch directories.
In Nomad that wasn&rsquo;t a problem, I could define the command&rsquo;s path like this:</p>
<pre tabindex="0"><code>/var/www/html/custom_apps/notify_push/bin/${attr.kernel.arch}/notify_push
</code></pre><p>The <code>${attr.kernel.arch}</code> would be replaced with the CPU architecture of the
node the job got scheduled on by Nomad.
I was 100% sure that Kubernetes would have something similar. In fact, I knew
it did. The information is stored in the <code>kubernetes.io/arch</code> label. And you can
get labels into env variables with the <a href="https://kubernetes.io/docs/concepts/workloads/pods/downward-api/">Downward API</a>,
and then you can use env variables in the <code>command</code> via the <code>$(ENV_VAR)</code> syntax.
The problem: The <code>arch</code> label is defined on nodes, not on Pods. And the Downward
API only allows access to Pod labels, not node labels. So I finally had to reach
for the <code>uname -m</code> you see above. I was really surprised that k8s doesn&rsquo;t have
the capability to inject the node&rsquo;s arch into a container&rsquo;s env.</p>
<p>But that wasn&rsquo;t the end of my notify-push problems. Now that it was finally
able to execute the binary, it error&rsquo;d out with this error:</p>
<pre tabindex="0"><code>Error: php_literal_parser::unexpected_token

  × Error while parsing nextcloud config.php
  ╰─▶ Error while parsing &#39;/var/www/html/config/config.php&#39;:
      No valid token found, expected one of boolean literal, integer literal,
      float literal, string literal, &#39;null&#39;, &#39;array&#39; or &#39;[&#39;
    ╭─[22:31]
 21 │           &#39;arguments&#39; =&gt; [
 22 │                   &#39;bucket&#39; =&gt; getenv(&#39;BUCKET_NAME&#39;),
    ·                               ┬
    ·                               ╰── Expected boolean literal, integer literal, float literal, string literal, &#39;null&#39;, &#39;array&#39; or &#39;[&#39;
 23 │                   &#39;autocreate&#39; =&gt; true,
    ╰────
</code></pre><p>Before I had the version using environment variables to provide the Nextcloud
configs needed by the notify-push app, I was providing the <code>config.php</code> file
directly, which is supposed to work as well. I figured I had the file already
anyway, so why not use it?
But it looks like the PHP parser used by notify-push is not capable of
actually executing PHP, it expects the config options to all be set to a static
value.
That&rsquo;s why I ended up using the environment variables supported by the notify-push
binary to set the necessary configuration options.</p>
<p>After all of that, the Pod finally fully started, and I was able to log
in and got all of my files, calendars, contacts and so on.
I also went through the warnings shown in the admin interface and had one
issue I&rsquo;d like to note here. The errors told me that my mail settings had not
been tested, so I went into them and clicked the &ldquo;send test mail&rdquo; button.
This showed an error immediately:</p>
<pre tabindex="0"><code>AxiosError: Request failed with status code 400
</code></pre><p>I had absolutely no idea what it meant, as I knew that my mail server was working
as intended. It turned out that the issue wasn&rsquo;t with the mail server or the
Nextcloud mail config, but just the fact that I had never set a mail address
for the admin account I was working in. &#x1f926;</p>
<p>The last piece of the puzzle is the cron container. As I&rsquo;ve described above,
Nextcloud needs some regularly executed tasks. I&rsquo;m not enough of a web developer
to really have any experience with PHP, but from what I understand, PHP is request-oriented,
so it doesn&rsquo;t have a convenient place to put/execute cron tasks?
Anyway, I needed some way to regularly call the <code>cron.php</code> file to trigger these
regular maintenance tasks. The advise from the <a href="https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/background_jobs_configuration.html">Nextcloud docs</a>
recommend to hit the <code>cron.php</code> file every five minutes. For that, I re-used the
Nextcloud container, because it already has all that&rsquo;s needed onboard:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-cron</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;/cron-scripts/webcron.sh&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/cron-scripts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SLEEPTIME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">INITIAL_WAIT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;10m&#34;</span>
</span></span></code></pre></div><p>But instead of launching php-fpm, I run a simple bash script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">webcron.sh</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    echo &#34;$(date): Launched task, sleeping for ${INITIAL_WAIT}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    sleep &#34;${INITIAL_WAIT}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    while true; do
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      curl http://127.0.0.1/cron.php 2&gt;&amp;1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;$(date): Sleeping for ${SLEEPTIME}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      sleep &#34;${SLEEPTIME}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    done</span>
</span></span></code></pre></div><p>This does the task pretty nicely, while staying pretty simple.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This one went quite well. I was expecting more problems, especially considering
that it sometimes looks like mine is the only Nextcloud deployment in the Homelabbing
community which runs without any issues. &#x1f605; I intentionally chose to not muck about with the setup
too much and instead copied my Nomad setup as much as possible, which made for
a relatively smooth migration.
I was reluctant to change too much, because I rely on Nextcloud for a lot of my
&ldquo;I would rather not be without this for more than a weekend&rdquo; needs. So being
a bit conservative with how much I change was in order.</p>
<p>I haven&rsquo;t decided what comes next yet - I might spend next week finishing some
blog post drafts instead of starting anything new, because at this point I&rsquo;ve mostly
got &ldquo;finish during the weekend because I need it during the week&rdquo; stuff left in
the migration.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 18: Migrating Jellyfin</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-18-jellyfin/</link>
      <pubDate>Thu, 20 Feb 2025 23:30:24 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-18-jellyfin/</guid>
      <description>Migrating my Jellyfin instance and media collection to the Kubernetes cluster</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Jellyfin instance to the k8s cluster.</p>
<p>This is part 19 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I&rsquo;m running a <a href="https://jellyfin.org/">Jellyfin</a> instance in my Homelab to play
movies and TV shows. I don&rsquo;t have a very fancy setup, no re-encoding or anything
like that. I&rsquo;m just using Direct Play, as I&rsquo;m only watching things on my desktop
computer.</p>
<p>Jellyfin doesn&rsquo;t have any external dependencies at all, so there&rsquo;s only the
Jellyfin Pod itself to be configured. It also doesn&rsquo;t have a proper configuration
file. Instead, it&rsquo;s configured through the web UI and a couple of command line
options. For that reason, I won&rsquo;t have any Secrets or ConfigMaps. Instead I&rsquo;ve
just got a PVC with the config and some space for Jellyfin&rsquo;s cache and another
CephFS volume for the media collection.</p>
<p>Said media collection volume will be the main focus of this post, because
everything else about the setup follows my standard k8s app setup pretty closely.
I had originally planned to also dive a bit (okay, a lot &#x1f605;) into
the metrics of the copy operation, but that rather quickly turned into a
rabbit hole all its own, and so I decided to declare the beginning of operation
&ldquo;articles, not tomes&rdquo; and split it out into another post that will follow
shortly after this one.</p>
<h2 id="setting-up-the-media-volume">Setting up the media volume</h2>
<p>For my media volume, I had been using a CephFS volume in the Nomad job setup.
I&rsquo;ve had two reasons for this:</p>
<ol>
<li>I need to mount the volume twice and access it from two places: The Jellyfin
job, and my main desktop</li>
<li>Having &ldquo;unlimited&rdquo; space</li>
</ol>
<p>Ceph RBD volumes were out of the question, because those always need to have a
size set. They can&rsquo;t just grow over the entire space available in their Ceph
pool. CephFS volumes are different, though. By default, they don&rsquo;t have any
size restriction and can use the entire data pool of the CephFS they&rsquo;ve been
created on. This allows me to not have to worry about whether I need to extend
the size at some point.
At the same time, I also regularly copy new files onto the disk when expanding
my media collection. This happens from my desktop. So I also need to have the
ability to mount the volume on two machines at the same time, and write to it
at the same time too.</p>
<p>These two points make CephFS the perfect fit for the media volume. But it left
me with a problem: I needed a k8s PVC to mount into the Jellyfin Pod. But by
default, PVCs always have to have a capacity set. In my initial tests, I tried
just removing the size in the manifest for a test PVC, but k8s rejected it when
I tried to apply it. The same thing happened when I instead set the size to 0.</p>
<p>So back to the drawing board it was. Luckily for me, <a href="https://beyondwatts.social/@beyondwatts">@beyondwatts</a>
pointed me to static PVCs, which can be used to make manually created CephFS
and RBD volumes available as PVCs in Kubernetes. This seems to be a feature of
the <a href="https://github.com/ceph/ceph-csi">Ceph CSI</a>. The documentation for the
feature can be found <a href="https://github.com/ceph/ceph-csi/blob/devel/docs/static-pvc.md">here</a>.</p>
<p>I created my new media volume (technically a <a href="https://docs.ceph.com/en/reef/cephfs/fs-volumes/">CephFS subvolume</a>)
with the following Ceph commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolumegroup create homelab-fs static-pvcs
</span></span><span style="display:flex;"><span>ceph fs subvolume create homelab-fs media static-pvcs
</span></span></code></pre></div><p>After creation, the <code>ceph fs subvolume info homelab-fs media static-pvcs</code> output
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;atime&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;bytes_pcent&#34;</span>: <span style="color:#e6db74">&#34;undefined&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;bytes_quota&#34;</span>: <span style="color:#e6db74">&#34;infinite&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;bytes_used&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;created_at&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ctime&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;data_pool&#34;</span>: <span style="color:#e6db74">&#34;homelab-fs-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;features&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;snapshot-clone&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;snapshot-autoprotect&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;snapshot-retention&#34;</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;flavor&#34;</span>: <span style="color:#ae81ff">2</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;gid&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;mode&#34;</span>: <span style="color:#ae81ff">16877</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;mon_addrs&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;300.300.300.1:6789&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;300.300.300.2:6789&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;300.300.300.3:6789&#34;</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;mtime&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;pool_namespace&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;state&#34;</span>: <span style="color:#e6db74">&#34;complete&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;subvolume&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;uid&#34;</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Note especially the <code>bytes_quota: infinite</code> part, which was what I was after.
Next, I created the PersistentVolume for it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolume</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">capacity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">1Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">driver</span>: <span style="color:#ae81ff">rook-ceph.cephfs.csi.ceph.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">controllerExpandSecretRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">nodeStageSecretRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-csi-cephfs-node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeAttributes</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;fsName&#34;: </span><span style="color:#e6db74">&#34;homelab-fs&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;clusterID&#34;: </span><span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;staticVolume&#34;: </span><span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;rootPath&#34;: </span><span style="color:#ae81ff">/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeHandle</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistentVolumeReclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeMode</span>: <span style="color:#ae81ff">Filesystem</span>
</span></span></code></pre></div><p>I mostly copied this from another CephFS volume I already had as scratch space
for my backup setup. Important to note here is the <code>spec.csi.volumeAttributes.staticVolume: true</code>
entry as well as the <code>rootPath</code>.
The value for the root path can be found with the following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume getpath homelab-fs media static-pvcs
</span></span></code></pre></div><p>The PersistentVolumeClaim then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">1Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeMode</span>: <span style="color:#ae81ff">Filesystem</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeName</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span></code></pre></div><p>Because it&rsquo;s a CephFS subvolume, I could use the ReadWriteMany access mode.</p>
<p>But when trying to launch a Pod using the PVC, I got this error message initially:</p>
<pre tabindex="0"><code>MountVolume.MountDevice failed for volume &#34;jellyfin-media&#34; : rpc error: code = Internal desc = failed to get user credentials from node stage secrets: missing ID field &#39;userID&#39; in secrets
</code></pre><p>This showed up in the Events of the Pod. The issue is mentioned in the <a href="https://rook.io/docs/rook/latest/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage/#consume-the-shared-filesystem-across-namespaces">Rook Docs</a>.
And it needs to be solved by manually creating another Secret. I&rsquo;m not sure why
the Ceph CSI driver doesn&rsquo;t automatically create the Secret, as it&rsquo;s just a
copy of the <code>rook-csi-cephfs-node</code> secret with different names for the data keys.</p>
<p>I did the copy by first fetching the <code>rook-csi-cephfs-node</code> secret:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n rook-cluster secrets rook-csi-cephfs-node -o yaml &gt; csi-secret.yaml
</span></span></code></pre></div><p>From that <code>csi-secret.yaml</code> I then removed all of the runtime information added
by Kubernetes and then renamed the keys like this:</p>
<ul>
<li><code>adminID</code> -&gt; <code>userID</code></li>
<li><code>adminKey</code> -&gt; <code>userKey</code></li>
</ul>
<p>After that, I applied the new Secret to the cluster and then changed the
<code>spec.csi.nodeStageSecretRef.name</code> property of the PersistentVolume to the newly
created Secret. After that, the Pod was able to mount the CephFS static volume
without issue.
What I&rsquo;m still wondering about is why these static PVCs need this special
handling, even though CephFS PVCs created dynamically don&rsquo;t.</p>
<p>The last step of the preparation was to make sure that I could also mount the
CephFS subvolume on my desktop machine.
This, quite honestly, involved a bit of silliness. In my current configuration,
I just had the <code>name</code> option set for the mount, giving the Ceph user name to
use for authentication. This then automatically takes the <code>/etc/ceph/ceph.conf</code>
file to get the MON daemon IPs for initial cluster contact and the <code>ceph.client.&lt;username&gt;.keyring</code>
file from the same directory. I couldn&rsquo;t reuse the same approach, because I&rsquo;ve
got other mounts from the baremetal cluster I need to keep for now.</p>
<p>But as per the <a href="https://docs.ceph.com/en/reef/man/8/mount.ceph/">ceph.mount man page</a>,
there is a <code>secretfile</code> option. In my naivete, I thought that this file takes the
path to a keyring file. Which would make sense. Because the keyring files are
the way Ceph credentials are provided everywhere else. But no. The <code>secretfile</code>
option expects a file which contains <em>only</em> the key, and nothing else.
If you provide it with a full keyring file, the mount command will output an
error like this:</p>
<pre tabindex="0"><code>secret is not valid base64: Invalid argument.
adding ceph secret key to kernel failed: Invalid argument
couldn&#39;t append secret option: -22
</code></pre><p>With that finally figured out, I created the Ceph config file for the Rook
cluster with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config generate-minimal-conf
</span></span></code></pre></div><p>Then I was able to mount the subvolume with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/temp -o name<span style="color:#f92672">=</span>myuser,secretfile<span style="color:#f92672">=</span>/etc/ceph/ceph-rook.secret,conf<span style="color:#f92672">=</span>/etc/ceph/ceph-rook.conf
</span></span></code></pre></div><p>What I really like about working with Rook instead of baremetal Ceph is that I
can create additional users with Kubernetes manifests so I can version control
them, instead of having to document long sequences of commands in a runbook:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephClient</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caps</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mds</span>: <span style="color:#e6db74">&#39;allow rw path=/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>: <span style="color:#e6db74">&#39;allow r&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">osd</span>: <span style="color:#e6db74">&#39;allow rw tag cephfs data=homelab-fs&#39;</span>
</span></span></code></pre></div><p>This will allow the user to access only that specific static volume in the
cluster.</p>
<h2 id="copying-the-media-collection">Copying the media collection</h2>
<p>My media collection has a size of about 1.7 TiB. I knew that copying it over
would take quite a while, so I planned to do it from my Command&amp;Control host.
But then I got a weird feeling and decided to check the networking diagram.
It looks something like this:
<figure>
    <img loading="lazy" src="copy-routing.svg"
         alt="This is a network diagram. It shows several hosts: The first two are &#39;Baremetal Ceph Host&#39; and &#39;Rook Ceph Host&#39;. They&#39;re both in the same VLAN. Then there is &#39;Copy Host&#39;, which is connected to a different VLAN. All of them are connected to the same switch. Also connected to that switch is the &#39;Router routing VLANs&#39;. The diagram shows a network flow starting at &#39;Baremetal Ceph Host&#39; and going into the router via the switch. Then from the router it goes over back to the switch to end up in the &#39;Copy Host&#39;. From there, the flow goes back out to the switch and to the router again, to then go back to the switch and end up in the Rook Ceph Host."/> <figcaption>
            <p>Network diagram with the packet flow for the copy operation.</p>
        </figcaption>
</figure>
</p>
<p>The issue here is the fact my C&amp;C host, called the <em>Copy Host</em> here, is in a
different VLAN than the baremetal and Rook Ceph hosts. This means that some
routing needs to happen for packets to get to and from the Ceph hosts to the
copy host. This in turn means that all packets need to pass through the router.
This would be fine if the packets only needed to pass through the router once.
But in truth, they need to pass through the router twice. And they pass through
the same NIC on the router even four times.</p>
<p>The packets go from the source, the baremetal Ceph cluster, up to the router via
the link from the switch. Pass Nr. 1. Then they go down that same link again to
reach the C&amp;C host on its VLAN. Pass Nr. 2. The C&amp;C host then sends it to the
router again, now with the Rook Ceph host as the destination, pass Nr. 3.
And finally, the router then sends the packets back again down that link
between router and switch to finally arrive at the Ceph Rook host.</p>
<p>So because each packet passes the link twice in each direction, my maximum copy
speed is suddenly reduced to 500 Mbit/s, which is a mere 62 MByte/s, slower even
than the HDDs involved in this copy process.</p>
<p>I was contemplating which Homelab host to take out and install the necessary
tools on when <a href="https://hachyderm.io/@badnetmask">@badnetmask</a>, rightly, asked
why I don&rsquo;t just launch a Pod somewhere. And that was what I finally went with.</p>
<p>I then remembered that there is a <a href="https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/">Rook Ceph Toolbox</a>
with all the necessary tools already installed and I decided to try that.
After copying the credentials similar to what I explained above for my desktop
mounts, I got an error:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>bash-5.1$ mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/rook -o name<span style="color:#f92672">=</span>admin
</span></span><span style="display:flex;"><span>mount: drop permissions failed.
</span></span></code></pre></div><p>I then changed <a href="https://github.com/rook/rook/blob/master/deploy/examples/toolbox.yaml">the Pod&rsquo;s Yaml</a>
a bit by running it as root. Which gave me an error again, but at least a different one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span>root@rook-ceph-tools-584df95dcb-vdwqc /<span style="color:#f92672">]</span><span style="color:#75715e"># mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/rook -o name=admin</span>
</span></span><span style="display:flex;"><span>Unable to apply new capability set.
</span></span><span style="display:flex;"><span>modprobe: FATAL: Module ceph not found in directory /lib/modules/5.15.0-131-generic
</span></span><span style="display:flex;"><span>failed to load ceph kernel module <span style="color:#f92672">(</span>1<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>Unable to apply new capability set.
</span></span><span style="display:flex;"><span>unable to determine mon addresses
</span></span></code></pre></div><p>To get rid of the failed attempt to load the Ceph kernel module, I then also
added the <code>/lib/modules</code> directory as a volume to the Pod. This worked and got
rid of the fatal modprobe error, but still left me with the other errors.
So throwing up my hands, I set <code>securityContext.privileged</code>. I&rsquo;m still a bit
surprised that Linux doesn&rsquo;t have a specific capability to add to be allowed to
do mounting? Perhaps the ability to run mount is just so powerful that you&rsquo;ve
got <code>CAP_SYS_ADMIN</code> anyway?</p>
<p>The final Deployment I used:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span> <span style="color:#75715e"># namespace:cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dnsPolicy</span>: <span style="color:#ae81ff">ClusterFirstWithHostNet</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">rook-ceph-default</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/ceph/ceph:v18</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">/bin/bash</span>
</span></span><span style="display:flex;"><span>            - -<span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>            - |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CEPH_CONFIG=&#34;/etc/ceph/ceph.conf&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MON_CONFIG=&#34;/etc/rook/mon-endpoints&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              KEYRING_FILE=&#34;/etc/ceph/keyring&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              write_endpoints() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                endpoints=$(cat ${MON_CONFIG})
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                mon_endpoints=$(echo &#34;${endpoints}&#34;| sed &#39;s/[a-z0-9_-]\+=//g&#39;)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                DATE=$(date)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                echo &#34;$DATE writing mon endpoints to ${CEPH_CONFIG}: ${endpoints}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  cat &lt;&lt;EOF &gt; ${CEPH_CONFIG}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              [global]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              mon_host = ${mon_endpoints}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              [client.admin]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              keyring = ${KEYRING_FILE}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              watch_endpoints() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                real_path=$(realpath ${MON_CONFIG})
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                initial_time=$(stat -c %Z &#34;${real_path}&#34;)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                while true; do
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  real_path=$(realpath ${MON_CONFIG})
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  latest_time=$(stat -c %Z &#34;${real_path}&#34;)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  if [[ &#34;${latest_time}&#34; != &#34;${initial_time}&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    write_endpoints
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    initial_time=${latest_time}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  fi
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  sleep 10
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                done
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              ceph_secret=${ROOK_CEPH_SECRET}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              if [[ &#34;$ceph_secret&#34; == &#34;&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                ceph_secret=$(cat /var/lib/rook-ceph-mon/secret.keyring)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              fi
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              cat &lt;&lt;EOF &gt; ${KEYRING_FILE}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              [${ROOK_CEPH_USERNAME}]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              key = ${ceph_secret}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              write_endpoints
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              watch_endpoints</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">IfNotPresent</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tty</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">runAsNonRoot</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">privileged</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ROOK_CEPH_USERNAME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-mon</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ceph-username</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/ceph</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-config</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mon-endpoint-volume</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/rook</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-admin-secret</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/lib/rook-ceph-mon</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">modules</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/lib/modules</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-admin-secret</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">rook-ceph-mon</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ceph-secret</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#ae81ff">secret.keyring</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mon-endpoint-volume</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-mon-endpoints</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">key</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#ae81ff">mon-endpoints</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">emptyDir</span>: {}
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">modules</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">hostPath</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/lib/modules</span> <span style="color:#75715e"># directory location on host</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;node.kubernetes.io/unreachable&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Exists&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoExecute&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tolerationSeconds</span>: <span style="color:#ae81ff">5</span>
</span></span></code></pre></div><p>Anyway, with the <code>privileged</code> option set, I was finally able to mount. Wanting
to use rsync, I installed it with <code>yum install rsync</code> and mounted the baremetal
and Rook CephFS subvolumes.</p>
<p>I used this command to execute the copy operation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rsync -av --info<span style="color:#f92672">=</span>progress2 --info<span style="color:#f92672">=</span>name0 /mnt/baremetal/* /mnt/rook/
</span></span></code></pre></div><p>Here is the final output:</p>
<pre tabindex="0"><code>sent 1,748,055,479,314 bytes  received 155,334 bytes  54,890,039.24 bytes/sec
total size is 1,853,006,549,228  speedup is 1.06
</code></pre><p>The operation took a total of 9.5 h.</p>
<h2 id="deploying-jellyfin">Deploying Jellyfin</h2>
<p>Just for completeness&rsquo; sake, here is the Jellyfin Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1006</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsUser</span>: <span style="color:#ae81ff">1007</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsGroup</span>: <span style="color:#ae81ff">1006</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">jellyfin/jellyfin:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/jellyfin/jellyfin&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--datadir&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.mounts.cacheAndConf }}/data&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--cachedir&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.mounts.cacheAndConf }}/cache&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--ffmpeg&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/usr/lib/jellyfin-ffmpeg/ffmpeg&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cache-and-conf</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mounts.cacheAndConf }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">media</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mounts.media }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">1000m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1000Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/health&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cache-and-conf</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">jellyfin-config-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">media</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span></code></pre></div><p>Some specialties out of the ordinary here are the settings in the <code>spec.securityContext</code>.
These are there to ensure that I&rsquo;m getting the right permissions on the files
produced on the media collection subvolume. All files on there have the GID
<code>1006</code>, which is historically my group on the first desktop connected to my
first Homeserver, and it&rsquo;s still serving as the shared group for my media
collection. This is because both Jellyfin and my desktop user need to access the
media files. With this configuration, new files are written with the correct
GID by Jellyfin.</p>
<p>Another somewhat interesting point about Jellyfin: It does allow changing around
the config and cache directories, as you can see in the <code>containers[0].command</code>, but it
does not allow the same for the location of the media libraries. Those locations
are hardcoded.
I had pretty big problems with this fact back when I migrated from Docker Compose
to Nomad, but sadly that was before I took extensive notes or documented everything
in my internal wiki, so I can&rsquo;t repeat the manual steps I used to migrate
the data location back then. &#x1f614;</p>
<p>And that&rsquo;s it already for this one. As I noted above, I will pretty closely
follow this post with another one looking at Ceph during the large copy operation.</p>
<p>My next migration this coming weekend will be my Nextcloud instance. I&rsquo;ll need
to look at some Helm charts, but at this point I&rsquo;m pretty sure I will just write
my own one.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 17: Migrating my IoT Services</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-17-iot/</link>
      <pubDate>Sat, 15 Feb 2025 12:09:12 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-17-iot/</guid>
      <description>Migrating Mosquitto, mqtt2prometheus and zigbee2mqtt</description>
      <content:encoded><![CDATA[<p>Wherein I migrate several IoT services over to Kubernetes.</p>
<p>This is part 18 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>This is going to be a short one. This weekend, I finished the migration of
several IoT related services to k8s.
<a href="https://mosquitto.org/">Mosquitto</a> is my MQTT broker, handling messages from
several sources. For me, it&rsquo;s only a listener - I do not have any actual home
automations.
Said mosquitto instance is scraped by <a href="https://github.com/hikhvar/mqtt2prometheus">mqtt2prometheus</a>
to get the data my smart plugs and thermometers produce into my Prometheus
instance.
Finally, I also migrated my <a href="https://www.zigbee2mqtt.io/">Zigbee2MQTT</a> instance
over to the k8s cluster. It controls my Zigbee transceiver and sends the data
from my thermometers on to mosquitto.</p>
<p>If you&rsquo;d like some more details on the power plug data gathering setup, have
a look <a href="https://blog.mei-home.net/posts/power-measurement/">here</a>.
The post on my thermometer setup is still on the large pile of blog posts I&rsquo;d
like to write at some point.</p>
<p>This will be a short(er) post, as I want to only talk about a couple of issues
I encountered along the way.</p>
<h2 id="selfmade-helm-chart">Selfmade Helm chart</h2>
<p>I decided to write my own Helm chart for these tools and manage them all in the
same namespace. Just makes the setup a bit simpler, as they don&rsquo;t really need to
talk to many other services, none of the apps needs a database for example.</p>
<p>So what does a Helm chart look like when I write it myself?
The <code>Chart.yaml</code> is kept extremely simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v2</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">name</span>: <span style="color:#ae81ff">iot</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">description</span>: <span style="color:#ae81ff">The Homelab IoT services</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">type</span>: <span style="color:#ae81ff">application</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">version</span>: <span style="color:#ae81ff">0.1.0</span>
</span></span></code></pre></div><p>I don&rsquo;t need anything more. I don&rsquo;t even bother to change the Chart&rsquo;s version
when I change things.</p>
<p>The <code>values.yaml</code> file is also pretty sparse. I mostly use it for cases where
I need a value in multiple places:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">iot</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mosquitto</span>: <span style="color:#e6db74">&#34;1883&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pwr</span>: <span style="color:#e6db74">&#34;9641&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">temp</span>: <span style="color:#e6db74">&#34;9642&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">z2m</span>: <span style="color:#e6db74">&#34;8080&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">mqttHost</span>: <span style="color:#ae81ff">mqtt.example.com</span>
</span></span></code></pre></div><p>And that&rsquo;s it already.</p>
<h2 id="mosquitto">Mosquitto</h2>
<p>As I said, I won&rsquo;t detail every single manifest here. But one interesting part
was that MQTT isn&rsquo;t HTTP, it&rsquo;s a purely TCP based protocol. But I&rsquo;m still using
Ingress mechanisms, because Traefik does support TCP routes. In k8s, these are
configured with the <a href="https://doc.traefik.io/traefik/routing/providers/kubernetes-crd/#kind-ingressroutetcp">IngressRouteTCP</a> CRD.
Using such a router config, some things are not available. E.g. if you don&rsquo;t
configure TLS, you cannot do host-based routing. The connection simply doesn&rsquo;t
tell you what host it connected to. So when you want to use unencrypted TCP (or UDP),
your have to create a separate Traefik entrypoint with its own port just for this
route.
Here&rsquo;s the route manifest:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRouteTCP</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;{{ .Values.mqttHost }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">mqtt</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>: <span style="color:#ae81ff">HostSNI(`*`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">1883</span>
</span></span></code></pre></div><p>This connects Traefik&rsquo;s 1883 port to mosquitto&rsquo;s Service. All connections
arriving on the mqtt entrypoint will be forwarded to mosquitto.</p>
<p>If you do require TLS, Traefik can make use of the <a href="https://de.wikipedia.org/wiki/Server_Name_Indication">Server Name Indication</a>,
via the <code>HostSNI</code> setting. But SNI is an extension to TLS, so not all software
implementing TLS will support it.
When TLS is enabled, you can even run pure TLS connections over the same port
Traefik is using for HTTPS.
An IngressRouteTCP would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRouteTCP</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto-tls</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">websecure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>: <span style="color:#ae81ff">HostSNI(`{{ .Values.mqttHost }}`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">1883</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>: {}
</span></span></code></pre></div><p>Here, the <code>websecure</code> entrypoint is my standard HTTPS entrypoint. This still works
as expected though, even for pure TLS connections, by using the SNI and
forwarding connections arriving for mqtt.example.com to mosquitto.
The <code>tls</code> key at the end is important, even though it is empty. This tells
Traefik to enable TLS with its default configuration, which uses my wildcard
cert.</p>
<p>The most interesting part of the mosquitto setup was the creation of users.
It uses a passwd-like file format, and I got &ldquo;creative&rdquo; when setting up the
Nomad job.
All of the users (admin user, scrapers, Zigbee2MQTT, my smart plugs) are in a
directory in Vault, looking like this:</p>
<pre tabindex="0"><code>my-secrets/iot/mqtt/users/username1
my-secrets/iot/mqtt/users/username2
[...]
</code></pre><p>Then each of those only has a single key, <code>secret</code>, which contains the user&rsquo;s
password, already encrypted with <a href="https://mosquitto.org/man/mosquitto_passwd-1.html">mosquitto_passwd</a>.
The problem now is: How to get all of those into a single passwd file for
mosquitto to use?
The resulting file should look something like this:</p>
<pre tabindex="0"><code>user1:$7$foo_encrypted==

user2:$7$bar_encrypted==
</code></pre><p>It turns out that <a href="https://external-secrets.io/latest/">external-secrets</a> has a
pretty good <a href="https://external-secrets.io/latest/guides/templating/">templating engine</a>,
so I was actually able to do this. The finished ExternalSecret looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto-users</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">passwd</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">passwd</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ `{{ range $name, $pass := . }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ $name }}:{{ with $pass | fromJson }}{{ .secret }}{{ end }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ end }}` }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">find</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">my-secrets/iot/mqtt/users</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#34;.*&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">rewrite</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">regexp</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">source</span>: <span style="color:#e6db74">&#34;my-secrets/iot/mqtt/users/(.*)&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">target</span>: <span style="color:#e6db74">&#34;$1&#34;</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the data fetching in <code>dataFrom</code>. It returns all secrets
below the <code>users/</code> path and returns them in a map, akin to this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">resultMap</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">my-secrets/iot/mqtt/users/username1</span>: {<span style="color:#f92672">&#34;secret&#34;: </span><span style="color:#e6db74">&#34;foo&#34;</span>}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">my-secrets/iot/mqtt/users/username2</span>: {<span style="color:#f92672">&#34;secret&#34;: </span><span style="color:#e6db74">&#34;bar&#34;</span>}
</span></span></code></pre></div><p>This is a bit unfortunate, because to get the right format, I need the username
as well. That&rsquo;s what the <code>rewrite:</code> object gives me. It does a regex match on
the whole path and gives me back only the last element, which is the username.
Then the template itself just iterates over the map and brings out the
username and password in the right format.</p>
<p>I&rsquo;m repeatedly impressed how many tight situations external-secrets has helped
me out of already. After some fiddling, this is a good enough result.</p>
<p>One thing I found rather unfortunate though: There&rsquo;s no way of defining the
owner of a Secret mapped into a pod as a volume. This means that the passwd file
sits in the container world readable. Not great. The only potential solution I
found was the introduction of an init container, to run chmod on the file.
I skipped that for now, but will have to take care about it at some point,
because mosquitto already complains about the fact that the passwd file is
world readable, noting that a setup like that will be rejected in the future.</p>
<h2 id="scraping-mqtt-data-with-prometheus">Scraping MQTT data with Prometheus</h2>
<p>I like and greatly enjoy my Prometheus data. I like looking at all of the plots
in Grafana. There&rsquo;s a reason it gets to occupy 200 GB of disk space.
So I need to get my MQTT data, meaning power consumption from the smart plugs
and thermal data from the thermometers, into Prometheus.
For this, I&rsquo;m using <a href="https://github.com/hikhvar/mqtt2prometheus">mqtt2prometheus</a>.
I&rsquo;ve currently got two instances running, one for my power plugs&rsquo; energy measurement
and one for my thermometers&rsquo; temperature and humidity. I put both of them into
one Pod, because having separate Pods for each of them seemed unnecessary.</p>
<p>The configuration of the power measurements exporter looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    mqtt:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      server: tcp://{{ .Values.mqttHost }}:{{ .Values.ports.mosquitto }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      user: promexport
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      client_id: pwr-exporter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      topic_path: &#34;plugs/tasmota/tele/#&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      device_id_regex: &#34;plugs/tasmota/tele/(?P&lt;deviceid&gt;.*)/.*&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    metrics:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_total_power_kwh
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Total
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Total power consumption (kWh)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: counter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_power
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Power
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current consumption (W)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_current
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.ApparentPower
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current (A)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_yesterday_pwr
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Yesterday
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Yesterdays Total Power Consumption (kWh)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: counter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_today_pwr
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Today
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Todays Total Power Consumption (kWh)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: counter</span>
</span></span></code></pre></div><p>And the one for the thermometers looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    mqtt:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      server: tcp://{{ .Values.mqttHost }}:{{ .Values.ports.mosquitto }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      user: promexport
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      client_id: temp-exporter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      topic_path: &#34;zigbee2mqtt/temp/sonoff/#&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      device_id_regex: &#34;zigbee2mqtt/temp/sonoff/(?P&lt;deviceid&gt;.*)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      timeout: 24h
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    metrics:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_temp_battery_percent
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: battery
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current battery percentage (percent)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        omit_timestamp: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_temp_humidity
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: humidity
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current humidity (percent)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        omit_timestamp: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_temp_temperature
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: temperature
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current temperature (C)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        omit_timestamp: true</span>
</span></span></code></pre></div><p>The configurations mostly consist of the translation of MQTT topics to Prometheus
metrics.</p>
<p>Here&rsquo;s the deployment for the Pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/pwr-config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/pwr-exp-conf.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/temp-config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/temp-exp-conf.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/hikhvar/mqtt2prometheus:{{ .Values.mqtt2promVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/etc/mqtt2prom/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-listen-port&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.ports.pwr }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-log-format&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;json&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-pwr</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/mqtt2prom</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">MQTT2PROM_MQTT_USER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;promexport&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporter-mosquitto-user</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.pwr }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/hikhvar/mqtt2prometheus:{{ .Values.mqtt2promVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/etc/mqtt2prom/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-listen-port&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.ports.temp }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-log-format&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;json&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-temp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/mqtt2prom</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">MQTT2PROM_MQTT_USER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;promexport&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporter-mosquitto-user</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.temp }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-pwr</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-temp</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span></code></pre></div><p>I have again cut out some unimportant pieces. Luckily, mqtt2prometheus supports
providing the credentials for MQTT access via environment variables, so I didn&rsquo;t
have to template the entire configuration file to avoid putting the credentials
into git.</p>
<p>Finally, I had to also set up the network policy to allow my Prometheus deployment
access to the Pod and its ports for scraping:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;exporters&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">prometheus</span>
</span></span></code></pre></div><h2 id="the-zigbee-manager">The Zigbee manager</h2>
<p>My thermometers are connected via Zigbee, so I needed some way to transform the
data to MQTT and send it to my mosquitto instance. I don&rsquo;t use HomeAssistant,
because it looks very much like overkill. I don&rsquo;t actually control anything,
I just want to gather a bit of data.
I&rsquo;m using <a href="https://www.zigbee2mqtt.io/">Zigbee2MQTT</a> for this. I&rsquo;m using a
Zigbee transceiver connected via LAN, so I didn&rsquo;t have to muck about with
mounting a USB device into the Pod.
Again, zigbee2mqtt is a good piece of software because it allows me to set some
config keys, those containing secrets, via environment variables but also allows
me to provide the non-secret config options in the configuration file.
Zigbee2MQTT requires three secrets:</p>
<ol>
<li>The MQTT credentials for access to mosquitto</li>
<li>An auth token for access to the web UI</li>
<li>A network key</li>
</ol>
<p>I&rsquo;m providing all three from my Vault instance in an ExternalSecret again:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ZIGBEE2MQTT_CONFIG_FRONTEND_AUTH_TOKEN</span>: <span style="color:#e6db74">&#34;{{ `{{ .auth }}` }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ZIGBEE2MQTT_CONFIG_MQTT_PASSWORD</span>: <span style="color:#e6db74">&#34;{{ `{{ .mqtt }}` }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ZIGBEE2MQTT_CONFIG_ADVANCED_NETWORK_KEY</span>: <span style="color:#e6db74">&#34;[{{ `{{ .network }}` }}]&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">auth</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">my-secrets/iot/zigbee2mqtt/auth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">mqtt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">my-secrets/iot/zigbee2mqtt/mqtt</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">network</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">my-secrets/iot/zigbee2mqtt/network-key</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>The complicated part of the Zigbee2MQTT deployment is the configuration file.
Because sadly, Zigbee2MQTT is one of those applications that need write access
to their configuration file. Which makes usage of a ConfigMap complicated,
because those are always mounted read-only. In the case of Zigbee2MQTT, I don&rsquo;t
really care about the content changes it makes, I can just deploy my original
file over it without an issue. But Zigbee2MQTT won&rsquo;t even start if it can&rsquo;t
write to the config file.</p>
<p>First, the config map itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">configuration.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: 4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    homeassistant:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    permit_join: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    frontend:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      enabled: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    mqtt:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      base_topic: zigbee2mqtt
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      server: &#39;mqtts://{{ .Values.mqttHost }}:443&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      user: foo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      client_id: &#34;foobar&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    # Serial settings
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    serial:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      port: &#39;tcp://my-zigbee-bridge:1234&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      adapter: zstack
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    advanced:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      channel: 23
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log_output:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        - console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    devices:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;0x123&#39;:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        friendly_name: &#39;temp/sonoff/thermo1&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        icon: device_icons/bdc2692122548ad0f2b0fb6c9f10a93d.png
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;0x456&#39;:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        friendly_name: &#39;temp/sonoff/thermo2&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        icon: device_icons/bdc2692122548ad0f2b0fb6c9f10a93d.png</span>
</span></span></code></pre></div><p>When new devices are connected, Zigbee2MQTT adds them to the <code>devices:</code> map, and
I then just add them to the ConfigMap manually.</p>
<p>But how to handle the fact that this config file needs to be writable?
Init containers. Because up to now, I&rsquo;ve been living in blissful ignorance of
such hacks, but that streak of good fortune had to end at some point. I just
find it so incredibly ugly. Look at it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/z2m-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt-init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine:3.21.2</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;cp&#34;</span>, <span style="color:#e6db74">&#34;/config/configuration.yaml&#34;</span>, <span style="color:#e6db74">&#34;/data/configuration.yaml&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">koenkk/zigbee2mqtt:{{ .Values.zigbee2mqttVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/app/data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">web</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.z2m }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">z2m-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span></code></pre></div><p>I&rsquo;m launching an entire other container just to run a single <code>cp</code> command to
copy the mounted ConfigMap into the data volume. I wish we had some better way
to do something like this. But it seems we don&rsquo;t.</p>
<p>And that&rsquo;s it for this one. I think wherever possible, I will keep the future
migration posts in this format, not explaining every single line of every single
Yaml file anymore, but only pointing out interesting things like the issue
with the mosquitto credentials in this one. It&rsquo;s more interesting to write and
I hope more interesting to read than the umpteenth re-explanation of my CNPG
DB setup.</p>
<p>Next up will be my Jellyfin media server. The copying of my media collection
is already done, and hopefully I will get the actual migration completed today.
That one will contain a lot of Grafana plots and Ceph performance musings. &#x1f913;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 16: Migrating Gitea</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-16-gitea/</link>
      <pubDate>Fri, 07 Feb 2025 22:50:37 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-16-gitea/</guid>
      <description>Migrating my Gitea instance from Nomad to Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Gitea instance from Nomad to k8s.</p>
<p>This is part 17 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I&rsquo;ve been using <a href="https://about.gitea.com/">Gitea</a> as my Git forge for a while
now. What&rsquo;s now the Gitea instance started life is a <a href="https://github.com/gogs/gogs">Gogs</a>
instance in 2016, when I had to downsize my Homelab to a Raspberry Pi 3B that
couldn&rsquo;t handle all the things I wanted to run on it. I decided to get rid of
my Gitlab instance and exchange it for Gogs. The switch back to Gitea then
happened because Gitlab started eating 12% of my new home server&rsquo;s CPU even
when idle.</p>
<p>This is what the front page looks like when logged in:
<figure>
    <img loading="lazy" src="gitea_frontpage.png"
         alt="A screenshot of the Gitea home page for a logged in user. At the top is a heat map, similar to the one on GitHub&#39;s user profile page. It shows a brighter color for days with a lot of activity, and a lighter color for days with less activity. It shows a full year&#39;s worth of activity, showing one colored box per day, with columns for weeks and rows for days of the week. My activity shows almost all weekend days with activity, while the winter months also show lots of activity on workdays. Below the heat map is an activity feed, showing activities from the last couple of days, like pushes to different repositories. Most of them are to the adm/homelab and mmeier/blog repository. On the right side is a list of repositories, showing ones like &#39;adm/homenet-docs&#39;, &#39;mmeier/smokes.cli&#39; or &#39;learning/learning-go&#39;. Next to some of them is a green check mark or a red cross, indicating the state of the last CI pipeline."/> <figcaption>
            <p>Screenshot of Gitea&rsquo;s home page for my user.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;m quite liking it and, in contrast to Gitlab, I never had any problems with it.
It&rsquo;s pretty snappy (again, especially in contrast to Gitlab) and relatively light
on resources. Most of the time I can&rsquo;t even tell whether it got assigned one
of my beefier x86 nodes or a Raspberry Pi.</p>
<p>I&rsquo;ve got 82 repositories stored in it, from relatively small dead projects which
never got much farther than a README to extremely large repos containing 3D
models and such for a <a href="https://www.sinsofasolarempire1.com/">Sins of a Solar Empire</a>
mod I was once involved in. Most repos don&rsquo;t see a lot of activity and I&rsquo;m the
only user at the moment. The instance is not publicly accessible, but I might
change that when the <a href="https://forgefed.org/">ForgeFed</a> project matures.</p>
<p>My way of working depends on the repository. For my Homelab, this blog and my
Homelab docs for example I&rsquo;m just pushing to the master branch. (Which again
reminds me to finally get around to the <code>main</code> branch migration.)
In my development projects though I&rsquo;m mostly working with Pull Requests. I find
Gitea&rsquo;s interface pretty convenient, and like seeing all the information and
CI runs for a specific feature in one place.</p>
<p>So I&rsquo;m not using too many actual features of Gitea, it&rsquo;s mainly a convenient UI
for my Git repos. But I must admit: I&rsquo;m rather fond of that activity heat map.
The only Gitlab feature I&rsquo;m genuinely missing are the repository stats. If any
of you know a good web app, either dynamic or statically generated, that can
show stats on a Git repo, I&rsquo;d be very interested.</p>
<h2 id="database-setup-and-migration">Database setup and migration</h2>
<p>I promise, this is the last time one of my migration articles will have a
long-winded section on databases. &#x1f609;
But in this case, it&rsquo;s warranted, because this is the first time I&rsquo;m actually
migrating a database, instead of setting up a new one.</p>
<p>I&rsquo;m using <a href="https://cloudnative-pg.io/">CloudNativePG</a> to manage the Postgres
databases in my k8s cluster. More details can be found <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">here</a>.
CNPG has a number of methods for seeding a new DB cluster with data. Generally,
those approaches are split into two. The first way involves another online cluster
and full replication or restoration of a backup.
The second method is more suited to what I needed, namely a one-time import
from another cluster using <code>initdb</code> to bootstrap the CNPG cluster.
This method uses <code>pg_dump</code>/<code>pg_restore</code> from another running cluster. This
method suited me somewhat well, because my Nomad Postgres setup is still up and
running. The docs for this method can be found <a href="https://cloudnative-pg.io/documentation/1.25/database_import/">here</a>.</p>
<p>There was just one problem: In Nomad, I&rsquo;m using <a href="https://developer.hashicorp.com/consul/docs/connect">Consul Connect Service Mesh</a>
to connect services and only allow access between specific services instead of
having open ports everywhere. This has been working pretty nicely in the past
several years. Remember, I&rsquo;m switching away from HashiCorp&rsquo;s stuff not because
their software is bad, but rather for ideological reasons.</p>
<p>But in this instance, I was stumped. For using <code>pg_dump</code>, CNPG needs access
to the other cluster. But of course no k8s service is currently inside the
Consul Mesh, so there&rsquo;s no way to access the Postgres DB. I thought: Well, I
can just open up a node port temporarily. And I failed. As in: I spend an entire
evening trying to figure this out and had to give up. For reference, the
network config for my Postgres Nomad job looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;postgres&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {}
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;script&#34;</span>
</span></span><span style="display:flex;"><span>        command  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/bin/pg_isready&#34;</span>
</span></span><span style="display:flex;"><span>        args     <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;-U&#34;, &#34;postgres&#34;</span>]
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2s&#34;</span>
</span></span><span style="display:flex;"><span>        task     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>[...]
</span></span></code></pre></div><p>With that config, Consul launches an <a href="https://www.envoyproxy.io/">Envoy</a> container
next to the Postgres container in the network namespace, by default on a random
port. Inside the network namespace, Postgres&rsquo; <code>5432</code> port is connected to Envoy.
Envoy then listens on a public port, but only lets through connections with the
right mTLS cert. Other services can then be allowed to access Postgres via
their own Envoy proxy. As best as I&rsquo;ve been able to figure out, there&rsquo;s no
way for a service outside the mesh to get through the Envoy proxy to the
Postgres port.</p>
<p>But opening another port also did not work. I&rsquo;m reasonably sure that&rsquo;s because
trying to connect Postgres&rsquo; socket to two other sockets (the temporary public one, and Envoy&rsquo;s)
is just not something that can ever work. I still tried though. Pretty hard
even.</p>
<p>But in the end I threw up my hands and had to admit that I was trying something
that&rsquo;s simply not possible. I could either have that port accessible on the node,
or via the Consul Mesh, but not both.</p>
<p>I also couldn&rsquo;t just temporarily switch off the Consul Mesh for Postgres, because
that would have impacted other workloads on my Nomad cluster. Took me quite a
while to come up with the solution: I remembered that, during my initial migration
to Nomad from my Docker Compose setup, I had set up an <a href="https://developer.hashicorp.com/consul/docs/connect/gateways/ingress-gateway">Ingress Gateway</a>
to provide access to the already migrated services from the apps still running
in Docker Compose.
That Ingress Gateway does pretty much what it says on the tin: It allows services
from outside the mesh access to services inside the mesh. It was of course not
as fine-grained as the service mesh itself. If a service could reach the gateway,
it could access all services inside the mesh that the gateway was allowed to
access.</p>
<p>Luckily, by the time I originally set up the Ingress Gateway, I had already
started to put my Homelab under version control, and I was still able to find
the old Ingress Gateway definition. I pared it down to only Postgres, and the
Nomad job ended up looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;ingress-gateways&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;homenet&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;internal&#34;</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;postgres&#34;</span> {
</span></span><span style="display:flex;"><span>        static <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>        to <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ingress-internal&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;8080&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">gateway</span> {
</span></span><span style="display:flex;"><span>          <span style="color:#66d9ef">ingress</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">listener</span> {
</span></span><span style="display:flex;"><span>              port <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>              protocol <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;tcp&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>                name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>              }
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This definition starts by setting up a bridged network namespace, meaning no
outside access by default. Then it creates a listener for Postgres. With that,
the Envoy proxy of the service would create a socket at port <code>5577</code> in the
namespace, connected to the Postgres service&rsquo;s Envoy proxy. The gateway would
also open a static port on <code>5577</code> on the node it is running on, which would
be connected to port <code>5577</code> inside the network namespace. And with that,
any service connecting to port <code>5577</code> on the host running the Ingress Gateway
would be connected to the Postgres database. Pretty neat and simple setup,
but took me a while to remember.</p>
<p>I ran test connections with this command to confirm that I finally had
external access to the cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>psql -U gogs -h ingress-internal.service.consul -p <span style="color:#ae81ff">5577</span> -d gitea
</span></span></code></pre></div><p>With that, I finally had access to the Postgres cluster from inside my k8s
cluster.</p>
<p>The CNPG Cluster manifest then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:17.2&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">import</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">microservice</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">databases</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">source</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">externalCluster</span>: <span style="color:#ae81ff">nomad-pg</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;50MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;150MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;12800kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;1536kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;128kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_keep_size</span>: <span style="color:#e6db74">&#34;512MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-gitea</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-gitea</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalClusters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nomad-pg</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">connectionParameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#ae81ff">ingress-internal.service.consul</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;5577&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">user</span>: <span style="color:#ae81ff">gogs</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">olddb-secret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span></code></pre></div><p>I&rsquo;ve omitted some standard things like the backup bucket setup here. The important
parts for the migration are the <code>spec.bootstrap.initdb.import</code> and <code>spec.externalClusters</code>
keys.</p>
<p>Let&rsquo;s start with the <code>externalClusters</code> definition. It&rsquo;s documented <a href="https://cloudnative-pg.io/documentation/1.25/bootstrap/#the-externalclusters-section">here</a> and describes the connection to another cluster. This doesn&rsquo;t need to be
a CNPG cluster. One problem was that seemingly, no documentation exists of the
<code>externalClusters.connectionParameters.port</code> option. I spend quite a while
trying to figure out whether the port was supposed to go on the end of the <code>host</code>
parameter, or whether it was a separate key.
I was finally saved by the fact that CNPG is open source, and so I could look
at the code - specifically a Yaml file from their test setup <a href="https://github.com/cloudnative-pg/cloudnative-pg/blob/62d48282bdd4c640d1af104b9cf637087148075e/tests/e2e/fixtures/replica_mode_cluster/cluster-replica-tls.yaml.template#L22">here</a>.
The password for the connection was coming from a Secret with the old database
credentials. As you can see in the <code>user</code> parameter, the Gitea database was
originally created during the Gogs phase of my Git hosting. &#x1f601;</p>
<p>The second part of the config is in <code>spec.bootstrap.initdb.import</code>, which tells
CNPG what it should import from the external cluster.
The first choice to make here is the <code>type</code> of the import. This describes the
destination cluster, meaning the new CNPG cluster. The choices are <code>microservice</code>,
meaning that the cluster serves only one app with one user, or <code>monolith</code>,
meaning a cluster hosting the databases of multiple services.
Besides that, I just needed to provide the name of the database in the source
cluster and the name of said cluster in the <code>externalClusters</code> list.</p>
<p>This import, as configured above, worked immediately. I was very positively
surprised. All data was imported properly, and CNPG automatically created the
customary Secret with the connection details and credentials for accessing
the cluster.
After the initial import, I was able to remove the <code>spec.bootstrap.initdb.import</code>
and <code>externalClusters</code> keys completely from the manifest without any error.</p>
<h2 id="helm-chart">Helm Chart</h2>
<p>For the Gitea deployment itself, I made use of the <a href="https://gitea.com/gitea/helm-chart">official Helm chart</a>.
It is one of the better ones I&rsquo;ve encountered since starting the migration,
providing the ability to set config options in the <code>values.yaml</code> instead of
having to maintain a separate <code>app.init</code> file. What I value extremely highly
is that they provide the ability to add environment variables via <code>env.ValueFrom</code>,
so I can directly use the automatically created Secrets from CNPG for the DB
and Rook Ceph for the S3 bucket. This safes me the roundabout setup I had to do
for other charts, where I had to use external-secrets to template the auto
Secrets into new Secrets with a different format to conform to the Chart&rsquo;s
expectations.</p>
<p>One downside at the time of writing is that the chart is not on the newest
Gitea version 1.23.1. But I just changed the image tag and chart version 10.6.0
worked without issue.
Going by <a href="https://gitea.com/gitea/helm-chart/issues/783">this GitHub issue</a> the
delay is simply due to some internal refactoring of the chart they want to finish
before the next release.</p>
<p>I will split the exploration of my <code>values.yaml</code> file into two parts, one with
the Gitea config under the <code>gitea</code> key and one for everything else.</p>
<h3 id="everything-besides-the-gitea-config">Everything besides the Gitea config</h3>
<p>Let&rsquo;s start with the &ldquo;everything else&rdquo; part:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">replicaCount</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">rootsless</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tag</span>: <span style="color:#e6db74">&#34;1.23.1&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Recreate</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">containerSecurityContext</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">SYS_CHROOT</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ssh</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2222</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">git.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">very-safe-entrypoint</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">gitea.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">pathType</span>: <span style="color:#ae81ff">Prefix</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gitea.example.com</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1500Mi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">create</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mount</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">gitea-data-volume</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">signing</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">actions</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis-cluster</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql-ha</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>As I noted above, I had to hardcode the Gitea tag for now, because the newest
chart version is still on 1.22.3. I also opted for using the rootless image,
which I did not use in the Nomad job. It just is a little bit nicer than
having a root-capable image, although Gitea automatically drops root privileges
at startup and even with the root image, Gitea doesn&rsquo;t actually run as root.</p>
<p>I also had to hardcode the update strategy to <code>Recreate</code>. I&rsquo;m not sure why
it&rsquo;s set to rolling updates by default. This doesn&rsquo;t really work, because
Gitea is launched in a Deployment and has a PVC mounted, so the newly started
instance won&rsquo;t actually be able to start because the RWO volume can&rsquo;t be mounted
in two Pods at the same time.</p>
<p>Then comes an important one, the <code>SYS_CHROOT</code> capability. This is documented as
required when using <a href="https://cri-o.io/">cri-o</a> as the container runtime in the
Helm chart&rsquo;s extensive <a href="https://gitea.com/gitea/helm-chart/src/tag/v10.6.0/README.md">README.md</a>.</p>
<p>External access is split between two different subdomains, one for Gitea&rsquo;s web
frontend going through my Traefik ingress and one for git access with SSH. I
like setting my LoadBalancer services, <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">provided by Cilium</a>,
to the Local <code>externalTrafficPolicy</code>. This ensures that the client IP those
services see are the actual client IPs, and not the IPs of the machine which
received the request and forwarded it to the service it was intended for.
The <code>homelab/public-service</code> label is simply a sign for Cilium that it should
handle the service.</p>
<p>The ingress config has one important point, the <code>tls</code> key. I initially did not
set that key, because I&rsquo;ve never set it before - my Traefik automatically
uses HTTPS and I&rsquo;m using a wildcard cert. But Gitea needs to generate publicly
addressable URLs for some things, e.g. when providing clone URLs for HTTPS or
providing callback URLs in webhooks, e.g. for Woodpecker.
The domain itself is fine, but the chart determines the protocol like this:</p>
<pre tabindex="0"><code>{{- define &#34;gitea.public_protocol&#34; -}}
{{- if and .Values.ingress.enabled (gt (len .Values.ingress.tls) 0) -}}
https
{{- else -}}
{{ .Values.gitea.config.server.PROTOCOL }}
{{- end -}}
{{- end -}}
</code></pre><p>In there, <code>https</code> is only set as the protocol if the <code>ingress.tls</code> list has at
least one entry. And setting the <code>server.PROTOCOL</code> config comes with it&rsquo;s own
problems, so I decided to just add the <code>tls.hosts</code> setting, even if it means
that I have to repeat my Gitea domain a number of times in the <code>config.yaml</code>.</p>
<p>For the <code>persistence</code> setting I had to go with a pre-created volume, because
I had to make the migrated Gitea data available. One thing to note here is
that you should delete the <code>app.ini</code> file your previous setup might have left
on the disk. The init container which puts together the <code>app.ini</code> from the
values and env variables and so on doesn&rsquo;t handle it well when there&rsquo;s an
existing <code>app.ini</code> it didn&rsquo;t create itself.</p>
<p>I also disabled a number of features I didn&rsquo;t need, like signing or actions
or the Redis and Postgres instances the Helm chart can deploy, because I&rsquo;ve
already got my own deployments.</p>
<h3 id="the-gitea-config">The Gitea config</h3>
<p>Here&rsquo;s the full <code>gitea:</code> section of the <code>config.yaml</code>, just for reference. I
will post the relevant subsections as I go over them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">oauth</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">provider</span>: <span style="color:#e6db74">&#34;openidConnect&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">oidc-credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">autoDiscoverUrl</span>: <span style="color:#e6db74">&#34;https://key.example.com/realms/homelab/.well-known/openid-configuration&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">APP_NAME</span>: <span style="color:#e6db74">&#34;My Gitea&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">RUN_MODE</span>: <span style="color:#e6db74">&#34;prod&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_SERVER_HOST_KEYS</span>: <span style="color:#e6db74">&#34;ssh/gitea.ed25519&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">APP_DATA_PATH</span>: <span style="color:#e6db74">&#34;/data/gitea_data&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_DOMAIN</span>: <span style="color:#e6db74">&#34;git.example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_PORT</span>: <span style="color:#ae81ff">2222</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DB_TYPE</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">LOG_SQL</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">oauth2</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DISABLE_REGISTRATION</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">REQUIRE_SIGNIN_VIEW</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_KEEP_EMAIL_PRIVATE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ALLOW_CREATE_ORGANIZATION</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ORG_VISIBILITY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ORG_MEMBER_VISIBLE</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ENABLE_TIMETRACKING</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SHOW_REGISTRATION_BUTTON</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repository</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ROOT</span>: <span style="color:#e6db74">&#34;/data/git-repos&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCRIPT_TYPE</span>: <span style="color:#ae81ff">bash</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_PRIVATE</span>: <span style="color:#ae81ff">private</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_BRANCH</span>: <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ui</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_THEME</span>: <span style="color:#ae81ff">gitea-auto</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">queue</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TYPE</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">CONN_STR</span>: <span style="color:#e6db74">&#34;addr=redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">WORKERS</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">BOOST_WORKERS</span>: <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_EMAIL_NOTIFICATIONS</span>: <span style="color:#ae81ff">disabled</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">openid</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLE_OPENID_SIGNIN</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">webhook</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ALLOWED_HOST_LIST</span>: <span style="color:#ae81ff">private</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mailer</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SUBJECT_PREFIX</span>: <span style="color:#e6db74">&#34;[Gitea]&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SMTP_ADDR</span>: <span style="color:#ae81ff">mail.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SMTP_PORT</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">FROM</span>: <span style="color:#e6db74">&#34;gitea@example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">USER</span>: <span style="color:#e6db74">&#34;gitea@example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ADAPTER</span>: <span style="color:#e6db74">&#34;redis&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">INTERVAL</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">HOST</span>: <span style="color:#e6db74">&#34;network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ITEM_TTL</span>: <span style="color:#ae81ff">7d</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">session</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER_CONFIG</span>: <span style="color:#ae81ff">network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">time</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_UI_LOCATION</span>: <span style="color:#e6db74">&#34;Europe/Berlin&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.archive_cleanup</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.update_mirrors</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.repo_health_check</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;0 30 5 * * *&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TIMEOUT</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.check_repo_stats</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;0 0 5 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.update_migration_poster_id</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.sync_external_users</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">UPDATE_EXISTING</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.deleted_branches_cleanup</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">migrations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ALLOW_LOCALNETWORKS</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">packages</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">STORAGE_TYPE</span>: <span style="color:#ae81ff">minio</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_ENDPOINT</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_LOCATION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_USE_SSL</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">additionalConfigFromEnvs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__HOST</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__USER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__PASSWD</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__SECURITY__SECRET_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">key</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__OAUTH2__JWT_SECRET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jwt-secret</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">jwt</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__MAILER__PASSWD</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mail-pw</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_BUCKET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">configMapKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the <code>gitea.admin</code> config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#66d9ef">null</span>
</span></span></code></pre></div><p>I&rsquo;ve already got an admin account, so I didn&rsquo;t want the Helm chart to create a
new one. I thought I could do that by just setting <code>admin: {}</code>, but that of course
doesn&rsquo;t work. So the Helm chart created an admin user with the chart&rsquo;s default
<code>gitea.admin.password</code>. But I then figured out that setting all values to <code>null</code>
does work. It&rsquo;s important to note that Gitea doesn&rsquo;t then remove the newly
created admin user again. It needs to be deleted manually via the UI.</p>
<p>The <code>gitea.oauth</code> config is also worth a paragraph. First, it&rsquo;s important to note
that this is the config for Gitea as an Oauth2 <em>client</em>. The config for Gitea
as an identity provider has to be done in another place.
I&rsquo;m using <a href="https://www.keycloak.org/">Keycloak</a> as my identity provider in the
Homelab. For more details, see <a href="https://blog.mei-home.net/posts/sso/">this post</a>.
The issue is that Gitea&rsquo;s OAuth2 client config can only be done in the UI or
via the CLI, not via the config file. And I had already taken my Nomad instance
down at this point. I could get the client ID and secret from Keycloak, but not
the name under which it was saved in the database for example. It was also
pretty unclear what options should be set under the <code>gitea.oauth</code> key. I finally
ended up looking into the <a href="https://gitea.com/gitea/helm-chart/src/tag/v10.6.0/templates/gitea/init.yaml">init container script</a>,
which is a bash script using the Gitea CLI to create the OAuth2 entry:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>    <span style="color:#66d9ef">function</span> configure_oauth<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- <span style="color:#66d9ef">if</span> .Values.gitea.oauth <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- range $idx, $value :<span style="color:#f92672">=</span> .Values.gitea.oauth <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      local OAUTH_NAME<span style="color:#f92672">={{</span> <span style="color:#f92672">(</span>printf <span style="color:#e6db74">&#34;%s&#34;</span> $value.name<span style="color:#f92672">)</span> | squote <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      local full_auth_list<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>gitea admin auth list --vertical-bars<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>      local actual_auth_table<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># We might have distorted output due to warning logs, so we have to detect the actual user table by its headline and trim output above that line</span>
</span></span><span style="display:flex;"><span>      local regex<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;(.*)(ID\s+\|Name\s+\|Type\s+\|Enabled.*)&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>full_auth_list<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span>~ $regex <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>        actual_auth_table<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>BASH_REMATCH[2]<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | tail -n+2<span style="color:#66d9ef">)</span> <span style="color:#75715e"># tail&#39;ing to drop the table headline</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      local AUTH_ID<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>actual_auth_table<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | grep -E <span style="color:#e6db74">&#34;\|</span><span style="color:#e6db74">${</span>OAUTH_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">\s+\|&#34;</span> | grep -iE <span style="color:#e6db74">&#39;\|OAuth2\s+\|&#39;</span> | awk -F <span style="color:#e6db74">&#34; &#34;</span>  <span style="color:#e6db74">&#34;{print \$1}&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>AUTH_ID<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#34;No oauth configuration found with name &#39;</span><span style="color:#e6db74">${</span>OAUTH_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;. Installing it now...&#34;</span>
</span></span><span style="display:flex;"><span>        gitea admin auth add-oauth <span style="color:#f92672">{{</span>- include <span style="color:#e6db74">&#34;gitea.oauth_settings&#34;</span> <span style="color:#f92672">(</span>list $idx $value<span style="color:#f92672">)</span> | indent <span style="color:#ae81ff">1</span> <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#39;...installed.&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#34;Existing oauth configuration with name &#39;</span><span style="color:#e6db74">${</span>OAUTH_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;: &#39;</span><span style="color:#e6db74">${</span>AUTH_ID<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;. Running update to sync settings...&#34;</span>
</span></span><span style="display:flex;"><span>        gitea admin auth update-oauth --id <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>AUTH_ID<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">{{</span>- include <span style="color:#e6db74">&#34;gitea.oauth_settings&#34;</span> <span style="color:#f92672">(</span>list $idx $value<span style="color:#f92672">)</span> | indent <span style="color:#ae81ff">1</span> <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#39;...sync settings done.&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- end <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- <span style="color:#66d9ef">else</span> <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#39;no oauth configuration... skipping.&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- end <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    configure_oauth
</span></span></code></pre></div><p>I&rsquo;ve removed some unimportant bits for the sake of brevity (heh, brevity &#x1f602;).
What we can see here is that the entries in the <code>gitea.oauth</code> section are converted
into CLI flags and their parameters 1:1.
For finding the right options I used for my Keycloak setup, I ended up looking
into the database:</p>
<pre tabindex="0"><code>\c gitea
SELECT * FROM login_source;

id | type |   name   | is_sync_enabled |                                                                                                                                                                                               cfg                                                                                                                                                                                               | created_unix | updated_unix | is_active
----+------+----------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+--------------+-----------
  1 |    6 | Keycloak | f               | {&#34;Provider&#34;:&#34;openidConnect&#34;,&#34;ClientID&#34;:&#34;bar&#34;,&#34;ClientSecret&#34;:&#34;foo&#34;,&#34;OpenIDConnectAutoDiscoveryURL&#34;:&#34;https://key.example.com/realms/homelab/.well-known/openid-configuration&#34;,&#34;CustomURLMapping&#34;:null,&#34;IconURL&#34;:&#34;&#34;,&#34;Scopes&#34;:null,&#34;RequiredClaimName&#34;:&#34;&#34;,&#34;RequiredClaimValue&#34;:&#34;&#34;,&#34;GroupClaimName&#34;:&#34;&#34;,&#34;AdminGroup&#34;:&#34;&#34;,&#34;RestrictedGroup&#34;:&#34;&#34;,&#34;SkipLocalTwoFA&#34;:true} |   1678573526 |   1678573526 | t
(1 row)
</code></pre><p>But this still left the question of how the <code>gitea.oauth.existingSecret</code> should
be formatted. Which keys was the chart expecting the Secret to have?
I wasn&rsquo;t able to find any info, so I ended up looking first for the place where
the <code>gitea.oauth_settings</code> from the init script above was defined, which lead
me to the chart&rsquo;s <a href="https://gitea.com/gitea/helm-chart/src/tag/v10.6.0/templates/_helpers.tpl">helpers</a>
again:</p>
<pre tabindex="0"><code>{{- define &#34;gitea.oauth_settings&#34; -}}
{{- $idx := index . 0 }}
{{- $values := index . 1 }}

{{- if not (hasKey $values &#34;key&#34;) -}}
{{- $_ := set $values &#34;key&#34; (printf &#34;${GITEA_OAUTH_KEY_%d}&#34; $idx) -}}
{{- end -}}

{{- if not (hasKey $values &#34;secret&#34;) -}}
{{- $_ := set $values &#34;secret&#34; (printf &#34;${GITEA_OAUTH_SECRET_%d}&#34; $idx) -}}
{{- end -}}

{{- range $key, $val := $values -}}
{{- if ne $key &#34;existingSecret&#34; -}}
{{- printf &#34;--%s %s &#34; ($key | kebabcase) ($val | quote) -}}
{{- end -}}
{{- end -}}
{{- end -}}
</code></pre><p>Here, the <code>key</code> and <code>secret</code> values, if not defined in the chart, are set to
the <code>GITEA_OAUH_KEY_$ID</code> and <code>GITEA_OUATH_SECRET_$ID</code> env variables. Looking for those variables then lead me
to the <a href="https://gitea.com/gitea/helm-chart/src/branch/main/templates/gitea/deployment.yaml">Deployment template</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA_OAUTH_KEY_{{ $idx }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>:  <span style="color:#ae81ff">key</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: {{ <span style="color:#ae81ff">$value.existingSecret }}</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA_OAUTH_SECRET_{{ $idx }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>:  <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: {{ <span style="color:#ae81ff">$value.existingSecret }}</span>
</span></span></code></pre></div><p>And here I finally had my answer: The Secret should have a key <code>key</code> and a key
<code>secret</code> for the two values. Armed with that info I could finally define the
OAuth2 options:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">oauth</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">provider</span>: <span style="color:#e6db74">&#34;openidConnect&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">oidc-credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">autoDiscoverUrl</span>: <span style="color:#e6db74">&#34;https://key.example.com/realms/homelab/.well-known/openid-configuration&#34;</span>
</span></span></code></pre></div><p>One thing which annoyed me is in Gitea&rsquo;s S3 config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">STORAGE_TYPE</span>: <span style="color:#ae81ff">minio</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_ENDPOINT</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_LOCATION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_USE_SSL</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The <code>MINIO_ENDPOINT</code> needs to have the host and port in one value. But the
ConfigMap created by Rook for a new bucket contains them only in separate keys,
meaning I had to hardcode the value in the <code>values.yaml</code> instead of taking it
from the ConfigMap.
But at least I could still use the Secret Rook creates to get the S3 credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">additionalConfigFromEnvs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span></code></pre></div><p>An option like this is what more Helm charts should have: The ability to use
the <code>valueFrom</code> form of defining env variables. With this, I can easily use
autogenerated Secrets and ConfigMaps without having to jump through hoops.</p>
<p>Next stumbling block was the redis config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ADAPTER</span>: <span style="color:#e6db74">&#34;redis&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">INTERVAL</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">HOST</span>: <span style="color:#e6db74">&#34;network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ITEM_TTL</span>: <span style="color:#ae81ff">7d</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">session</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER_CONFIG</span>: <span style="color:#ae81ff">network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">queue</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TYPE</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">CONN_STR</span>: <span style="color:#e6db74">&#34;addr=redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">WORKERS</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">BOOST_WORKERS</span>: <span style="color:#ae81ff">5</span>
</span></span></code></pre></div><p>Here I wasn&rsquo;t aware that the connection string has to have a certain format
and isn&rsquo;t just host:port. Took me a while to figure out why I wasn&rsquo;t able to make
a connection with Redis.</p>
<p>And finally, another word on YAML: Check your indentation! &#x1f605;
I had to make sure that the entire network could reach the SSH service so I
could actually use it for git operations. So I added <code>fromEntities:\n - world</code> to the
network policy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;gitea-access&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;app.kubernetes.io/name&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;gitea&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>And when I still could not connect, I checked with Cilium&rsquo;s monitoring:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-vh5jj -- cilium monitor --type drop
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span></code></pre></div><p>Fast forward through an hour of reading through Cilium&rsquo;s network policy docs,
and I took another look at the policy - and realized that I had screwed up the
indentation. &#x1f926;
It should of course look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;gitea-access&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;app.kubernetes.io/name&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;gitea&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>With the <code>fromEntities</code> entry in the <code>ingress:</code> list, not the <code>fromEndpoints:</code>
list.
And after that it was all up and running. Woodpecker, my CI, did not need any
additional config to access Gitea, it worked out of the box. Likely because
it uses HTTPS for Git access and goes through the standard Gitea URL. And I
don&rsquo;t think I can change that to have it use the internal service instead of
going through the ingress. That&rsquo;s because it also uses Gitea for auth, and
I don&rsquo;t think it will handle having two different URLs to access Gitea very
well. But that still ended up on the rickety pile of Homelab tasks to look at
at some point.</p>
<p>Overall, it was a good migration and allowed me to figure out my DB migration
strategy with a service which I could do without for a couple of days.
I also have to congratulate the Gitea community on their work on the Helm
chart. It was definitely one of the better ones I&rsquo;ve used.</p>
<p>And that&rsquo;s it for today. I can&rsquo;t say what&rsquo;s going to be next on the migration
list, as I haven&rsquo;t decided yet. I first thought to migrate my IoT services,
Mosquitto, zigbee2mqtt and friends, but I&rsquo;d also like to tackle some of the
bigger items, like Nextcloud. On the other hand, I&rsquo;m really not looking forward
to touching my Nextcloud deployment. It has been working so nicely.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 15: Migrating my CI</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-15-ci/</link>
      <pubDate>Sun, 26 Jan 2025 22:50:33 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-15-ci/</guid>
      <description>Migrating my Drone CI install on Nomad to a Woodpecker CI on Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Drone CI setup on Nomad to a Woodpecker CI setup on k8s.</p>
<p>This is part 16 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Finally, another migration blog post! I&rsquo;m still rather happy that I&rsquo;m getting
into it again.
For several years now, I&rsquo;ve been running a CI setup to automate a number of
tasks related to some personal projects. CI stands for <a href="https://en.wikipedia.org/wiki/Continuous_integration">Continuous Integration</a>,
and Wikipedia says this about it:</p>
<blockquote>
<p>Continuous integration (CI) is the practice of integrating source code changes frequently and ensuring that the integrated codebase is in a workable state.</p></blockquote>
<p>I&rsquo;m pretty intimately familiar with the concept on a rather large scale, as I&rsquo;m
working in a CI team at a large company.</p>
<p>In the Homelab, I&rsquo;m using CI for a variety of use cases, ranging from the
traditional automated test cases for software I&rsquo;ve written to just a convenient
automation for things like container image builds. I will go into details on a
few of those use cases later on, when I describe how I&rsquo;ve migrated some of my
projects.</p>
<p>The basic principle of CI for me is: You push a commit to a Git repository,
and a piece of software automatically launches a variety of test jobs. These
can range from UT jobs, over automated linter runs up to automated deploys of
the updated software.</p>
<h2 id="from-drone-ci-to-woodpecker-ci">From Drone CI to Woodpecker CI</h2>
<p>Since I started running a CI, I&rsquo;ve been using <a href="https://www.drone.io/">Drone CI</a>.
It&rsquo;s a relatively simple CI system, compared to what one could build e.g. with
<a href="https://zuul-ci.org/">Zuul</a>, <a href="https://www.jenkins.io/">Jenkins</a> and <a href="https://www.gerritcodereview.com/">Gerrit</a>.</p>
<p>Drone CI consists of two components, the Drone CI server providing web hooks for
the Git Forge to call and launching the jobs, and agents, which take the jobs
and run them. In my deployment on Nomad, I was using the <a href="https://github.com/drone-runners/drone-runner-docker">drone-runner-docker</a>.
It mounts the host&rsquo;s Docker socket into the agent and uses it to launch Docker
containers for each step of the CI pipeline.</p>
<p>It has always worked well for me and mostly got out of my way. So I didn&rsquo;t switch
to <a href="https://woodpecker-ci.org/">Woodpecker CI</a> because of features. There aren&rsquo;t
that many different features anyway, because Woodpecker is a community fork of
Drone CI.
Rather, Drone CI started to have quite a bad smell. What bothered me the most
was that their release notes were basically empty and said things like
&ldquo;integrated UI updates&rdquo;.
Then there is whatever happens after they were bought by Harness. Then there&rsquo;s
the fact that the component which needs to mount your host&rsquo;s Docker socket hasn&rsquo;t
been updated in over a year.</p>
<p>In contrast, Woodpecker is a community project and had a far nicer smell, so I
decided that while I was at it, I would not just migrate Drone to k8s but also
switch to Woodpecker.</p>
<p>One of the things I genuinely looked forward to was the backend. With the migration
to k8s, I could finally make use of my entire cluster. With Drone&rsquo;s Docker runner,
I always had to reserve a lot of resources for the CI job execution on the nodes
where the agents were launched.
Now, with the Kubernetes backend, it doesn&rsquo;t matter (much, more later) where
the agents are running - the only thing they do is launching Pods to run each
step of the pipeline, but where those are scheduled is left to Kubernetes.</p>
<p>I will go into more detail later, when talking about my CI job migrations,
but let me still give a short example of what I&rsquo;m actually talking about.</p>
<p>Here&rsquo;s a slight variation of the example pipeline from the <a href="https://woodpecker-ci.org/docs/usage/intro">Woodpecker docs</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">branch</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">debian</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;This is the build step&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;binary-data-123&#34; &gt; executable</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">a-test-step</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">golang:1.16</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;Testing ...&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./executable</span>
</span></span></code></pre></div><p>This pipeline tells Woodpecker that it should only be run when a Git push is
done to the <code>master</code> branch of the repository. This file would be committed to
the repository it&rsquo;s used in, but there are also options to tell Woodpecker
to listen on events for other repositories. So you could theoretically even have
a separate &ldquo;CI&rdquo; repository with all the pipelines. But that&rsquo;s generally not a
good idea.</p>
<p>The pipeline itself will execute two separate steps, called &ldquo;build&rdquo; and &ldquo;a-test-step&rdquo;.
The <code>image:</code> parameter defines which container image is executed, in this case
Debian and the golang image. And then follows a list of commands to be run.
In this case, they&rsquo;re pretty nonsensical and will lead to failed pipelines,
but it&rsquo;s only here for demonstration purposes anyway. In the Woodpecker web UI,
this is what the pipeline looks like:</p>
<figure>
    <img loading="lazy" src="first_run.png"
         alt="A screenshot of the Woodpecker web UI. It is separated into two main areas. The left one shows an overview of the pipeline and its steps. At the top left, it shows that the pipeline was launched by a push from user mmeier. Below that follows the list of steps, showing in order: clone, build, a-test-step. Both clone and build have a green check mark next to them, while a-test-step has a red X. The a-test-step step is also highlighted. On the right side, a window header &#39;Step Logs&#39; shows the logs from the a-test-step execution. It starts out echoing the string &#39;Testing ...&#39;, followed by &#39;/bin/sh: 18: ./executable: Permission denied&#39;."/> <figcaption>
            <p>Screenshot of my first Woodpecker CI pipeline execution.</p>
        </figcaption>
</figure>

<h2 id="database-deployment">Database deployment</h2>
<p>To begin with, Woodpecker needs a bit of infrastructure set up, namely a
Postgres database. Smaller deployments can also be run on SQLite, I&rsquo;m using
Postgres mostly out of habit.</p>
<p>As I&rsquo;ve <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">written about before</a>,
I&rsquo;m using <a href="https://cloudnative-pg.io/">CloudNativePG</a> for my Postgres DB needs.
In the recent <a href="https://cloudnative-pg.io/documentation/1.25/release_notes/v1.25/">1.25 release</a>,
CNPG introduced support for creating multiple databases in a single Cluster.
But because I&rsquo;ve already started with &ldquo;one Cluster per app&rdquo;, I decided to stay
with that approach for the duration of the k8s migration and look into merging
it all into one Cluster later.</p>
<p>Because I&rsquo;ve written about it in detail before, here&rsquo;s just the basic options
for the CNPG Cluster CRD I&rsquo;m using:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">woodpecker-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:16.2-10&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;50MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;150MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;12800kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;1536kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;128kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_keep_size</span>: <span style="color:#e6db74">&#34;512MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-woodpecker</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-woodpecker</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">woodpecker-pg-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">woodpecker-pg-cluster</span>
</span></span></code></pre></div><p>As always, I&rsquo;m configuring backups right away.
For CNPG to work, the operator needs network access to the Postgres instance
started up in the Woodpecker namespace, so a network policy is also needed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-pg-cluster-allow-operator-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cnpg.io/cluster</span>: <span style="color:#ae81ff">woodpecker-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">cloudnative-pg</span>
</span></span></code></pre></div><p>While we&rsquo;re on the topic of network policies, here&rsquo;s my generic deny-all
policy I&rsquo;m using in most namespaces:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-deny-all-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span></code></pre></div><p>This allows all intra-namespace access between Pods, but no ingress from any
Pods in other namespaces.</p>
<p>And because Woodpecker provides a web UI, I also need to provide access to the
<code>server</code> Pod from my Traefik ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-traefik-access&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;app.kubernetes.io/name&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;server&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span></code></pre></div><p>Hm, writing all of this up I&rsquo;m realizing that I completely forgot to write a
post about some &ldquo;standard things&rdquo; I will be doing for most apps. I had planned
to do that for the migration of my Audiobookshelf instance to k8s, but
completely forgot to write any post about it at all. Will put it on the pile. &#x1f604;</p>
<p>Before getting to the Woodpecker Helm chart, we also need to do a bit of
yak shaving with regards to the CNPG DB secrets. Helpfully, CNPG always
creates a secret with the necessary credentials to access the database,
in multiple formats. An example would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">host</span>: <span style="color:#ae81ff">woodpecker-pg-cluster-rw</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jdbc-uri</span>: <span style="color:#ae81ff">jdbc:postgresql://woodpecker-pg-cluster-rw.woodpecker:5432/woodpecker?password=1234&amp;user=woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#ae81ff">1234</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgpass</span>: <span style="color:#ae81ff">woodpecker-pg-cluster-rw:5432:woodpecker:woodpecker:1234</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">uri</span>: <span style="color:#ae81ff">postgresql://woodpecker:1234@woodpecker-pg-cluster-rw.woodpecker:5432/woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">username</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span></code></pre></div><p>I would love to be able to use the values from that Secret verbatim, specifically
the <code>uri</code> property, to set the <code>WOODPECKER_DATABASE_DATASOURCE</code> variable from
it. But sadly, the <a href="https://github.com/woodpecker-ci/helm">Woodpecker Helm chart</a>
is one of those which do allow Secrets to be used to set environment variables -
but only via <code>envFrom.secretRef</code>. Which feeds the Secret&rsquo;s keys in as env
variables, but doesn&rsquo;t allow to set specific env variables to specific keys
from the secret, via <code>env.valueFrom.secretKeyRef</code>.</p>
<p>I think this should be a
functionality every Helm chart provides, specifically for cases like this. I&rsquo;ve
got two tools which automatically create Secrets in my cluster, CNPG for DB
credentials and configs, and Rook, which creates Secrets and ConfigMaps for
S3 buckets and Ceph users created through its CRDs.
But every tool/Helm chart seems to have their own ideas about the env variables
certain things should be stored in. The S3 credential env vars in the case of
Rook&rsquo;s S3 buckets should work in most cases because they&rsquo;re pretty standardized,
but everything else is pretty much hit-and-miss.</p>
<p>And, with the <code>env.valueFrom</code> functionality for both Secrets and ConfigMaps,
Kubernetes already provides the necessary utility to assign specific keys from
them to specific env vars. A number of Helm charts just need to allow me to
make use of that, instead of insisting on Secrets with a specific group of keys.</p>
<p>Anyway, in the case of Secrets, I&rsquo;ve found a pretty roundabout way to achieve
what I want, namely being able to use automatically created credentials.
And I&rsquo;m using my <a href="https://external-secrets.io/latest/">External Secrets</a>
deployment for this, more specifically the ability to configure a Kubernetes
namespace as a SecretStore:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secrets-store</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteNamespace</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">serviceAccount</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-root-ca.crt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">selfsubjectrulesreviews</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span></code></pre></div><p>This SecretStore then allows me to use External Secret&rsquo;s ExternalSecret
templating to take the CNPG Secret created automatically and bring it into a
format to make it usable with the Woodpecker Helm chart. I decided that I would
use the <code>envFrom.secretRef</code> method to turn all of the Secret&rsquo;s keys into env
variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-db-secret&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secrets-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">WOODPECKER_DATABASE_DATASOURCE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">woodpecker-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">uri</span>
</span></span></code></pre></div><p>That ExternalSecret takes the <code>uri</code> key from the automatically created CNPG
Secret and writes its content into a new Secret&rsquo;s <code>WOODPECKER_DATABASE_DATASOURCE</code>
key.
And just like that, I have a Secret in the right format to use it with
Woodpecker&rsquo;s Helm chart.</p>
<p>After I implemented the above, I had another thought how I could do the same
thing without taking the detour via ExternalSecret. The Helm chart does provide
options to add extra volume mounts. Furthermore, Woodpecker has the
<code>WOODPECKER_DATABASE_DATASOURCE_FILE</code> variable, which allows reading the
connection string from a file. So I could have mounted the CNPG DB Secret as a
volume and then provided the path to the file with the <code>uri</code> key in this
variable. Sadly I found this a bit late, but I will keep this possibility in
mind should I come across another Helm chart which lacks the possibility
to assign arbitrary Secret keys to env variables.</p>
<h2 id="temporary-storageclass">Temporary StorageClass</h2>
<p>Woodpecker needs some storage for every pipeline executed. That storage is
shared between all steps and is used to clone the repository and share
intermediate artifacts between steps.</p>
<p>With the Kubernetes backend, Woodpecker uses PersistentVolumeClaims, one per
pipeline run. It also automatically cleans those up after the pipeline has run
through.
The issue for me is that in my Rook Ceph setup, the StorageClasses all have their
reclaim policy set to <code>Retain</code>. This is mostly because I&rsquo;m not the smartest guy
under the sun, and there&rsquo;s a real chance that I might accidentally remove a
PVC with data I would really like to keep.
But that&rsquo;s a problem for these temporary PVCs, which are only relevant for the
duration of a single pipeline run. Using my standard StorageClasses would mean
ending up with a lot of unused PersistentVolumes.</p>
<p>So I had to create another StorageClass with the reclaim policy set to <code>Delete</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">storage.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">StorageClass</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-fs-temp</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">provisioner</span>: <span style="color:#ae81ff">rook-ceph.cephfs.csi.ceph.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Delete</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterID</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">fsName</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pool</span>: <span style="color:#ae81ff">homelab-fs-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span></code></pre></div><p>This uses CephFS as the provider, because I like those volumes to be RWX capable,
which is not the case for RBD based volumes.</p>
<p>Using this StorageClass, the PersistentVolume is deleted when the PVC is
deleted, freeing the space for the next pipeline run.</p>
<h2 id="gitea-configuration">Gitea configuration</h2>
<p>Because Woodpecker needs access to Gitea, there&rsquo;s some configuration
necessary as well, mainly related to the fact that Woodpecker doesn&rsquo;t have its
own authentication and instead relies on the forge it&rsquo;s connected to.</p>
<p>To begin with, Woodpecker needs to be added as an OAuth2 application. This can
be done by any user, under the <code>https://gitea.example.com/user/settings/applications</code>
URL. The configuration is the same as for any other OAuth2 provider, Woodpecker
needs a client ID and a client secret.</p>
<p>The application can be given any name, and the redirect URL has to be
<code>https://&lt;your-woodpecker-url&gt;/authorize</code>:</p>
<figure>
    <img loading="lazy" src="gitea_add_app.png"
         alt="A screenshot of Gitea&#39;s OAuth2 client app creation form. In the &#39;Application Name&#39; field, it shows &#39;Woodpecker Blog Example&#39;, and in the &#39;Redirect URIs&#39; field, it shows &#39;https://ci.example.com/authorize&#39;. The &#39;Confidential Client&#39; option is enabled."/> <figcaption>
            <p>Gitea&rsquo;s OAuth2 creation form.</p>
        </figcaption>
</figure>

<p>After clicking <em>Create Application</em>, Gitea creates the app and shows the
necessary information:</p>
<figure>
    <img loading="lazy" src="gitea_add_info.png"
         alt="A screenshot of Gitea&#39;s OAuth2 app information screen. It shows the randomly generated &#39;Client ID&#39; and &#39;Client Secret&#39; and allows changing the &#39;Application Name&#39; and &#39;Redirect URIs&#39; fields."/> <figcaption>
            <p>Gitea&rsquo;s OAuth2 information page.</p>
        </figcaption>
</figure>

<p>I then copied the <em>Client ID</em> and <em>Client Secret</em> fields into my Vault instance
and provided them to Kubernetes with another ExternalSecret:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;gitea-secret&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hashi-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">WOODPECKER_GITEA_CLIENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/gitea-oauth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">clientid</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">WOODPECKER_GITEA_SECRET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/gitea-oauth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">clientSecret</span>
</span></span></code></pre></div><p>That was all the Gitea config necessary. There&rsquo;s going to be one more step
when accessing Woodpecker for the first time. Because it uses OAuth2, it will
redirect you to Gitea to log in, and Gitea will then need confirmation that
Woodpecker can access your account info and repositories.</p>
<h2 id="deploying-woodpecker">Deploying Woodpecker</h2>
<p>For deploying Woodpecker itself, I&rsquo;m using the <a href="https://github.com/woodpecker-ci/helm">official Helm chart</a>.
It&rsquo;s split into two subcharts, one for the agents which run the pipelines and
one for the server. Let&rsquo;s start with the server part of the <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_OPEN</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_HOST</span>: <span style="color:#e6db74">&#39;https://ci.example.com&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_DISABLE_USER_AGENT_REGISTRATION</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_DATABASE_DRIVER</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_GITEA</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_GITEA_URL</span>: <span style="color:#e6db74">&#34;https://gitea.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_PLUGINS_PRIVILEGED</span>: <span style="color:#e6db74">&#34;woodpeckerci/plugin-docker-buildx:latest-insecure&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraSecretNamesForEnvFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">gitea-secret</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">woodpecker-db-secret</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistentVolume</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">ci.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span></code></pre></div><p>As I do so often, I explicitly set <code>metrics.enabled</code> to <code>false</code>, so that later
I can go through my Homelab repo and slowly enable metrics for the apps I&rsquo;m
interested in, just by grepping for <code>metrics</code>.</p>
<p>Woodpecker is entirely configured through environment variables. I&rsquo;ve configured
those which don&rsquo;t contain secrets right in the <code>values.yaml</code>, and the secrets
are added via the <code>extraSecretNamesForEnvFrom</code> list. Those are the Gitea OAuth2
and CNPG DB Secrets. The server itself also needs some storage space, which I
put on my bulk storage pool with the <code>persistentVolume</code> option. I&rsquo;m also
configuring the Ingress and resources.</p>
<p>A short comment on the resources: Make sure that you know what you&rsquo;re doing. &#x1f605;
I initially had the <code>cpu: 100m</code> resource set under <code>limits</code> accidentally. And
then I was wondering yesterday why the Woodpecker server was restarted so often
due to failed liveness probes. Turns out that the <code>100m</code> is not enough CPU
when the Pod happens to run on a Pi 4 and I&rsquo;m also clicking around in the Web UI.
The liveness probe then doesn&rsquo;t get a timely answer and starts failing, ultimately
restarting the Pod.</p>
<p>The second part of a Woodpecker deployment are the agents. Those are the part
of Woodpecker that runs the actual pipelines, launching the containers for each
step. Woodpecker supports multiple backends. The first one is the traditional
Docker backend, which needs the agent to have access to the Docker socket.
That&rsquo;s the config I&rsquo;ve been running up to now with my Drone setup.
The two biggest downsides for me were the fact that a piece of software explicitly
intended to execute arbitrary code would have full access to the host&rsquo;s Docker
daemon.
The second one was that the agent could only run pipelines on its own host, which
meant that it couldn&rsquo;t distribute the different steps in my entire Nomad cluster.</p>
<p>Now, with Woodpecker, I&rsquo;m making use of the <a href="https://woodpecker-ci.org/docs/administration/backends/kubernetes">Kubernetes Backend</a>.
With this backend, the agents themselves only work as an interface to the
k8s API, launching one Pod for each step and creating the PVC used as shared
storage for all steps of a pipeline.</p>
<p>One quirk of the Kubernetes backend is that it adds a NodeSelector to the
architecture of the agent which is launching the pipeline. So when the agent
executing a pipeline happens to be an ARM64 machine, all Pods for that pipeline
will also run on ARM64 machines. But this can be controlled for individual
steps as well.</p>
<p>Here is the agent portion of the Woodpecker Helm <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">agent</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicaCount</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND</span>: <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_MAX_WORKFLOWS</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_NAMESPACE</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_VOLUME_SIZE</span>: <span style="color:#ae81ff">10G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_STORAGE_CLASS</span>: <span style="color:#ae81ff">homelab-fs-temp</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_STORAGE_RWX</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceAccount</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">create</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">rbasc</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">create</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">topologySpreadConstraints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">maxSkew</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">topologyKey</span>: <span style="color:#e6db74">&#34;kubernetes.io/arch&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">whenUnsatisfiable</span>: <span style="color:#e6db74">&#34;DoNotSchedule&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labelSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;app.kubernetes.io/name&#34;: </span><span style="color:#ae81ff">agent</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;app.kubernetes.io/instance&#34;: </span><span style="color:#ae81ff">woodpecker</span>
</span></span></code></pre></div><p>Here I&rsquo;m configuring two agents to be run, and one each on a different
architecture. In my cluster, this leads to one agent running on AMD64 and one
running on ARM64, through the <code>topologySpreadConstraints</code>. I&rsquo;m also telling
the agents which StorageClass to use, as I explained above I had to create
a new one with retention disabled. I&rsquo;m setting a default 10 GB size for the
volume.</p>
<p>Before continuing with some CI pipeline configs, let&rsquo;s have a short look at
the Pods Woodpecker launches. I&rsquo;ve captured the Pod for the following Woodpecker
CI step:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">debian</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;This is the build step&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;binary-data-123&#34; &gt; executable</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">chmod u+x ./executable</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">sleep 120</span>
</span></span></code></pre></div><p>It looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">step</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjg6be5cq</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">/bin/sh</span>
</span></span><span style="display:flex;"><span>    - -<span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo $CI_SCRIPT | base64 -d | /bin/sh -e</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_AUTHOR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_AUTHOR_AVATAR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/avatars/d941e68cc8aa38efdee91c3e3c97159e</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_AUTHOR_EMAIL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier@noreply.gitea.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_BRANCH</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_MESSAGE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Add a sleep to inspect the Pod</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_REF</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">refs/heads/master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_SHA</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">353b9f67102ba120ffe9284aa711eb87c2542573</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/353b9f67102ba120ffe9284aa711eb87c2542573</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_FORGE_TYPE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_FORGE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_MACHINE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">woodpecker-agent-1</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_CREATED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888948&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_EVENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_FILES</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#39;[&#34;.woodpecker/my-first-workflow.yaml&#34;]&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_FINISHED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888960&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_FORGE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/353b9f67102ba120ffe9284aa711eb87c2542573</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;3&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_PARENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_STARTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888951&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_STATUS</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">success</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com/repos/1/pipeline/3</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_AUTHOR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_AUTHOR_AVATAR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/avatars/d941e68cc8aa38efdee91c3e3c97159e</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_AUTHOR_EMAIL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier@noreply.gitea.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_BRANCH</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_MESSAGE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Possibly fix permission error</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_REF</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">refs/heads/master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_SHA</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">b680ab9b9a7aa300d80a43bd389de0e57f767e4f</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/b680ab9b9a7aa300d80a43bd389de0e57f767e4f</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_CREATED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736800786&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_EVENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_FINISHED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736800827&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_FORGE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/b680ab9b9a7aa300d80a43bd389de0e57f767e4f</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;2&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_PARENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_STARTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736800790&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_STATUS</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com/repos/1/pipeline/2</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">adm/ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_CLONE_SSH_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">ssh://gituser@git.example.com:1234/adm/ci-tests.git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_CLONE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests.git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_DEFAULT_BRANCH</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_OWNER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">adm</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_PRIVATE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_REMOTE_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;94&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_SCM</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_TRUSTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_FINISHED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888960&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_STARTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888951&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_STATUS</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">success</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com/repos/1/pipeline/3</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_HOST</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">ci.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_PLATFORM</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">linux/amd64</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_VERSION</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">2.8.1</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_WORKFLOW_NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">my-first-workflow</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_WORKFLOW_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_WORKSPACE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/woodpecker/src/gitea.example.com/adm/ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HOME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/root</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SCRIPT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">CmlmIFsgLW4gIiRDSV9ORVRSQ19NQUNISU5FIiBdOyB0aGVuCmNhdCA8PEVPRiA+ICRIT01FLy5uZXRyYwptYWNoaW5lICRDSV9ORVRSQ19NQUNISU5FCmxvZ2luICRDSV9ORVRSQ19VU0VSTkFNRQpwYXNzd29yZCAkQ0lfTkVUUkNfUEFTU1dPUkQKRU9GCmNobW9kIDA2MDAgJEhPTUUvLm5ldHJjCmZpCnVuc2V0IENJX05FVFJDX1VTRVJOQU1FCnVuc2V0IENJX05FVFJDX1BBU1NXT1JECnVuc2V0IENJX1NDUklQVAoKZWNobyArICdlY2hvICJUaGlzIGlzIHRoZSBidWlsZCBzdGVwIicKZWNobyAiVGhpcyBpcyB0aGUgYnVpbGQgc3RlcCIKCmVjaG8gKyAnZWNobyAiYmluYXJ5LWRhdGEtMTIzIiA+IGV4ZWN1dGFibGUnCmVjaG8gImJpbmFyeS1kYXRhLTEyMyIgPiBleGVjdXRhYmxlCgplY2hvICsgJ2NobW9kIHUreCAuL2V4ZWN1dGFibGUnCmNobW9kIHUreCAuL2V4ZWN1dGFibGUKCmVjaG8gKyAnc2xlZXAgMTIwJwpzbGVlcCAxMjAK</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SHELL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/bin/sh</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">debian</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">Always</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjg6be5cq</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: {}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">terminationMessagePath</span>: <span style="color:#ae81ff">/dev/termination-log</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">terminationMessagePolicy</span>: <span style="color:#ae81ff">File</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjasgpcwn-0-default</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/run/secrets/kubernetes.io/serviceaccount</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-api-access-n75dj</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">workingDir</span>: <span style="color:#ae81ff">/woodpecker/src/gitea.example.com/adm/ci-tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dnsPolicy</span>: <span style="color:#ae81ff">ClusterFirst</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enableServiceLinks</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imagePullSecrets</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">regcred</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeName</span>: <span style="color:#ae81ff">sehith</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#ae81ff">amd64</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">preemptionPolicy</span>: <span style="color:#ae81ff">PreemptLowerPriority</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">priority</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">restartPolicy</span>: <span style="color:#ae81ff">Never</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedulerName</span>: <span style="color:#ae81ff">default-scheduler</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">securityContext</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceAccount</span>: <span style="color:#ae81ff">default</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">default</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">terminationGracePeriodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoExecute</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node.kubernetes.io/not-ready</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tolerationSeconds</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoExecute</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node.kubernetes.io/unreachable</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tolerationSeconds</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjasgpcwn-0-default</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjasgpcwn-0-default</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-api-access-n75dj</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">projected</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">defaultMode</span>: <span style="color:#ae81ff">420</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">sources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">serviceAccountToken</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">expirationSeconds</span>: <span style="color:#ae81ff">3607</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">path</span>: <span style="color:#ae81ff">token</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">path</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-root-ca.crt</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">downwardAPI</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">fieldRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">fieldPath</span>: <span style="color:#ae81ff">metadata.namespace</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">path</span>: <span style="color:#ae81ff">namespace</span>
</span></span></code></pre></div><p>There are a number of noteworthy things in here. First perhaps the handling of
the script to execute for the job:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">/bin/sh</span>
</span></span><span style="display:flex;"><span>    - -<span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo $CI_SCRIPT | base64 -d | /bin/sh -e</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SCRIPT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">CmlmIFsgLW4gIiRDSV9ORVRSQ19NQUNISU5FIiBdOyB0aGVuCmNhdCA8PEVPRiA+ICRIT01FLy5uZXRyYwptYWNoaW5lICRDSV9ORVRSQ19NQUNISU5FCmxvZ2luICRDSV9ORVRSQ19VU0VSTkFNRQpwYXNzd29yZCAkQ0lfTkVUUkNfUEFTU1dPUkQKRU9GCmNobW9kIDA2MDAgJEhPTUUvLm5ldHJjCmZpCnVuc2V0IENJX05FVFJDX1VTRVJOQU1FCnVuc2V0IENJX05FVFJDX1BBU1NXT1JECnVuc2V0IENJX1NDUklQVAoKZWNobyArICdlY2hvICJUaGlzIGlzIHRoZSBidWlsZCBzdGVwIicKZWNobyAiVGhpcyBpcyB0aGUgYnVpbGQgc3RlcCIKCmVjaG8gKyAnZWNobyAiYmluYXJ5LWRhdGEtMTIzIiA+IGV4ZWN1dGFibGUnCmVjaG8gImJpbmFyeS1kYXRhLTEyMyIgPiBleGVjdXRhYmxlCgplY2hvICsgJ2NobW9kIHUreCAuL2V4ZWN1dGFibGUnCmNobW9kIHUreCAuL2V4ZWN1dGFibGUKCmVjaG8gKyAnc2xlZXAgMTIwJwpzbGVlcCAxMjAK</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SHELL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/bin/sh</span>
</span></span></code></pre></div><p>Running the <code>CI_SCRIPT</code> content through <code>base64 -d</code> results in this shell script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span>$CI_NETRC_MACHINE<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>cat <span style="color:#e6db74">&lt;&lt;EOF &gt; $HOME/.netrc
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">machine $CI_NETRC_MACHINE
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">login $CI_NETRC_USERNAME
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">password $CI_NETRC_PASSWORD
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">EOF</span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ae81ff">0600</span> $HOME/.netrc
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>unset CI_NETRC_USERNAME
</span></span><span style="display:flex;"><span>unset CI_NETRC_PASSWORD
</span></span><span style="display:flex;"><span>unset CI_SCRIPT
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;echo &#34;This is the build step&#34;&#39;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;This is the build step&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;echo &#34;binary-data-123&#34; &gt; executable&#39;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;binary-data-123&#34;</span> &gt; executable
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;chmod u+x ./executable&#39;</span>
</span></span><span style="display:flex;"><span>chmod u+x ./executable
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;sleep 120&#39;</span>
</span></span><span style="display:flex;"><span>sleep <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>This shows that the commands from the <code>commands:</code> list from the <code>step</code> object
in the Woodpecker file is converted into a shell script by copying the commands
into the script and adding an <code>echo</code> for each of them.</p>
<p>Looking at this and thinking about my own work on a large CI I&rsquo;m sometimes
wondering what we&rsquo;d do without the <code>base64</code> command. &#x1f605;</p>
<p>Another aspect of the setup is all of the available environment variables,
supplying a lot of information not just on the commit currently being CI tested,
but also the previous commit. Most of the <code>CI_</code> variables also have equivalents
prefixed with <code>DRONE_</code>, for backwards compatibility. I removed them in the
output above to not make the snippet too long.</p>
<p>Finally there&rsquo;s proof of what I said above about the agent&rsquo;s architecture. This
pipeline was run by the agent on my AMD64 node, resulting in the NodeSelector for
AMD64 nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">nodeName</span>: <span style="color:#ae81ff">sehith</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#ae81ff">amd64</span>
</span></span></code></pre></div><p>Also nice to see that the Pod was running on <code>sehith</code>, which isn&rsquo;t the node the
agent ran on, showing that the Pods are just submitted for scheduling to k8s,
being able to run on any (AMD64 in this case) node.</p>
<p>Before ending the post, let&rsquo;s have a look at some example CI configurations.</p>
<h2 id="ci-configurations">CI configurations</h2>
<p>Each repository using Woodpecker needs to be enabled. This is done from
Woodpecker&rsquo;s web UI:
<figure>
    <img loading="lazy" src="enable_repo.png"
         alt="A screenshot of Woodpecker&#39;s repo enabling UI. IT shows a search field at the top and a list of repositories at the bottom. Some of them have a label saying &#39;Already enabled&#39;, while others have an &#39;Enable&#39; button next to them."/> <figcaption>
            <p>Woodpecker&rsquo;s repo addition UI.</p>
        </figcaption>
</figure>

When clicking the <em>Enable</em> button, Woodpecker will contact Gitea and add a
webhook configuration for the repository. With that, Gitea will call the
webhook with information about the event which triggered it and the state of
the repository.</p>
<p>The Woodpecker configuration files for a specific repository are expected in
the <code>.woodpecker/</code> directory at the repository root by default.</p>
<h3 id="blog-repo-example">Blog repo example</h3>
<p>Here&rsquo;s the configuration I&rsquo;m using to build and publish this blog:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hugo:0.125.4-r3&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">hugo</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Missing alt text check</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">python:3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">pip install lxml beautifulsoup4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">python3 scripts/alt_text.py ./public/posts/</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Upload</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hugo:latest&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_ACCESS_KEY_ID</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">access-key</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_SECRET_ACCESS_KEY</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">s3cmd -c /s3cmd.conf sync -r --delete-removed --delete-after --no-mime-magic ./public/ s3://blog/</span>
</span></span></code></pre></div><p>To start with, the page needs to be build, using Hugo in an image I build
myself, based on Alpine with a couple of tools installed. Then I&rsquo;m running
a short Python script which uses <a href="https://pypi.org/project/beautifulsoup4/">beautifulsoup4</a>
to scan through the generated HTML and make sure that each image has alt text,
and that there&rsquo;s actually something in that alt text. Finally, I push the
generated site up to an S3 bucket in my Ceph cluster from where it is served.</p>
<p>The <code>when:</code> at the beginning is important, it determines under which
conditions the pipeline is executed. This can be configured for specific
branches or certain events, like a push or an update of a pull request.
The different conditions can also be combined. In addition to configuring
conditions on the entire pipeline, they can also be configured just on
certain steps, as we will see later.</p>
<p>One thing I find a little bit lacking at the moment, specifically for the
Kubernetes use case, is the secrets management. It&rsquo;s currently only possible
via the web UI or the CLI. There&rsquo;s no way to provide specific Kubernetes Secrets to
certain steps in a certain pipeline. But there is an open issue to implement
support for Kubernetes Secrets <a href="https://github.com/woodpecker-ci/woodpecker/issues/3582">on Github</a>.
Until that is implemented, the UI needs to be used. It looks like this:
<figure>
    <img loading="lazy" src="secrets.png"
         alt="A screenshot of Woodpecker&#39;s secret configuration UI. It contains a field for a name for the secret and values. In addition, it can be made available only for certain images used in steps. Furthermore, the secret can be restricted to certain events triggering a pipeline run, e.g. only Pushes or Tags or Pull Requests."/> <figcaption>
            <p>Woodpecker&rsquo;s secret addition UI.</p>
        </figcaption>
</figure>

Secrets can be configured for specific repositories, specific orgs where the
forge supports them and for all pipelines.</p>
<p>When looking at a specific repository, all of the pipelines which ran for it
are listed:
<figure>
    <img loading="lazy" src="pipeline_list.png"
         alt="A screenshot of the pipeline list for the mmeier/blog repository. It shows for pipelines. The first one is called &#39;CI: Migrate to Woodpecker&#39; and the most recent one &#39;Publish post on hl-backup-operator deployment&#39;. All of them show that they were pushed directly to the Master branch and took about 1 - 2 minutes each. Each pipeline also shows the Git SHA1 of the commit it tested."/> <figcaption>
            <p>Woodpecker&rsquo;s pipeline list for my blog repo.</p>
        </figcaption>
</figure>

This gives a nice overview of the pipelines which ran recently, here with the
example of my blog repository, including the most recent run for publishing
the post on the backup operator deployment.</p>
<p>Clicking on one of the pipeline runs then shows the overview of that pipeline&rsquo;s
steps and the step logs:
<figure>
    <img loading="lazy" src="pipeline_example.png"
         alt="A screenshot of the pipeline run publishing the hl-backup-operator blog article. At the top right is the subject line of the commit message triggering the pipeline again, &#39;Publish post on hl-backup-operator deployment&#39;. On the left is a list of the steps, showing &#39;clone&#39;, &#39;Hugo Site Build&#39;, &#39;Missing alt text check&#39;, &#39;Hugo Site Upload&#39;. The &#39;Hugo Site Build&#39; step is highlighted, and the logs for that step, showing Hugo&#39;s build output, are shown on the right side."/> <figcaption>
            <p>Woodpecker&rsquo;s pipeline view.</p>
        </figcaption>
</figure>
</p>
<p>This pipeline is not very complex and runs through in about two minutes. So
let&rsquo;s have a look at another pipeline with a bit more complexity.</p>
<h3 id="docker-repo-example">Docker repo example</h3>
<p>Another repository where I&rsquo;m making quite some use of CI is my Docker repository.
In that repo, I&rsquo;ve got a couple of Dockerfiles for cases where I&rsquo;m adding something
to upstream images or building my own where no upstream container is available.</p>
<p>This repository&rsquo;s CI is a bit more complicated mostly because it does the same
thing for multiple different Docker files, and because it needs to do different
things for pull requests and commits pushed to the Master branch.</p>
<p>And that&rsquo;s where the problems begin, at least to a certain extend. As I&rsquo;ve shown
above, you can provide a <code>when</code> config to tell Woodpecker under which conditions
to run the pipeline. And if you leave that out completely, you don&rsquo;t end up
with the pipeline being run for all commits. No. You end up with the pipeline being
run twice for some commits.</p>
<p>Consider, for example, this configuration:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-docker-buildx:latest-insecure</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*dockerx-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dry-run</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">pull_request</span>
</span></span></code></pre></div><p>Ignore the config under settings, and concentrate on the fact that there&rsquo;s no
<code>when</code> config on the pipeline level, only on the step level. And there&rsquo;s only
one step, that&rsquo;s supposed to run on pull requests. The result of this config
is that two pipelines will be started - including Pod launches, PVC creation
and so on:
<figure>
    <img loading="lazy" src="doubled_pipelines.png"
         alt="A screenshot of Woodpecker showing two pipelines. One failed, one successful. Both show being run for the same commit. One shows that it was launched by a push event to the &#39;woodpecker-ci&#39; branch and the other that it was pushed to pull request 77."/> <figcaption>
            <p>The two pipelines started for the previous configuration, both for the same commit.</p>
        </figcaption>
</figure>

The pipeline <em>#1</em> was launched for the &ldquo;push&rdquo; event to the <em>woodpecker-ci</em> branch,
and the other for the update of the pull request that push belonged to. The push
event pipeline only launched the <em>clone</em> step, while the pull request pipeline
launched the <em>build image</em> step and the clone step.</p>
<p>The root cause for this behavior is that Gitea always triggers the webhook for
all fitting events, one for each event. And consequently, Woodpecker then
launches one pipeline for each event.</p>
<p>A similar effect can be observed when combining both, pull requests and push
events in one <code>when</code> clause on the pipeline level.</p>
<p>Now, you might be saying: Okay, then just configure the triggers only on the
steps, not on the entire pipeline. But that also doesn&rsquo;t really work. Without
a <code>when</code> clause, as shown above, two pipelines are always started for commits
in pull requests. And even though one of the pipelines won&rsquo;t do much, it would
still do something. In my case, it would launch a Pod for the clone step and
also create a PVC and clone the repo - for nothing.</p>
<p>The next idea I came up with: Okay, then let&rsquo;s set the pipeline&rsquo;s <code>when</code> to trigger
on push events, because that would trigger the pipeline for both - pushes to
branches like master and pushes to pull requests. And then just add <code>when</code>
clauses to each step with either the pull request or push event, depending on
when it is supposed to run.
But that also won&rsquo;t work - any given pipeline only ever sees one event. If I
trigger on push events on the pipeline level, the steps triggering on the
pull request event will never trigger.</p>
<p>I finally figured out a way to do this. I always trigger the pipeline on push
events. And then I use Woodpecker&rsquo;s <a href="https://woodpecker-ci.org/docs/usage/workflow-syntax#evaluate">evaluate clause</a>
to trigger only on certain branches.</p>
<p>With all of that said, this is what the config is looking like for the pipeline
which builds my Hugo container:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;.woodpecker/hugo.yaml&#39;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;hugo/*&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;alpine-version</span> <span style="color:#e6db74">&#39;3.21.2&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;app-version</span> <span style="color:#e6db74">&#39;0.139.0-r0&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;dockerx-config</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">debug</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: <span style="color:#ae81ff">harbor.example.com/homelab/hugo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">registry</span>: <span style="color:#ae81ff">harbor.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">ci</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">container-registry</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dockerfile</span>: <span style="color:#ae81ff">hugo/Dockerfile</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">context</span>: <span style="color:#ae81ff">hugo/</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mirror</span>: <span style="color:#ae81ff">https://harbor-mirror.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">buildkit_config</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      debug = true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      [registry.&#34;docker.io&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mirrors = [&#34;harbor.example.com/dockerhub-cache&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      [registry.&#34;quay.io&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mirrors = [&#34;harbor.example.com/quay.io-cache&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      [registry.&#34;ghcr.io&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mirrors = [&#34;harbor.example.com/github-cache&#34;]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">latest</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#75715e">*app-version</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build_args</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hugo_ver</span>: <span style="color:#75715e">*app-version</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">alpine_ver</span>: <span style="color:#75715e">*alpine-version</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">platforms</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;linux/amd64&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;linux/arm64&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-docker-buildx:latest-insecure</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*dockerx-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dry-run</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">release image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-docker-buildx:latest-insecure</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*dockerx-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dry-run</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span></code></pre></div><p>First, what does this pipeline do for pull requests and main branch pushes?
For pull requests, it uses the <a href="https://woodpecker-ci.org/plugins/Docker%20Buildx">buildx plugin</a>
to build a Docker container from the directory <code>hugo/Dockerfile</code> in the repository.
That&rsquo;s what happens in the <em>build image</em> step. Notably, no push to a registry
happens here.
In the case of pushes to the repo&rsquo;s default branch, which is provided by Gitea
in the webhook call, the same plugin and build is used, but this time the
newly build images are pushed to my Harbor registry. For more details on that
setup, <a href="https://blog.mei-home.net/posts/k8s-migration-11-harbor/">see this post</a>.</p>
<p>In the <code>when</code> clause for the pipeline, as I&rsquo;ve explained above, I&rsquo;m triggering
on the push event, to circumvent the problem with multiple pipelines being
executed for commits in pull requests.
In addition, I&rsquo;m also making use of path-based triggers. Because I&rsquo;ve got multiple
container images defined in one repository, I&rsquo;d like to avoid running the builds
for images which haven&rsquo;t changed unnecessarily. That&rsquo;s done by triggering the
pipeline only on changes in its own config file and changes in the <code>hugo/</code> directory.
So if the Hugo image definition and CI config haven&rsquo;t changed, the pipeline won&rsquo;t
be triggered.</p>
<p>As you can see I&rsquo;m building images for both, AMD64 and ARM64. And before I
close this section, I have to tell a slightly embarrassing story. I initially
tried to run two pipelines - one for each architecture, so that they could
both run in parallel on different nodes fitting their architecture. This would
avoid the cost of emulating a foreign architecture, making the builds faster
overall.
This seemed like an excellent idea. And it worked really, really well. The
pipelines got a couple of minutes faster. Until I had a look at my Harbor
instance. And as some of you might have already figured out, I found that of
course there was not one tag with images for both architectures.
Instead, the tag contained whatever pipeline finished last. Because of course,
two different Docker pushes override each other, instead of doing a merge.
This is a problem I need to have another look at later. Someone on the Fediverse
already showed me that there is a multistep way to do this manually.</p>
<p>Another point that I still need to improve is image caching. I think that there&rsquo;s
still some potential for optimization in my setup. But that&rsquo;s also something for
after the k8s migration is done.</p>
<p>Before I close out this section, I would like to point out a pretty nice feature
Woodpecker has: A linter for the pipeline definitions, for example like this:</p>
<figure>
    <img loading="lazy" src="linter.png"
         alt="A screenshot of Woodpecker&#39;s linter output. It shows a number of issues with the pipeline config. For example that steps.1.environment and steps.2.environment are both of an invalid type. It expected an array, but got a null. And for all of the steps it outputs a &#39;bad_habit&#39; warning about the fact that neither the pipeline nor any of the steps have a &#39;when&#39; clause."/> <figcaption>
            <p>Example output of Woodpecker&rsquo;s config linter.</p>
        </figcaption>
</figure>

<p>The configuration spitting out those warnings is this one for my blog:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">submodules</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine/git</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">git submodule update --init --recursive</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.example.com/homelab/hugo:0.125.4-r3&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">hugo</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Missing alt text check</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">python:3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">pip install lxml beautifulsoup4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">python3 scripts/alt_text.py ./public/posts/</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Upload</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.example.com/homelab/hugo:latest&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_ACCESS_KEY_ID</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">access-key</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_SECRET_ACCESS_KEY</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">s3cmd -c /s3cmd.conf sync -r --delete-removed --delete-after --no-mime-magic ./public/ s3://blog/</span>
</span></span></code></pre></div><p>The main issues are the empty <code>environment</code> keys, as well as the fact that I did
not set any <code>when</code> clause.</p>
<h2 id="conclusion">Conclusion</h2>
<p>And that&rsquo;s it. Again a pretty long one, but I had never written about my CI setup
and wanted to take this chance to do so, also because I had gotten some
questions on the Fediverse from people what a CI actually does, and some interest
in what Woodpecker looks like.</p>
<p>Oh and also, I just have a propensity for long-winded writing. &#x1f605;</p>
<p>With this post, the Woodpecker/CI migration to k8s is done, and I&rsquo;m quite happy
with it. Especially the fact that my CI pipeline steps now get distributed over
the entire cluster instead of just running on the nodes with the agents.</p>
<p>For the next step I will likely take my Gitea instance and migrate it over, but
as this blog post took longer than I thought it would, it might have to wait
until next weekend.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 14: Deploying the Backups</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-14-backup-operator/</link>
      <pubDate>Thu, 23 Jan 2025 21:50:30 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-14-backup-operator/</guid>
      <description>Deploying my backup operator into the cluster</description>
      <content:encoded><![CDATA[<p>Wherein I&rsquo;m finally done with the backups in my Homelab&rsquo;s k8s cluster.</p>
<p>This is part 15 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Finally, I&rsquo;m done. After months of writing Python code for my backup operator.
Especially during the middle phase of the implementation, after the initial
planning and design, it felt like a slog. Dozens of tasks, many functions to
implement, and seemingly no end in sight. I&rsquo;m rather elated to finally be able
to write another post in the k8s migration series.</p>
<p>In this post, I will not write much about how the operator is implemented.
Instead, I will give it the same treatment I gave the other apps I&rsquo;ve deployed
into the cluster up to now, as if I hadn&rsquo;t written it myself.</p>
<p>If you&rsquo;re interested in the implementation, take a look at the
<a href="https://blog.mei-home.net/tags/hlbo/">series of posts</a> I wrote about it.</p>
<p>Suffice it to say for now that the operator reads a CRD defining some S3 buckets
and PersistentVolumeClaims to be backed up, and launches Pods which use rclone
and restic to do just that.</p>
<h2 id="infrastructure-preparation">Infrastructure Preparation</h2>
<p>Besides the backup operator itself, I also need some additional infrastructure.
The backups themselves use restic with an S3 bucket as a target. I&rsquo;m going with
one bucket per app here. So before I can run the first backups, I need a couple
of S3 users and buckets.</p>
<p>If you would like to read a bit more about my S3 setup, have a look at
<a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/">this post</a></p>
<p>The first two things needed are the <code>backups</code> and <code>service-backup-user</code> users.
The <code>backups</code> user is the owner of all of the backup buckets, while
<code>backups-services</code> is a reduced-permissions user for the actual backups:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;Common user for backup buckets&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">service-backup-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;User for service backups&#34;</span>
</span></span></code></pre></div><p>With these two manifests, <a href="https://rook.io/">Rook Ceph</a> will create two S3
users in my bulk storage, which is the part of my Ceph cluster backed by HDDs.</p>
<p>Due to the fact that I&rsquo;m doing the bucket management itself through Ansible,
I also need to push these secrets to my <a href="https://www.vaultproject.io/">Vault</a>
instance, to make them available during Ansible runs. Although, now that I&rsquo;m
writing this, I&rsquo;m wondering whether Ansible has a k8s Secrets lookup plugin?
Something to look into later.</p>
<p>For pushing Secrets to Vault (and creating Secrets from Vault data), I&rsquo;m using
<a href="https://external-secrets.io/latest/">external-secrets</a>.
Specifically, PushSecret in this case:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PushSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">s3-backupsuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deletionPolicy</span>: <span style="color:#ae81ff">Delete</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#ae81ff">30m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRefs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-vault</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>:  <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secrets/backups</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">access</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secrets/backups</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>As always with secrets related stuff, this is a bit obfuscated.
What this manifest does is take the Secret automatically created by Rook Ceph at
<code>rook-ceph-object-user-rgw-bulk-backups</code> and pushing the S3 access key
and secret key to the Vault KV store <code>secrets</code> at the entry <code>backups</code>.</p>
<p>Then I&rsquo;m creating the S3 buckets themselves. I&rsquo;m doing this with the Ansible
<a href="https://docs.ansible.com/ansible/latest/collections/amazon/aws/s3_bucket_module.html">amazon.aws.s3_bucket</a>
module. The Ansible play looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Play for creating the backup buckets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/backups:access token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/backups:secret token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create service backup buckets</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">buckets</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">amazon.aws.s3_bucket</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-{{ item }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">access_key</span>: <span style="color:#e6db74">&#34;{{ s3_access }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret_key</span>: <span style="color:#e6db74">&#34;{{ s3_secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ceph</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpoint_url</span>: <span style="color:#ae81ff">https://s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">policy</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;ansible.builtin.template&#39;,&#39;bucket-policies/backup-services.json.template&#39;) }}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">loop</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">audiobookshelf</span>
</span></span></code></pre></div><p>This play first fetches the credentials pushed into Vault with the PushSecret
above, using the Vault plugin. Be cautious when looking for info on Vault in
Ansible, Ansible&rsquo;s own secret storage is unfortunately also called vault.
This uses my Vault token I have to generate on my C&amp;C host before I can do
pretty much anything, which in turn needs credentials not stored on said host.</p>
<p>My backup buckets always follow the <code>backup-$APP</code> convention, and I&rsquo;m iterating
over the apps I need backup buckets for via a loop.
Also interesting to mention is the policy set here, which is the S3 bucket
policy for the new bucket.
It&rsquo;s created from this template:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:DeleteObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:PutObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/service-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/external-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Through the magic of jinja2 and some naming conventions, this policy template
will allow my service backup user to access all of the APIs needed by restic,
meaning read and write access. The second user, <code>external-backup-user</code>, is the
user I use to run backups to an external HDD. This user is more restricted than
the service backup user, because it only needs read access and never writes to
the backup buckets.</p>
<p>Short aside: Why use Ansible for the bucket creation, instead of Rook&rsquo;s
<a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">ObjectBucketClaim</a>?
Simple answer: Because of policies. Until very recently, there was no way to
configure a bucket policy via an ObjectBucketClaim, so I would have needed to
reach for Ansible or something else anyway. That&rsquo;s why I decided to go ahead and do the
bucket creation with Ansible as well.</p>
<p>Just for completeness&rsquo; sake, I also created an ExternalSecret for my restic
backup password:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;restic&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">service-backups</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/restic</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">password</span>
</span></span></code></pre></div><p>Incidentally, looking at the SecretStore name: I really need to stop prefixing
everything with &ldquo;homelab&rdquo; or &ldquo;hl&rdquo;. &#x1f605;</p>
<p>Last but not least, I also need a sort of scratch volume, where backed up
S3 buckets are copied to before being slurped up by restic.
It&rsquo;s a PVC looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">service-backups</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">50Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span></code></pre></div><p>It needs to be RWX because it&rsquo;s shared among all backups for all apps, not one
per app. So instead of my customary Ceph RBD volume, it&rsquo;s a CephFS volume.
This is one part of my backup setup I need to still improve. At some point,
fully cloning an S3 bucket to a local disk and then feeding it into restic might no longer be
feasible.</p>
<p>Anyway, that&rsquo;s all the Yak shaving necessary, let&rsquo;s look at the backup operator
itself.</p>
<h2 id="the-operator-deployment">The operator deployment</h2>
<p>Because this is an operator, the first thing to consider is what access it
needs to the k8s API. For this, I defined one Role and one ClusterRole. The
ClusterRole is necessary so the operator can access a number of resources in
all namespaces, while the Role is for things where it only needs access in
its own namespace.</p>
<p>Let&rsquo;s begin with the ClusterRole:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterRole</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-cluster-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># Needed for Kopf Framework</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#ae81ff">apiextensions.k8s.io]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">customresourcedefinitions]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">list, watch]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">events]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">create]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">namespaces</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">persistentvolumes</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">persistentvolumeclaims</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;storage.k8s.io&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">volumeattachments</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;mei-home.net&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabservicebackups</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;batch&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">jobs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span></code></pre></div><p>A number of things in here are requirements from the <a href="https://github.com/nolar/kopf">kopf framework</a>
I used to implement the operator. It needs to be able to watch for CRDs because
it needs to handle them. The HomelabBackupConfigs and HomelabServiceBackups
are the two CRDs I introduced. PersistentVolumeClaims, PersistentVolumes and
VolumeAttachments are needed because that&rsquo;s what the operator backs up.
Both PersistentVolumes and VolumeAttachments are cluster level resources.
And because PVCs generally live in the namespace of the app they&rsquo;re used by,
cluster-wide access is required for the operator.
Finally, the cluster-wide access to Jobs is due to a quirk of Kopf.
I really only need to access Jobs in the operator&rsquo;s own namespace, to launch
them and monitor them. But the issue is that I&rsquo;m using Kopf&rsquo;s event handler
mechanism to watch for Job events, so I can react when a Job finishes or fails.
And Kopf only knows universal configuration when it comes to which APIs it uses.
Either the cluster level ones, or the namespaced ones. This can&rsquo;t be configured
per listener, only for the entire instance.</p>
<p>So in the end, even though I really only needed Job control over Jobs in the
same namespace, I still need to grant cluster-wide access.</p>
<p>Next, the Role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">configmaps</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">delete</span>
</span></span></code></pre></div><p>I&rsquo;m creating ConfigMaps for the individual Jobs during the backup run, so I need
access. But in this case, I implemented all of the necessary access myself with
explicit API calls, without Kopf&rsquo;s involvement. This allowed me to scope the
access rights to a single namespace.</p>
<p>Then there&rsquo;s the general backup configuration, which is set with the
HomelabBackupConfig CRD. These are configuration options which don&rsquo;t differ per
app, and so can be set centrally, instead of having a block of similar config
settings in every individual app&rsquo;s backup config.
For my deployment, it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-config</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scratchVol</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3.example.com:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3.example.com:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">restic</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticRetentionPolicy</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">daily</span>: <span style="color:#ae81ff">7</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">weekly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">monthly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">yearly</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">jobNS</span>: <span style="color:#e6db74">&#34;backups&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/hn-backup:5.0.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;hn-backup&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span></code></pre></div><p>This configures service backups to run every night at 01:30. It configures
the credentials and S3 servers for both, the location of the app&rsquo;s S3 buckets
and the location of the backup buckets. These are currently the same, but if
I ever run two types of S3, e.g. for some reason I decide to add a second Ceph
cluster or a MinIO instance, I can have the service and backup buckets on
different S3 servers.</p>
<p>Also of interest might be the retention policy. This keeps the backups for the
last 7 days, the backups for the Sundays of the last 6 weeks, the backups of the
last day of the month for the last 6 months and finally the backup from December
31st of the previous year.</p>
<p>Finally, there&rsquo;s the definition of the container image and command to run during
individual backups, just in case I ever decide to change my setup for the individual
backups but want to keep the operator going.</p>
<p>And here, finally, the operator&rsquo;s deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">hlbo-account</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hl-backup-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/hl-backup-operator:1.1.0</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-v&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-d&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">Always</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span></code></pre></div><p>The <code>imagePullPolicy: Always</code> is mostly for the current, still somewhat &ldquo;beta&rdquo;
phase of use, so I can easily switch to using <code>:dev</code> images. The args are all
for Kopf. The <code>-A</code> says that Kubernetes&rsquo; cluster API should be used, while
<code>-v</code> and <code>-d</code> enable lots of debug output.</p>
<p>That&rsquo;s it, operator deployed. Now onto configuring a backup.</p>
<h2 id="configuring-backups-for-my-audiobookshelf-instance">Configuring backups for my Audiobookshelf instance</h2>
<p><a href="https://www.audiobookshelf.org/">Audiobookshelf</a> was the first user-facing
workload I deployed in k8s after setting up all the monitoring and infrastructure.
It stores everything on a single PersistentVolume, including progress and listened
to episodes of all of my podcasts. As such, I only need to backup that single
PVC, and I&rsquo;m good to go.</p>
<p>Backups are configured via the HomelabServiceBackups CRD. For my Audiobookshelf,
it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-audiobookshelf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">abs-data-volume</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">audiobookshelf</span>
</span></span></code></pre></div><p>The only configuration needed is the name of the backup bucket and a list of the S3
buckets and PVCs to be backed up.</p>
<p>In this case, my Audiobookshelf deployment only has a single PVC:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">abs-data-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">100Gi</span>
</span></span></code></pre></div><p>The operator will figure out where that volume is currently mounted and launch
a backup Job on that host.</p>
<p>And that&rsquo;s it! The backups are finally working, and by now several weeks
worth of backups were successful. It was a pretty long detour, but I did have
at least some fun writing a small-ish project that I&rsquo;m actually using.</p>
<p>The next installment of this series will come pretty soon, because I&rsquo;m already
done migrating my Drone CI instance on Nomad to a Woodpecker CI instance
on k8s. The only thing left to do is to write the blog post.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part III: Running Backups</title>
      <link>https://blog.mei-home.net/posts/backup-operator-3-running-backups/</link>
      <pubDate>Fri, 10 Jan 2025 22:10:52 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-3-running-backups/</guid>
      <description>Implementing the actual backups</description>
      <content:encoded><![CDATA[<p>In the last couple of months, I&rsquo;ve been working on a k8s operator for running
backups of persistent volumes and S3 buckets in my cluster.
Previous installments of the series can be found <a href="https://blog.mei-home.net/tags/hlbo/">here</a>.</p>
<p>And now, I&rsquo;m finally done with it, and over the weekend, I ran the first
successful backups. Time to describe what I&rsquo;ve implemented, why and how.</p>
<h2 id="recap">Recap</h2>
<p>Let&rsquo;s start with a recap. For a more detailed description of the problem,
have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">this post in my k8s migration series</a>.</p>
<p>In short, my previous backup implementation on my Nomad cluster runs a container
on each host in the cluster. This container then checks which jobs run on the
host and backs up the volumes noted in the config file for that job.
This approach would not work on Kubernetes, because k8s does not provide an API
similar to Nomad&rsquo;s <a href="https://developer.hashicorp.com/nomad/docs/schedulers#system-batch">Sysbatch jobs</a>.
Those types of jobs launch a given container on every host in the cluster, with
a run-to-completion setup.
Kubernetes, on the other hand, only knows Jobs, which cannot be run on every host
simultaneously, and DaemonSets, which don&rsquo;t have run-to-completion semantics.</p>
<p>There would have, of course, been the easy way out: Using an existing solution.
But where&rsquo;s the fun in that?</p>
<p>So I decided to take this chance to learn the Kubernetes API a bit better,
and write my own operator. Because I&rsquo;m relatively familiar with Python,
I decided to use the <a href="https://github.com/nolar/kopf">Kopf</a> framework.</p>
<p>The end goal was to have a per-app configuration in the form of a custom resource
definition which tells the operator which volumes and buckets need to be backed
up. Here is an example I used for my tests:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-service-backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">runNow</span>: <span style="color:#e6db74">&#34;12&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;backup-operator-testing&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mysql-pv-claim</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-pv-claim</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">service-backup-test</span>
</span></span></code></pre></div><p>This object instructs my operator to back up the MySQL and WordPress volumes
of a WordPress deployment I launched just for testing purposes. It also contains
an S3 bucket that&rsquo;s not used by the deployment and just exists to test that part
of the operator.</p>
<h2 id="high-level-overview">High level overview</h2>
<p>Alright, let&rsquo;s assume that we&rsquo;ve go the above example HomelabServiceBackup (HLSB).
What do I want to happen when a backup is triggered?</p>
<p>On the most basic level, I want two things to happen:</p>
<ol>
<li>The two <code>pvc</code> type entries in the <code>spec.backups</code> list are run through restic
to back them up. This means the backup needs access to those volumes.</li>
<li>The <code>s3</code> type bucket is downloaded to a temporary location, and then restic
is run on that temporary location to make an incremental backup of the bucket.</li>
</ol>
<p><strong>BAD THINGS.</strong> This paragraph is the &ldquo;Do as I say, not as I do&rdquo; part of this
post. First of all, running backups on live data is generally a bad idea. You
might end up with inconsistent state in your backup.
Second, there are perfectly good block-level backup capabilities right in Ceph.
With consistency guarantees. But I don&rsquo;t like those. They basically require a
second Ceph cluster as a backup target.
<strong>To reiterate:</strong> What I&rsquo;m doing here is bad. And I know that what I&rsquo;m doing
here is bad. It&rsquo;s working for me, but I&rsquo;m really not advising you to do the same
thing. That&rsquo;s the main reason I will likely never publish the operator I wrote -
I just don&rsquo;t think it&rsquo;s a good idea.</p>
<p>With that out of the way, which steps need to be completed?</p>
<ol>
<li>Determine where each of the <code>pvc</code> type volumes is mounted</li>
<li>Split the volumes into groups by the host they&rsquo;re currently mounted on</li>
<li>For each of those groups:
<ul>
<li>Create a ConfigMap with the configuration for that particular group</li>
<li>Create a Job for each group and launch them, in sequence</li>
</ul>
</li>
<li>Determine whether all jobs were successful and update the HLSB object in the
k8s cluster</li>
</ol>
<p>The HLSB object has a <code>status.state</code> property, which can be one of:</p>
<ul>
<li><code>Running</code></li>
<li><code>Success</code></li>
<li><code>Failed</code></li>
</ul>
<p>These are then later used by a Grafana panel using Prometheus data from
kube-state-metrics to show whether all of the backup were successful.</p>
<p>Now let&rsquo;s have a closer look at the above steps.</p>
<h2 id="implementation-details">Implementation details</h2>
<h3 id="finding-volumes-and-hosts">Finding volumes and hosts</h3>
<p>Let&rsquo;s look at the backup list from the example HLSB above again:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mysql-pv-claim</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-pv-claim</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span></code></pre></div><p>I&rsquo;m ignoring the <code>s3</code> type entry here, because quite frankly, it&rsquo;s not that
interesting.</p>
<p>For the <code>pvc</code> type entries, the very first step is to determine on which host
they&rsquo;re currently mounted. Because the PVC might be RWO, we cannot just mount
them to the backup Pod while the app using it is already running. Instead,
I will use a <a href="https://kubernetes.io/docs/concepts/storage/volumes/#hostpath">hostPath</a>
volume, to mount the directory where the Ceph CSI provider mounts the volumes
into the Backup container.</p>
<p>For that to work, I need to know on which host the volume is actually mounted.
And for apps having multiple pods and associated volumes, these may be multiple
hosts. Which presents yet another challenge: Restic, when backing up to a
repository, locks that repository, so there can only ever be a single writer.
My backup buckets are separated by app, so even if an app has multiple volumes
defined, like the example above, I can only ever run one backup in parallel.
If multiple volumes happen to be mounted on a single host, that&rsquo;s not a problem.
The backup Job for that host can backup all of them. But if they happen to be mounted
on separate hosts, there need to be multiple Jobs, running one after the other.</p>
<p>So how to get the volumes? With the Kubernetes API. As input for our journey,
we&rsquo;ve got the PVC defined, with its name and namespace, in the list of things
to backup.</p>
<p>So the first action is to fetch the PVC via the Kubernetes API. Because I&rsquo;m
writing async code in Kopf, I&rsquo;m using <a href="https://github.com/tomplus/kubernetes_asyncio">kubernetes_asyncio</a>
instead of the official Kubernetes Python lib.</p>
<p>Here&rsquo;s what the PVC looks like, with the <code>wp-pv-claim</code> from the example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;PersistentVolumeClaim&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;app&#34;</span>: <span style="color:#e6db74">&#34;wordpress&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;app.kubernetes.io/managed-by&#34;</span>: <span style="color:#e6db74">&#34;Helm&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;testing&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;wp-pv-claim&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;testing&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;resources&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;requests&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;storageClassName&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeMode&#34;</span>: <span style="color:#e6db74">&#34;Filesystem&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeName&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;capacity&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;phase&#34;</span>: <span style="color:#e6db74">&#34;Bound&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I removed a couple of pieces which aren&rsquo;t that interesting. With this info in
hand, we can go to the next step, fetching the PersistentVolume backing this
claim. This can also be done pretty easily with the <code>read_persistent_volume</code>
API, which only needs a name as input, because PersistentVolumes are cluster
level resources. The name of the volume backing the claim can be taken from
the <code>spec.volumeName</code> property.</p>
<p>The result for the above PVC would look like this, again with unimportant bits
removed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;PersistentVolume&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;capacity&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;csi&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;controllerExpandSecretRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;rook-csi-rbd-provisioner&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;driver&#34;</span>: <span style="color:#e6db74">&#34;rook-ceph.rbd.csi.ceph.com&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;fsType&#34;</span>: <span style="color:#e6db74">&#34;ext4&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;nodeStageSecretRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;rook-csi-rbd-node&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;volumeAttributes&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;clusterID&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;imageFeatures&#34;</span>: <span style="color:#e6db74">&#34;layering,exclusive-lock,object-map,fast-diff&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;imageName&#34;</span>: <span style="color:#e6db74">&#34;csi-vol-3361c6d5-4269-4ab2-bc14-771420b768a7&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;journalPool&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;pool&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;volumeHandle&#34;</span>: <span style="color:#e6db74">&#34;0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;persistentVolumeReclaimPolicy&#34;</span>: <span style="color:#e6db74">&#34;Retain&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;storageClassName&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeMode&#34;</span>: <span style="color:#e6db74">&#34;Filesystem&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;phase&#34;</span>: <span style="color:#e6db74">&#34;Bound&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>One potentially useful side-note: The <code>spec.csi.volumeAttributes.imageName</code>
property is the name of the backing RBD volume in Ceph.</p>
<p>The third thing we need is the <a href="https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/">VolumeAttachment</a>
for the PersistentVolume, which tells us where it is currently mounted.
Sadly, these don&rsquo;t have an API to find the attachment for a given
PersistentVolume (or multiple attachments of the same volume, if it is RWX).
So instead, I&rsquo;m fetching all of the attachments with the <code>list_volume_attachments</code>
API. This one, again, is not namespaced.
Here is the current attachment for the above PersistentVolume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;storage.k8s.io/v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;VolumeAttachment&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;creationTimestamp&#34;</span>: <span style="color:#e6db74">&#34;2024-12-29T10:44:46Z&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;csi-8aee698fd97659b400535fa69969815fad87d2b761d69625d04afc95d53bf252&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;resourceVersion&#34;</span>: <span style="color:#e6db74">&#34;152545692&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;uid&#34;</span>: <span style="color:#e6db74">&#34;6cbe234b-e2c7-4596-a4b6-03d66eb45f5f&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;attacher&#34;</span>: <span style="color:#e6db74">&#34;rook-ceph.rbd.csi.ceph.com&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;nodeName&#34;</span>: <span style="color:#e6db74">&#34;sehith&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;source&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;persistentVolumeName&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;attached&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The <code>spec.nodeName</code> provides us with what we need: The name of the host where
the volume is currently mounted.</p>
<p>Next, how to figure out which <code>hostPath</code> to use to mount that volume into the
backup container? That&rsquo;s done with this small Python function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_ceph_csi_host_path</span>(pv):
</span></span><span style="display:flex;"><span>    volume_handle <span style="color:#f92672">=</span> pv<span style="color:#f92672">.</span>spec<span style="color:#f92672">.</span>csi<span style="color:#f92672">.</span>volume_handle
</span></span><span style="display:flex;"><span>    driver <span style="color:#f92672">=</span> pv<span style="color:#f92672">.</span>spec<span style="color:#f92672">.</span>csi<span style="color:#f92672">.</span>driver
</span></span><span style="display:flex;"><span>    vol_id_digest <span style="color:#f92672">=</span> sha256(bytes(volume_handle, <span style="color:#e6db74">&#39;utf-8&#39;</span>))<span style="color:#f92672">.</span>hexdigest()
</span></span><span style="display:flex;"><span>    p <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span><span style="color:#f92672">.</span>join([
</span></span><span style="display:flex;"><span>        CSI_MOUNT_PREFIX,
</span></span><span style="display:flex;"><span>        driver,
</span></span><span style="display:flex;"><span>        vol_id_digest,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;globalmount&#34;</span>,
</span></span><span style="display:flex;"><span>        volume_handle
</span></span><span style="display:flex;"><span>    ])
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> p
</span></span></code></pre></div><p>It takes the PersistentVolume as input, as well as the <code>CSI_MOUNT_PREFIX</code>, which
is <code>/var/lib/kubelet/plugins/kubernetes.io/csi</code>. In addition, there is a hash of
the <code>spec.csi.volume_handle</code> in the path. The full mount path looks like this:</p>
<pre tabindex="0"><code>/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fb3f47df032796f8ee3f021a858f09772c60bf6b30a75288a4887852a59b071f/globalmount/0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7
</code></pre><p>And yes, for some reason the path contains the volume&rsquo;s <code>volume_handle</code> once in
plain form, and once in hashed form. No idea what&rsquo;s the reason behind that.
Plus, it&rsquo;s worth noting that this is specific to the Ceph CSI driver. The
paths for other drivers would look different.</p>
<h3 id="creating-the-configuration-file">Creating the configuration file</h3>
<p>Because we&rsquo;ve only got two volumes in our example HLSB, let&rsquo;s assume that both
of them are mounted on the same host. So this particular backup would only need
to run a single Job. That Job needs to be told what it&rsquo;s supposed to back up,
which I&rsquo;m doing by creating a fresh ConfigMap for the job. An example for the
two volumes in our example HLSB would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hlsb-conf.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    retention:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      daily: 7
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      monthly: 6
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      weekly: 6
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      yearly: 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    - name: testing-mysql-pv-claim
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    - name: testing-wp-pv-claim</span>
</span></span></code></pre></div><p>This config describes the retention policy and the volumes for this backup.
The retention policy is one of the shortcuts I took. It&rsquo;s actually more of a
global config, which I would normally provide to the backup Job via environment
variables. But because the retention is not just a simple single value, I
decided that it&rsquo;s just easier to add it to the config file, even though it&rsquo;s not
specific to the currently executed backup Job.</p>
<p>The entries in the <code>volumes:</code> list are the combination of the PVC&rsquo;s namespace+name.
These are also the names of the directories under which they&rsquo;re mounted into
the backup container.</p>
<h3 id="creating-the-job">Creating the Job</h3>
<p>As I&rsquo;ve noted above, each host where one of the app&rsquo;s volumes is mounted gets a
Job. These Jobs only have one Pod, running a relatively simple Python app that
reads the config file and runs <code>restic backup</code> on the mount directories of all
the volumes to be backed up.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;batch/v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;Job&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;hlsb&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf_backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;backups&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;backoffLimit&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;completions&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;parallelism&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;template&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;affinity&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;podAntiAffinity&#34;</span>: {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;requiredDuringSchedulingIgnoredDuringExecution&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;labelSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;matchLabels&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                },
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;topologyKey&#34;</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ]
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;containers&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;command&#34;</span>: [
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;hn-backup&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span><span style="display:flex;"><span>                        ],
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;env&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_BACKUP_BUCKET&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SCRATCH_VOL_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/backup-s3-scratch&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_VOL_MOUNT_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NAME&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NS&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_CONFIG&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/hlsb-conf.yaml&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_RESTIC_PW&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;pw&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;restic-pw&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ],
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;image&#34;</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hn-backup:5.0.0&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;volumeMounts&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/audiobookshelf-abs-data-volume&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;readOnly&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ]
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ],
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;nodeSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;kubernetes.io/hostname&#34;</span>: <span style="color:#e6db74">&#34;khepri&#34;</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;priorityClassName&#34;</span>: <span style="color:#e6db74">&#34;system-node-critical&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;restartPolicy&#34;</span>: <span style="color:#e6db74">&#34;Never&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;volumes&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;hostPath&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>                    },
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;configMap&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;defaultMode&#34;</span>: <span style="color:#ae81ff">420</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;backup-confmap-audiobookshelf-backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This one does not fit the HLSB I&rsquo;ve been using as an example, but I hope you can
forgive that oversight. I forgot to save the JSON for one of the jobs I ran
against my example HLSB.</p>
<p>Let&rsquo;s start with the <code>metadata</code> property:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;metadata&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;hlsb&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf_backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;backups&#34;</span>,
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>For identification reasons, all pieces belonging to a certain HLSB have that
HLSB&rsquo;s namespace and name in a HLSB label. In addition, they&rsquo;re all marked as
<code>part-of</code> the Homelab service backup as part of my general labeling scheme.
The name of the Job again contains the namespace and name of the HLSB and is
capped off by a random string. It is generated like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_new_job_name</span>(hlsb_name, hlsb_namespace):
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">{</span>hlsb_namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>hlsb_name<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>uuid<span style="color:#f92672">.</span>uuid4()<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    truncated_name <span style="color:#f92672">=</span> name[<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">61</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> truncated_name[<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;-&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> truncated_name[<span style="color:#ae81ff">0</span>:<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> truncated_name
</span></span></code></pre></div><p>Creating this name was a lot more complicated than I anticipated. Because I
don&rsquo;t currently have any integration tests against a real k8s cluster, this
function was a surprising source for issues. To begin with, the name of a Job
can only be 63 chars long at maximum. So appending the full UUID lead to errors
during the initial testing. Than I thought I had it, with my test HLSB running
backups successfully. And then I implemented the above HLSB, for my
<a href="https://www.audiobookshelf.org/">Audiobookshelf</a> deployment. And I then found
that the cutoff at 61 chars I implemented left the name ending on a <code>-</code>. Which
k8s also doesn&rsquo;t allow, hence the check if the name ends on <code>-</code>. &#x1f926;</p>
<p>Another thing worth mentioning: The backup jobs run in my <code>backups</code> namespace,
not in the app&rsquo;s namespace. This is mostly so that I can comfortably keep all of
the necessary secrets in a separate namespace.</p>
<p>Then let&rsquo;s continue with the spec, more precisely the affinity I&rsquo;ve set up:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;affinity&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;podAntiAffinity&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;requiredDuringSchedulingIgnoredDuringExecution&#34;</span>: [
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;labelSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;matchLabels&#34;</span>: {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;topologyKey&#34;</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>This config prevents multiple backup Jobs from running on the same host. This is
necessary because sometimes, especially with larger S3 buckets to be backed up,
the rclone invocation in the backup container can use quite some resources.
Plus, I just generally didn&rsquo;t want to tax any specific node too much.</p>
<p>Next, the node selector, which ensures that the Job runs on the host where
the required volumes are mounted:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;nodeSelector&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kubernetes.io/hostname&#34;</span>: <span style="color:#e6db74">&#34;khepri&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>This is a definition computed from the values provided by the PVC probing
I&rsquo;ve described above. The volumes to be backed up get grouped by the hosts
they&rsquo;re mounted on, and then every resulting group/host gets one Job.</p>
<p>And then the more interesting part, the volumes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;volumes&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;hostPath&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;configMap&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;defaultMode&#34;</span>: <span style="color:#ae81ff">420</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;backup-confmap-audiobookshelf-backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>The <code>hostPath.path</code> is computed as described above, via the information from the
persistent volume. And the name for the volume is defined as <code>vol-backup-pvc_namespace-pvc_name</code>.
Additionally, the ConfigMap described in the previous section also gets
mounted.</p>
<p>And finally, the container itself. Let&rsquo;s start with the command and image:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;command&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;hn-backup&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span><span style="display:flex;"><span>]<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;image&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hn-backup:5.0.0&#34;</span><span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>I&rsquo;ve kept it pretty simple. And instead of mucking around with lots of command
line switches, the configuration is done via the config file and environment
variables.
I won&rsquo;t say much about the <code>hn-backup</code> program, as it&rsquo;s mainly just a wrapper
around <a href="https://rclone.org/">rclone</a> for fetching S3 buckets to be backed up
and <a href="https://restic.net/">restic</a> for the backups themselves.</p>
<p>The volume mounts look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;volumeMounts&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/audiobookshelf-abs-data-volume&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;readOnly&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>All mounts are done into the <code>/hlsb-mounts</code> directory in the container, which
is then used by hn-backup to construct the paths to be backed up.</p>
<p>Then there&rsquo;s the env variable. Those I use to define the common configuration.
So while the ConfigMap contains options relevant for the current Job, the
env variables contain the common configs.
These options are defined in the HomelabBackupConfig CRD, an example of which
would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-config</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scratchVol</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3-k8s.mei-home.net:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3-k8s.mei-home.net:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">restic-pw</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticRetentionPolicy</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">daily</span>: <span style="color:#ae81ff">7</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">weekly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">monthly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">yearly</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">jobNS</span>: <span style="color:#e6db74">&#34;backups&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.mei-home.net/homelab/hn-backup:5.0.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;hn-backup&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span></code></pre></div><p>This CRD describes options common to all backups, so they don&rsquo;t need to be
repeated in every HomelabServiceBackup manifest.
The most important parts here are the configs for S3 access.</p>
<p><code>s3BackupConfig</code> describes access to the backup buckets to which restic will
write the backup. It contains the host, optionally with port, and how to get
the S3 credentials. Very important to me here was to be able to specify not just
the name of the Secret, but also the key inside the Secret to use for the
specific credential. Because I&rsquo;ve been pretty annoyed by some Helm charts which
only allow specifying the Secret&rsquo;s name and then expecting certain keys to exist.
Which makes using generated secrets, like those created by Ceph Rook for S3
buckets, a real pain.</p>
<p>The <code>s3ServiceConfig</code> has exactly the same structure, but provides the
credentials for access to buckets used by services, which might also be backed
up, and which might live on a completely different system. This is the case for
my Nomad cluster apps right now, for example. Their S3 buckets still live on the
baremetal Ceph cluster, while the backup buckets have already been migrated to
the Ceph Rook cluster. And I decided to make such a setup possible here as well,
just in case I wanted to migrate to a different S3 setup at some point.</p>
<p>The <code>resticPasswordSecret</code> describes the encryption password for the restic
backup repos in the individual S3 buckets.</p>
<p>All of this information is put into environment variables on the Pod running
the backup. Let&rsquo;s start with the backup credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>The configs for the S3 service bucket credentials are very similar, so I won&rsquo;t
repeat them here.
One noteworthy thing about the above setup, especially for the Secrets: The
ServiceAccount for the operator does not require access to any Secrets in
its namespace. Of course, that&rsquo;s a bit cosmetic - because the operator is
allowed to launch Jobs, which in turn can access the secrets. But still,
I found it nice that due to the way I&rsquo;d set things up, the operator itself
would not need to touch any Secrets.</p>
<p>More interesting might be some odds and ends I&rsquo;ve also defined in env variables,
just to make accessing them more convenient.
To my shame, I have to admit that I lied above, when I pretended that I had a
clean separation between generic config going into environment variables and
per-Job configs going into the config file. One piece of per-Job info did end
up in the environment variables, and I have absolutely no idea why I decided
to do that: The name of the backup bucket. No idea why I decided to go
inconsistent just with this one value.</p>
<p>Some other interesting variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SCRATCH_VOL_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/backup-s3-scratch&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_VOL_MOUNT_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NAME&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NS&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_CONFIG&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/hlsb-conf.yaml&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>These provide convenient access to the S3 scratch volume, which is used by
rclone for downloading an entire S3 bucket, which is then backed up by restic.
The HLSB&rsquo;s name and namespace also ended up being convenient to have available
in the Pod, if only for some meaningful log outputs. And finally it&rsquo;s nice to
have the path to the config file available as well.</p>
<p>And that&rsquo;s it - that&rsquo;s the entire Job. I&rsquo;ve long thought about providing some
code snippets used for creating the <a href="https://kubernetes-asyncio.readthedocs.io/en/latest/kubernetes_asyncio.client.models.v1_job.html">V1Job</a>,
but honestly, it&rsquo;s just not very interesting. It took me a while to get right,
but in the end it was all just value assignments.
Here&rsquo;s an example, the function which creates the Pod Volume spec for the
scratch volume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_s3_scratch_volume</span>(backup_conf_spec):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;scratchVol&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> backup_conf_spec:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Did not find scratchVol in backup config.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    pvc <span style="color:#f92672">=</span> V1PersistentVolumeClaimVolumeSource(
</span></span><span style="display:flex;"><span>        claim_name<span style="color:#f92672">=</span>backup_conf_spec[<span style="color:#e6db74">&#34;scratchVol&#34;</span>], read_only<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>)
</span></span><span style="display:flex;"><span>    volume <span style="color:#f92672">=</span> V1Volume(name<span style="color:#f92672">=</span>S3_SCRATCH_VOL_NAME,
</span></span><span style="display:flex;"><span>                      persistent_volume_claim<span style="color:#f92672">=</span>pvc)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> volume
</span></span></code></pre></div><p>The <code>backup_conf_spec</code> here is the <code>spec.serviceBackup</code> object from the
HomelabBackupConfig I&rsquo;ve shown above. And the rest of the roughly 630 lines it
took me to create the V1Job programmatically look very similar, perhaps
with the occasional <code>if</code> thrown in, but mostly just value assignments and logs.</p>
<p>And because I&rsquo;m a kind man, I will spare you all of it.</p>
<p>But I still want to show you some code I think could be interesting, so let&rsquo;s
jump to the Job execution.</p>
<h3 id="job-execution">Job execution</h3>
<p>The Job itself will get submitted via the Python API again, nothing special
here. But what is special: The current daemon (Kopf&rsquo;s nomenclature for a long
running change handler that doesn&rsquo;t just run to completion for a specific event)
needs to know the current Job has finished, in whatever way. For this I decided
to make use of the fact that I was writing asyncronous code. So while the
daemon waited for the Job to finish, it should yield. And luckily, Kopf
already provides a way to watch events from any k8s object type you might
be interested in. So I set up a watcher for events from Jobs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.event</span>(<span style="color:#e6db74">&#39;jobs&#39;</span>, labels<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#39;homelab/part-of&#39;</span>: <span style="color:#e6db74">&#39;hlsb&#39;</span>})
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">job_event_handler</span>(type, status, labels, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    jobs<span style="color:#f92672">.</span>handle_job_events(type, status, labels)
</span></span></code></pre></div><p>This filters for the events of all Jobs with the <code>homelab/part-of: hlsb</code> label.</p>
<p>The actual handling of events then happens in this function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_job_events</span>(type, status, labels):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> type <span style="color:#f92672">in</span> [<span style="color:#e6db74">&#34;None&#34;</span>, <span style="color:#e6db74">&#34;DELETED&#34;</span>, <span style="color:#66d9ef">None</span>]:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Ignored job event:</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Labels: </span><span style="color:#e6db74">{</span>labels<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;hlsb&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> labels:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;Got event without hlsb label:&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Status: </span><span style="color:#e6db74">{status}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Labels: </span><span style="color:#e6db74">{labels}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        ns, name <span style="color:#f92672">=</span> labels[<span style="color:#e6db74">&#34;hlsb&#34;</span>]<span style="color:#f92672">.</span>split(<span style="color:#e6db74">&#34;_&#34;</span>)
</span></span><span style="display:flex;"><span>        job_state <span style="color:#f92672">=</span> get_job_state(status)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> job_state <span style="color:#f92672">in</span> [JobState<span style="color:#f92672">.</span>COMPLETE, JobState<span style="color:#f92672">.</span>FAILED]:
</span></span><span style="display:flex;"><span>            logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Found finished job for </span><span style="color:#e6db74">{</span>ns<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>            set_job_finished_event(ns, name)
</span></span></code></pre></div><p>This function only concerns itself with failed or completed jobs. And if it
finds such a job, it sets a &ldquo;Job finished&rdquo; event. These events are part of the
Python standard library&rsquo;s async synchronization primitives, see <a href="https://docs.python.org/3/library/asyncio-sync.html#asyncio.Event">here</a>.
They&rsquo;re awaitable objects, where the coroutine waiting on an event can be
woken up by executing the <code>event.set</code> method. And that&rsquo;s what happens in the
<code>set_job_finished_event</code> function called when the Job has been detected as
finished.</p>
<p>So how to determine whether a k8s Job has finished, failed or is still running?
Took me a while to figure out, but the safest way seems to be to look at the
<code>Job.status.conditions</code> array. If the <code>status</code> doesn&rsquo;t have that member at all,
it&rsquo;s a pretty good bet that the Job is running or pending.
Then you can iterate over the conditions, and if the given condition has <code>type</code>
<code>Failed</code> and <code>status</code> <code>True</code>, the job has failed. Same if <code>type</code> is <code>Complete</code>
and <code>status</code> is still <code>True</code>. Here&rsquo;s an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;conditions&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;lastProbeTime&#34;</span>: <span style="color:#e6db74">&#34;2025-01-10T01:30:23Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;lastTransitionTime&#34;</span>: <span style="color:#e6db74">&#34;2025-01-10T01:30:23Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: <span style="color:#e6db74">&#34;True&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;Complete&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>And here&rsquo;s how that looks in Python:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_job_state</span>(job_status):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;conditions&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> job_status <span style="color:#f92672">or</span> <span style="color:#f92672">not</span> job_status[<span style="color:#e6db74">&#34;conditions&#34;</span>]:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>RUNNING
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> cond <span style="color:#f92672">in</span> job_status[<span style="color:#e6db74">&#34;conditions&#34;</span>]:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> cond[<span style="color:#e6db74">&#34;type&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;Failed&#34;</span> <span style="color:#f92672">and</span> cond[<span style="color:#e6db74">&#34;status&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;True&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>FAILED
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> cond[<span style="color:#e6db74">&#34;type&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;Complete&#34;</span> <span style="color:#f92672">and</span> cond[<span style="color:#e6db74">&#34;status&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;True&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>COMPLETE
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>RUNNING
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>And that&rsquo;s it. To be completely honest, this is the third time I&rsquo;m typing this
conclusion, and I almost <code>rm -rf</code>&rsquo;d this post multiple times. I don&rsquo;t think
it&rsquo;s that good or engaging. It seems I&rsquo;m just not that good at writing programming
blog posts. I hope those of you who made it to this point still got something
out of it.</p>
<p>So, time to do a recap: What did this bring me? And was it a good idea?
It all started out with my burning wish to just copy+paste my backup mechanism
from Nomad to Kubernetes, more-or-less verbatim. Add to that the fact that I
don&rsquo;t get to do much programming at $dayjob, and I was just missing it a bit.
Honestly, if someone were to ask me &ldquo;What&rsquo;s your most-used programming language?&rdquo;,
my honest answer would need to be &ldquo;Whatever Atlassian calls JIRA&rsquo;s markup language.&rdquo;</p>
<p>But I also learned quite a bit. I had never really worked with the k8s API
before, and this was a good way to dive deeper into it. Although I&rsquo;m not really
convinced that possessing the knowledge that writing small operators is something
I&rsquo;m able to do isn&rsquo;t just a tad bit dangerous. &#x1f62c;</p>
<p>My first commit to the repo was on May 9th, 2024. Adding it all up, this took
me nine months to do. With rather long interruptions at times, but most of those
were more due to motivation than anything else. If I had just used something
existing, I would have the k8s migration done by now. But where&rsquo;s the fun in
that?</p>
<p>There&rsquo;s still a lot I would like to refactor in the implementation. For example,
those of you who know the k8s API probably wondered why I went with async events
instead of just creating a &ldquo;watch&rdquo; on the Jobs and waiting for them to finish via that? I&rsquo;m
honestly not sure. But I would like to dive into k8s API watches.
Then there&rsquo;s the UT code. There&rsquo;s so much repeating myself in those tests,
and especially the mocks. Then there&rsquo;s still a lot of hardcoded constants in
the code I&rsquo;d like to make configurable via the HomelabBackupConfig or
HomelabServiceBackup.
And finally, there&rsquo;s also my wish to finally go and learn Golang. With this
operator, I&rsquo;ve got a really good-sized first project. And I would have the
advantage that it&rsquo;s not a greenfield project. Most of the design is already done,
so I would be able to concentrate on writing Go.</p>
<p>I will write one more post on the operator, as part of the Nomad to k8s series,
treating it as just another app and describing what the deployment looks like.</p>
<p>And finally, I&rsquo;m quite happy that I&rsquo;m done with this now. I&rsquo;ve been looking
forward to being able to continue the k8s migration for way too long.</p>
<p>My longing for continuing the migration has been getting so bad that I&rsquo;ve started
to miss YAML.</p>
<p>Almost.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 13: Almost one year</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-13-almost-one-year/</link>
      <pubDate>Thu, 15 Aug 2024 00:10:02 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-13-almost-one-year/</guid>
      <description>Motivation hole</description>
      <content:encoded><![CDATA[<p>Wherein I realize that I&rsquo;ve been at this for almost a year now.</p>
<p>This is part 14 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>It has been quite a while since I last wrote a blog post about the migration.
And I&rsquo;ve realized today that it&rsquo;s been almost a year since I made the decision
to switch over to Kubernetes for my Homelab.
On the 17th of August 2023, I posted about <a href="https://blog.mei-home.net/posts/hashipocalypse/">the HashiPocalypse</a>.</p>
<p>Back then, I laid out my thoughts about HashiCorp&rsquo;s decision to switch all of
their tools, almost all of which I&rsquo;m currently using in my Homelab, to the
BSL license.
Back then, I only announced it as an experiment, but it has become a migration
at this point.</p>
<p>I really started the migration in mid-December 2023, so it hasn&rsquo;t really been
almost a year. The first months went pretty well and I got a lot of the initial
setup and infrastructure into place. At the end of April, I was finally done
with all of the infrastructure, from Ceph Rook for storage to kube-prometheus-stack
for metrics. On the 28th of April, I migrated the first piece of what I&rsquo;d call
&ldquo;real workload&rdquo; over, my <a href="https://www.audiobookshelf.org/">Audiobookshelf</a> server.
This then served me as a test bed, first of all for my workload template, but
also for my backups, which came next.</p>
<p>And that&rsquo;s where the problems started, when I finally realized that k8s doesn&rsquo;t
have a combination of <a href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/">CronJobs</a>
and <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSets</a>.
That&rsquo;s a problem because my current backup setup, on my Nomad cluster, uses such
a type of job to run my backups on every node.</p>
<p>But fear not, I thought: I know Python, I know how to access an API, we&rsquo;re just
going to write our own operator for backups!</p>
<p>And that was a mistake, in hindsight. Don&rsquo;t get me wrong, I will still continue
working on the operator, but starting to write it was a mistake. Because I know
how programming projects generally go when I&rsquo;m working on them alone.
My intro of Nomad to my Homelab was delayed by about half a year because I
decided I don&rsquo;t like the Nomad CLI and wanted something more like docker-compose.
So I got out the Python and wrote it. And it took ages.</p>
<p>This same thing has also happened here. It&rsquo;s not really a complicated implementation.
But it&rsquo;s about my backups, so I want to do it properly. But the project is getting
dragged out, and I&rsquo;m not sure why. The motivation isn&rsquo;t the same as it was while
doing all of the infra setups at the beginning of the year. I averaged about one
commit per week, if that much, now.</p>
<p>The problem is: I really need to get going with the migration. Maintaining what
amounts to two different Homelabs, both hosting important things, is getting a
bit too much.
I mean, will I give up on the plan to implement my backup operator? No, of course
not. Sunk cost fallacy and all that.
But I definitely wish I hadn&rsquo;t started. I would be way farther ahead. I just
need to find where the heck my motivation has gotten to. Perhaps it&rsquo;s just the
summer? My motivation has generally been pretty low when the simple act of typing
a bit more vigorously would already make me break out in a sweat. I really
strongly dislike summer, and I&rsquo;m very much ready for this year&rsquo;s to be over
and done with.</p>
<p>And it has a little bit to do with the ridiculousness of how I write software.
First of all, the first couple of weeks are spend writing very copious amounts
of notes. Making diagrams. Spending way more time on project/tooling setup than
is at all reasonable. And then the tests. The ratio of UT code to production code
is ridiculous. I couldn&rsquo;t write a prototype or MVP if my life depended on it.</p>
<p>So if any one of you meets my motivation, please send it back to me! It will be
the middle-aged guy who looks like his beard really shouldn&rsquo;t be gray yet. You
will recognize it by the large amount of grumbling going on.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part II: Basic Framework</title>
      <link>https://blog.mei-home.net/posts/backup-operator-2-basic-framework/</link>
      <pubDate>Sat, 25 May 2024 19:40:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-2-basic-framework/</guid>
      <description>My first steps in the operator implementation with kopf</description>
      <content:encoded><![CDATA[<p>In the <a href="https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/">last post</a>
of my <a href="https://blog.mei-home.net/tags/hlbo/">Backup Operator series</a>, I lamented the state
of permissions in the <a href="https://github.com/nolar/kopf">kopf</a> Kubernetes Operator
framework. After some thinking, I decided to go ahead with kopf and just accept
the permission/RBAC ugliness.</p>
<p>I&rsquo;ve just finished implementing the first cluster state change in the operator,
so I thought this is a good place to write a post about my approach and setup.</p>
<p>The journey up to now has been pretty interesting. I learned a bit about the
Kubernetes API, and a lot about how cooperative multitasking with coroutines
works in Python.</p>
<h2 id="why-write-an-entire-operator">Why write an entire operator?</h2>
<p>I&rsquo;ve already written some things about my backup setup in
<a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">the Kubernetes migration post</a>
which triggered this operator implementation.</p>
<p>Just to give a short refresher: I need to run daily backups on the persistent
volumes and S3 buckets of the services running in my Homelab. I&rsquo;m currently
doing that by launching a run-to-completion job on every one of my Nomad hosts
which backs up all the volumes which happen to be mounted on their host at the
time.
I can&rsquo;t do that in k8s, because it seems to lack a run-to-completion,
run-on-every-host type of workload. <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs</a>
can do the run-to-completion part, and <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSets</a>
can do the run-on-every-host part, but there doesn&rsquo;t seem to be a workload type
which can do both in one.
And that&rsquo;s why I&rsquo;ve decided to go with writing my own operator. There are two
main benefits this approach will have, compared to my previous one. First, I
will be able to explicitly schedule the second stage of my backup, backing up
certain backups onto an external disk. Right now, I just schedule that phase an
hour after the previous one.
Second, I will be able to package the backup config for each individual service.
In my current approach, I have the definition of which volumes and buckets to
back up configured in the backup job&rsquo;s config. With the Kubernetes operator, I
will introduce a CRD that can be deployed together with each service, e.g. as
part of the Helm chart.</p>
<h2 id="overview-of-the-approach">Overview of the approach</h2>
<p>As I&rsquo;ve mentioned above, I will write the operator in Python and use the
<a href="https://github.com/nolar/kopf">kopf</a> framework to do it. This is simply
because I&rsquo;m currently familiar with three languages: C++, C and Python. And
Python is the most comfortable of the three.
Due to the RBAC problems I described <a href="https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/">in my last post</a>, I briefly looked into other possibilities. But the Kubernetes ecosystem seems
to mostly live in Golang, which I haven&rsquo;t written anything in yet. And the main
goal currently is to get ahead with the Homelab migration to k8s, not to learn
yet another programming language. &#x1f642;</p>
<p>There will be a total of three custom resources the operator will look for.
The first one, HomelabBackupConfig, will be a one-per-cluster resource and
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apiextensions.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CustomResourceDefinition</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelabbackupconfigs.mei-home.net</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scope</span>: <span style="color:#ae81ff">Namespaced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#ae81ff">mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">names</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">plural</span>: <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">singular</span>: <span style="color:#ae81ff">homelabbackupconfig</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">versions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">v1alpha1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">served</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">schema</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">openAPIV3Schema</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;This object describes the general configuration of all backups created by the Homelab backup operator.&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The configuration for all service level backups created by the operator instance.&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">schedule</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The schedule on which all service level backups will be executed.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">scratchVol</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the PVC for scratch space. Needs to be a RWX volume.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the backup S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the service buckets which should be backed up.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the buckets which should be backed up.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the service S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The Secret with the Restic password for the backups.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the password.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">secretKey</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName Secret which contains the Restic password.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration of the Job launched for each service backup.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The container image to be used for all service Jobs.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The command handed to Job.spec.template.containers.command&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Additional entries for the containers.env list. These entries cann only be of the name,value variety. Other forms of env entries are not supported for now.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the env variable to add.&#34;</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">value</span>:
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The value of the env variable to add.&#34;</span>
</span></span></code></pre></div><p>This resource configures all of the common settings which will be shared by
all of the individual service backups I will describe next.</p>
<p>My backups will be running with <a href="https://restic.net/">restic</a>, backing up into
S3 buckets on my Ceph Rook cluster for each service.
As all service level backups will work like this, and back up to the same
S3 service, it makes sense to centralize the configuration, instead of copying
it into every service backup CRD.
This configuration happens in the <code>s3BackupConfig</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the backup S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span></code></pre></div><p>Pretty important to me is the flexibility when it comes to what the k8s Secrets
have to look like. I&rsquo;ve been annoyed with some of the Helm charts I&rsquo;ve been using
for prescribing exactly what the properties in the Secret need to be named,
so I introduced a config option here to not only define the Secret&rsquo;s name, but
also the name of the property for the access and secret keys for the S3
credentials.
The <code>s3ServiceConfig</code> has the same structure, but will be used for the
credentials for accessing the S3 buckets of services, instead of the S3 backup
buckets.</p>
<p>The <code>resticPasswordSecret</code> is the configuration of the restic password to
unlock the restic encryption keys.</p>
<p>Finally, there&rsquo;s the <code>jobSpec</code>. This will likely still change in the future,
as I have not yet implemented that part. This spec will be used to create the
<a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs</a> which
will run the actual backup. One will be created for each of the
HomelabServiceBackup instances I will describe next. I will not go into detail
on this part of the CRD today and instead keep it until I&rsquo;ve actually implemented
the Job creation.</p>
<p>Then there&rsquo;s the HomelabServiceBackup CRD:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apiextensions.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CustomResourceDefinition</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelabservicebackups.mei-home.net</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scope</span>: <span style="color:#ae81ff">Namespaced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#ae81ff">mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">names</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">plural</span>: <span style="color:#ae81ff">homelabservicebackups</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">singular</span>: <span style="color:#ae81ff">homelabservicebackup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">versions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">v1alpha1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">served</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">schema</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">openAPIV3Schema</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;This object describes the configuration of the backups for a specific service.&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">backupBucketName</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the S3 bucket to which the backup should be made.&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The elements, like PVCs and S3 buckets to back up for this service.&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The Type of the element, either s3 or pvc.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">enum</span>:
</span></span><span style="display:flex;"><span>                          - <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>                          - <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the element, either the name of an S3 bucket or a PVC&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">status</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Status of this service backup&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">nextBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Date and time of the next backup run&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">lastBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Status of latest backup&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">state</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">integer</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;State of the last backup. 1: Failed, 0: Successful&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">timestamp</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Date and time the last backup run was executed&#34;</span>
</span></span></code></pre></div><p>This CRD describes the backups to be done for an individual service. It contains
two main parts, the status and the spec. In the spec, I&rsquo;m configuring the
S3 bucket to be used for the backup, and a list of things to back up. Right now,
I&rsquo;ve only got PersistentVolumeClaims and S3 buckets in mind. An instantiation
might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-service-backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backup-tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;non-existant-bucket&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">non-existant-pvc</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">another-non-existant-pvc</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">non-existant-S3-bucket</span>
</span></span></code></pre></div><h2 id="kopf-overview">Kopf overview</h2>
<p>Kopf has a relatively nice approach to listening for changes to resources it is
supposed to be watching. It makes use of Kubernetes&rsquo; watch API. And then it
combines some Kubernetes events to provide a nicer interface than could be
provided when just using plain events.</p>
<p>The main method are event handlers for a small group of events. These handlers
can be defined for each of four different event categories:</p>
<ol>
<li>Creation of a new resource</li>
<li>Resume of the handler for an already existing resource after an operator
restart</li>
<li>Deletion of a resource</li>
<li>Change of a resource</li>
</ol>
<p>In addition, there are daemons, which are long running handlers. Instead of
running to completion for every event, they stay active from the moment a
resource is created to the moment it is deleted. They are automatically started
up after operator restarts as well.</p>
<p>Finally, there is a generic event handler, which does get the full firehose of
Kubernetes events, without the nice provisioning of diffs and the like you get
for kopf&rsquo;s event category handlers.</p>
<p>The handlers are Python functions with a decorator which describes the
event group it should listen on and the CRD it should listen for. Those
handlers can also be combined, so you can have the same Python function
handling both, creation of a new resource and resume after the operator restarts.</p>
<p>Handlers generally come in two flavors, using threads or using coroutines.
I spontaneously decided to go with the coroutine approach, because I had never
before used Python&rsquo;s <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a>
feature, but I was familiar with coroutines in C and C++.</p>
<h2 id="handling-the-homelabbackupconfig-crd">Handling the HomelabBackupConfig CRD</h2>
<p>There isn&rsquo;t too much to do with the generic handling for this CRD. There is
only ever supposed to be one of those, and the only thing which needs to be done
with it is to store it in memory in the operator and make it available to the
handlers of the HomelabServiceBackup CRD, so they can use the configs to launch
their job.</p>
<p>The implementation of the handlers themselves I kept pretty simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_backup_config <span style="color:#66d9ef">as</span> backupconf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.startup</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_backup_config_cond</span>(memo, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    memo<span style="color:#f92672">.</span>backup_conf_cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.create</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.resume</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.update</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_resume_update_handler</span>(spec, meta, memo, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> backupconf<span style="color:#f92672">.</span>handle_creation_and_change(meta[<span style="color:#e6db74">&#34;name&#34;</span>],
</span></span><span style="display:flex;"><span>                                                memo<span style="color:#f92672">.</span>backup_conf_cond, spec)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.delete</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">delete_handler</span>(meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    backupconf<span style="color:#f92672">.</span>handle_deletion(meta[<span style="color:#e6db74">&#34;name&#34;</span>])
</span></span></code></pre></div><p>This sets up a combined handler for creation, resumption and updates of the
CRD. It also creates a <a href="https://docs.python.org/3/library/asyncio-sync.html#condition">Condition</a>
which I will later use in the HomelabServiceBackup handlers to notify them
when the config changed.</p>
<p>The <code>homelab_backup_config</code> module looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> datetime
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> logging
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> croniter
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>__CONFIG <span style="color:#f92672">=</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_creation_and_change</span>(name, cond, spec):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">global</span> __CONFIG
</span></span><span style="display:flex;"><span>    __CONFIG <span style="color:#f92672">=</span> spec
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Set backup config from </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> to: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        cond<span style="color:#f92672">.</span>notify_all()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_deletion</span>(name):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">global</span> __CONFIG
</span></span><span style="display:flex;"><span>    __CONFIG <span style="color:#f92672">=</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>warning(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Config </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> deleted. No backups will be scheduled!&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_config</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> __CONFIG
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_next_service_time</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> __CONFIG:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Service schedule time requested, but no config present.&#34;</span>
</span></span><span style="display:flex;"><span>                      )
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> (<span style="color:#e6db74">&#34;serviceBackup&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">or</span> <span style="color:#e6db74">&#34;schedule&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>]):
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Config serviceBackup.schedule is missing.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    now <span style="color:#f92672">=</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> croniter<span style="color:#f92672">.</span>croniter(__CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>][<span style="color:#e6db74">&#34;schedule&#34;</span>], now
</span></span><span style="display:flex;"><span>    )<span style="color:#f92672">.</span>get_next(datetime<span style="color:#f92672">.</span>datetime)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_service_backup_spec</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> __CONFIG <span style="color:#f92672">or</span> <span style="color:#e6db74">&#34;serviceBackup&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Config serviceBackup is missing.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> __CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>]
</span></span></code></pre></div><p>As I said, I kept it <em>really</em> simple.
This implementation stores the spec as received from the handler in a module
level variable <code>__CONFIG</code> and then has a couple functions to make it available
to the rest of the operator.
The only really interesting part is the <code>get_next_service_time</code> function. It
looks at the <code>spec.serviceBackup.schedule</code> value, which is a string in cron
format, for example like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 18 * * *&#34;</span>
</span></span></code></pre></div><p>I decided to keep all times in UTC internally, just to prevent confusing myself.
Instead of writing my own cron parser, I used <a href="https://github.com/kiorky/croniter">croniter</a>.
It doesn&rsquo;t just provide a parser for the cron format, but also provides a helper
to get the time and date of the next scheduled execution, which I make use of
here.</p>
<h2 id="implementing-the-homelabservicebackup-handling">Implementing the HomelabServiceBackup handling</h2>
<p>The HomelabServiceBackup resource describes the backup for an individual
service. In the operator, it will ultimately need to launch a Job to run the
backup of the configured PersistentVolumeClaims and S3 buckets belonging to the
service.</p>
<p>The first thing I implemented was the waiting for the scheduled execution time
of the backup. For this, I initially thought to use kopf&rsquo;s <a href="https://kopf.readthedocs.io/en/stable/timers/">timers</a>,
but quickly realized that those only allow for a fix interval. But I needed an
adaptable wait, depending on the schedule configured on the HomelabBackupConfig.
For that reason, I reached for kopf&rsquo;s <a href="https://kopf.readthedocs.io/en/stable/daemons/">Daemons</a>.
These are long running handlers. One is created for each instance of the watched
resource.</p>
<p>The handler function itself is again simple, as I just call a separate function
in a module:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_service_backup <span style="color:#66d9ef">as</span> servicebackup
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.startup</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_backup_config_cond</span>(memo, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    memo<span style="color:#f92672">.</span>backup_conf_cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.daemon</span>(<span style="color:#e6db74">&#34;homelabservicebackups&#34;</span>, initial_delay<span style="color:#f92672">=</span><span style="color:#ae81ff">30</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">service_backup_daemon</span>(name, namespace, spec, memo, stopped, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> servicebackup<span style="color:#f92672">.</span>homelab_service_daemon(name, namespace, spec, memo,
</span></span><span style="display:flex;"><span>                                               stopped)
</span></span></code></pre></div><p>The daemon will spend most of its time waiting, as it only needs to do something
in two cases:</p>
<ol>
<li>When the scheduled time for a backup has arrived</li>
<li>When the backup schedule changes</li>
</ol>
<p>Let&rsquo;s look at the second case first. This is the reason for the usage of the
memo. The <a href="https://kopf.readthedocs.io/en/stable/memos/">memo</a> is a generic
container handled by kopf and made available to all handlers. I&rsquo;m creating a
Condition during operator startup. Every daemon will wait on this condition,
and the handler for HomelabBackupConfig updates will notify all waiters on
that condition when the HomelabBackupConfig changes. This is necessary because
the schedule is configured in the HomelabBackupConfig, so daemons might need to
adjust their wait timer.</p>
<p>Here is what that waiting currently looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">WakeupReason</span>(Enum):
</span></span><span style="display:flex;"><span>    TIMER <span style="color:#f92672">=</span> auto()
</span></span><span style="display:flex;"><span>    SCHEDULE_UPDATE <span style="color:#f92672">=</span> auto()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">cond_waiter</span>(cond):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> cond<span style="color:#f92672">.</span>wait()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">wait_for</span>(waittime, update_condition):
</span></span><span style="display:flex;"><span>    cond_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(cond_waiter(update_condition),
</span></span><span style="display:flex;"><span>                                    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;condwait&#34;</span>)
</span></span><span style="display:flex;"><span>    sleep_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(asyncio<span style="color:#f92672">.</span>sleep(waittime), name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sleepwait&#34;</span>)
</span></span><span style="display:flex;"><span>    done, pending <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait([cond_task, sleep_task],
</span></span><span style="display:flex;"><span>                                       return_when<span style="color:#f92672">=</span>asyncio<span style="color:#f92672">.</span>FIRST_COMPLETED)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> p <span style="color:#f92672">in</span> pending:
</span></span><span style="display:flex;"><span>        p<span style="color:#f92672">.</span>cancel()
</span></span><span style="display:flex;"><span>    wake_reasons <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> done:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;condwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;sleepwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>TIMER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wake_reasons
</span></span></code></pre></div><p>As I&rsquo;ve noted before, I&rsquo;m using Python&rsquo;s asyncio module, so instead of threads,
I&rsquo;m using coroutines. Luckily, the Python standard library already provides the
means to wait for multiple tasks and even tell me which task is done waiting
when the function returns. So here, I&rsquo;m creating two tasks. One is waiting on
the given <code>waittime</code>. This is the difference between the current time and the
next scheduled backup, in seconds. The second one is waiting on the condition
I mentioned previously. This condition will be notified by the handler for the
HomelabBackupConfig when that resource changes. This is necessary because the
daemon might need to adjust its wait time if the schedule for backups has changed.</p>
<p>Finally, I&rsquo;m checking which task finished waiting, and return a list of
enums to tell the caller why it woke up, to take different actions.</p>
<p>Then there&rsquo;s the main loop of the daemon:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">homelab_service_daemon</span>(name, namespace, spec, memo, stopped):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Launching daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#f92672">not</span> stopped:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;In main loop of </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        next_run <span style="color:#f92672">=</span> backupconfig<span style="color:#f92672">.</span>get_next_service_time()
</span></span><span style="display:flex;"><span>        wait_time <span style="color:#f92672">=</span> next_run <span style="color:#f92672">-</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> wait_for(wait_time<span style="color:#f92672">.</span>total_seconds(), memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Finished daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span></code></pre></div><p>This doesn&rsquo;t do much at the moment, as I haven&rsquo;t implemented the backups
themselves yet. It runs in an endless loop, checking the <code>stopped</code> variable,
which will be set to <code>True</code> by kopf if the HomelabServiceBackup this daemon is
handling is deleted or the operator is stopped. Kopf will also throw a
<a href="https://docs.python.org/3/library/asyncio-exceptions.html#asyncio.CancelledError">CancelledError</a>
into the coroutine in those cases, so the daemon will also be stopped when it
is currently waiting.</p>
<p>The waiting time is computed with the <code>get_next_service_time</code> function I discussed
above.</p>
<h2 id="implementing-status-updates">Implementing status updates</h2>
<p>The goal which triggered this blog post was me finally getting the scheduled
triggering and updates of the HomelabServiceBackup&rsquo;s status implemented, which
was my first change of the cluster status via the operator.</p>
<p>My goal was to have each daemon update a field in its HomelabServiceBackup
resource with the scheduled time of the next backup, which would ultimately
look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">status</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nextBackup</span>: <span style="color:#e6db74">&#34;2024-05-25T18:30:00+00:00&#34;</span>
</span></span></code></pre></div><p>The <code>status.nextBackup</code> field is what I was interested in setting. I first
looked at the <a href="https://github.com/kubernetes-client/python">Kubernetes Python Client</a>,
but found that it did not support asyncio. But I quickly found
<a href="https://github.com/tomplus/kubernetes_asyncio">kubernetes_asyncio</a>.
An interesting thing I learned while looking at these two libraries is that they
were, for the most part, not hand-written. Instead, they use the <a href="https://github.com/openapitools/openapi-generator">openapi-generator</a>
to automatically generate the API code from the Kubernetes API definition. Which
is pretty cool to see, to be honest. It leads to boatloads of repeated code, but
the alternative of writing all that code by hand probably doesn&rsquo;t bear thinking
about.</p>
<p>Of course, one of the downsides of using the Python API client was that it would
not have API support for the CRDs I&rsquo;ve written for my own cluster. Instead,
I needed to use the generic <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md">CustomObjectsAPI</a>.</p>
<p>Initially, because I wanted to specifically update the status of my resources,
I looked at the <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#patch_namespaced_custom_object_status">patch_namespaced_custom_object_status</a>
API. But running that API against a resource which did not have the status set
yet just returns a 404. It took me a <em>long while</em> to realize that the 404 was
not due to an error on my end, but simply because the resource needed to have
a status already for the status API to work.</p>
<p>So instead, I reached for the <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#patch_namespaced_custom_object">patch_namespaced_custom_object</a>
API. That, too, had a lot of issues. I initially thought I was the first person
to use the Python API package for accessing custom objects.
All the examples I could find stated that this should work:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kubernetes_asyncio <span style="color:#f92672">import</span> client, config
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kubernetes_asyncio.client.api_client <span style="color:#f92672">import</span> ApiClient
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pprint <span style="color:#f92672">import</span> pprint
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">main</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> config<span style="color:#f92672">.</span>load_kube_config()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> ApiClient() <span style="color:#66d9ef">as</span> api:
</span></span><span style="display:flex;"><span>        mine <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>CustomObjectsApi(api)
</span></span><span style="display:flex;"><span>        res <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> mine<span style="color:#f92672">.</span>patch_namespaced_custom_object(<span style="color:#e6db74">&#34;mei-home.net&#34;</span>, <span style="color:#e6db74">&#34;v1alpha1&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;backups&#34;</span>, <span style="color:#e6db74">&#34;homelabservicebackups&#34;</span>, <span style="color:#e6db74">&#34;test-service-backup&#34;</span>,
</span></span><span style="display:flex;"><span>                body<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;status&#34;</span>:{<span style="color:#e6db74">&#34;lastBackup&#34;</span>: {<span style="color:#e6db74">&#34;state&#34;</span>:<span style="color:#ae81ff">1</span>, <span style="color:#e6db74">&#34;timestamp&#34;</span>:<span style="color:#e6db74">&#34;foobar&#34;</span>}}}
</span></span><span style="display:flex;"><span>                )
</span></span><span style="display:flex;"><span>        pprint(res)
</span></span><span style="display:flex;"><span>asyncio<span style="color:#f92672">.</span>run(main())
</span></span></code></pre></div><p>But it did not. Instead, I kept getting errors like this back:</p>
<pre tabindex="0"><code>kubernetes_asyncio.client.exceptions.ApiException: (415)
Reason: Unsupported Media Type
HTTP response body: {&#34;kind&#34;:&#34;Status&#34;,&#34;apiVersion&#34;:&#34;v1&#34;,&#34;metadata&#34;:{},&#34;status&#34;:&#34;Failure&#34;,
&#34;message&#34;:&#34;the body of the request was in an unknown format - accepted media types
include: application/json-patch+json, application/merge-patch+json,
application/apply-patch+yaml&#34;,
&#34;reason&#34;:&#34;UnsupportedMediaType&#34;,
&#34;code&#34;:415}
</code></pre><p>I finally found <a href="https://github.com/tomplus/kubernetes_asyncio/issues/68">this bug</a>.
It seems to indicate that the issue is a wrong media type getting set in the
<code>content-type</code> header. This lead me to the <a href="https://github.com/tomplus/kubernetes_asyncio/blob/master/examples/patch.py">examples</a>
file, which shows that a specific content type could be forced, by adding
<code>_content_type='application/merge-patch+json'</code> as a parameter to the
<code>patch_namespaced_custom_object</code> call. With that addition, I was finally able
to properly update the time for the next backup in the status, by adding these
lines to the <code>homelab_service_daemon</code> function from before:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>status_body <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;nextBackup&#34;</span>: next_run<span style="color:#f92672">.</span>isoformat()
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">await</span> kubeapi<span style="color:#f92672">.</span>patch_mei_home_custom_object(
</span></span><span style="display:flex;"><span>    namespace, kubeapi<span style="color:#f92672">.</span>HOMELABSERVICEBACKUP_PLURAL, name, status_body)
</span></span></code></pre></div><p>The <code>patch_mei_home_custom_object</code> function is just a thin wrapper around
the <code>patch_namespaced_custom_object</code> function from above.</p>
<h2 id="some-notes-on-testing">Some notes on testing</h2>
<p>Writing UTs was not always simple here. First of all, I needed to employ a lot
of mocks to remove any attempted k8s cluster access. I&rsquo;m seriously considering
buying some additional Pis and setting up a test cluster. &#x1f601;</p>
<p>My first generic issue was: How do I even properly unit test asyncio code?
Luckily, that issue was easy to answer, at least in the abstract: I used
<a href="https://github.com/pytest-dev/pytest-asyncio">pytest-asyncio</a>. It allows me to
add <code>@pytest.mark.asyncio</code> at the top of my test function, or entire test classes,
and the pytest plugin will automatically setup the event loop infrastructure
and execute the test functions with it.</p>
<p>Still, I had a particular challenge with testing the waiting code, specifically
when it comes to testing whether the Condition properly fires. As a reminder,
here is what the code looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">cond_waiter</span>(cond):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> cond<span style="color:#f92672">.</span>wait()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">wait_for</span>(waittime, update_condition):
</span></span><span style="display:flex;"><span>    cond_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(cond_waiter(update_condition),
</span></span><span style="display:flex;"><span>                                    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;condwait&#34;</span>)
</span></span><span style="display:flex;"><span>    sleep_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(asyncio<span style="color:#f92672">.</span>sleep(waittime), name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sleepwait&#34;</span>)
</span></span><span style="display:flex;"><span>    done, pending <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait([cond_task, sleep_task],
</span></span><span style="display:flex;"><span>                                       return_when<span style="color:#f92672">=</span>asyncio<span style="color:#f92672">.</span>FIRST_COMPLETED)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> p <span style="color:#f92672">in</span> pending:
</span></span><span style="display:flex;"><span>        p<span style="color:#f92672">.</span>cancel()
</span></span><span style="display:flex;"><span>    wake_reasons <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> done:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;condwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;sleepwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>TIMER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wake_reasons
</span></span></code></pre></div><p>And here is my initial attempt at the test code:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> unittest.mock <span style="color:#f92672">import</span> AsyncMock, Mock
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_service_backup <span style="color:#66d9ef">as</span> sut
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@pytest.mark.asyncio</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">TestCondWait</span>:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">test_cond_wait_works</span>(self):
</span></span><span style="display:flex;"><span>        cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>        test_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(sut<span style="color:#f92672">.</span>wait_for(<span style="color:#ae81ff">15</span>, cond))
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>            cond<span style="color:#f92672">.</span>notify_all()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> test_task
</span></span><span style="display:flex;"><span>        res <span style="color:#f92672">=</span> test_task<span style="color:#f92672">.</span>result()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">assert</span> res <span style="color:#f92672">==</span> [sut<span style="color:#f92672">.</span>WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE]
</span></span></code></pre></div><p>I&rsquo;m trying to test whether the Condition works properly. My thinking is that
the code path goes like this:</p>
<ol>
<li>[testcode]: Creates an async task ready to run, executing the function under
test.</li>
<li>[appcode]: Runs until it hits the <code>asyncio.wait</code> line</li>
<li>[appcode]: Now waits for either the timer to expire or the Condition to be
triggered, hands back execution to the [testcode]</li>
<li>[testcode]: Executes the <code>cond.notify_all</code> function</li>
<li>[testcode]: Awaits the task, handing execution back to [appcode]</li>
<li>[appcode]: Gets notified in <code>cond_waiter</code> and runs to completion</li>
</ol>
<p>But that was not what happened. Sprinkling in some <code>print</code> statements, I found
that the test code continues running after the <code>create_task</code> call, straight
through the <code>notify_call</code> call. The first time the wait_for gets to do anything
is when the test code hits the <code>await test_task</code> line. And only then does it
reach the <code>await cond.wait</code> line. But at this point, the test code already
executed the <code>notify_all</code>, and the <code>wait_for</code> function does not return until the
timer, of the <code>sleepwait</code> task, is hit, resulting in a failed UT.</p>
<p>The only way I found around this issue is to have the test code explicitly hand
execution off. I did this by introducing a <code>await asyncio.sleep(0.05)</code> before
the <code>async with cond:</code> line of the test function.
Then the <code>wait_for</code> function gets to run until it hits the <code>await cond.wait</code> and
gets properly notified and the test reliably succeeds.</p>
<p>This was, yet again, a case where the UT ends up being more complicated than the
actual code.</p>
<p>One more issue I hit had to do with the merciless advance of time. Have another
look at the <code>homelab_service_daemon</code> function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">homelab_service_daemon</span>(name, namespace, spec, memo, stopped):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Launching daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#f92672">not</span> stopped:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;In main loop of </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        next_run <span style="color:#f92672">=</span> backupconfig<span style="color:#f92672">.</span>get_next_service_time()
</span></span><span style="display:flex;"><span>        wait_time <span style="color:#f92672">=</span> next_run <span style="color:#f92672">-</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>        status_body <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;nextBackup&#34;</span>: next_run<span style="color:#f92672">.</span>isoformat()
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> kubeapi<span style="color:#f92672">.</span>patch_mei_home_custom_object(
</span></span><span style="display:flex;"><span>            namespace, kubeapi<span style="color:#f92672">.</span>HOMELABSERVICEBACKUP_PLURAL, name, status_body)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> wait_for(wait_time<span style="color:#f92672">.</span>total_seconds(), memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Finished daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span></code></pre></div><p>It has to compute the waiting time as the difference between the current time
and the time of the next scheduled backup. But how to handle <code>datetime.now</code> in
UTs? I initially tried to do this with a bit of fuzziness when comparing the
arguments handed to the mocked <code>wait_for</code> with the expected wait time, but that
seemed a bit too brittle.</p>
<p><a href="https://github.com/spulec/freezegun">Freezegun</a> to the rescue. It provides a
nice API to patch <code>datetime.now</code> (and several other related functions) so that
it always returns a deterministic value.
Using it in a UT to verify that <code>homelab_service_daemon</code> calls <code>wait_for</code> as
expected could look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@pytest.fixture</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">mock_wait_for</span>(self, mocker):
</span></span><span style="display:flex;"><span>    wait_for_mock <span style="color:#f92672">=</span> AsyncMock(spec<span style="color:#f92672">=</span>sut<span style="color:#f92672">.</span>wait_for)
</span></span><span style="display:flex;"><span>    mocker<span style="color:#f92672">.</span>patch(<span style="color:#e6db74">&#39;hl_backup_operator.homelab_service_backup.wait_for&#39;</span>,
</span></span><span style="display:flex;"><span>                  side_effect<span style="color:#f92672">=</span>wait_for_mock)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wait_for_mock
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">test_daemon_waits_correctly</span>(self, mocker, mock_wait_for):
</span></span><span style="display:flex;"><span>    mock_memo <span style="color:#f92672">=</span> Mock()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_stopped <span style="color:#f92672">=</span> Mock()
</span></span><span style="display:flex;"><span>    mock_stopped_bool <span style="color:#f92672">=</span> Mock(side_effect<span style="color:#f92672">=</span>[<span style="color:#66d9ef">False</span>, <span style="color:#66d9ef">True</span>])
</span></span><span style="display:flex;"><span>    mock_stopped<span style="color:#f92672">.</span><span style="color:#a6e22e">__bool__</span> <span style="color:#f92672">=</span> mock_stopped_bool
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    time_now <span style="color:#f92672">=</span> datetime(year<span style="color:#f92672">=</span><span style="color:#ae81ff">2024</span>, month<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>, day<span style="color:#f92672">=</span><span style="color:#ae81ff">22</span>, hour<span style="color:#f92672">=</span><span style="color:#ae81ff">19</span>, minute<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>                        second<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>, tzinfo<span style="color:#f92672">=</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>    time_trigger <span style="color:#f92672">=</span> datetime(year<span style="color:#f92672">=</span><span style="color:#ae81ff">2024</span>, month<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>, day<span style="color:#f92672">=</span><span style="color:#ae81ff">22</span>, hour<span style="color:#f92672">=</span><span style="color:#ae81ff">19</span>, minute<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>                            second<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>, tzinfo<span style="color:#f92672">=</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_next_service_time <span style="color:#f92672">=</span> Mock(return_value<span style="color:#f92672">=</span>time_trigger)
</span></span><span style="display:flex;"><span>    mocker<span style="color:#f92672">.</span>patch(
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;hl_backup_operator.homelab_backup_config.get_next_service_time&#39;</span>,
</span></span><span style="display:flex;"><span>        side_effect<span style="color:#f92672">=</span>mock_next_service_time)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> freezegun<span style="color:#f92672">.</span>freeze_time(time_now):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> sut<span style="color:#f92672">.</span>homelab_service_daemon(<span style="color:#e6db74">&#34;tests&#34;</span>, <span style="color:#e6db74">&#34;testns&#34;</span>, {}, mock_memo,
</span></span><span style="display:flex;"><span>                                          mock_stopped)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_wait_for<span style="color:#f92672">.</span>assert_awaited_once_with(<span style="color:#ae81ff">2</span>, mock_memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span></code></pre></div><p>I&rsquo;m mocking away both, the <code>wait_for</code> and <code>get_next_service_time</code> functions,
and I&rsquo;m also defining two fixed times, one &ldquo;current&rdquo; time, and one trigger time.
In the <code>with freezegun.freeze_time(time_now)</code> context, <code>datetime.now</code> will now
reliably always return <code>time_now</code> instead of the actual current time. And with
that, I don&rsquo;t need to rely on any fuzziness when testing time-related
functionality.</p>
<h2 id="next-steps">Next steps</h2>
<p>After I&rsquo;m finally happy with the groundwork, I still need to implement a couple
of features before starting with the implementation of the backup Jobs
themselves.
The first one is proper handling of the case where there is no HomelabBackupConfig
configured. Currently, the <code>homelab_service_daemon</code> function would crash, because
<code>get_next_service_time</code> would return <code>None</code>, due to not having any configured
schedule. That is easily fixable by extending the waiting time to &ldquo;forever&rdquo;.
With the Condition mechanism already in place, the daemons will be woken up once
a HomelabBackupConfig appears and can then return to the right schedule.</p>
<p>The second feature currently missing is mostly for testing purposes. Right now,
I&rsquo;m only able to centrally set the schedule, which would be applicable for all
service daemons. This is bound to become cumbersome once I want to start testing
the Job creation and monitoring, so I will want the possibility to trigger a
single service daemon&rsquo;s backup immediately. I will likely introduce another
parameter into the HomelabServiceBackup CRD which makes the daemon trigger
a backup immediately.</p>
<p>Alright, that&rsquo;s all I have to say for now. This is my first &ldquo;programming&rdquo; post
on this blog, and I&rsquo;m honestly not sure how it came out. Were you actually
able to follow, or was it a confused mess? Was it actually interesting to read?
I&rsquo;d be glad for some feedback, e.g. via my <a href="https://social.mei-home.net/@mmeier">Fediverse account</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part I: RBAC permission issues</title>
      <link>https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/</link>
      <pubDate>Sun, 12 May 2024 20:40:59 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/</guid>
      <description>I ran into some issues with the RBAC permissions for my operator</description>
      <content:encoded><![CDATA[<p>As I&rsquo;ve mentioned in my <a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">last k8s migration post</a>,
I&rsquo;m working on writing a Homelab backup operator for my Kubernetes cluster.
And I&rsquo;ve run into some RBAC/permission issues I can&rsquo;t quite figure out. So let&rsquo;s
see whether writing about it helps. &#x1f642;</p>
<p>First, a short overview of the plan. I&rsquo;m using the <a href="https://github.com/nolar/kopf">kopf</a>
framework to build a Kubernetes operator. This operator&rsquo;s main goal is to handle
HomelabServiceBackup resources. These will contain a list of PersitentVolumeClaims
and S3 buckets which need to be backed up. I intend for there to be one
HomelabServiceBackup object for every service, located in the service&rsquo;s Namespace.</p>
<p>At the same time, I started out with defining a HomelabBackupConfig resource.
This will contain some configs which will be common among all service backups,
things like the hostname of the S3 server to store the backups and the image to
be used for the backup jobs.
There will only ever be one instance of this custom resource, and it should
always reside in the Namespace of the operator itself. At the same time, there
should also only ever be one operator for the entire k8s cluster.</p>
<p>This all seemed sensible to me until this afternoon, which was when I had finally
done all the yak-shaving all new projects need, creation of the repo, config of
the CI for image generation and UTs and such things. And I finally had a container
image I could run, with a very simple implementation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> logging
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.create</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.resume</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">resume_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.update</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">update_handler</span>(spec, status, meta, diff, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with diff: </span><span style="color:#e6db74">{</span>diff<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.delete</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">delete_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><p>The intention for this was merely to get a feeling for what I was actually
getting for each of the different events, and to play around with when each of
these handlers would be called.</p>
<p>For the first deployment, I launched kopf with the <code>-A</code> flag, which means it
will use the Kubernetes cluster APIs to watch every Namespace. As noted above,
I want every Namespace to be watched, as every one of them might contain a
HomelabServiceBackup object to take care of the backup for the service residing
in the Namespace.
But I started out with only the HomelabBackupConfig CRD defined, as that&rsquo;s the
first step in my implementation plan. The content of the CRD is not important
for now, I will show them in a later post when I&rsquo;ve actually got the implementation
ready.</p>
<p>I also needed to provide proper RBAC for the deployment, as the operator needs
access to the API server.
My thoughts went like this: For now, I only need the HomelabBackupConfig, and
I only need that in the same Namespace the operator is running in. So I created
the following Role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">events]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">create]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;mei-home.net&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-account</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span></code></pre></div><p>This produced a number of errors when trying to launch my rudimentary operator:</p>
<pre tabindex="0"><code>[2024-05-12 14:19:55,454] kopf._core.reactor.o [ERROR   ]
Watcher for homelabbackupconfigs.v1alpha1.mei-home.net@none has failed:
&#39;homelabbackupconfigs.mei-home.net is forbidden: User &#34;system:serviceaccount:backups:hlbo-account&#34; cannot list resource &#34;homelabbackupconfigs&#34; in API group &#34;mei-home.net&#34; at the cluster scope&#39;
</code></pre><p>Okay, this seems reasonably clear to me. I&rsquo;ve only created a Role and done a
RoleBinding for the <code>backups</code> Namespace, where the operator resides.</p>
<p>I also tried another variant. Instead of using <code>-A</code> to have kopf use the cluster
API, one can provide <code>--namespace=*</code>. This tells kopf to use the namespaced API,
but list all Namespaces and watch them all. Then, I allowed kopf to list all
Namespaces. I kept only allowing it access to the HomelabBackupConfig in the
backups Namespace, though. This results in a lot of errors when it tries to
watch HomelabBackupConfigs in Namespaces other than backups, but the operator
keeps running. So this might a &ldquo;solution&rdquo;.</p>
<p>I could also return to using <code>-A</code> and just configure everything in a ClusterRole.
But that&rsquo;s just too many permissions that the operator doesn&rsquo;t need. And I need
to grant it access to the Jobs API, and I don&rsquo;t want to do that cluster-wide
either.</p>
<p>And finally, the individual handlers don&rsquo;t allow defining a Namespace to watch
a specific resource in. The only config is the command line flag, and that
applies for all resources and their handlers.</p>
<p>So it looks like I have to search for another framework, as kopf doesn&rsquo;t seem to
allow me to do things in the least-privilege way I want them done. &#x1f614;</p>
<p>If you&rsquo;ve got a good idea or you think I&rsquo;ve overlooked something, please feel
free to ping me on the <a href="https://social.mei-home.net/@mmeier">Fediverse</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 12: Backup Plan</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/</link>
      <pubDate>Sun, 05 May 2024 11:10:21 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/</guid>
      <description>It seems I need a new backup strategy</description>
      <content:encoded><![CDATA[<p>Wherein it seems I need a new backup strategy.</p>
<p>This is part 13 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>During the last week, I&rsquo;ve started to work on implementing my backup strategy
for the new Kubernetes cluster. The original plan was to stay with what I&rsquo;m already
doing in my Nomad cluster. But it turns out I can&rsquo;t, so I need a new strategy.</p>
<p>If you&rsquo;re prone to suffering from IT-related nightmares, you might wish to skip
this one. The Nomad backup implementation ain&rsquo;t pretty, and my current plans for
the k8s backup implementation ain&rsquo;t going to make it any prettier. You&rsquo;ve been
warned.</p>
<h2 id="speaking-of-backups">Speaking of backups</h2>
<p>You should do backups. They don&rsquo;t have to be perfect. As you will see in the
next section, mine definitely aren&rsquo;t. But they&rsquo;re serving me well.</p>
<p>I&rsquo;ve only ever needed backups once, right after I left university and my entire
life was stored on a single laptop&rsquo;s internal HDD - and that HDD failed. But:
I was lucky, in that I had backups of my <code>/home</code> directory only about 24h old.</p>
<p>So all good, you might think. But not really. You see, my backups were encrypted.
And now guess where the only copy of that decryption key was stored. Exactly.
I got pretty lucky again, in that I was able to read the key from the broken
disk. But these days, I&rsquo;ve got my keys stored in several places.</p>
<p>In fact, I went one more step: There&rsquo;s an unencrypted USB stick with a copy of
my password manager and PGP keys in a bank vault.
So don&rsquo;t forget to <em>separately</em> back up your backup&rsquo;s encryption key!</p>
<p>Generally, backups are supposed to be 3-2-1:</p>
<ul>
<li>Three different copies of your data</li>
<li>On two different kinds of media</li>
<li>With one copy off-site</li>
</ul>
<p>I do not have an off-site copy anywhere yet, safe for the aforementioned USB
stick with my unencrypted password manager.</p>
<p>And for me, the &ldquo;two different kinds of media&rdquo; isn&rsquo;t really two different kinds
of media. It&rsquo;s more like &ldquo;two independent systems&rdquo;, because even the data that&rsquo;s
really important to me is already too big to even store on multiple DVDs, which
is the only medium I would call &ldquo;sensible&rdquo; for a consumer.</p>
<p>What I do find important is incremental backups. Don&rsquo;t just override the previous
day&rsquo;s backup with the current one. Because incremental backups aren&rsquo;t to protect you
from faulty devices, but rather to protect you from yourself. This might be a
fat-fingered <code>rm -rf /</code>, or a ransomware infection. With incremental backups,
you can always go back.</p>
<p>My general strategy is:</p>
<ul>
<li>One yearly backup</li>
<li>One monthly backup for the past six month</li>
<li>One weekly backup for the past six weeks</li>
<li>A daily backup for the past seven days</li>
</ul>
<p>That should make sure that I can even recover from an accidental delete I
only realized I made a couple months later.</p>
<h2 id="the-current-state-of-backups-in-my-nomad-cluster">The current state of backups in my Nomad cluster</h2>
<p>The basis of my backups, both in my Homelab and for my other hosts, is
<a href="https://restic.net/">restic</a>. It&rsquo;s a CLI backup program which supports a wide
range of backup targets, encryption and incremental backups.</p>
<p>Restic then pushes all of those, both the data volumes from my Homelab services
and my <code>/home</code> dir from my workstation and laptop, into an S3 bucket on my Ceph
cluster, on a pool with two replicas. For the stuff coming from my workstation,
this is already an improvement, because the backup is stored on different disks
than the original data.
But it doesn&rsquo;t do very much for my Homelab data volumes, because those are all
located on that same Ceph cluster. The only advantage those backups bring is their
incremental nature, so if I accidentally delete a volume, I can still get the data
back.</p>
<p>The second part, which fulfills the &ldquo;two different types of media&rdquo; requirement
is a backup on an external HDD. This backup is a bit more selective than the
relatively broad restic backup, because that single external HDD isn&rsquo;t big
enough to hold all of my data. But it is easily big enough to hold all the data
I genuinely care about.</p>
<p>I&rsquo;m running both of those backup jobs through the Nomad cluster. The first
backup, dubbed my &ldquo;services backup&rdquo;, backs up the data volumes attached to my
Homelab services. The second one, called the &ldquo;external backup&rdquo;, takes a couple
of the S3 buckets used as targets in the services backup, and clones them onto
an external HDD.</p>
<h3 id="the-services-backup">The services backup</h3>
<p>The services backup is deployed in the Nomad cluster as a <a href="https://developer.hashicorp.com/nomad/docs/schedulers#system-batch">System Batch</a>
type job. These jobs are similar to Kubernete&rsquo;s DaemonSet, in that they run a
job instance on every host, but they are of the &ldquo;run-to-completion&rdquo; type, similar
to Kubernete&rsquo;s Job object, instead of expecting to start a daemon which stays
active on each node.</p>
<p>This job needs to be run in <em>privileged</em> mode, because it mounts the directory
where CSI drivers mount CSI volumes on the host into the job&rsquo;s container.</p>
<p>Yes, you read that right: On all my Nomad cluster hosts, every night, there runs
a container which mounts the mount directories of all mounted CSI volumes.</p>
<p>Once that&rsquo;s done, the container runs a small Python program I&rsquo;ve written to do
the actual backup.
It does roughly the following:</p>
<ol>
<li>Check which jobs are running on the current node</li>
<li>Check which volumes from those jobs are noted as needing backups in a config file</li>
<li>Run restic against those directories and push them into an S3 bucket on Ceph</li>
</ol>
<p>In addition, I&rsquo;m also using <a href="https://rclone.org/">rclone</a> to back up S3 buckets
from those apps which use them for storage. This, again, does not make the data
more resilient, but it is again protection against accidental bucket deletion
and similar things.</p>
<p>This approach has a number of downsides. First, it is not 100% reliable. I&rsquo;m
backing up the data from volumes while those volumes are mounted and used by
their services. I don&rsquo;t have too much of a problem with that, simply because the
data on disk does not change too much during the times when I run the backups.
But it is still something to consider.
In addition, the overall backup job setup is also not the safest from a security
point of view. I&rsquo;m taking the mounted data for all of my services and mount them
into a single container. At least from a data access standpoint, that container
is basically root on my cluster nodes and can access the data for all services
in the Homelab.</p>
<h3 id="the-external-disk-backup">The external disk backup</h3>
<p>The second part of the backup strategy is to take the backup repositories created
by the service backups described above, as well as the ones created by my host
backups, and cloning them onto an external HDD connected to one of my nodes. This
is also implemented as a Nomad job.</p>
<p>This job receives the <code>/dev</code> device for the USB external HDD, instead of receiving
the already mounted directory. This is another defense in depth, as it allows me
to mount the backup disk only when it is really needed/used, and not have it
mounted all the time. This is a small defense against both, encryption ransomware
and accidental deletion.</p>
<p>But it also has one downside: Security, again. To be allowed to call the <code>mount</code>
command in the container, I have to run it in privileged mode.</p>
<p>This job does not have to do anything fancy in the implementation itself. It
mounts the external HDD and then runs rclone on all of the backup S3 buckets
defined in the configuration file for the services backup, plus a couple of
additional buckets for e.g. my <code>/home</code> backups. All of those get cloned onto
the external HDD. Here, I&rsquo;m not using restic and incremental backups, simple
because the individual backup buckets already contain the incremental backups.</p>
<p>This part of the backup I had already transferred over to my Kubernetes cluster,
without much issue.</p>
<h2 id="the-issue-with-migrating-the-backups-to-the-k8s-cluster">The issue with migrating the backups to the k8s cluster</h2>
<p>My main issue came yesterday, when I started to plan the addition of the service
backup job to Kubernetes. The basic functionality seemed to be available. I
could just mount the <code>/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com</code>
directory into the container. That&rsquo;s the directory where the Ceph CSI plugin I
use to provide PersistentVolumes mounts the volumes.</p>
<p>But then I checked how to actually run the job. As noted above, I needed to
have a run-to-completion pod running on every host in the cluster. And it looks
like k8s just doesn&rsquo;t have anything equivalent to Nomad&rsquo;s System Batch type of
job.</p>
<p>So what to do instead? One option would be to change the small Python app I
wrote, so that it doesn&rsquo;t just run the current backup cycle and then exits,
but instead runs continuously. I could then put it into a <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSet</a>
on each k8s node. That would very likely have worked. But since my initial
tests with k8s back in August 2023, I had thought that I might implement a small
&ldquo;Homelab Backups&rdquo; operator.</p>
<h2 id="over-engineered-but-hopefully-fun">Over-engineered, but hopefully fun</h2>
<p>So if we take a hard look at my current setup for services and external backups,
there are a number of crutches in there. First of all, there&rsquo;s the sequencing
problem. The nightly external HDD backup job should only run when all of the
service backup jobs have done their work for the night. But I was never able to
come up with a good way to do that, so I settled for just launching the external
HDD backup an hour after the service backup job. Not very elegant, but worked
well enough.</p>
<p>Then there was the issue of the &ldquo;run the service job on every host&rdquo; approach.
This is a shotgun approach, and also not very explicit in its configs. It&rsquo;s very
possible that there was no job with backups even running on any given host, so
the service backup run on that host would have been a waste.</p>
<p>Finally, the backup configuration, namely which volumes and S3 buckets should
be backed up, was done in configuration files for the two backup jobs - not
the individual app&rsquo;s jobs. So when removing or adding an app, I always had to
remember that I needed to also update the config of the backup jobs.</p>
<p>The idea I came up with, which solves all of the issues above, is to implement
a &ldquo;Homelab Backup&rdquo; Kubernetes Operator. That operator would handle &ldquo;HomelabBackup&rdquo;
objects, which I could individually configure for each app I&rsquo;m running and which
needs backups. When I then remove the app, that manifest would also be removed
and the backup for that particular app would be stopped.</p>
<p>It might look something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucket</span>: <span style="color:#ae81ff">my-nextcloud-backup-bucket</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 2 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeClaims</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-nextcloud-pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">s3Buckets</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-nextcloud-data-bucket</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This would allow the definition of PersistentVolumeClaims and S3 buckets to back
up, and also where to back them up to.</p>
<p>The operator would then not start one backup job for every node, but instead
launch one k8s Job workload per HomelabBackup object. I would also be able to
then watch those Jobs, and once they all finish one way or the other, I could
then launch the external backup Job right away, instead of waiting for an
arbitrary amount of time. I could now make the dependency explicit.</p>
<p>With the operator launching the jobs, I will also be able to launch the job on
the right node where the volume is currently mounted. This can be done by looking
at Kubernete&rsquo;s VolumeAttachment objects, that show on which node any given
volume is currently mounted.</p>
<p>I&rsquo;m also considering some scheduling, to make sure that on any given node,
there&rsquo;s only ever going to be a single Job running, because anything else would
likely tax my 1 Gbps network.</p>
<p>Looking around, I found the <a href="https://github.com/nolar/kopf">kopf</a> framework
for Kubernetes Operators in Python. This looks pretty well suited for my needs,
and Python is currently among my most familiar languages anyway. It would be nice
to go for Go instead, but I would first have to familiarize myself with the
language before I could write the operator. And the main goal here is still to
move forward with the k8s migration.</p>
<p>Overall, I&rsquo;m not actually too mad about this detour. It looks like it&rsquo;s going
to be an interesting dive into Kubernete&rsquo;s API and operator implementations,
and it&rsquo;s going to fix a couple of problems with my old backup implementation.
In the end, stuff like this is why I set the migration up in such a way that
I could do it iteratively, while both the Nomad and k8s clusters run side by
side.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 11: Container Registry with Harbor</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-11-harbor/</link>
      <pubDate>Sat, 27 Apr 2024 21:20:46 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-11-harbor/</guid>
      <description>Running harbor for internal images and as a proxy for external registries</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my internal container registry to Harbor.</p>
<p>This is part 12 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Let&rsquo;s start by answering the obvious question: Why even have an internal
container registry? For me, there are two reasons:</p>
<ol>
<li>Some place to put my own container images</li>
<li>A cache for external images</li>
</ol>
<p>Most of my internal images are slightly changed external images. A prime example
is my Fluentd image. I&rsquo;ve extended the official image with a couple of additional
plugins. And I needed some place to store them.</p>
<p>My main reason for point 2) is to avoid waste. Why reach out to the Internet and
put additional unnecessary load on somebody else&rsquo;s infrastructure by pulling the
same image 12 times? It makes a lot more sense to me to only do that once and
then use an internal cache.
A secondary reason was of course the introduction of the DockerHub rate limit.
I tended to hit that pretty regularly, especially when I was working on my CI.</p>
<p>A tertiary reason is Deutsche Telekom. My ISP. A couple of years ago, they
tended to regularly get into peering battles with their tier 1 peering partners,
and consequently, you had some days where the entire US was connected down a
512 Kbps pipe. Or at least that was what it felt like. Pulling an image from
DockerHub ran with, I kid you not, 5 Kbps. Those days seem to be over, but I
still like to at least be able to use previously pulled images.</p>
<p>Finally, there might also be a speed advantage when pulling from a local cache
instead of reaching out to the Internet. But for me, that was never really a
consideration. I&rsquo;ve got a 1 Gbps LAN, and most of my storage runs off of a Ceph
cluster, with the Image cache running on my bulk storage HDDs. So there&rsquo;s really
not going to be that much gain.</p>
<p>In my Nomad cluster, I had set up two instances of Docker&rsquo;s official <a href="https://distribution.github.io/distribution/">registry</a>.
Hm, it is now called &ldquo;distribution&rdquo;? And seemingly under the CNCF?
Ah:</p>
<blockquote>
<p>Registry, the open source implementation for storing and distributing container images and other content, has been donated to the CNCF. Registry now goes under the name of Distribution, and the documentation has moved to&hellip;</p></blockquote>
<p>From the <a href="https://docs.docker.com/registry/">official docs on the Docker page</a>.</p>
<p>I chose registry back then because it looked like a pretty low powered solution.
For a GUI, I used <a href="https://github.com/Joxit/docker-registry-ui">docker-registry-ui</a>,
which I can warmly recommend.</p>
<p>But I also pretty much ran it as an open registry, which bothered me a bit. Plus,
I had looked a lot at <a href="https://goharbor.io/">Harbor</a>, but always found that it sounded a bit too much
oriented towards deployment in Kubernetes. And now that I&rsquo;m finally running
my own Kubernetes cluster, I decided to replace my two registry instances with
a single Harbor instance.</p>
<p>Another reason for wanting to look at Harbor was that I think at some point,
registry could only serve as a pull-through cache for DockerHub, but not for
other registries, e.g. <a href="quay.io">Quay.io</a>. But if I read <a href="https://distribution.github.io/distribution/recipes/mirror/">the docs</a>
right, it&rsquo;s now possible to mirror other registries with it as well.</p>
<p>There are other alternatives as well. The first one, <a href="https://jfrog.com/artifactory/">Artifactory</a>,
is out, because while I know that it would fulfill my needs, it is also what we
use at work. And there is no great love lost between me and Artifactory. It will
only get deployed in my Homelab over my dead, cold, decomposing body.</p>
<p>Then there&rsquo;s <a href="https://www.sonatype.com/products/sonatype-nexus-oss">Sonatype Nexus</a>.
But quite frankly: That always gave off pretty strong &ldquo;We&rsquo;re going to go source
available within the week&rdquo; vibes.</p>
<p>Finally, there&rsquo;s Gitea and their relatively recently introduced <a href="https://docs.gitea.com/usage/packages/overview">package management feature</a>, which also includes a container registry.
The main reason I did not go with this one is that it currently doesn&rsquo;t support
pull-through caches, although <a href="https://github.com/go-gitea/gitea/issues/21223">there&rsquo;s a feature request</a>.
In addition, I&rsquo;m still a big fan of running apps which do one thing well, instead
of everything somewhat decently. (He says, looking at his Nextcloud file sharing/note taking/calendar/contacts/bookmarks moloch &#x1f605;)</p>
<p>So Harbor it is. Let&rsquo;s dig into it.</p>
<h2 id="harbor-setup">Harbor setup</h2>
<p>To setup Harbor, I used the <a href="https://github.com/goharbor/harbor-helm">official Helm chart</a>.
It is perfectly workable, but has some quirks when it comes to secrets handling
I will go into detail about later.</p>
<p>Here is my <code>values.yaml</code> file for the chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">expose</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certSource</span>: <span style="color:#ae81ff">none</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">core</span>: <span style="color:#ae81ff">harbor.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">myentrypoint</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">harbor</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">externalURL</span>: <span style="color:#ae81ff">https://harbor.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ipFamily</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ipv6</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageChartStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">disableredirect</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">my-harbor-rgw-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">harbor-random-numbers-here</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regionendpoint</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-myobjectstorename.my-rook-cluster-namespace.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">v4auth</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">rootdirectory</span>: <span style="color:#ae81ff">/harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">encrypt</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secure</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">info</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPassword</span>: <span style="color:#ae81ff">my-admin-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPasswordKey</span>: <span style="color:#ae81ff">mySecretsKey</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretSecretKey</span>: <span style="color:#ae81ff">my-harbor-secret-key-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">portal</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">core</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">jobservice</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jobLoggers</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">database</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">registry</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">registry</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">controller</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">credentials</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">my-harbor-registry-user</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">my-harbor-registry-user-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">trivy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">external</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;harbor-pg-cluster-rw&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">coreDatabase</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">harbor-pg-cluster-app</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">external</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">addr</span>: <span style="color:#ae81ff">redis.redis.svc.cluster.local:6379</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The above is only for completeness&rsquo; sake. Let&rsquo;s go through the config bit-by-bit.
The first part is the setup for external access:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">expose</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certSource</span>: <span style="color:#ae81ff">none</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">core</span>: <span style="color:#ae81ff">harbor.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">myentrypoint</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">harbor</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">externalURL</span>: <span style="color:#ae81ff">https://harbor.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ipFamily</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ipv6</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>This uses my <a href="https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/">Traefik Ingress</a>
to provide external connectivity. I&rsquo;m disabling IPv6 because I don&rsquo;t have it
set up in my Homelab. Please note the (perfectly normal!) spelling of <code>externalURL</code>.
I spelled it wrong, and so all the pull commands which Harbor helpfully shows
in the web UI had the default URL in it. One of those things which can really
only be solved by staring very intently at the YAML for an extended period of
time. &#x1f605;</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageChartStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">disableredirect</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">my-harbor-rgw-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">harbor-random-numbers-here</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regionendpoint</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-myobjectstorename.my-rook-cluster-namespace.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">v4auth</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">rootdirectory</span>: <span style="color:#ae81ff">/harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">encrypt</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secure</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Next up is persistence. Harbor has two approaches here. The first one, which is
the default that I&rsquo;m not using here, is using PersistentVolumeClaims to store
the data, like container images. The second one is using S3, as I&rsquo;m doing here.
I disable the registry&rsquo;s redirect feature here. It would normally redirect any
requests directly to the S3 storage. But access to my S3 storage is very limited
outside the cluster. And with my relatively low levels of activity, I don&rsquo;t need
to reduce the load on Harbor&rsquo;s registry by enabling it.
I&rsquo;m using my <a href="https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/">Ceph Rook based S3 setup</a>
here. Again for completeness&rsquo; sake, here is the manifest for creating the bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">objectbucket.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ObjectBucketClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">generateBucketName</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span></code></pre></div><p>I will talk about the secrets setup later in a separate section.</p>
<p>Another important thing to configure when setting up storage without persistent
volumes is the configuration of storage for the job logs from e.g. the automated
security scans Harbor can conduct on the images:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">jobservice</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jobLoggers</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">database</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span></code></pre></div><p>The important part here is the <code>jobservice.jobLoggers[0]=database</code> setting,
which configures the job service to write logs to the Postgres DB.</p>
<p>I&rsquo;m also disabling all of this security scanning, by switching off <code>trivy.enabled</code>.</p>
<p>Next somewhat interesting thing is the database setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">external</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;harbor-pg-cluster-rw&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">coreDatabase</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">harbor-pg-cluster-app</span>
</span></span></code></pre></div><p>To manage the database, I&rsquo;m using <a href="http://localhost:1313/posts/k8s-migration-8-cloud-native-pg/">my CloudNativePG setup</a>.
Here are some parts of the database config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;50MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;150MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;12800kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;1536kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;128kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_keep_size</span>: <span style="color:#e6db74">&#34;512MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span></code></pre></div><p>I hope this is a good compromise between dumping a long piece of YAML into
every post about an app which needs Postgres, and not showing the database
setup at all.</p>
<p>Finally, I&rsquo;m using my Redis instance for caching and disabling metrics
explicitly, so when I get around to gathering all the app level metrics and
making dashboards, I&rsquo;ve got something to grep for in the Homelab repo. &#x1f609;</p>
<h3 id="issues-with-secrets">Issues with secrets</h3>
<p>I had a couple of issues with the different secrets which Harbor needs.
First, let&rsquo;s start with the place where it&rsquo;s doing it right, the admin
credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPassword</span>: <span style="color:#ae81ff">my-admin-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPasswordKey</span>: <span style="color:#ae81ff">mySecretsKey</span>
</span></span></code></pre></div><p>The Helm chart doesn&rsquo;t just allow setting the Secret to use, but also which key
in that Secret contains the password. That&rsquo;s how it should be done.</p>
<p>The credentials for the database were also okay, because the key the Helm chart
expected, <code>password</code>, happens to also be the key where CloudNativePG stores
the user password in the secret it creates with the credentials. What saddened
me a bit is that I couldn&rsquo;t set the host and port that way as well, because
CNPG puts those into the keys of the Secret it creates as well.</p>
<p>But a lot more annoying were the S3 credentials. Rook creates a secret for
every bucket, complete with the access key and the secret key, as well as the
name of the bucket, which is created semi-randomly. It also provides the correct
endpoint. It would have been nice if I could have handed the ConfigMap Rook
creates over to the Helm chart. Instead, I hardcoded the values in the <code>values.yaml</code>,
which means I would have to do some manual intervention if I ever have to recreate
it all.
For the credentials, I could at least provide the name of an existing Secret.
But as per the <code>values.yaml</code> comments, the access key and the secret key need
to be put into specific keys in the provided Secret. And those were not the
standard key names you would expect, e.g. <code>AccessKey</code> and <code>SecretKey</code>.
No, they have to be <code>REGISTRY_STORAGE_S3_ACCESSKEY</code> and <code>REGISTRY_STORAGE_S3_SECRETKEY</code>.</p>
<p>So what to do now? Manually extract the keys from Rook&rsquo;s secret and write a new
secret by hand? Luckily, no. The Fediverse came through, and somebody proposed
to use external-secret&rsquo;s <a href="https://external-secrets.io/latest/provider/kubernetes/">Kubernetes provider</a>.
This provider allows me to automatically take a Kubernetes Secret, and create
a new secret from it, with the same data in different keys. This is still a pretty
roundabout way, but I decided that this is preferable to the other options,
which would be writing a secret by hand or forking the Helm chart.</p>
<p>First, we need to define some RBAC objects for use by the SecretStore for the
Kubernetes provider.</p>
<p>Here is the ServiceAccount:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span></code></pre></div><p>Next, we need a Role for that ServiceAccount to use:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">selfsubjectrulesreviews</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span></code></pre></div><p>This allows all accounts using the Role to view Secrets in the Namespace the
Role is created in, which in this case is my Harbor Namespace.</p>
<p>Finally, we need a RoleBinding to bind the Role to the ServiceAccount:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span></code></pre></div><p>Once all of that has been created, we can define the SecretStore:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-secrets-store</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteNamespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">serviceAccount</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-root-ca.crt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span></code></pre></div><p>One fascinating thing I learned is that Kubernetes puts the CA certs for the
kube-apiserver in every Namespace, under a ConfigMap called <code>kube-root-ca.crt</code>.</p>
<p>This SecretStore can then be used to take the Secret created by Rook for the
S3 bucket and rewrite it to fit the expectations of the Harbor chart as follows:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;harbor-s3-secret&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-secrets-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">REGISTRY_STORAGE_S3_ACCESSKEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">REGISTRY_STORAGE_S3_SECRETKEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span></code></pre></div><p>This will have external-secrets go to the kube-apiserver and get the
<code>AWS_SECRET_ACCESS_KEY</code> and <code>AWS_ACCESS_KEY_ID</code> keys from the <code>harbor</code> Secret,
which was previously created automatically by Rook through the <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">ObjectBucketClaim</a>
I used to create the S3 bucket for Harbor.</p>
<p>And with these five simple manifests, I could use the Rook S3 Secret with the
Harbor Helm chart. &#x1f605;</p>
<p>One last thing which tripped me during setup were the registry credentials.
The <a href="https://github.com/goharbor/harbor-helm/blob/main/values.yaml">values.yaml</a>
contains these comments on how to set up the credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">registry</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">credentials</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#e6db74">&#34;harbor_registry_user&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;harbor_registry_password&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># If using existingSecret, the key must be REGISTRY_PASSWD and REGISTRY_HTPASSWD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Login and password in htpasswd string format. Excludes `registry.credentials.username`  and `registry.credentials.password`. May come in handy when integrating with tools like argocd or flux. This allows the same line to be generated each time the template is rendered, instead of the `htpasswd` function from helm, which generates different lines each time because of the salt.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># htpasswdString: $apr1$XLefHzeG$Xl4.s00sMSCCcMyJljSZb0 # example string</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">htpasswdString</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>What I did not initially get from that comment was that when using an existing
Secret, both the clear text password and the htpasswd string are required.
This initially put me into an amusing conundrum: I did not have a single host
where I had <code>htpasswd</code> available. &#x1f602;
I ended up using the Apache container just to generate the htpasswd string:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker run -it httpd htpasswd -n -B my-harbor-registry-user
</span></span></code></pre></div><p>I then put that string into the Secret verbatim and was finally able to start
the Harbor instance.</p>
<h2 id="transferring-my-internal-images-to-harbor">Transferring my internal images to Harbor</h2>
<p>The first step I took was to transfer all of my internal images over to Harbor,
by adapting the CI jobs which create them and pointing them to Harbor.</p>
<p>I&rsquo;ve currently got five internal images, most of them just copies of official
images with some additions. I create them with <a href="https://www.drone.io/">Drone CI</a>,
which I will switch over to <a href="https://woodpecker-ci.org/">Woodpecker</a> later as
part of the migration.</p>
<p>The first step in transferring the images was to set up a user for the CI in
Harbor. This can be done with the Harbor Terraform provider, but I did it
manually for now. Then I created a &ldquo;homelab&rdquo; project for those Docker images.</p>
<p>For my image repository, which houses the Dockerfiles for most of my internal
images, I have a <code>.drone.jsonnet</code> file which looks like this:</p>
<pre tabindex="0"><code>local alpine_ver = &#34;3.19.1&#34;;

local Pipeline(img_name, version, pr, alpine=false, alpine_ver_int=alpine_ver) = {
  kind: &#34;pipeline&#34;,
  name:
      if pr then
        &#34;Build &#34;+img_name
      else
        &#34;Release &#34;+img_name,
  platform: {
    arch: &#34;arm64&#34;,
  },
  steps: [
    {
      name:
      if pr then
        &#34;Build Image&#34;
      else
        &#34;Release Image&#34;,
      image: &#34;thegeeklab/drone-docker-buildx&#34;,
      privileged: true,
      settings: {
        repo: &#34;harbor.example.com/homelab/&#34;+img_name,
        registry: &#34;harbor.example.com&#34;,
        username: &#34;myuser&#34;,
        password: {
          from_secret: &#34;harbor-secret&#34;,
        },
        dockerfile: img_name+&#34;/Dockerfile&#34;,
        context: img_name+&#34;/&#34;,
        mirror: &#34;https://harbor-mirror.example.com&#34;,
        debug: true,
        buildkit_config: &#39;debug = true\n[registry.&#34;docker.io&#34;]\n  mirrors = [&#34;harbor.example/dockerhub-cache&#34;]\n[registry.&#34;quay.io&#34;]\n  mirrors = [&#34;harbor.example.com/quay.io-cache&#34;]\n[registry.&#34;ghcr.io&#34;]\n  mirrors = [&#34;harbor.example.com/github-cache&#34;]&#39;,
        tags: [version, &#34;latest&#34;],
        custom_dns: [&#34;10.0.0.1&#34;],
        build_args: std.prune([
          img_name+&#34;_ver=&#34;+version,
          if alpine then
            &#34;alpine_ver=&#34;+alpine_ver_int
        ]),
        platforms: [
          &#34;linux/amd64&#34;,
          &#34;linux/arm64&#34;,
        ],
        dry_run:
        if pr then
          true
        else
          false
      },
    }
  ],
  trigger:
    if pr then
    {
      event: {
        include: [
          &#34;pull_request&#34;
        ]
      }
    }
    else
    {
      branch: {
        include: [
          &#34;master&#34;
        ]
      },
      event: {
        exclude: [
          &#34;pull_request&#34;
        ]
      }
    }
};

local Image(img_name, version, alpine=false, alpine_ver_int=alpine_ver) = [
  Pipeline(img_name, version, true, alpine, alpine_ver_int),
  Pipeline(img_name, version, false, alpine, alpine_ver_int)
];

Image(&#34;gitea&#34;, &#34;1.21.10&#34;)
</code></pre><p>This configuration uses <a href="https://docs.docker.com/build/buildkit/">buildkit</a> via
the <a href="https://github.com/thegeeklab/drone-docker-buildx">drone-docker-buildx</a>
plugin, which is no longer actively developed. One of the reasons why I&rsquo;m
planning to migrate to Woodpecker.
I&rsquo;m creating images for both arm64 and amd64, as most of my Homelab consists
of Raspberry Pis.</p>
<p>One snag I hit during this part of the setup was when I tried to switch the
Fluentd image in my logging setup, already running on Kubernetes, over to
Harbor. I got only pull failures, without any indication what was going wrong.
It turned out that this was the first time my Kubernetes nodes were trying to
access something running in my cluster behind the Traefik ingress at
<code>example.com</code>. And I yet again had to adapt my NetworkPolicy for said Traefik
Ingress.
Looking at the Cilium monitoring, I saw the following whenever one of my k8s
hosts tried to pull the image:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node-&gt;39413: 10.8.5.218:55064 -&gt; 10.8.4.134:8000 tcp SYN
xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node-&gt;39413: 10.8.5.218:55064 -&gt; 10.8.4.134:8000 tcp SYN
xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node-&gt;39413: 10.8.5.218:55064 -&gt; 10.8.4.134:8000 tcp SYN
</code></pre><p>Here the endpoint with the <code>1868</code> identity is Traefik, and we can see that access
from a <code>remote-node</code> identity is failing. This was due to the fact that while I
allowed access from <code>world</code> to Traefik, <code>world</code> in Cilium only means all nodes
outside the Kubernetes cluster. Cluster nodes, including the local host, need
to be explicitly allowed. So I had to add the following to my Traefik NetworkPolicy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">cluster</span>
</span></span></code></pre></div><p><code>cluster</code> includes both, the local host and all other nodes in the cluster.</p>
<p>With that fixed, my homelab project was able to provide images to both, my
Docker based Nomad cluster and my cri-o based Kubernetes cluster:</p>
<figure>
    <img loading="lazy" src="homelab-project.png"
         alt="A screenshot of the repositories in the &#39;homelab&#39; project. It shows five repositories: hn-backup, fluentd, hugo, gitea and taskd. They have from 2 to 5 artifacts and 2 to 19 pulls. Overall, the quota used is 1.43 GiB. The access level is shown as &#39;Public&#39;"/> <figcaption>
            <p>My homelab project with five images after a couple of days of usage.</p>
        </figcaption>
</figure>

<h2 id="setting-harbor-up-as-a-pull-through-cache">Setting Harbor up as a pull-through cache</h2>
<p>With the handling of my own images finished and working, the last step remaining
is the setup of pull-through caches for some public image registries. I wanted
to set up an internal mirror for the following registries:</p>
<ul>
<li><a href="https://hub.docker.com/">DockerHub</a></li>
<li><a href="https://quay.io">quay.io</a></li>
<li><a href="ghcr.io">GitHub Container Registry</a></li>
<li><a href="https://registry.k8s.io">The official k8s registry</a></li>
</ul>
<p>In Harbor, each mirror needs to be set up as a separate project, and it needs to
be accessed at &ldquo;harbor.example.com/project-name&rdquo;. This is an issue for Docker
daemons, which I will go into detail about later.</p>
<p>Here is an example for setting up the <code>quay.io</code> cache. First, an endpoint needs
to be defined:</p>
<figure>
    <img loading="lazy" src="new-mirror.png"
         alt="A screenshot of the &#39;New Registry Endpoint&#39; dialogue in Harbor. In the menu on the left, the entry &#39;Registries&#39; is chosen, and then the button &#39;NEW ENDPOINT&#39; was clicked. The dialogue itself has the &#39;Provider&#39; dropdown filled with &#39;Quay&#39;. The name is given as &#39;Quay.io Cache&#39; and the &#39;Endpoint URL&#39; as &#39;https://quay.io&#39;."/> <figcaption>
            <p>Setting up an endpoint for quay.io</p>
        </figcaption>
</figure>

<p>After the endpoint is defined, the project needs to be created:</p>
<figure>
    <img loading="lazy" src="new-mirror-project.png"
         alt="A screenshot of the &#39;New Project&#39; dialogue in Harbor. In the menu on the left, the entry &#39;Projects&#39; is chosen, and then the button &#39;NEW PROJECT was clicked. In the dialogue, the project-name is &#39;quay-cache&#39;, with the &#39;Access Level&#39; checkbox labeled &#39;Public&#39; being checked. The project quota is left at the default &#39;-1&#39;. The &#39;Proxy Cache&#39; toggle is enabled, and the previously shown &#39;Quay.io cache&#39; is chosen in the dropdown."/> <figcaption>
            <p>Setting up an the mirror project for quay.io</p>
        </figcaption>
</figure>

<p>After these steps are done, a mirror for <a href="https://quay.io">quay.io</a> will be
available at <code>https://harbor.example.com/quay-cache</code>.</p>
<p>Here is a table of the configs for my current mirrors:</p>
<table>
  <thead>
      <tr>
          <th>Name</th>
          <th>Endpoint URL</th>
          <th>Provider</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>dockerhub-cache</td>
          <td><a href="https://hub.docker.com">https://hub.docker.com</a></td>
          <td>Docker Hub</td>
      </tr>
      <tr>
          <td>github-cache</td>
          <td><a href="https://ghcr.io">https://ghcr.io</a></td>
          <td>Github GHCR</td>
      </tr>
      <tr>
          <td>k8s-cache</td>
          <td><a href="https://registry.k8s.io">https://registry.k8s.io</a></td>
          <td>Docker Registry</td>
      </tr>
      <tr>
          <td>quay.io-cache</td>
          <td><a href="https://quay.io">https://quay.io</a></td>
          <td>Quay</td>
      </tr>
  </tbody>
</table>
<p>But there is an issue with Harbor&rsquo;s subpath approach to projects/mirrors:
Docker only supports the <a href="https://docs.docker.com/docker-hub/mirror/#configure-the-docker-daemon">registry-mirror</a>
option. This will only be used for DockerHub images, not any other registry.
And the main issue: It does not support paths in the mirror URL given. Docker
always expects the registry at <code>/</code>. This obviously doesn&rsquo;t work with Harbor&rsquo;s
<code>domain/projectName/</code> scheme.</p>
<p>At the same time, cri-o does not suffer from this issue at all. It follows the
<a href="https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md">OCI containers-registries spec</a>.
With this spec, and the <code>containers-registries.conf</code> file, it can be configured
to rewrite pulls to any registry URL you like.
I will explain this later, but let&rsquo;s start with the more complicated Docker
daemon case.</p>
<h3 id="what-does-docker-actually-do-when-pulling">What does Docker actually do when pulling?</h3>
<p>While trying to figure out how to solve the issue with Docker&rsquo;s <code>registry-mirror</code>
option, I found <a href="https://smile.eu/en/publications-and-events/how-configure-docker-hub-proxy-harbor">this blog post</a>,
which had an excellent idea: Just rewrite Docker&rsquo;s requests to point them to the
right Harbor URL. And it worked. &#x1f642;</p>
<p>Let&rsquo;s start by having a look at the HTTP requests Docker makes when issuing the
following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker pull postgres:10
</span></span></code></pre></div><p>As the command does not have a registry domain defined, Docker defaults to
DockerHub.
Let&rsquo;s imagine the Docker daemon is configured with <code>--registry-mirror https://harbor.example.com</code>.</p>
<p>The first request Docker would try to make is this:</p>
<pre tabindex="0"><code>GET https://harbor.example.com/v2/
</code></pre><p>It would expect a 401 return code, and a <code>www-authenticate</code> header.
This header looks something like this in the case of harbor:</p>
<pre tabindex="0"><code>www-authenticate: Bearer realm=&#34;https://harbor.example.com/service/token&#34;,service=&#34;harbor-registry&#34;
</code></pre><p>Next, Docker tries to request a token:</p>
<pre tabindex="0"><code>https://harbor.example.com/service/token?scope=repository:library/postgres/pull&amp;service=harbor-registry
</code></pre><p>Armed with that token, it would look for the manifest file for the posgres:10
image:</p>
<pre tabindex="0"><code>https://harbor.example.com/v2/library/postgres/manifests/10.0
</code></pre><p>This is where things start going wrong with harbor, because this request, send
to harbor, would look for the <code>library</code> project, which does exist by default,
but is not a DockerHub mirror.</p>
<p>My first attempt to solve this issue was pretty simplistic: I configured an
additional route for the <code>harbor-core</code> service in my Traefik ingress, with an
additional path rewrite to rewrite requests like <code>/v2/library/postgres/manifests/10.0</code>
to <code>/v2/dockerhub-cache/library/postgres/manifests/10.0</code>. It looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-docker-mirror</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;harbor-mirror.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress-k8s.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`harbor-mirror.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">project-rewrite</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-core</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">project-rewrite</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replacePathRegex</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">^\/v2\/(.+)$</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">/v2/dockerhub-cache/${1}</span>
</span></span></code></pre></div><p>This worked somewhat. The initial request for <code>/v2/</code> was rewritten. But then
I did not see the <code>/service/token</code> request hit this new <code>harbor-mirror</code> domain
at all. It went to the <code>harbor</code> domain instead. And that request worked successfully,
Docker got a token from that endpoint.
But: The request would have been for a token to access the <code>/library/postgres</code>
repository.
The next request then went through the <code>harbor-mirror</code> again, which meant the
request was correctly rewritten:</p>
<pre tabindex="0"><code>/v2/dockerhub-cache/library/postgres/manifests/10.0
</code></pre><p>But Harbor would now return a 401, because the token fetched in the previous
step was for <code>/library/postgres/</code>, while the request was now for <code>/dockerhub-cache/library/postgres</code>.</p>
<p>To fix this issue, I did not just need to rewrite the query parameter for the
<code>/service/token</code> request, but also the one before that. Because the domain to
contact for the <code>/service/token</code> request is taken from the <code>www-authenticate</code>
header of the response from the initial <code>/v2/</code> request. And Harbor would of
course always answer with a fixed domain, the one from the <code>externalURL</code>
parameter in the Helm chart. And that&rsquo;s not the route with the rewrite.</p>
<p>So I had to do two additional things, in addition to rewriting paths accessing
<code>/v2/</code>:</p>
<ol>
<li>Rewrite the <code>www-authenticate</code> header from the response to the initial
<code>/v2/</code> request to make the Realm point to the special mirror domain, not
Harbor&rsquo;s domain</li>
<li>Rewrite the <code>scope=repository:</code> in the <code>/service/token</code> request to prefix it
with the name of the DockerHub mirror project in Harbor</li>
</ol>
<p>It turned out that Traefik wasn&rsquo;t really well equipped for that. It can of course
rewrite headers, but there&rsquo;s no facility to work with regexes - I could only
replace the entire <code>www-authenticate</code> header with a static value. And that seemed a bit too inflexible
to me.</p>
<p>So instead, I decided to set up another Pod, running the <a href="https://github.com/caddyserver/caddy">Caddy webserver</a>,
and using it to do the rewrites. I decided to use Caddy instead of Nginx, as the
blog post I linked above did, because I&rsquo;ve already got another Caddy serving
as a webserver for my Nextcloud setup, but currently don&rsquo;t have any Nginx in my
Homelab.</p>
<p>I kept the Caddy setup pretty simple. Here&rsquo;s the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-dockerhub-mirror</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">caddy:2.7.6</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/caddy/</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-mirror-conf</span>
</span></span></code></pre></div><p>Then there&rsquo;s also a service required:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-mirror</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ClusterIP</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>And finally an IngressRoute for my Traefik ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-docker-mirror</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;harbor-mirror.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress-k8s.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`harbor-mirror.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-mirror</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>The really interesting part is the Caddy config:</p>
<pre tabindex="0"><code>apiVersion: v1
kind: ConfigMap
metadata:
  name: caddy-mirror-conf
data:
  Caddyfile: |
    {
      admin off
      auto_https off
      log {
        output stdout
        level INFO
      }
    }
    :8080 {
      log {
        output stdout
        format filter {
          wrap json
          fields {
            request&gt;headers&gt;Authorization delete
            request&gt;headers&gt;Cookie delete
          }
        }
      }
      @v2-subpath {
        path_regexp repo ^/v2/(.+)
      }

      map /service/token {query.scope} {new_scope} {
        ~(repository:)(.*) &#34;${1}dockerhub-cache/${2}&#34;
      }

      rewrite /service/token ?scope={new_scope}&amp;service={query.service}

      header &gt;Www-Authenticate harbor.example.com harbor-mirror.example.com

      rewrite @v2-subpath /v2/dockerhub-cache/{re.repo.1}

      reverse_proxy http://harbor-core.namespace-of-harbor.svc.cluster.local {
        header_up Host &#34;harbor.example.com&#34;
      }
    }
</code></pre><p>The first rewrite is for all requests which go to <code>/v2/</code>. Because I don&rsquo;t want
to append the <code>dockerhub-cache/</code> to the URL for the initial Docker daemon request
for <code>/v2/</code>, I went with the <code>^/v2/(.+)</code> regex for the matcher:</p>
<pre tabindex="0"><code>@v2-subpath {
  path_regexp repo ^/v2/(.+)
}

rewrite @v2-subpath /v2/dockerhub-cache/{re.repo.1}
</code></pre><p>These two lines define a rewrite for all paths <code>/v2/.+</code> to <code>/v2/dockerhub-cache/...</code>,
so that any request going over this mirror automatically accesses the DockerHub
mirror project on my Harbor instance.</p>
<p>The next line just replaces the canonical Harbor domain with the specific mirror
domain in the <code>www-authenticate</code> header, so that the subsequent request for the
token goes through the mirror as well, instead of directly going to Harbor:</p>
<pre tabindex="0"><code>header &gt;Www-Authenticate harbor.example.com harbor-mirror.example.com
</code></pre><p>With this, the <code>realm=&quot;https://harbor.example.com/service/token&quot;</code> part of the
header is rewritten to <code>realm=&quot;https://harbor-mirror.example.com/service/token&quot;</code>.</p>
<p>Now, the request for the token also goes through the Caddy instance, and I can
rewrite the repository in the request&rsquo;s <code>scope</code> parameter:</p>
<pre tabindex="0"><code>map /service/token {query.scope} {new_scope} {
  ~(repository:)(.*) &#34;${1}dockerhub-cache/${2}&#34;
}
rewrite /service/token ?scope={new_scope}&amp;service={query.service}
</code></pre><p>The <code>map</code> instruction matches only on requests to <code>/service/token</code> and maps only
the <code>scope</code> query parameter, to a Caddy-internal variable <code>new_scope</code>, where
I split the <code>scope=repository:library/postgres:pull</code> parameter, graft the
necessary <code>/dockerhub-cache/</code> prefix in front of the <code>/library/postgres</code> repository
definition. With this, the token request is made for the correct repository and
Harbor will accept requests for the image files accompanied by this token.</p>
<p>One note: I had also tried to rewrite the entire query part of the request in
one go, but I hit a weird issue. When operating on the whole query as one,
Caddy would urlencode more parts of the query, in particular the <code>=</code> sign in the
<code>scope</code> and <code>service</code> parameters. And for some reason, Harbor did not like that.
It would only spit out a token when the <code>=</code> signs were left as-is.</p>
<p>And with all of this combined, I could now set the <code>registry-mirror</code> option for
my Docker agents to <code>https://harbor-mirror.example.com</code>, and Docker pulls worked
as intended and used the dockerhub-cache mirror on my Harbor instance without
issue. &#x1f389;</p>
<h2 id="configuring-docker-and-cri-o">Configuring Docker and cri-o</h2>
<p>Onto the last step: Configuring the Docker daemons in my Nomad cluster and the
cri-o daemons in my Kubernetes cluster to use the new Harbor mirrors.</p>
<p>As noted above, Docker only supports mirrors for the DockerHub, nothing else.
So configuring those daemons is pretty simple, just adding this in the
<code>/etc/docker/daemon.json</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;registry-mirrors&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;https://harbor-mirror.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Luckily, <code>registry-mirrors</code> is one of the Docker config options which can be live-reloaded,
so a <code>pkill --signal SIGHUP dockerd</code> is enough, no restarts of the daemon and
running containers necessary.</p>
<p>The cri-o config is a bit more involved, but it does have the benefit of supporting
mirrors for any external registry you like.
Cri-o implements the <a href="https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md">containers-registries</a>
config files. These can also be reloaded by sending a <code>pkill --signal SIGHUP crio</code>
to the daemon, without any restarts.</p>
<p>The mirror configs all have a similar format. As an example, the config for
<code>registry.k8s.io</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml"><span style="display:flex;"><span>[[<span style="color:#a6e22e">registry</span>]]
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">prefix</span> = <span style="color:#e6db74">&#34;registry.k8s.io&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">insecure</span> = <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">blocked</span> = <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">location</span> = <span style="color:#e6db74">&#34;registry.k8s.io&#34;</span>
</span></span><span style="display:flex;"><span>[[<span style="color:#a6e22e">registry</span>.<span style="color:#a6e22e">mirror</span>]]
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">location</span> = <span style="color:#e6db74">&#34;harbor.example.com/k8s-cache&#34;</span>
</span></span></code></pre></div><p>I place that file into <code>/etc/containers/registries.conf.d/k8s-mirror.conf</code>,
issue a SIGHUP, and cri-o will happily start pulling from the Harbor mirror
whenever an image from the official k8s registry is required. And like Docker,
it will pull from the original registry if the mirror is down.</p>
<p>And with that, I&rsquo;ve got my container registry needs migrated fully to Kubernetes
with Harbor. Especially the piece with the request rewrites to get a DockerHub
mirror for Docker daemons going on Harbor was interesting to figure out and very
satisfying to get working.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 10: Grafana</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-10-grafana/</link>
      <pubDate>Sat, 06 Apr 2024 21:10:25 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-10-grafana/</guid>
      <description>Running Grafana with the kube-prometheus-stack chart</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Grafana instance over to k8s.</p>
<p>This is part 11 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I already wrote about my love for metrics in the companion post about the
<a href="https://blog.mei-home.net/posts/k8s-migration-9-prometheus/">Prometheus setup</a>, so I will
spare you my excitement about pretty graphs this time. &#x1f609;</p>
<p>For the Grafana setup, I used the <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack&rsquo;s</a>
integration of the <a href="https://github.com/grafana/helm-charts/tree/main/charts/grafana">Grafana Helm Chart</a>.</p>
<h2 id="database-setup">Database setup</h2>
<p>First step is to setup the database for Grafana. You can also run it locally,
without an external database. Then, Grafana uses an SQLite DB. But the Postgres
database made more sense to me. This was the first deployment of a production
database with <a href="https://cloudnative-pg.io/">CloudNativePG</a> and looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:16.2-10&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;20&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;25MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;75MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;6400kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;768kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;640kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-backup-example-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-backup-example-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span></code></pre></div><p>As before, I determined the <code>spec.postgresql.parameters</code> by plugging my requirements
into <a href="https://pgtune.leopard.in.ua/">PGtune</a>. One important piece is the
<code>storage.size</code> config. I got that value wrong in the beginning, setting it to
only 256 MB. More details can be found in <a href="https://blog.mei-home.net/posts/k8s-migration-8a-pg-problems/">this post</a>.</p>
<p>I also configured backups via my Ceph Rook cluster and had to create an S3 bucket
user like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-example-user</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;Backup user for Grafana DB&#34;</span>
</span></span></code></pre></div><p>I also configured scheduled backups for the database:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span></code></pre></div><p>And finally, the CloudNativePG operator needs access to the Postgres pods
when using NetworkPolicies:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;grafana-pg-cluster-allow-operator-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cnpg.io/cluster</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">cloudnative-pg</span>
</span></span></code></pre></div><p>With the database finally up and running, and all the kinks worked out, I could
deploy Grafana itself.</p>
<h2 id="grafana-setup">Grafana setup</h2>
<p>Before beginning the Grafana setup itself, I had to go over to Keycloak to add
a new client, as I was changing the Grafana URL as part of the migration.
The Grafana doc has a good example for setting up OIDC <a href="https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/keycloak/">here</a>,
so I won&rsquo;t go into details.</p>
<p>To supply the OIDC secret and client name to the Grafana deployment, I stored
them in my HashiCorp Vault instance and grabbed them from there via
<a href="https://external-secrets.io/latest/">external-secrets</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;grafana-oauth2-keycloak&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret</span>: <span style="color:#e6db74">&#34;{{ .secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">client</span>: <span style="color:#e6db74">&#34;{{ .client }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">extract</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/grafana-oauth2-secrets</span>
</span></span></code></pre></div><p>On to the main event. As noted above, I&rsquo;m deploying the Grafana Helm chart as
a subchart of the kube-prometheus-stack chart, which I used previously to provide
Prometheus already:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">defaultDashboardsTimezone</span>: <span style="color:#ae81ff">Europe/Berlin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">defaultDashboardsEditable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidecar</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">alertmanager</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">testFramework</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">grafana.example.com</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">250m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256M</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;admin-secret-name&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">userKey</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">passwordKey</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraSecretMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">oidc-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">grafana-oauth2-keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/secrets/oauth-keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">db-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">grafana-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/secrets/my-db</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">editable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-k8s</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://loki.loki.svc.cluster.local:3100</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">grafana.ini</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">analytics</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">check_for_updates</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">root_url</span>: <span style="color:#ae81ff">https://grafana.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">postgres</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/host}:$__file{/secrets/my-db/port}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/dbname}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">user</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/user}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/password}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_sign_up</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">log</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">level</span>: <span style="color:#ae81ff">info</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">log.console</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">format</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alerting</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth.generic_oauth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_sign_up</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">skip_org_role_sync</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">client_id</span>: <span style="color:#e6db74">&#34;$__file{/secrets/oauth-keycloak/client}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">client_secret</span>: <span style="color:#e6db74">&#34;$__file{/secrets/oauth-keycloak/secret}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">scopes</span>: <span style="color:#ae81ff">openid email profile offline_access roles</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">email_attribute_path</span>: <span style="color:#ae81ff">email</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">login_attribute_path</span>: <span style="color:#ae81ff">username</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_attribute_path</span>: <span style="color:#ae81ff">full_name</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth_url</span>: <span style="color:#ae81ff">https://keycloak.example.com/realms/my-realm/protocol/openid-connect/auth</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">token_url</span>: <span style="color:#ae81ff">https://keycloak.example.com/realms/my-realm/protocol/openid-connect/token</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">api_url</span>: <span style="color:#ae81ff">https://keycloak.example.com/realms/my-realm/protocol/openid-connect/userinfo</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">role_attribute_path</span>: <span style="color:#ae81ff">contains(roles[*], &#39;admin&#39;) &amp;&amp; &#39;Admin&#39; || contains(roles[*], &#39;editor&#39;) &amp;&amp; &#39;Editor&#39; || &#39;Viewer&#39;</span>
</span></span></code></pre></div><p>Let&rsquo;s start with an obvious one: I&rsquo;m yet again disabling alerting and Grafana&rsquo;s
own ServiceMonitor, because I did not want to get bogged down even more in staring
at pretty plots all evening long. &#x1f605;
I&rsquo;ve got the persistence disabled, because I&rsquo;m using a Postgres database. Be
cautious with this - if persistence is disabled and you don&rsquo;t configure an external
database, your Grafana config and dashboards won&rsquo;t survive a Pod restart!</p>
<p>Next, let&rsquo;s look at the <code>admin</code> config. I went with an existing secret here, to
not have to put a password into the Helm chart directly. This password is important,
because it&rsquo;s not just Grafana&rsquo;s initial password, but it&rsquo;s also used by Grafana&rsquo;s
<a href="https://grafana.com/docs/grafana/latest/administration/provisioning/">Provisioning functionality</a>,
for API access. There is also a formatting issue somewhere with the password. If
it contains special characters, you will run into issues with not being able to
log in as the admin, and the dashboard and data source provisioning containers
failing to do their job because they also can&rsquo;t log in. I&rsquo;m not sure which
particular special character Grafana did not like, but logins failed consistently
with my completely randomly generated 100 character password. Switching to a
purely alphanumeric one fixed the issue.</p>
<p>One would think we would have gotten past the &ldquo;Escaping strings is hard!!!&rdquo; phase
of computing by now. &#x1f644;</p>
<p>I will got into the <code>datasources</code> config a bit later when I talk about Grafan&rsquo;s
provisioning capability.</p>
<h3 id="grafana-config">Grafana config</h3>
<p>Now let&rsquo;s have a look at the Grafana config. The first thing to note is the
<code>$__file{&lt;FILEPATH&gt;}</code> syntax. This is a pretty nice Grafana feature. Instead of
having to write things into environment variables, Grafana can read values for
its config from other files. I&rsquo;m using
that for the Postgres database config as well as the OIDC secrets from Keycloak.</p>
<p>When defining a secret to mount, Kubernetes will create one file per property
under the <code>data:</code> key in the secret. My database secret, automatically generated
by CloudNativePG, looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">host</span>: <span style="color:#ae81ff">grafana-pg-cluster-rw</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jdbc-uri</span>: <span style="color:#ae81ff">jdbc:postgresql://grafana-pg-cluster-rw:5432/grafana?password=foo&amp;user=grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#ae81ff">foo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgpass</span>: [<span style="color:#ae81ff">...]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">uri</span>: <span style="color:#ae81ff">postgresql://grafana:foo@grafana-pg-cluster-rw:5432/grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">username</span>: <span style="color:#ae81ff">grafana</span>
</span></span></code></pre></div><p>This means that under <code>/secrets/my-db/</code>, where I mounted the secret, I will have
files like <code>dbname</code>, <code>password</code> or <code>uri</code>, which I can then use with the <code>$__file</code>
syntax to put them into Grafans&rsquo;s config file.</p>
<p>One note on the <code>uri</code> which CloudNativePG provides and Grafana generally supports
instead of setting all the options separately: Sadly, the URI as provided by
CloudNativePG gives the DB type as <code>postgresql</code>, but Grafana expects the name to
be <code>postgres</code> instead, spitting out the following error message:</p>
<pre tabindex="0"><code>Error: ✗ failed to connect to database: unknown database type: postgresql
</code></pre><p>So I had to switch to providing the individual config options, which also worked
nicely.</p>
<p>The last interesting thing to note about the config is the <code>grafana.ini.auth.generic_oauth.allow_sign_up</code>
option. This needs to be set to <code>true</code> for your first login with your Keycloak
user, so that Grafana can create the user. After that, it can be disabled.</p>
<h2 id="grafana-provisioning">Grafana provisioning</h2>
<p>Grafana&rsquo;s provisioning functionality was something I hadn&rsquo;t heard about at all
before this migration. In short, instead of defining data sources and dashboards
manually via the UI, you can provide YAML files in a specific format in a specific
directory or call the Grafana API to create them.</p>
<p>I&rsquo;m currently only making use of this in my own config to add my Loki data source:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">editable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-k8s</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://loki.loki.svc.cluster.local:3100</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>This will add the <code>loki-k8s</code> data source, with the given URL. The <code>access</code> config
configures how the data is fetched. <code>proxy</code> configures it so that Grafana fetches
the data, while the alternative <code>direct</code> will have your browser fetch the data.</p>
<p>The same functionality is also used by the <code>kube-prometheus-stack</code> chart to add
the Prometheus instance as a source automatically.</p>
<p>Similarly, dashboards can also be defined in this way. I initially thought that
I would add all my own dashboards this way as well. But then I decided not to.
The reason is in the nature of dashboards, especially when compared to data
sources: I will be changing dashboards relatively often, and might just make
occasional, spur of the moment changes. When there&rsquo;s a dashboard which is supplied
via provisioning, Grafana will always override the version from the database with
the provisioned version. That means, whenever I do a change, I would need to
export the dashboard and put it under version control. That seemed just a bit
too much hassle.</p>
<p>The difference I see to data sources is that with dashboards, I will only ever
change them in the UI. Editing the text version by hand just isn&rsquo;t an option.
In contrast, I don&rsquo;t have to see what they look like or extensively test data
sources. I define them once, and then they will remain untouched until the next
big Homelab migration.
But perhaps Grafana will come up with a good UX to push UI changes back to
provisioned dashboards. I would use it in a heartbeat.</p>
<p>One place where provisioned dashboards are pretty nice is when other Helm charts
bring their own dashboards out of the box, like kube-prometheus-stack does.</p>
<h2 id="migrating-the-dashboards">Migrating the dashboards</h2>
<p>After the new Grafana instance was finally up and running, I started migrating
over my dashboards. The first one I did was the Ceph dashboard. At the same time,
I enabled metrics gathering for my Rook cluster. Enabling metrics was as simple
as adding the following to the <code>values.yaml</code> file for the Cluster chart (not the
operator chart!):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This will enable the required MGR module and set up a Prometheus operator
ServiceMonitor. I initially had problems with actually getting Prometheus to
recognize the new ServiceMonitor, because I had not properly configured the
Namespaces where it looks for them. I fixed this by adding the following option
to the <code>prometheus.prometheusSpec</code> map in the <code>values.yaml</code> file for the
kube-prometheus-stack chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">serviceMonitorSelectorNilUsesHelmValues</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>By default, Prometheus only checks the Namespace where it is deployed. This
option configures it so it looks in all namespaces, unless you have explicitly
configured a list of Namespaces to check in the chart.</p>
<p>The next issue I observed, still on my old instance, was that the Ceph dashboard
I was using, a fork of <a href="https://grafana.com/grafana/dashboards/2842-ceph-cluster/">this dashboard</a>
was not handling multiple clusters well. This became an issue because I was now
gathering metrics from my baremetal and from my Rook cluster.</p>
<p>I worked around this by making use of Grafana&rsquo;s <a href="https://grafana.com/docs/grafana/latest/dashboards/variables/">Variables</a>.
I chose the <code>Custom</code> type of variable, and added the following two values:</p>
<pre tabindex="0"><code>job=&#34;ceph-metrics&#34;,job!=&#34;ceph-metrics&#34;
</code></pre><p>My old baremetal cluster&rsquo;s scrape job was called <code>ceph-metrics</code>, and the Ceph
metrics themselves sadly don&rsquo;t come with per-cluster labels.</p>
<p>Let&rsquo;s take a simple Stat panel showing the health of the cluster with this
query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span> <span style="color:#66d9ef">without</span> <span style="color:#f92672">(</span>instance<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>ceph_health_status{<span style="color:#960050;background-color:#1e0010">${cluster</span>}<span style="color:#960050;background-color:#1e0010">}</span><span style="color:#f92672">)</span>
</span></span></code></pre></div><p>Now with my little workaround, the <code>${cluster}</code> variable will either contain
<code>job=&quot;ceph-metrics&quot;</code> or <code>job!=&quot;ceph-metrics&quot;</code>, cleanly separating the data for
my clusters.</p>
<p>One further change I had to make was to change the labels to be ignored in all
the aggregation queries specifically for the Rook cluster&rsquo;s data, because besides
the typical Ceph metrics, it also added the Pod name to some of them, for example
the <code>ceph_osd_op_r_out_bytes</code> metric. So for getting the current read rate for
my OSDs, I would then use this query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span> <span style="color:#66d9ef">without</span> <span style="color:#f92672">(</span>ceph_daemon, instance, pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span><span style="color:#66d9ef">irate</span><span style="color:#f92672">(</span>ceph_osd_op_r_out_bytes{<span style="color:#960050;background-color:#1e0010">${cluster</span>}<span style="color:#960050;background-color:#1e0010">}</span>[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span>
</span></span></code></pre></div><p>The addition here was the <code>pod</code> in the <code>without</code> list.</p>
<p>With that bit of preface out of the way, let&rsquo;s look at the actual dashboard
migration. I opted to go for exporting the dashboard to a local file on my desktop
from the old Grafana and then importing it into the new Grafana.</p>
<p>To export a dashboard, you can use the &ldquo;Share&rdquo; button in the upper right of each
dashboard, next to the &ldquo;Save&rdquo; and &ldquo;Config&rdquo; buttons:</p>
<figure>
    <img loading="lazy" src="export-dashboard.png"
         alt="A screenshot of Grafana&#39;s dashboard export UI. It shows the top part of the dashboard UI in the background, with the button labeled &#39;Share&#39; marked in red. In the foreground is the dashboard share modal, with the &#39;Export&#39; tab selected. On this tab, the &#39;Export for sharing externally&#39; button is checked."/> <figcaption>
            <p>Grafana&rsquo;s dashboard export UI.</p>
        </figcaption>
</figure>

<p>When exporting dashboards for use in another Grafana instance, it is important
to check the &ldquo;Export for sharing externally&rdquo; button. With that, library panels
used in the dashboard are also exported as part of the dashboard.</p>
<p>After being stored in a file, the import is similarly simple. After loading the
JSON file previously exported via Grafana&rsquo;s dashboard import, which is shown as
an option when adding a new dashboard, your are presented with this form:
<figure>
    <img loading="lazy" src="import-dashboard.png"
         alt="A screenshot of Grafana&#39;s dashboard import UI. It contains the name of the dashboard, the unique identifier and dropdowns for choosing the folder to put the imported dashboard and selecting the Prometheus data source which should be used. In the lower part, it shows the heading &#39;Existing library panels&#39;, with two panels named &#39;Ceph Health Status&#39; and &#39;OSDs DOWN&#39;."/> <figcaption>
            <p>Grafana&rsquo;s dashboard import UI.</p>
        </figcaption>
</figure>
</p>
<p>The above import form allows you to set the name, the UID and the folder where
an imported dashboard is placed. Because I chose the &ldquo;Export for sharing externally&rdquo;
option, the import also contains two library panels I have in the dashboard.
Finally, you also get to chose the Prometheus data source to be used, as the
exported dashboard contains placeholders instead of actual IDs for the data
source.</p>
<p>This worked pretty well, including the import of the library panels, but I
still hit an error, specifically with those library panels. For some reason, the
data source placeholder was not properly replaced during the import, and I got
the following error message on the two library panels:</p>
<figure>
    <img loading="lazy" src="datasource-error.png"
         alt="A screenshot of Grafana&#39;s inspect panel, showing the &#39;Error&#39; tab. The error reads &#39;Datasource ${DS_PROMETHEUS-FOR-LIBRARY-PANEL} was not found.&#39;"/> <figcaption>
            <p>Error on imported library panels.</p>
        </figcaption>
</figure>

<p>I was not able to figure out why I was seeing this error. All of the non-library
panels in this dashboard, as well as all the other dashboards I imported, worked
fine, while all the library panels showed this same error.</p>
<p>I ended up fixing it by going to the &ldquo;JSON&rdquo; tab and manually replacing the
<code>${DS_PROMETHEUS-FOR-LIBRARY-PANEL}</code> placeholder with the name of my Prometheus
data source.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This post finally concludes the migration of my metrics stack over to my Kubernetes
cluster. Besides that I now also have proper data gathering for the Kubernetes
and Rook clusters. More pretty graphs for me. &#x1f913;</p>
<p>This part of the migration took way longer than previous parts. I make my return
to gaming, and specifically returning to Stellaris, partially responsible for
that. &#x1f601;</p>
<p>The next step should hopefully go a bit faster: I will have a look at <a href="https://goharbor.io/">Harbor</a>
for both, my own image storage as well as using it as a pull-through cache.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Bite Sized: Some K8s Logging Changes</title>
      <link>https://blog.mei-home.net/posts/some-k8s-logging-changes/</link>
      <pubDate>Thu, 04 Apr 2024 00:17:53 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/some-k8s-logging-changes/</guid>
      <description>Because somehow, logging is complicated?</description>
      <content:encoded><![CDATA[<p>While working on the logging setup for my Grafana, Loki and CloudNativePG
deployments, I found that there were some things I disliked about my original k8s
logging setup, which I described <a href="https://blog.mei-home.net/posts/k8s-migration-6-logging/">here</a>.</p>
<p>This is the start of a kind of post where I try to keep the reading time reasonably
short.
Whenever I prefix a post with &ldquo;Bit Sized:&rdquo;, you can expect a short one. I&rsquo;m
trying to wean myself off of the incredibly long, meandering posts I seem to
keep putting out.</p>
<p><em>Hindsight:</em> Well, at least I stayed under 10 minutes.</p>
<h2 id="the-trigger-cloudnativepg-logs">The trigger: CloudNativePG logs</h2>
<p>I&rsquo;m running <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">CloudNativePG</a>
for my database needs in the new k8s cluster. Configuring Grafana for my metrics
stack was the first time I deployed a production database with it. So obviously,
I needed to get the logs properly ingested. To see what we&rsquo;re dealing with,
here are the first couple of logs lines from the Postgres Pod in a CloudNativePG
deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Starting workers&#34;</span>,<span style="color:#f92672">&#34;controller&#34;</span>:<span style="color:#e6db74">&#34;cluster&#34;</span>,<span style="color:#f92672">&#34;controllerGroup&#34;</span>:<span style="color:#e6db74">&#34;postgresql.cnpg.io&#34;</span>,<span style="color:#f92672">&#34;controllerKind&#34;</span>:<span style="color:#e6db74">&#34;Cluster&#34;</span>,<span style="color:#f92672">&#34;worker count&#34;</span>:<span style="color:#ae81ff">1</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;logger&#34;</span>:<span style="color:#e6db74">&#34;postgres&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37.836 UTC [23] LOG:  redirecting log output to logging collector process&#34;</span>,<span style="color:#f92672">&#34;pipe&#34;</span>:<span style="color:#e6db74">&#34;stderr&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;grafana-pg-cluster-1&#34;</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;logger&#34;</span>:<span style="color:#e6db74">&#34;postgres&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;record&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;grafana-pg-cluster-1&#34;</span>,<span style="color:#f92672">&#34;record&#34;</span>:{<span style="color:#f92672">&#34;log_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37.836 UTC&#34;</span>,<span style="color:#f92672">&#34;process_id&#34;</span>:<span style="color:#e6db74">&#34;23&#34;</span>,<span style="color:#f92672">&#34;session_id&#34;</span>:<span style="color:#e6db74">&#34;660694c9.17&#34;</span>,<span style="color:#f92672">&#34;session_line_num&#34;</span>:<span style="color:#e6db74">&#34;3&#34;</span>,<span style="color:#f92672">&#34;session_start_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37 UTC&#34;</span>,<span style="color:#f92672">&#34;transaction_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,<span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;LOG&#34;</span>,<span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;00000&#34;</span>,<span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;listening on IPv4 address \&#34;0.0.0.0\&#34;, port 5432&#34;</span>,<span style="color:#f92672">&#34;backend_type&#34;</span>:<span style="color:#e6db74">&#34;postmaster&#34;</span>,<span style="color:#f92672">&#34;query_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>}}
</span></span></code></pre></div><p>All three are JSON, which is nice already, meaning I won&rsquo;t need any regexes.</p>
<p>The first line is the easiest to handle, produced by the management part of the
CloudNativePG Pod. The second is already a bit more complicated, because it
contains a full Postgres formatted logging message in it. From that line, I would
at least like to get the log level (<code>LOG</code> in this case) as well as the timestamp.</p>
<p>The third line is then the most complicated, because it contains the Postgres
log in the <code>record</code> key, now in JSON format as well.</p>
<p>But the above is not actually how those logs arrive in my FluentD instance.
First of all, because I use <a href="https://cri-o.io/">cri-o</a>, the first log line
would look like this in the actual log file:</p>
<pre tabindex="0"><code>2024-03-29T11:15:37.627916025+01:00 stderr F {&#34;level&#34;:&#34;info&#34;,&#34;ts&#34;:&#34;2024-03-29T10:15:37Z&#34;,&#34;msg&#34;:&#34;Starting workers&#34;,&#34;controller&#34;:&#34;cluster&#34;,&#34;controllerGroup&#34;:&#34;postgresql.cnpg.io&#34;,&#34;controllerKind&#34;:&#34;Cluster&#34;,&#34;worker count&#34;:1}
</code></pre><p>The prefix <code>&lt;time&gt; &lt;stream&gt; &lt;_p&gt; &lt;log&gt;</code> is added to each log
line a container produces by cri-o, and I think this might be a standardized format
mandated for CRI implementations?
To ingest these logs and send them on to FluentD, I&rsquo;m using Fluentbit, with its
<a href="https://docs.fluentbit.io/manual/pipeline/filters/kubernetes">Kubernetes plugin</a>.
My mistake here was to leave the option <code>Merge_Log</code> enabled. If the content of
<code>&lt;log&gt;</code> is a JSON string, the Kubernetes plugin takes that JSON object and adds
its keys to the record directly. This means by the time one of the logs lines
above arrives at my FluentD instance, it would look like this:</p>
<pre tabindex="0"><code>{[...]
 record=&#34;{\&#34;log_time\&#34;=&gt;\&#34;2024-03-29 22:55:27.814\&#34; [...]
}
</code></pre><p>This is not JSON format, instead it&rsquo;s nested keys format from FluentD. The issue
with this is that FluentD does not seem to have any good tools to do something
with an entire subkey. So I can&rsquo;t do something specific with the <code>record</code> key
here - I can only access specific subkeys, e.g. <code>record['record']['log_time']</code>,
at least as long as I don&rsquo;t want to go to Ruby. And I really, really don&rsquo;t want
to.</p>
<h2 id="time">Time!</h2>
<p>This lead me to another issue: While scrolling through some logs in my Grafana
dashboard, I saw that pretty much all of them were prefixed with a <code>time</code> key.
Which looks odd, because <code>time</code> should not appear in a log record&rsquo;s keys.
It turns out that that is the record key into which the CRI parser of Fluentbit
parses the <code>&lt;time&gt;</code> field from the cri-o log line! And because I generally don&rsquo;t
do anything with that key, preferring to use the timestamp from the actual
application log, I just left it in accidentally, polluting the log lines with
unnecessary information.</p>
<p>But, and here comes the real issue: I couldn&rsquo;t just change this behavior. The
cri-o multiline parser is embedded into Fluentbit, and not configurable. So
even though Fluentbit allows to configure whether parsers keep the time field
of the parsed log or not, it did not allow me to do so here.</p>
<p>Next possibility: Just drop the <code>time</code> key in Fluentbit, before even sending
the log on to FluentD. But this also wasn&rsquo;t possible, because of the <code>Merge_Log</code>
config of the Fluentbit Kubernetes plugin. This is due to some apps, e.g. Traefik
in my setup, producing JSON log lines and having a field called <code>time</code> in those
logs.
So consequently, when a Traefik log line runs through the Kubernetes plugin,
it will end up with its <code>time</code> field, not the original one from the cri-o parser,
being put at the apex level of the log record. Removing it would mean deleting
Traefik&rsquo;s own timestamp, which I did not want.</p>
<p>Instead of investigating further, I decided to switch <code>Merge_Log</code> off. This way,
Fluentbit never touches the actual log line produced by the app. It only parses
the CRI parts and slaps on some Kubernetes Pod labels, then sends the entire
enchilada on to FluentD for proper massaging.</p>
<p>This has the downside that now, I don&rsquo;t get &ldquo;parsed&rdquo; JSON logs for free,
but quite honestly, it isn&rsquo;t that much work to add something like this for the
apps which produce JSON formatted logs:</p>
<pre tabindex="0"><code>&lt;filter services.monitoring.grafana&gt;
    @type parser
    key_name log
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
        @type json
        time_key t
        time_type string
        time_format %iso8601
        utc true
    &lt;/parse&gt;
&lt;/filter&gt;
</code></pre><p>This FluentD config only touches the <code>log</code> key of the incoming record, tries
to parse it as JSON, adding all JSON object keys to the record as keys and parses
the time from the <code>t</code> key, dropping that key afterwards.</p>
<p>I find this somehow more comforting, because I can be sure that I will always
find the app&rsquo;s original log output in the <code>log</code> key and can then do whatever
parsing I want in FluentD, leaving Fluentbit mostly as a log forwarder/shipper.</p>
<p>I still had to drop the <code>time</code> key from all incoming logs, but now I could be
sure that I was only removing the <code>&lt;time&gt;</code> part from the CRI log file entry,
keeping the actual app log&rsquo;s <code>time</code> key.</p>
<h2 id="parsing-cloudnativepg-logs">Parsing CloudNativePG logs</h2>
<p>Finally back to the CloudNativePG logs. The first step of the parsing process
is to parse the JSON log from the <code>log</code> key:</p>
<pre tabindex="0"><code>&lt;filter services.*.postgres&gt;
    @type parser
    key_name log
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
        @type json
        time_key nil
    &lt;/parse&gt;
&lt;/filter&gt;
</code></pre><p>I&rsquo;m ignoring the time from that log, for now.</p>
<p>Then I have to broadly split the records into two categories. The logs coming
from the cnpg management plane, and the actual Postgres logs. That&rsquo;s done
as follows:</p>
<pre tabindex="0"><code>&lt;match services.*.postgres&gt;
  @type rewrite_tag_filter
  &lt;rule&gt;
    key record
    pattern /^.+$/
    tag pg-record.${tag}
    label @PGRECORD
  &lt;/rule&gt;
  &lt;rule&gt;
    key record
    pattern /^.+$/
    tag pg-no-record.${tag}
    invert true
    label @PGNORECORD
  &lt;/rule&gt;
&lt;/match&gt;
</code></pre><p>This checks whether the record (after parsing of <code>log</code>) has a <code>record</code> key,
and sends the records to different <a href="https://docs.fluentd.org/quickstart/life-of-a-fluentd-event#labels">labels</a>.</p>
<p>First comes the <code>PGRECORD</code> label, which handles log lines which look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;logger&#34;</span>:<span style="color:#e6db74">&#34;postgres&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;record&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;grafana-pg-cluster-1&#34;</span>,<span style="color:#f92672">&#34;record&#34;</span>:{<span style="color:#f92672">&#34;log_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37.836 UTC&#34;</span>,<span style="color:#f92672">&#34;process_id&#34;</span>:<span style="color:#e6db74">&#34;23&#34;</span>,<span style="color:#f92672">&#34;session_id&#34;</span>:<span style="color:#e6db74">&#34;660694c9.17&#34;</span>,<span style="color:#f92672">&#34;session_line_num&#34;</span>:<span style="color:#e6db74">&#34;3&#34;</span>,<span style="color:#f92672">&#34;session_start_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37 UTC&#34;</span>,<span style="color:#f92672">&#34;transaction_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,<span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;LOG&#34;</span>,<span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;00000&#34;</span>,<span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;listening on IPv4 address \&#34;0.0.0.0\&#34;, port 5432&#34;</span>,<span style="color:#f92672">&#34;backend_type&#34;</span>:<span style="color:#e6db74">&#34;postmaster&#34;</span>,<span style="color:#f92672">&#34;query_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>}}
</span></span></code></pre></div><p>They contain the JSON formatted Postgres log in the <code>record</code> key.</p>
<pre tabindex="0"><code>&lt;label @PGRECORD&gt;
  &lt;filter **&gt;
    @type record_transformer
    remove_keys record
    enable_ruby true
    &lt;record&gt;
      json_record ${record[&#34;record&#34;].to_json}
    &lt;/record&gt;
  &lt;/filter&gt;
  &lt;filter **&gt;
    @type parser
    key_name json_record
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
      @type json
      time_key nil
    &lt;/parse&gt;
  &lt;/filter&gt;
  &lt;filter **&gt;
    @type record_transformer
    remove_keys log_time,message,error_severity
    &lt;record&gt;
      ts ${record[&#34;log_time&#34;]}
      level ${record[&#34;error_severity&#34;]}
      msg ${record[&#34;message&#34;]}
    &lt;/record&gt;
  &lt;/filter&gt;
  &lt;match pg-record.**&gt;
    @type relabel
    @label @PGNORECORD
  &lt;/match&gt;
&lt;/label&gt;
</code></pre><p>Okay, so one after the other. First, I need to transform the content of the
<code>record</code> key back into JSON - because in the initial parsing of the <code>log</code> key,
it would have been transformed into a FluentD nested key structure. Then I&rsquo;m
parsing that again. The effect that this has is that now, all the
keys from <code>log_record.record</code> are keys of the record itself, and the <code>record</code>
field has been removed.</p>
<p>Then I&rsquo;m rewriting the names of a couple of keys. I do this so they&rsquo;re
similar to the log lines produced by the cnpg management app. After the Postgres
log records pass through this pipeline, all the logs coming from the Postgres
Pod, regardless of whether they come from Postgres itself or the management plane
look approximately the same.</p>
<p>The unified logs are then further massaged by this pipeline:</p>
<pre tabindex="0"><code>&lt;label @PGNORECORD&gt;
  &lt;filter pg-no-record.**&gt;
    @type parser
    key_name msg
    reserve_data true
    remove_key_name_field false
    &lt;parse&gt;
      @type multi_format
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;ts&gt;[0-9\-]* [0-9\:\.]* [^\ ]+) [^\ ]* (?&lt;pglvl&gt;[^\ ]*) (?&lt;msg&gt;.*)$/
        time_key nil
      &lt;/pattern&gt;
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;msg&gt;.*)$/
        time_key nil
      &lt;/pattern&gt;
    &lt;/parse&gt;
  &lt;/filter&gt;
  &lt;filter **&gt;
    @type parser
    key_name ts
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
      @type multi_format
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;logtime&gt;[0-9\-]* [0-9\:\.]* [^\ ]+)$/
        time_key logtime
        time_type string
        time_format %F %T.%N %Z
      &lt;/pattern&gt;
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;logtime&gt;[0-9]{4}-[01][0-8]-[0-3][0-9]T[0-2][0-9]:[0-6][0-9]:[0-6][0-9].*)$/
        time_key logtime
        time_type string
        time_format %iso8601
        utc true
      &lt;/pattern&gt;
    &lt;/parse&gt;
  &lt;/filter&gt;
  &lt;match **&gt;
    @type rewrite_tag_filter
    remove_tag_regexp /^pg-(no-)?record\./
    &lt;rule&gt;
      key msg
      pattern /^.+$/
      tag parsed.${tag}
      label @K8S
    &lt;/rule&gt;
  &lt;/match&gt;
&lt;/label&gt;
</code></pre><p>Again, going step by step. The first <code>filter</code> config takes the <code>msg</code> key
and parses it, with two potential regex parsers being tried. The first one
recognizes Postgres log lines in string format, like this one:</p>
<pre tabindex="0"><code>2024-03-29 10:15:37.836 UTC [23] LOG:  redirecting log output to logging collector process
</code></pre><p>It properly parses the timestamp and level, which are the two most important
parts for me. All other content of the <code>msg</code> key are left unparsed and are just
written back to the <code>msg</code> key as-is.</p>
<p>Finally, I&rsquo;m parsing the timestamp <code>ts</code>, which looks different now, depending on
whether the log line came from Postgres or from cnpg. For Postgres, the timestamp
looks like this: <code>2024-03-29 10:15:37.836 UTC</code>, while for cnpg it is a properly
formatted <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO8601</a> date/time.</p>
<p>And finally I rewrite the tag to start with <code>parsed</code>, which indicates in my
config that the record should skip all the parsing pipelines.</p>
<h2 id="endless-loops---fluentbit-edition">Endless loops - Fluentbit edition</h2>
<p>Before ending for today, one thing needs an honorary mention: I build another
endless loop! &#x1f389;</p>
<p>This time, just by enabling Fluentbit&rsquo;s debug log verbosity. Because then, it
produces a log entry for every message forwarded to FluentD - including its own:</p>
<pre tabindex="0"><code>component=output:forward:forward.0 msg=&#34;send options records=0 chunk=&#39;&#39;
</code></pre><p>The one advantage about having my rack right next to my desk: I heard the fans
crank the RPMs right away. &#x1f605;</p>
<p>Now you know why I was talking about Eldritch Horrors and the log pipeline being
the most complicated part of my Homelab. I actually planned to just tidy up the
monitoring stack migration a bit over the long Easter weekend and write the blog
post, so I could get started with the next k8s migration step this
week. Instead this is what I decided to spend my long weekend on. &#x1f926;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 8a: CloudNativePG Disk Size Problems</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-8a-pg-problems/</link>
      <pubDate>Fri, 29 Mar 2024 16:22:04 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-8a-pg-problems/</guid>
      <description>Unhelpful error messages with a side of user error</description>
      <content:encoded><![CDATA[<p>I recently started migrating my <a href="https://grafana.com/">Grafana</a> instance from
Nomad to k8s and hit some very weird errors in the CloudNativePG DB after letting
it run for a short while.</p>
<p>This is an addendum to my <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">previous post</a>
on <a href="https://cloudnative-pg.io/">CloudNativePG</a>.</p>
<h2 id="the-initial-issue">The initial issue</h2>
<p>The first issue came during the first setup of Grafana. A couple of minutes
after Grafana started running and writing data to the DB, the two database pods (primary and replica)
suddenly stopped working and just threw this error:</p>
<pre tabindex="0"><code>msg=&#34;DB not available, will retry&#34;
err=&#34;failed to connect to `host=/controller/run user=postgres database=postgres`: dial error (dial unix /controller/run/.s.PGSQL.5432: connect: no such file or directory)&#34;
</code></pre><p>Initially, I thought I had somehow screwed up my NetworkPolicy setup. But after
re-creating the CloudNativePG Cluster CR, it all worked again. I thought it was
a hiccup and returned to working on Grafana, but a couple of minutes into the
next Grafana deployment, the same issue happened again. And then again, after
another deletion and re-creation of the Cluster CR. The error was always the same.</p>
<p>What saved me in the end was a random look at my metrics dashboard, where I&rsquo;m
showing the following plot:</p>
<figure>
    <img loading="lazy" src="disk_full.png"
         alt="A screenshot of a gauge style Grafana panel. It shows the CSI volume utilization of the grafana-pg-cluster-1 storage volume. At 97.9%."/> <figcaption>
            <p>Yupp, it&rsquo;s full.</p>
        </figcaption>
</figure>

<p>So there I had it. I had simply made the volume for the Postgres DB storage too
small. Way too small, as it turns out. My Cluster manifest looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:16.2-10&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;20&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;25MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;75MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;6400kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;768kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;640kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">256MB</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span></code></pre></div><p>While creating this config, I had looked at the size of the Grafana DB on my
Nomad instance, and it clocked in at around 35MB. So I wasn&rsquo;t really that worried.
But it seems I misunderstood some things. After changing the <code>storage.size</code> option
to <code>1GB</code>, everything was fine and the DB no longer crashed.</p>
<p>But that wasn&rsquo;t the end of my confusion when it came to the storage consumption.</p>
<h2 id="unbounded-growth-over-time">Unbounded growth over time?</h2>
<p>With the initial issue fixed, I set myself a task to check the disk usage of
the DB after a couple of days. This was during a week where I didn&rsquo;t have much
time to spend on the Homelab, so I expected the database size to not change
very much.</p>
<p>The result was this, after a week of not touching the Grafana instance, which
is the only user of the DB:</p>
<figure>
    <img loading="lazy" src="disk-growth.png"
         alt="A screenshot of a time series plot. It starts out a bit over 100MB usage and goes up to 300MB quickly. After that, it grows in steps approximately every six hours by about 100MB. It tops out at over 600 MB."/> <figcaption>
            <p>DB disk volume utilization growth.</p>
        </figcaption>
</figure>

<p>I couldn&rsquo;t understand what was going on here. It wasn&rsquo;t the database itself which
was just 12MB in size throughout the entire time. Then I looked at the disk,
and saw this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>postgres@grafana-pg-cluster-1:/$ ls -lh /var/lib/postgresql/data/pgdata/pg_wal/
</span></span><span style="display:flex;"><span>total 561M
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  <span style="color:#ae81ff">338</span> Mar <span style="color:#ae81ff">17</span> 19:13 000000010000000000000003.00000028.backup
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 12:09 <span style="color:#ae81ff">000000010000000000000036</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 12:14 <span style="color:#ae81ff">000000010000000000000037</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 20:09 <span style="color:#ae81ff">000000010000000000000038</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 20:14 <span style="color:#ae81ff">000000010000000000000039</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 21:09 00000001000000000000003A
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 21:14 00000001000000000000003B
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 22:09 00000001000000000000003C
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 22:14 00000001000000000000003D
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 06:09 00000001000000000000003E
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 06:14 00000001000000000000003F
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 08:09 <span style="color:#ae81ff">000000010000000000000040</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 08:14 <span style="color:#ae81ff">000000010000000000000041</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 09:09 <span style="color:#ae81ff">000000010000000000000042</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 16:09 <span style="color:#ae81ff">000000010000000000000043</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 16:14 <span style="color:#ae81ff">000000010000000000000044</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 18:09 <span style="color:#ae81ff">000000010000000000000045</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 18:14 <span style="color:#ae81ff">000000010000000000000046</span>
</span></span><span style="display:flex;"><span>-rw-rw---- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 20:09 <span style="color:#ae81ff">000000010000000000000047</span>
</span></span><span style="display:flex;"><span>-rw-rw---- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 20:14 <span style="color:#ae81ff">000000010000000000000048</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 21:09 <span style="color:#ae81ff">000000010000000000000049</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 21:14 00000001000000000000004A
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 02:10 00000001000000000000004B
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 02:15 00000001000000000000004C
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 04:10 00000001000000000000004D
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 08:10 00000001000000000000004E
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 08:15 00000001000000000000004F
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 09:10 <span style="color:#ae81ff">000000010000000000000050</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 12:10 <span style="color:#ae81ff">000000010000000000000051</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 12:15 <span style="color:#ae81ff">000000010000000000000052</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 14:10 <span style="color:#ae81ff">000000010000000000000053</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 14:15 <span style="color:#ae81ff">000000010000000000000054</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 19:05 <span style="color:#ae81ff">000000010000000000000055</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 19:10 <span style="color:#ae81ff">000000010000000000000056</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 10:09 <span style="color:#ae81ff">000000010000000000000057</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 10:14 <span style="color:#ae81ff">000000010000000000000058</span>
</span></span><span style="display:flex;"><span>drwxrws--- <span style="color:#ae81ff">2</span> postgres tape 4.0K Mar <span style="color:#ae81ff">22</span> 19:10 archive_status
</span></span></code></pre></div><p>That explained at least where the space was going. Then digging a bit into the
Postgres docs, I found the <a href="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-WAL-KEEP-SIZE">wal_keep_size</a> config option. This option determines how much WAL is kept
around. Looking a bit further, because I certainly hadn&rsquo;t set that option, I finally
came across CloudNativePG&rsquo;s default config. And in there, <code>wal_keep_size</code> is set
to 512MB. Which happens to fit the point where the DB volume stopped growing.
See the CloudNativePG docs <a href="https://cloudnative-pg.io/documentation/current/postgresql_conf/#the-postgresql-section">here</a>.</p>
<p>Still, this seems a bit excessive to me, considering that the database itself
is only 12MB. But at least now, I know to add 512MB to the storage volume size
to account for Write Ahead Log.</p>
<p>I&rsquo;m still surprised how much WAL is produced here, even though I&rsquo;m pretty sure
that there isn&rsquo;t actually that much going on in the database.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 9: Prometheus</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-9-prometheus/</link>
      <pubDate>Fri, 15 Mar 2024 00:30:16 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-9-prometheus/</guid>
      <description>Setting up Prometheus with kube-prometheus-stack</description>
      <content:encoded><![CDATA[<p>Wherein I set up Prometheus for metrics gathering in the k8s cluster.</p>
<p>This is part 10 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Let me tell you something about me: I love metrics. And pretty charts. The more
the better. Back in 2020, setting up Prometheus+Grafana was what brought me
to Homelabbing as a hobby, instead of just a means to an end, running some
services I wanted to use.
I had just gotten an updated ISP connection and found my old FritzBox not working
anymore. Instead of just buying a newer one, I decided to try out OPNsense.
This meant that I now had two hosts in the Homelab. My old home server running
a handful of services, and the new OPNsense box. And I wanted metrics, especially
about CPU and network usage.</p>
<p>Today, my Prometheus database takes about 50 GB on disk and I&rsquo;ve got a retention
period of five years. &#x1f605;
It&rsquo;s not just host metrics anymore. I&rsquo;m also scraping thermometers and smart
plugs to monitor my home a bit.</p>
<p>One of the main things I&rsquo;m using my monitoring stack for is to analyze changes
in the noise level. My desk is right next to my server rack. And sometimes,
fans suddenly ramp up, or hard disks start seeking without me doing anything.
Then I like the ability to go to my Homelab dashboard and immediately identify
which service is likely responsible for that increase in the noise level.</p>
<p>In this post, I will go over the migration of my Prometheus setup to k8s. I
will also migrate Grafana, but to keep the post relatively short, I decided to
split the Prometheus and Grafana posts instead of handling the entire monitoring
migration in one go.</p>
<h2 id="setup">Setup</h2>
<p>For the monitoring deployment, I decided to use <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>, which I was pointed to
by multiple Homelabbers on the Fediverse while working on my first test cluster.
It is a Helm chart which is able to deploy a full monitoring stack for k8s and
contains multiple components.</p>
<p>The first component deployed is an instance of <a href="https://github.com/prometheus-operator/prometheus-operator">prometheus-operator</a>.
This operator&rsquo;s main task is to deploy one (or several) Prometheus instances.
It also supplies a number of CRDs to configure scraping for those Prometheus
instances with Kubernetes manifests, instead of manipulating the Prometheus
config file directly. The two main CRDs are <a href="https://prometheus-operator.dev/docs/operator/design/#servicemonitor">ServiceMonitors</a> and <a href="https://prometheus-operator.dev/docs/user-guides/scrapeconfig/">ScrapeConfigs</a>.
The ServiceMonitor especially is something which seems to be accepted more widely
for configuring service scraping. I&rsquo;ve already seen the ability to create
ServiceMonitors in several Helm Charts I have deployed, e.g. Ceph Rook&rsquo;s. This
way, you don&rsquo;t have to create scrape configs manually.</p>
<p>The next component deployed by the Helm chart is <a href="https://github.com/kubernetes/kube-state-metrics">kube-state-metrics</a>.
This is a component which scrapes the Kubernetes apiserver and supplies a lot
of additional information about the state of the cluster, for example detailed
info about Pods or Deployments.</p>
<p>Finally, the Helm chart can also deploy and properly configure Grafana, and
comes with a number of pre-defined dashboards for the data scraped from the
cluster. I will skip this component for now and take it up in the next post in
this series, when I&rsquo;m migrating my Grafana instance.</p>
<p>Before going on to the <code>values.yaml</code>, I also need to talk about the necessary
config for Kubernetes components. By default, most components in a <em>kubeadm</em>
cluster only listen on local ports. Most component&rsquo;s metrics were not too
interesting, with the Kube Scheduler as an exception. To make it available
for Prometheus scraping, I added the following to my <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-ClusterConfiguration">ClusterConfiguration</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">scheduler</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">bind-address</span>: <span style="color:#ae81ff">0.0.0.0</span>
</span></span></code></pre></div><p>I disabled the scraping for the Kube Controller Manager completely, because the
metrics from it didn&rsquo;t look interesting enough to bother changing the config
after I had already set up the cluster.</p>
<p>Here is the full <code>values.yaml</code> file for the kube-prometheus-stack chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">crds</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">defaultRules</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">create</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">windowsMonitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">alertmanager</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeProxy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeEtcd</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeControllerManager</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeExporter</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">prometheusOperator</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">networkPolicy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">flavor</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">prometheus</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">networkPolicy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">flavor</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cilium</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prometheus.example.com</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selfMonitor</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">prometheusSpec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrapeInterval</span>: <span style="color:#ae81ff">30s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retention</span>: <span style="color:#ae81ff">5y</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">debug</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrapeConfigSelectorNilUsesHelmValues</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">700Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">900Mi</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeClaimTemplate</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">accessModes</span>: [<span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">100Gi</span>
</span></span></code></pre></div><p>The first thing to note here is that I&rsquo;m disabling a lot of functionality.
The first one is all alerting features. I don&rsquo;t have alerting set up on my Nomad
cluster either, but I&rsquo;ve had in on my list for a long time. I will get to it
at some point. &#x1f642;</p>
<p>Next is that I&rsquo;m keeping Grafana disabled as well. For now I will use my Nomad
Grafana instance. The migration of Grafana will come in the next step.</p>
<p>Then I&rsquo;m disabling scraping of the Kubernetes proxy, because I&rsquo;ve got that
component disabled, using Cilium instead. Further, I&rsquo;ve got etcd and the Kube
Controller Manager also disabled, because I did not see anything interesting in
their metrics.
Finally, the node-exporter functionality of the chart is also disabled, because
I&rsquo;m already deploying it via Ansible on all of my nodes. And because I&rsquo;ve got
nodes, like my OPNsense box, which don&rsquo;t run Kubernetes, I decided to keep
the node-exporter config in Ansible, on the node level. This is better, having
a common config for all hosts in the Homelab, instead of having some hosts
configured via Ansible and some via this Helm chart.</p>
<p>I&rsquo;ve currently got an Ingress configured for Prometheus. This is only temporary,
while I&rsquo;m still using the Grafana deployment on Nomad. After that&rsquo;s migrated to
Kubernetes as well, there will no longer be any need for it.</p>
<p>One important thing to point out is the <code>prometheus.prometheusSpec.scrapeConfigSelectorNilUsesHelmValues</code>
option. With this option unset, the Prometheus operator will only look at
ScrapeConfig resources deployed by the Helm chart itself. But I wanted to create
the files separately (see next section).</p>
<p>Finally, I would like to leave a couple of lines here about the data migration.
As I noted above, I like pretty charts. And I like to have access to older data
as well, which is why I&rsquo;ve got a retention period of five years for Prometheus.
I was a bit apprehensive about how well the migration would go. But it turns
out that Prometheus is absolutely fine when just copying over the files from
another Prometheus instance.</p>
<p>I just copied the data files from my old Prometheus volume over to the new one
with a <code>rsync -avP /mnt/old-volume/* /mnt/new-volume/prometheus-db/</code>. The
permissions/ownership of the files doesn&rsquo;t seem to matter much, the new
Prometheus instance was fine with handling the old ownership of the files upon
restart.</p>
<h2 id="scraping">Scraping</h2>
<p>And now onto the scrape configs. In good Kubernetes style, you don&rsquo;t just create
a Prometheus config file. Instead, scrapes are configured via ServiceMonitors
and ScrapeConfigs. I won&rsquo;t go into detail on the ServiceMonitor here, as I don&rsquo;t
directly use any of it yet - it is only used behind the scenes to configure
scraping for the Kubernetes components.</p>
<p>But I did need to introduce some ScrapeConfigs to configure the k8s Prometheus
instance so that it would scrape all the targets the old instance was scraping.</p>
<p>As an example of what this looks like, here is a ScrapeConfig for the node-exporter
running on all of my hosts:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-hosts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-hosts</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">hostmetrics</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;my.host:9100&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">relabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__address__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;([^\:]+)\:[0-9]+&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">go_.*</span>
</span></span></code></pre></div><p>The config under <code>spec:</code> is very similar to what would be put into the
Prometheus config in a baremetal deployment. The same config would look something
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">scrape_configs</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">job_name</span>: <span style="color:#ae81ff">hostmetrics</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">static_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;my.host:9100&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">relabel_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__address__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">([^\:]+)\:[0-9]+</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">target_label</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metric_relabel_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">go_.*</span>
</span></span></code></pre></div><p>On the one hand this similarity is pretty nice. But on the other hand, the
differences, especially the switch from snake_case to camelCase, threw me off
several times.</p>
<p>Here is a more involved example with more configs, for my <a href="https://github.com/prometheus/snmp_exporter">snmp-exporter</a>,
which uses SNMP to gather metrics from my VDSL modem:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-modem</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-modems</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">modemmetrics</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">300.300.300.1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricsPath</span>: <span style="color:#ae81ff">/snmp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scrapeInterval</span>: <span style="color:#ae81ff">1m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">params</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">module</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">routernameHere</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">relabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__address__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">__param_target</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">routerHostnameHere</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">__address__</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">snmp-exporter.example.com</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">snmp_scrape_.*</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">sysName</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;adslAturCurrStatus&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">([A-Z]*).*</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">adslAturCurrStatus</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">${1}</span>
</span></span></code></pre></div><p>This is just to demonstrate that the ScrapeConfig supports most of the options
which are supported in the Prometheus config file. In the operator docs, they&rsquo;re
hedging their bets a bit claiming that &ldquo;most&rdquo; configs are supported, but in my relatively large scrape configs I didn&rsquo;t find
a single case of an option which wasn&rsquo;t supported in ScrapeConfig.</p>
<p>One somewhat interesting case I would like to bring up was the scrape config
for Uptime Kuma. This had the special requirement of basic auth credentials for
the scrape.
My config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-kuma</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-kuma</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">uptime</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kuma.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTPS</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">basicAuth</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kuma-basic-auth</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">username</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kuma-basic-auth</span>
</span></span></code></pre></div><p>Here is where I found a difference between the Prometheus config file and the
ScrapeConfig. In my previous <code>prometheus.yaml</code>, I had the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">job_name</span>: <span style="color:#e6db74">&#39;uptime&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrape_interval</span>: <span style="color:#ae81ff">30s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">https</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">static_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">targets</span>: [<span style="color:#e6db74">&#39;kuma.example.com&#39;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">basic_auth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>: {{ <span style="color:#ae81ff">with secret &#34;mysecret/foo/kuma&#34; }}{{ .Data.secret }}{{end}}</span>
</span></span></code></pre></div><p>This file would be templated in Nomad with the <code>mysecret/foo/kuma</code> secret
from Vault. Note that I&rsquo;m not giving a <code>username</code> here. Kuma simply ignores
the username in the BasicAuth.
But in the Secret I prepared for the k8s Prometheus, <code>kuma-basic-auth</code>, I had
to add a <code>data.username</code> field, as this field is required, and the admission
webhook of the Prometheus operator would throw an error if it is not supplied.
So now I&rsquo;ve got the username <code>foo</code> in the Secret to work around this issue.</p>
<p>Besides the Uptime Kuma config issue, the migration went very nice and smoothly.
Biggest issue was when I was done copying the old Prometheus data and then
accidentally deleted the volume I had just copied that data to, instead of the
old volume. &#x1f926;</p>
<h2 id="the-k8s-dashboard">The k8s dashboard</h2>
<p>Now onto the result of all this data gathering. &#x1f913;</p>
<p>The first thing I would like to ask, if you&rsquo;re also using Prometheus data
scraping for your Kubernetes cluster: How do you identify &ldquo;problematic&rdquo; Pods?
E.g. Pods which are in a CrashLoop, or which are just pending because there&rsquo;s
no space left? I didn&rsquo;t really find anything good, and my googling skills
deserted me. If you&rsquo;ve got a couple good stats, hit me up on the Fediverse,
please!</p>
<p>Okay, first two plots are overall resource utilization in the cluster:</p>
<figure>
    <img loading="lazy" src="utilization.png"
         alt="A screenshot of two Grafana panels. Both show three gauges each, which are labeled &#39;Control Plane&#39;, &#39;Ceph&#39; and &#39;Workers&#39;. The first panel shows CPU resource usage, where the Control Plane has 48%, Ceph 92% and Workers 19%. The other panel show Memory resource usage, with the control plane at 37%, Ceph at 73% and Workers at 24%."/> <figcaption>
            <p>My resource utilization panels.</p>
        </figcaption>
</figure>

<p>This shows the three groups of hosts in my cluster, which are separated by different
taints. The control plane are the three control plane nodes, 4 cores and 4 GB
of RAM each, simulating the three Raspberry Pi 4 4GB which will ultimately house
the control plane. Ceph is currently two machines hosting the OSDs for the storage
and some other Ceph pieces. They&rsquo;re so full because Ceph needs a lot of pods
for basic functionality, and I gave them all an affinity for the Ceph hosts.
The workers are currently two VMs and a lone Raspberry Pi CM4 8GB. As you can
see from the utilization, most of what&rsquo;s running on the k8s cluster at the moment
is still infrastructure.</p>
<p>These two panels were not as easy to create as I had thought. I just couldn&rsquo;t
get the values to line up with the results of <a href="https://github.com/robscott/kube-capacity">kube-capacity</a>.</p>
<p>But before I get too far into the weeds, let&rsquo;s have a look at the PromQL query
for the CPU utilization panel. The memory panel is almost exactly the same:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">(</span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span> <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">!=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)))</span> <span style="color:#f92672">/</span> <span style="color:#66d9ef">sum</span><span style="color:#f92672">((</span>kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span>kube_node_spec_taint{value<span style="color:#f92672">!=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">))</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">(</span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span> <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)))</span> <span style="color:#f92672">/</span> <span style="color:#66d9ef">sum</span><span style="color:#f92672">((</span>kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span>kube_node_spec_taint{value<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">))</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">(</span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span> <span style="color:#f92672">unless</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{}<span style="color:#f92672">)))</span> <span style="color:#f92672">/</span> <span style="color:#66d9ef">sum</span><span style="color:#f92672">((</span>kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">unless</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> kube_node_spec_taint{}<span style="color:#f92672">))</span>
</span></span></code></pre></div><p>(One of those things which just make me quietly happy: Hugo&rsquo;s syntax highlighter
supports PromQL. &#x1f642;)</p>
<p>Each of these has a numerator, which is the sum of the requests of all of the
pods running on the given group of hosts. The denominator then is the total
resources available to the given host group. This had some complexity.</p>
<p>Let&rsquo;s just start with the very first part, which is similar for all of the
host groups:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span>
</span></span></code></pre></div><p>This part contains the base metric, <code>kube_pod_container_resource_requests</code>.
This metric reflects the resources requested by each container in each pod.
The first thing I found was that these were of course not just all currently
running pods, but really all pods. So the first thing to do was a &ldquo;filter&rdquo; on
the containers. This is what the second part does:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span>
</span></span></code></pre></div><p>Let&rsquo;s start with this one at the back: The <code>kube_pod_status_phase</code> is a really
useful metric when you want to find things out about pods. In this case, I
wanted to get all <code>Running</code> pods. But it&rsquo;s not enough to just get all the
Pods with <code>Running</code> phase. That would be too many, including pods which are
already gone, for some reason. But checking whether the actual value of the
metric equals <code>1</code> does the trick.</p>
<p>So what now happens: With the <code>and</code> as an operator, Prometheus outputs a new
vector which contains all the labels of <code>kube_pod_container_resource_requests</code>.
Then it filters out all values from that vector where no entry in the
<code>kube_pod_status_phase{phase=&quot;Running&quot;} == 1</code> vector has the same value for the
<code>pod</code> label. In short, the result of this entire first part are the resource
requests for all containers on the entire cluster which are currently running.</p>
<p>But this wasn&rsquo;t exactly what I wanted. I need to know whether any of my groups
of hosts is getting full. Which leads me to the second part of the numerator.
This provides the filter for a specific group of hosts. The first two filter
for the two groups of hosts which have a taint, the control plane and Ceph nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">!=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>The third one filters for my worker nodes, which don&rsquo;t have a taint:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">unless</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{}<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>Here I&rsquo;m constructing a vector of all nodes with taints and then filter the
Kubernetes pod requests for all elements which are <em>not</em> in that vector.</p>
<p>And with that, I&rsquo;ve finally got three sets of resource requests, one for each
group of hosts.</p>
<p>The denominator then has to be the total resources of the three node groups.
This works the same as the nominator query. Here, the data I want is in the
<code>kube_node_status_capacity{resource=&quot;cpu&quot;}</code> metric, and I again filter by
the taint to get the total resources per group.</p>
<p>Before moving on to the next chart, it&rsquo;s important to note that the <code>kube_pod_container_resource_requests</code>
metric is not 100% accurate. For most of my nodes, summing up all of the
requests results in a slightly too low value, when compared to the output of
kube-capacity. This happens because requests can be put on the container and on
the Pod. One example in my cluster are the Cilium Pods. Going by my Prometheus
metrics, they don&rsquo;t have any CPU requests. But in reality, they do request
50m CPUs, just on the Pod, not the container.
There is a better metric for this, discussed in <a href="https://github.com/kubernetes/kube-state-metrics/issues/1095">this GitHub issue</a>.
This metric, <code>kube_pod_resource_requests</code>, is more precise. But it is not enabled
by default. See also the <a href="https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#kube-scheduler-metrics">Kubernetes docs</a>.
I will likely enable this one later, but did not bother now.</p>
<p>So now onto the next set of metrics, which are the per-container metrics. The
first one is the CPU usage:</p>
<figure>
    <img loading="lazy" src="cpu.png"
         alt="A screenshot of a Grafana time series panel. It shows 24h worth of CPU utilization. It is titled &#39;CPU usage by Container&#39;. There are a number of curves between 0 and 0.2 on the Y axis, with relatively little fluctuation overall."/> <figcaption>
            <p>My CPU utilization, in total CPU usage seconds.</p>
        </figcaption>
</figure>

<p>I&rsquo;ve got two of these, both using the <code>container_cpu_usage_seconds_total</code> metric.
One of them is aggregated per container, and one per Namespace. I did not go
with aggregation by Pod, because the pod names are not stable.
The query for this plot is pretty simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>container_cpu_usage_seconds_total{container<span style="color:#f92672">!~</span>&#34;<span style="color:#e6db74">POD|</span>&#34;}[<span style="color:#960050;background-color:#1e0010">$__rate_interval</span>]<span style="color:#f92672">))</span> <span style="color:#66d9ef">by</span> <span style="color:#f92672">(</span>container<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>The only &ldquo;special&rdquo; thing I needed to do here was to exclude the <code>POD</code> &ldquo;container&rdquo;
and the container with the empty name. The value with the <code>POD</code> container is
actually the &ldquo;Pause&rdquo; container of the Pod, while the empty container is the value
for the cgroup, and I&rsquo;m interested in neither.</p>
<p>This plot shows one of those small, yet interesting things which make me look
at my metrics:</p>
<p><figure>
    <img loading="lazy" src="prom_cpu.png"
         alt="A screenshot of a Grafana time series panel. It shows the CPU usage of the Prometheus container from 15:40 to 17:40. In the interval from the start to 16:30, the CPU usage hovers at around 0.0125. Then there is a short break in the graph, and it continues around 16:32, but with a higher baseline, hovering around 0.0906 instead after a short period of usage up to 0.25."/> <figcaption>
            <p>CPU usage of Prometheus after a redeployment.</p>
        </figcaption>
</figure>

This plot shows the CPU utilization of the Prometheus Pod during a re-schedule,
which happens around 16:30. Prometheus wasn&rsquo;t actually using more CPU because it
was doing more - it just got moved from my x86 server to a Pi. And on that Pi,
Prometheus needs to spend more CPU time to do the same thing.
Simple and logical, of course, but I love seeing simple principles spelled out
and measured like this. &#x1f642;</p>
<p>Let&rsquo;s finish by looking at the values for the CPU usage again. It reflects the
current state of the cluster pretty well - there&rsquo;s not that much running on it
yet. The majority of my workloads are still running on my Nomad/Baremetal Ceph
cluster. The three top users by container are the cilium-agent, kube-apiserver
and etcd. Even the Ceph OSDs come only after that. This will change in the future
of course, e.g. the OSDs will become more loaded once the root disks of my worker
nodes start running off of the Ceph Rook cluster. But for now, it&rsquo;s mostly
just the infrastructure.</p>
<figure>
    <img loading="lazy" src="memory.png"
         alt="Another Grafana screenshot, this time of a panel for the Memory consumption, by namespace. It shows a 24h interval. There are two distinct curves at 5.5 GB and 8.3 GB. The 5.5 GB curve only slightly throughout the day, while the 8.3 GB curve shows a max of 8.5 GB and a minimum of 7.9 GB."/> <figcaption>
            <p>Memory usage by Namespace.</p>
        </figcaption>
</figure>

<p>The memory usage shows a similar pattern, although here, the top curve, around
8.3 GB, is the <code>rook-cluster</code> namespace, housing all the main Ceph components.
The next lower curve, around 5.5 GB is the <code>kube-system</code> namespace, again showing
that for now, infrastructure dominates my cluster&rsquo;s memory consumption.</p>
<p>Memory usage, overall, is not an easy thing to measure in Linux. I&rsquo;ve found
<a href="https://itnext.io/from-rss-to-wss-navigating-the-depths-of-kubernetes-memory-metrics-4d7d77d8fdcb">this article</a> quite useful. cAdvisor, which provides the per-container metrics, has several
choices for memory usage:</p>
<ul>
<li><code>container_memory_rss</code></li>
<li><code>container_memory_usage_bytes</code></li>
<li><code>container_memory_working_set_bytes</code></li>
</ul>
<p>Here are these three metrics, shown as a sum for my Monitoring namespace, running
Prometheus itself and the Prometheus operator as well as kube-state-metrics:</p>
<figure>
    <img loading="lazy" src="three_mem_types.png"
         alt="Another Grafana screenshot, this time of a single panel with three curves, showing different memory metrics. All three curves follow roughly the same behavior. Initially, all three show a slow increase, until they shortly raise up, then go down, just to then raise up even higher, before going down again, all in unison and by approximately the same value. The lowest of the three, titled &#39;container_memory_rss&#39;, starts at around 570 MB. The middle curve, showing &#39;container_memory_working_set_bytes&#39;, starts at about 600 MB, while the highest of the three, &#39;container_memory_usage_bytes&#39;, starts at 680 MB."/> <figcaption>
            <p>The three types of memory metrics as sums over the monitoring Namespace.</p>
        </figcaption>
</figure>

<p>This shows the differences. The <a href="https://en.wikipedia.org/wiki/Resident_set_size">Resident Set Size</a>
being heap and stack in memory is the lowest of the three curves. Next, the
Working Set Size is the amount of memory which was recently touched by the
process. It is generally called the data which the process needs in the next couple
of seconds to do its work. I found <a href="https://www.brendangregg.com/blog/2018-01-17/measure-working-set-size.html">this post</a> and interesting read. The final and highest curve is the Memory Usage.
This is so high because it also contains files mapped into memory, even when those
files haven&rsquo;t been touched in a while. No surprise that Prometheus, which is
mostly a time series DB, all said and done, should have a pretty hefty memory
mapping usage for its DB files.
What annoyed me a bit is that none of these values actually corresponds to the
<code>RES</code> values I&rsquo;m seeing in htop. But the working set came closest, and its
definition made the most sense to me, so I&rsquo;ve been going with that one for now.</p>
<p>Next up is networking:
<figure>
    <img loading="lazy" src="networking_tx.png"
         alt="Another Grafana panel, this time for the network transmission, again by Namespace. All of the curves are below the 200 Mbps threshold for the majority of the time. But there are two significant spikes. The first one at 02:30 going up to 1.85 Gbps, and the second one at 03:30, going up to 2.48 Gbps."/> <figcaption>
            <p>Network transmissions by namespace over 24h. The spikes are dominated by the rook-cluster namespace, in orange.</p>
        </figcaption>
</figure>
</p>
<p>First important thing to note about this chart: It is an aggregation over all Pods
in a namespace. And those pods might be running on different hosts. That&rsquo;s how
I&rsquo;m getting max throughput of 2.48 Gbps, even though I&rsquo;ve only got a 1 Gbps
LAN here. Another thing might be loopback interface traffic, which of course can
also be faster. The two spikes at 02:30 and 03:30 are my backups. The first, lower
spike up to 1.85 Gbps are my volume updates. The Ceph Rook cluster already
hosts the S3 backup buckets, while the baremetal cluster still hosts the CSI
volumes. So I expect these spikes to increase in the future, when the Ceph Rook
cluster needs to provide both, the data being backed up and the backup target.
Then the higher spike at 03:30 is my backup of many of those backup buckets to
an external HDD. I&rsquo;m currently not 100% sure why that one produces more network
traffic.
What I&rsquo;m also wondering about right now is what that blue curve, which follows
the two spikes but doesn&rsquo;t go quite as high, is all about. That&rsquo;s the rook-ceph
namespace, which only contains the Rook operator and the CSI plugin pods. None
of those should be in the data path during a transmission. Not sure what&rsquo;s going
on here.</p>
<p>Then let me finish with my favorite plot:
<figure>
    <img loading="lazy" src="csi_volumes.png"
         alt="A Grafana plot showing four gauges. They are titled &#39;prometheus-monitoring-kube-prometheus-&#39; and then the label is cut off. The rest are titled &#39;scratch-volume&#39;, twice, and finally &#39;scratch-redis-0&#39;. The three scratch volumes show very &lt; 1% usage, while the prometheus gauge shows 56%."/> <figcaption>
            <p>CSI storage volume utilization.</p>
        </figcaption>
</figure>

This plot shows the space utilization of all currently mounted CSI volumes. Which
is absolutely great, because for Nomad, I had to create my own scripting, which
basically ran <code>df -h</code> on the nodes and filtered for Nomad&rsquo;s CSI volume mount
dir to find the right values. Those were then written to a file in Prometheus
format, to be picked up by the local node-exporter&rsquo;s <code>textfile</code> collector. But
here, I&rsquo;m getting those directly from Kubernetes, which is pretty nice.
But there&rsquo;s one disadvantage with this plot: If a volume has been remounted
during the metric interval, it will show up twice, as one of the labels on the
<code>kubelet_volume_stats</code> is the node where the volume was mounted.</p>
<p>And that&rsquo;s it! &#x1f389;
I&rsquo;ve finally got metrics for my k8s cluster. Next step will be migrating my
Grafana instance to k8s as well.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 8: Setting up CloudNativePG for Postgres DB Support</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/</link>
      <pubDate>Thu, 29 Feb 2024 00:13:05 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/</guid>
      <description>Setting up CloudNativePG with a full Keycloak example</description>
      <content:encoded><![CDATA[<p>Wherein I set up cloud-native-pg to supply Postgres clusters in my k8s cluster.</p>
<p>This is part nine of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p><a href="https://www.postgresql.org/">PostgreSQL</a> is currently the only DBMS in my
Homelab. My initial plan was to just copy and paste the deployment over from
my Nomad setup. But then I was pointed towards <a href="https://cloudnative-pg.io/">CloudNativePG</a>,
which is an operator for managing Postgres deployments in Kubernetes.</p>
<p>But before I go into details on CloudNativePG, a short overview of my current setup in
Nomad. I&rsquo;ve got only a single Postgres instance, hosting several databases for
a variety of apps. By far the largest DB at the moment is for my Mastodon
instance, with something over 1 GB in size. It runs on a CSI volume provided
by my Ceph cluster, located on a couple of SSDs. All apps use this one Postgres
instance, and there&rsquo;s no High Availability or failover.</p>
<p>For backups, I&rsquo;m doing a full <code>pg_dumpall</code> of all the databases, which I pipe
into Restic and back up to an S3 bucket. For new apps, I&rsquo;m following a simple
playbook of manually creating the database and the DB user from the command line
with <code>psql</code>.</p>
<p>This approach is okayish, but CloudNativePG has one big advantage: It allows
declarative creation of Postgres instances. So I won&rsquo;t have to follow a playbook
anymore.</p>
<h1 id="overview-of-cloudnativepg">Overview of CloudNativePG</h1>
<figure>
    <img loading="lazy" src="cnpg-archi.png"
         alt="An architecture diagram for CloudNativePG. The central part is the cnpg-operator. Three pieces, each titled &#39;kind: Cluster&#39; and with &#39;dbname: foo/bar/baz&#39; point into the operator. The operator itself then points to three pairs of databases. Each of them called &#39;foo/bar/baz&#39; and having a partner called &#39;foo-replica/bar-replica/baz-replica&#39;. Into each of those pairs point three apps, called &#39;foo-app/bar-app/baz-app&#39;."/> <figcaption>
            <p>Architecture of CloudNativePG.</p>
        </figcaption>
</figure>

<p>The CloudNativePG architecture is centered around the operator. This operator
is responsible for creating the database clusters themselves. And these are
full Postgres clusters. CloudNativePG only has minimal direct support for multiple
databases in a single cluster.</p>
<p>When the operator sees a new <a href="https://cloudnative-pg.io/documentation/current/cloudnative-pg.v1/#postgresql-cnpg-io-v1-Cluster">Cluster</a>
resource, it creates the given number of Postgres Pods. This can range from a
single pod, without any High Availability, to a cluster with a single primary
and a number of replicas. The replication is entirely build upon Postgres&rsquo; own
replication features. CloudNativePG only provides the correct configuration,
but doesn&rsquo;t put anything on top of it.</p>
<p>Each new cluster is created with a single database and two users, the
<code>postgres</code> superuser and an application user which only has permissions for the
application database.</p>
<p>When a new cluster is created, the operator also provides a Kubernetes Secret,
with the username and password of the application DB user, the name of the
Service for that particular cluster as well as a full JDBC string. Applications
wanting to use the cluster only need to consume the appropriate keys from the
Secret.</p>
<p>In addition to providing the database cluster itself, CloudNativePG also provides
a pretty nice backup system as well as easy use of backups created from that
data to create fresh clusters when recovery from an incident is necessary.</p>
<p>Those backups can be based on writing to an S3 bucket, or creating volume snapshots
from within Kubernetes. In this post, I will concentrate on the S3 method, as
that&rsquo;s what I decided on. No specific reason, besides the fact that I&rsquo;ve got
my other backups in S3 as well, and I already have infrastructure to back those
buckets up on separate media.</p>
<p>In the S3 backups, the Write Ahead Logs are constantly streamed to S3. At the
same time, regular backups of the full <code>PGDATA</code> directory are created and
also pushed to the bucket.</p>
<p>But all of this also comes with some downsides, at least from my PoV. The first and
foremost one is that CloudNativePG only fully supports one database per Postgres
cluster. This is contrary to my current setup with one cluster and many databases.
On a certain level, the setup with one database cluster per database/app doesn&rsquo;t
make much sense to me. Postgres is made to support a number of different databases
per cluster, not just one. But this &ldquo;one DB per cluster&rdquo; has grown out of the
Microservice architecture paradigm. And it does have its advantages. E.g. when
different apps support different Postgres versions.</p>
<p>But it also comes with some overhead. With a single cluster, I can just throw
some HW at it and check occasionally whether I need to add some more. Be this
CPU, memory or storage. But with multiple clusters, I need to make that decision
for each app. And I need to do it up front, where I don&rsquo;t have any data to base
those decisions on. And there&rsquo;s a pretty wide gap just in the apps I&rsquo;m already
running. Both Nextcloud and Mastodon put quite some demands on the databases, with
the Mastodon DB at over 1 GB in size even for my single-user instance. At the
same time, those two apps also put consistent load on the DBs, even when I don&rsquo;t
directly use them. On the other side are things like Keycloak, where the DB comes
in at 12 MB, where access only happens when I actually log in somewhere.
Making all of these decisions, and making them up front, isn&rsquo;t that nice compared
to just having a single DB instance where I just throw some HW at it occasionally.
Now, I have to do that for eight different instances.</p>
<p>Next is the backups. Streaming the WAL to S3 will mean at least some more strain
on my S3 infrastructure. And those WALs are not optional. When using an S3 bucket
for backups, the WALs are mandatory. But it doesn&rsquo;t feel like they bring me
any benefit. I mean sure, when I actually have an issue and need to restore,
it&rsquo;s going to be nice to be able to do Point in Time restores, instead of having
to put in the potentially up to 24h old last backup.
There&rsquo;s also an issue with retention. It&rsquo;s one-dimensional. I can provide one
time window, say 30 days, for which I can restore, but there&rsquo;s no concept of
saying &ldquo;I want a backup for the last 14 days, each one of the last 6 months, and
one year ago&rdquo;.</p>
<p>But with all of this whining, I&rsquo;m still a sucker for declarative definitions of
my database, so let&rsquo;s quit the complaining and get to some YAML files. &#x1f913;</p>
<h1 id="detour-looking-at-k8s-priority-classes">Detour: Looking at K8s priority classes</h1>
<p>Ah, but before we get to the YAML, I would like to take a rather short detour
to Kubernetes&rsquo; <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass">PriorityClass</a>.
These priority classes are basically complex wrappers around a number between
<code>-2147483648</code> and <code>1000000000</code>. They are priorities used for scheduling by
Kubernetes. If there&rsquo;s no space left on the cluster, and there&rsquo;s a Pod with a
higher priority than an already running one, the running Pod will be removed
and the higher priority one will be deployed.</p>
<p>Kubernetes comes with two of those classes by default, <code>system-cluster-critical</code> and
<code>system-node-critical</code>. As an example, most of my Ceph Rook pods have one or
the other of those. My MON Pods, for example, have the node critical class, as
it is not just important that they run somewhere, but it is important that they
run on certain nodes. The same is true for example for my Fluentbit log shippers,
they also should be running on all nodes before absolutely anything else.
For cluster level critical, my example would again be Ceph Rook Pods, namely
the CSI providers. They don&rsquo;t have to run on specific nodes, but they definitely
have to be running somewhere.</p>
<p>But in my Homelab, there are some additional things which should have priority.
The first thing is just critical apps. This would be things like my databases,
because so many other services will depend on them, which is why I bring priorities
up here.
The second special class of services is going to be externally visible services.
So for example, it is way more important to me that my Mastodon instance stays
up than that my Gitea instance stays up.</p>
<p>As an example, the <code>hl-critical</code> PriorityClass would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">scheduling.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PriorityClass</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hl-critical</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">value</span>: <span style="color:#ae81ff">500000000</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">globalDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Priority class for critical Homelab pods&#34;</span>
</span></span></code></pre></div><p>And now finally to the main event. &#x1f642;</p>
<h1 id="operator-setup">Operator setup</h1>
<p>The first part to set up for CloudNativePG is the operator. I&rsquo;m using
<a href="https://github.com/cloudnative-pg/charts/tree/main/charts/cloudnative-pg">the Helm chart</a>
for this. There aren&rsquo;t really that many config options for the Operator
itself, so here it is:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">INHERITED_LABELS</span>: <span style="color:#ae81ff">homelab/*</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">cloud-native-pg</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;hl-critical&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100Mi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podMonitorEnabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">grafanaDashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">create</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">monitoringQueriesConfigMap</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">queries</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>Two interesting things here. The first one is the <code>config.data.INHERITED_LABELS</code>
setting. This defines labels which should be taken from the Cluster manifest
and applied to all resources, like Pods, Secrets and so on, created for that
same cluster. Neat thing to have even auto-generated resources properly labeled.</p>
<p>The second noteworthy config is <code>monitoringQueriesConfigMap.queries</code>. In the
default values of the chart, there are a lot of queries pre-defined. But as
I don&rsquo;t have any monitoring yet, I disabled them for now.</p>
<p>And that&rsquo;s it already. Deploying this Helm chart will create a single Pod with
the operator, ready to receive Cluster resources for the actual Postgres deployments.</p>
<h1 id="setting-up-a-cloudnativepg-cluster-for-keycloak">Setting up a CloudNativePG cluster for Keycloak</h1>
<p>Before I get to the cluster setups, I would like to rant for a paragraph. I
don&rsquo;t actually have any app on the cluster which needs Postgres yet. But I still
wanted to test CloudNativePG before moving on to that first app. I wanted something
really simple, perhaps even something which produces some test data on a button
press, with a small web frontend to read the data again, to verify that DB
restores worked properly. So I googled, with something like &ldquo;kubernetes postgres
simple app&rdquo; or &ldquo;simple postgres test app&rdquo;. And I got zero results. None. All I
got were boatloads of articles on how to setup Postgres on Kubernetes, or how
to write a simple app using Postgres with language/framework X. And I&rsquo;m pretty
sure that something like what I want exists. Probably in dozens of varieties,
even.
But Google would not surface those apps. I tried a lot of permutations of the
above queries. Nothing.</p>
<p>But I got lucky, and <a href="https://transitory.social/@rachel">Rachel</a> pointed me
towards <a href="https://www.keycloak.org/">Keycloak</a>. It has the advantage that it
only needs Postgres as a dependency, so it was relatively easy to set up.
Plus, I&rsquo;m already running a Keycloak instance on Nomad, so I&rsquo;m familiar with
it already. And creation of new users is sufficiently close to &ldquo;create database
records on the press of a button&rdquo; for my needs. Thanks Rachel. &#x1f642;</p>
<h2 id="basic-cluster-and-keycloak-setup">Basic cluster and Keycloak setup</h2>
<p>Yes, this is where we finally get to see a Postgres Pod enter the story. &#x1f605;</p>
<p>So as I&rsquo;ve mentioned multiple times before, the Operator is fed with Cluster
type resources and then spawns the appropriate number of Postgres Pods, configures
replication and generates a secret for use by the app.</p>
<p>My Keycloak test cluster looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">keycloak-test</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;32MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;96MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;8MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;983kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;81kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;1GB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2Gi</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span></code></pre></div><p>Because I set the <code>INHERITED_LABELS</code> config in the operator to <code>homelab/*</code>,
all resources created for this cluster will get the label <code>homelab/part-of: keycloak-test</code>.
The <code>metadata.name</code> is significant here, as it will become part of the Pod names
as well as the path for backups, once we get to them. The <code>instances</code> config
determines the replication. There needs to be at least one instance, the primary.
All additional instances are replicas. CloudNativePG also supports more involved
configs, but I&rsquo;m keeping it simple here, with a single primary and a single
replica.</p>
<p>The <code>bootstrap</code> section defines how the database is initially created. The
<code>initdb</code> method I&rsquo;m using here creates an empty database. You can also create
the cluster from another cluster, either from a running cluster, where the new
cluster will use the streaming replication protocol, or from a backup, when the
previous cluster doesn&rsquo;t exist anymore. I intend to give the streaming replication
approach a try when I start migrating services using Postgres from my Nomad cluster.
Perhaps I can skip manual <code>pg_dump</code> backups and restores this way.
I will show restoration of a cluster from another cluster&rsquo;s backups in a later
section.</p>
<p>The <code>postgresql.parameters</code> were initialized via <a href="https://pgtune.leopard.in.ua/">pgtune</a>.
I never tuned my databases before, and I&rsquo;m curious what impact this will have
on DBs with higher load, like my Mastodon DB.</p>
<p>Last but not least, I&rsquo;m telling CloudNativePG to use my SSD-backed <code>rbd-fast</code>
StorageClass and provide a 2GB volume.</p>
<p>Once I deploy this manifest, the Operator gets to work. It will first create
a Postgres Pod for the primary. These Pods use special CloudNativePG images,
not the default Postgres ones. Once that&rsquo;s set up, it will create a second pod,
as a replica.</p>
<p>In the Postgres instances, multiple users will be created. First, the <code>postgres</code>
superuser. This user will not be made available anywhere and is only for internal
use. But another user, in this case called <code>keycloak</code> as configured in
<code>spec.bootstrap.initdb.owner</code> will be created. This user is intended for use
by the app. Its credentials will be put into a Secret called <code>$CLUSTERNAME-app</code>,
in this particular example <code>keycloak-pg-cluster-app</code>. This secret has the
following content:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">host</span>: <span style="color:#ae81ff">keycloak-pg-cluster-rw</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jdbc-uri</span>: <span style="color:#ae81ff">jdbc:postgresql://keycloak-pg-cluster-rw:5432/keycloak?password=6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ&amp;user=keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#ae81ff">6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgpass</span>: <span style="color:#ae81ff">keycloak-pg-cluster-rw:5432:keycloak:keycloak:6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ\n</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">uri</span>: <span style="color:#ae81ff">postgresql://keycloak:6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ@keycloak-pg-cluster-rw:5432/keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">username</span>: <span style="color:#ae81ff">keycloak</span>
</span></span></code></pre></div><p>This secret contains all the information a client needs to connect to the DB
cluster. The given <code>host: keycloak-pg-cluster-rw</code> is a service CloudNativePG
creates, and which points to the primary of the cluster. In addition to this
service, CloudNativePG also creates <code>keyloak-pg-cluster-r</code> and <code>keycloak-pg-cluster-ro</code>
services, which point to both the primary and replicas or the replica only,
respectively. This can be used when there are some read-only apps using the
database.</p>
<p>Let me show you a quick example of how to connect to the Cluster, with <a href="https://www.keycloak.org/">Keycloak</a>
as an example.</p>
<p><strong>DO NOT USE THIS IN PROD!</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">keycloak-test</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/keycloak/keycloak:23.0.7</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">args</span>: [<span style="color:#e6db74">&#34;start-dev&#34;</span>]
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_URL_HOST</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_URL_PORT</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_URL_DATABASE</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_USERNAME</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_PASSWORD</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span></code></pre></div><p>This shows how to use <code>valueFrom.secretKeyRef</code> to get the database connection
details from the Secret which was created by CloudNativePG.</p>
<p>There&rsquo;s also one important configuration needed when you&rsquo;re using NetworkPolicy
to secure the Namespace where the cluster is created. This NetworkPolicy needs
to allow the CloudNativePG operator to access the cluster pods. In a CiliumNetworkPolicy,
it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;keycloak-pg-cluster-allow-operator-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cnpg.io/cluster</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">cloudnative-pg</span>
</span></span></code></pre></div><p>Here, I&rsquo;m using the fact that CloudNativePG adds a label with the Cluster name
to each pod to allow access only to the DB pods.
An example for a Kubernetes NetworkPolicy can be found <a href="https://github.com/cloudnative-pg/cloudnative-pg/blob/main/docs/src/samples/networkpolicy-example.yaml">here</a>.</p>
<p>Before continuing to the backup configuration, here is a warning which worried
me, after creating my first cluster:</p>
<pre tabindex="0"><code>&#34;Warning: mismatch architecture between controller and instances. This is an unsupported configuration.&#34;
</code></pre><p>That warning got my intention - I&rsquo;m running most of my workloads on Raspberry
Pi 4, but I also have some x86 machines, just in case I end up with a workload
that doesn&rsquo;t support AArch64.
The really frustrating thing at this point was that, yet again, Google utterly
deserted me. It looked like there were zero hits for the warning message.
Luckily, after some searching of the CloudNativePG repo on GitHub, I found
<a href="https://github.com/cloudnative-pg/cloudnative-pg/issues/3868">this issue</a>. That
then brought me to the realization that multi-arch clusters are currently only
a problem when in-place updates for the manager running in the Postgres containers
are needed. But I did not enable those anyway.</p>
<h2 id="adding-backups">Adding backups</h2>
<p>Next up: Preventing disaster. For backups, I went with the S3 bucket based
backups, instead of the volume snapshots method. This backup method has two
pieces. The first one is continuous backups of the WAL, and the second is a
regular full backup of the <code>PGDATA</code> directory. More info on the backup methods
can be found in the <a href="https://cloudnative-pg.io/documentation/current/backup/">CloudNativePG docs</a>.</p>
<p>But I hit a snag with my overall setup here. As described in a
<a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/#migrating-backup-buckets">previous article</a>,
I&rsquo;ve got a second stage in my backup where I download all of the backup buckets
onto an external HDD. So I need to provide access to the S3 user used for those
external HDD backups to all backup buckets. This includes the Postgres backup
bucket. But sadly, Ceph Rook&rsquo;s <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">Object Bucket Claim</a>
does not support setting a policy on the new bucket. So instead of using OBCs,
I created a single bucket in Ansible. Then I will use Rook&rsquo;s <a href="https://rook.io/docs/rook/latest-release/CRDs/Object-Storage/ceph-object-store-user-crd/">CephObjectStoreUser</a>
to create the S3 user, separately for each Postgres cluster/Namespace. This will
generate a Secret with the necessary credentials to access the bucket, which I
can then use to configure the CloudNativePG backup.</p>
<p>Here again, I&rsquo;m pretty happy with what I was able to do in the Ansible playbook.
Here is the play which creates my backup buckets, together with the task for
the Postgres backup bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Play for creating the backup buckets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;S3 access key id here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;S3 secret access key here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cnpg_backup_users</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create cnpg backup bucket</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">cnpg</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">amazon.aws.s3_bucket</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-cnpg</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">access_key</span>: <span style="color:#e6db74">&#34;{{ s3_access }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret_key</span>: <span style="color:#e6db74">&#34;{{ s3_secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ceph</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpoint_url</span>: <span style="color:#ae81ff">https://s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">policy</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;ansible.builtin.template&#39;,&#39;bucket-policies/backup-cnpg.json.template&#39;) }}&#34;</span>
</span></span></code></pre></div><p>Important to note here is the <code>cnpg_backup_users</code> list, which contains all the
users for the CloudNativePG clusters to be backed up. Right now, only the Keycloak
test setup. Here is the bucket policy referenced in the <code>policy</code> key:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>{<span style="color:#960050;background-color:#1e0010">%</span> <span style="color:#960050;background-color:#1e0010">for</span> <span style="color:#960050;background-color:#1e0010">user</span> <span style="color:#960050;background-color:#1e0010">in</span> <span style="color:#960050;background-color:#1e0010">cnpg_backup_users</span> <span style="color:#960050;background-color:#1e0010">%</span>}
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:DeleteObject&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:PutObject&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg/{{ user }}-pg-cluster/*&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg/{{ user }}-pg-cluster&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;arn:aws:iam:::user/cnpg-backup-{{ user }}&#34;</span>
</span></span><span style="display:flex;"><span>            ]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>{<span style="color:#960050;background-color:#1e0010">%</span> <span style="color:#960050;background-color:#1e0010">endfor</span> <span style="color:#960050;background-color:#1e0010">%</span>}
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/extern-backups-s3&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This policy grants access to each Db backup user only for certain subdirectories,
namely <code>$USERNAME-pg-cluster/</code>, as by default, CloudNativePG puts the backups
of different clusters into subdirectories with those Cluster&rsquo;s <code>metadata.name</code>.</p>
<p>The CloudNativePG backup config itself then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;90d&#34;</span>
</span></span></code></pre></div><p>This is put under <code>spec</code> in the <code>Cluster</code> manifest. This config tells
CloudNativePG how to access the S3 backup bucket, and that all data should be
retained for 90 days. This is going to be true for both, WALs and full <code>PGDATA</code>
backups.</p>
<p>After this update is made to the Cluster manifest, you might see this error
message in the logs:</p>
<pre tabindex="0"><code>&#34;error&#34;:&#34;while getting secret rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak: secrets \&#34;rook-ceph-object-use
r-rgw-bulk-cnpg-backup-keycloak\&#34; is forbidden: User \&#34;system:serviceaccount:testsetup:keycloak-pg-cluster\&#34; cannot get resource \&#34;secrets\&#34; in API group \&#34;\&#34; in the namespace \&#34;testsetup\&#34;&#34;
</code></pre><p>Don&rsquo;t be alarmed by it, it just seemed to be a transient error which went away
on its own.</p>
<p>After a couple of moments, CloudNativePG should start streaming the WALs to
the S3 bucket already. For this example, the path in the bucket looks like this:</p>
<pre tabindex="0"><code>s3://backup-cnpg/keycloak-pg-cluster/wals/
</code></pre><p>I like the fact that CloudNativePG doesn&rsquo;t just assume that the cluster has the
entire bucket to itself, but instead puts the data into a directory under the
root, allowing me to put the backups of all the clusters into the same bucket.</p>
<p>But the WALs are only part of the backup. The second part is the full <code>PGDATA</code>
backup, and that&rsquo;s done via a <a href="https://cloudnative-pg.io/documentation/current/cloudnative-pg.v1/#postgresql-cnpg-io-v1-ScheduledBackup">ScheduledBackup</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 0 0 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span></code></pre></div><p>This runs a backup every day at exactly midnight. The <code>immediate: true</code> config
tells CloudNativePG to make a backup immediately. This is another one of those
nice little features, avoiding the typical futzing with the schedule to make
it start a backup in five minutes for testing.</p>
<h2 id="recovery">Recovery</h2>
<p>And finally, let&rsquo;s have a look at how to use the previously created backups
to create a new cluster, as an example of a post-incidence recovery operation.</p>
<p>It&rsquo;s important here that while I deleted the old Cluster completely, I left the
S3 bucket user and its associated secret. That&rsquo;s needed so CloudNativePG can use
those credentials to get at the data from the old cluster.</p>
<p>Recovery itself is pretty straightforward. Instead of adding a <code>bootstrap.initdb</code>
key, the <code>bootstrap.recovery</code> key is used, like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">recovery</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">source</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span></code></pre></div><p>The <code>database</code> and <code>owner</code> keys need to be set. Without them, CloudNativePG will
read in the old database from the backups, but it will also create the default
<code>app</code> database, and create new Secrets for that DB, instead of the restored <code>keycloak</code>
DB.</p>
<p>The <code>source: keycloak-pg-cluster</code> references an entry in the <code>externalClusters</code>
section, which looks almost like the <code>backup:</code> section of the original cluster&rsquo;s
config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">externalClusters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span></code></pre></div><p>Here again, CloudNativePG will assume that the backups are stored under the
<code>name</code> of the cluster in the bucket. The restore with this config worked without
issue and after it was done, I saw my previously created test realm and users
again in Keycloak.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Phew. Okay. This setup took way longer than I had initially thought, mostly
because I almost jumped ship after the initial research and finding out that
it doesn&rsquo;t really support multi-DB clusters. And also because I ended up deciding
that I would like to get hands on with the recovery procedure before I actually
need it.</p>
<p>One of the potential downsides is that, just by virtue of running multiple
clusters, it will very likely need more resources than my current single-cluster
setup.
I&rsquo;m still not too happy with the backup, I would have preferred something closer
to restic, at least with deduplication. I will probably at least enable
compression by default at some point, to save on storage space on the S3 bucket.</p>
<p>Then again, let&rsquo;s be honest here: Complexity is sort of the goal. &#x1f913;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 7: Ansible Plays for Host Updates</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-7-node-updates/</link>
      <pubDate>Sat, 17 Feb 2024 22:45:46 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-7-node-updates/</guid>
      <description>It really needs to be automated</description>
      <content:encoded><![CDATA[<p>Wherein I add the Kubernetes nodes to my host update Ansible playbook.</p>
<p>This is part eight of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>With the number of hosts I&rsquo;ve now got in my Homelab, I definitely need a better
way to update them than manually SSH&rsquo;ing into each. So a while ago, I created
an Ansible playbook to update all hosts in my Homelab. These updates are also
one of the reasons I keep so many physical hosts, even if they&rsquo;re individually
relatively small: I want an environment where I can take down any given host
for updates without anything at all breaking, and especially without having to
take the entire lab down before a regular host update.</p>
<p>My node updates need to execute the following sequence:</p>
<ol>
<li>Drain all Pods from the node</li>
<li>Run <code>apt update</code></li>
<li>Run <code>apt upgrade</code></li>
<li>Reboot the machine</li>
<li>Uncordon the machine</li>
<li>Run <code>apt autoremove</code></li>
</ol>
<p>I&rsquo;ve got a couple of different classes of nodes in my Homelab, but I will
concentrate only on those related to k8s in this post:</p>
<ol>
<li>Control plane nodes. These run the kubeadm control plane Pods and Ceph MONs.</li>
<li>Ceph nodes. These run the Ceph OSDs providing storage to the Homelab and some
other Ceph services.</li>
<li>Worker nodes. Those run my Kubernetes workloads.</li>
</ol>
<p>All three have some alterations on the above sequence of steps. All three classes
of node have their separate play, and the plays all run in sequence, not parallel
to each other, to ensure stability of the overall cluster. I&rsquo;m reasonably sure
that with some fancy footwork, I could probably run them in parallel as well.
But the main goal of this setup is that I enter a single command, and then I
can do something completely different, without having to babysit the update.
If it takes an hour longer but I can just go and read something while it&rsquo;s
running, that&rsquo;s an okay trade-off for me. Those of you following me on Mastodon
can probably tell when my update Fridays are just by the volume of posts I make
on those evenings. &#x1f605;</p>
<p>The first difference in the playbooks for each class of node is in the parallelism
inside the playbooks. For this, I&rsquo;m using Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/linear_strategy.html">Linear Strategy</a>.
Both the control plane and Ceph nodes run with <code>serial: 1</code>, to make sure there
are always enough nodes up to keep the Homelab chugging along.
The worker nodes on the other hand are allowed to run with <code>serial: 2</code>, updating
two hosts in parallel, as I should have enough slack in the cluster to keep at
least most things running even with two fewer nodes.</p>
<p>For draining the nodes, I initially used the <a href="https://docs.ansible.com/ansible/latest/collections/kubernetes/core/k8s_drain_module.html">k8s_drain_module</a>.
But I had a problem with that one, namely getting <code>too many requests</code> errors:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>fatal: <span style="color:#f92672">[</span>node1 -&gt; cnc<span style="color:#f92672">]</span>: FAILED! <span style="color:#f92672">=</span>&gt; <span style="color:#f92672">{</span><span style="color:#e6db74">&#34;changed&#34;</span>: false, <span style="color:#e6db74">&#34;msg&#34;</span>: <span style="color:#e6db74">&#34;Failed to delete pod rook-cluster/rook-ceph-osd-1-7977658495-nt6ps due to: Too Many Requests&#34;</span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>I didn&rsquo;t always get the error. Sometimes it just worked. And I&rsquo;m not 100% sure
what the trigger here is. After spending quite a while googling, I&rsquo;m still not
sure where those errors are even coming from. Whether it&rsquo;s the kube-apiserver
which returns them, or whether it for example has something to do with Pod
disruption budgets. I then switched to executing <code>kubectl</code> via the <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/command_module.html">command module</a>.
This worked without issue. The task for draining a node looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span></code></pre></div><p>You need to supply the absolute path to the <code>kubectl</code> binary, as this runs
a command directly, not inside a shell, so no PATH extensions and the like.
I&rsquo;m also delegating this task to my Command &amp; Control host. This is the only
machine with Kubernetes certs. The <code>--delete-emptydir-data=true</code> is needed
because Cilium uses <a href="https://kubernetes.io/docs/concepts/storage/volumes/#emptydir">emptyDir</a>
for some temporary storage, and without it, the drain fails.
The same is true for <code>--force</code>, which is necessary to allow a drain on nodes
with Pods from e.g. Deployments or StatefulSets to go through. Finally, <code>--ignore-daemonsets</code>
is necessary to allow draining of DaemonSet pods, which in my case is for example
the Fluentbit log shipper.</p>
<p>The rest of the play for my worker nodes looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes worker nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run apt upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">upgrade</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">reboot machine</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reboot</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for the machine to accept ansible commands again</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for_connection</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">clear OSD blocklist</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd blocklist clear</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run autoremove</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">autoremove</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for one minute</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>The <code>clear OSD blocklist</code> task clears Ceph&rsquo;s client blocklist. Most of my worker
nodes don&rsquo;t have any storage of their own, and instead use netboot and a Ceph
RBD volume for their root FS. And sometimes, Ceph puts clients on a blocklist,
as I&rsquo;ve explained in more detail <a href="https://blog.mei-home.net/posts/netboot-prob-virtualbox/">here</a>.
This task clears the blocklist. I&rsquo;m also giving all my plays a pause at the end,
to afford the cluster some time to settle again before the next batch of workers
is taken down.</p>
<p>For the Ceph nodes, I need to get a little bit more involved. I start out with
pre tasks and post tasks, which set and later unset the Ceph <code>noout</code> flag. This flag tells Ceph
to not be bothered when an OSD goes down. With this flag unset, which should be
the default, Ceph would start re-balancing data between the still available OSDs
once an OSD has been out of the cluster for some time. This is useful for error
cases, but the <code>noout</code> flag can be used to tell Ceph that it&rsquo;s going to be okay.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set osd noout</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd set noout</span>
</span></span></code></pre></div><p>As you can see here, I&rsquo;m again delegating execution of the command to my C&amp;C host,
as no other host has the necessary k8s certs. This command also needs the absolute
path to the binary. This time, that&rsquo;s not <code>kubectl</code> itself, but instead the binary
of the <a href="https://github.com/rook/kubectl-rook-ceph">rook-ceph plugin</a>. Normally
I would call it with <code>kubectl rook-ceph ...</code>, but that does not work with the
<code>command</code> module, so I give it the absolute path.</p>
<p>The next extra, compared to the worker node play, is that I&rsquo;m actively waiting
for the Ceph OSDs to come back. This is important to make sure that I don&rsquo;t
start the updates of the next OSD node before the previous one is back up and
running, because otherwise, bad things would happen. For one thing, I&rsquo;ve got
workloads already using the Ceph storage. For another, most of my worker nodes
will use storage from Ceph for their root disks.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for OSDs to start</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd status &#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">ceph_end</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">until</span>: <span style="color:#e6db74">&#39;(ceph_end.stdout | regex_findall(&#34;.*,up.*&#34;, multiline=True) | list | length) == (ceph_end.stdout_lines | length - 1)&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">12</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delay</span>: <span style="color:#ae81ff">10</span>
</span></span></code></pre></div><p>This task waits for a maximum of 120 seconds for the node&rsquo;s OSDs to come up.
The output of <code>ceph osd status</code> looks like this:</p>
<pre tabindex="0"><code>ID  HOST     USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  nakith   191G  7260G      0        0       0        0   exists,up  
 1  nakith   809M  1862G      4     11.1k      5       26   exists,up  
 2  neper    391M   931G      0        0       1        0   exists,up  
 3  neper    191G  3534G      0        0       0     14.2k  exists,up  
</code></pre><p>That&rsquo;s then parsed with a regex, and the number of lines with <code>up</code> in the
state is compared to the number of lines overall.
Just for completeness&rsquo; sake, here is the full Ceph play:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes Ceph nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd set noout</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run apt upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">upgrade</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">reboot machine</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reboot</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for the machine to accept ansible commands again</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for_connection</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for OSDs to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd status &#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">register</span>: <span style="color:#ae81ff">ceph_end</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">until</span>: <span style="color:#e6db74">&#39;(ceph_end.stdout | regex_findall(&#34;.*,up.*&#34;, multiline=True) | list | length) == (ceph_end.stdout_lines | length - 1)&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">12</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delay</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run autoremove</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">autoremove</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">post_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unset osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd unset noout</span>
</span></span></code></pre></div><p>And finally, the control plane nodes. The main addition here is that I&rsquo;m using
Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_module.html">wait_for module</a>
to wait until the CP components are up again. Or, to be more precise, to wait
until their ports are open, as I&rsquo;m not really doing a readiness check.
Here is the play:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update k8s controller hosts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-controller</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run apt upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">upgrade</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">reboot machine</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reboot</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for the machine to accept ansible commands again</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for_connection</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for kubelet to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">10250</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for kube-apiserver to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">6443</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for kube-vip to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2112</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for etcd to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2379</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for ceph mon to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">6789</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run autoremove</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">autoremove</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for one minute after controller update</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>The additional waits for the ports to accepts connections on all of the CP
components is a bit of insurance, to make sure the node is fully up again. This
could certainly be improved by checking the Pod status via <code>kubectl</code> instead,
but this approach has served me well for about a year now in my Nomad cluster,
so it should be fine here as well.</p>
<p>And with that, I&rsquo;ve finally got my Kubernetes nodes in the regular updates as
well. It was really high time, I set the nodes up back on the 20th of December
and haven&rsquo;t updated them since then. &#x1f62c;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 6: Logging with FluentD, Fluentbit and Loki</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-6-logging/</link>
      <pubDate>Tue, 13 Feb 2024 00:20:32 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-6-logging/</guid>
      <description>How I&amp;#39;m getting logs from /var/log/containers to Grafana</description>
      <content:encoded><![CDATA[<p>Wherein I document how I migrated my logging setup from Nomad to k8s.</p>
<p>This is part seven of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<h1 id="setup-overview">Setup overview</h1>
<p>Let&rsquo;s start with an overview of the setup.</p>
<figure>
    <img loading="lazy" src="logging-structure.png"
         alt="A diagram of the different logging stages. The top box shows redis logs, with the following content: &#39;1:M 28 Jan 2024 20:31:08.432 * Background saving terminated with success&#39;. This is a standard redis log line. The next box down shows the CRI-O logs, stored in /var/log/containers. It has the same log line as before, but now prepended with a timestamp, the output stream, &#39;stdout&#39; in this case, and the letter &#39;F&#39;. Next comes the Fluentbit log. This shows the original log line in a variable called &#39;log&#39;. There are additional variables, &#39;namespace_name&#39;, &#39;container_name&#39; and &#39;labels&#39;. Another box with an arrow going to the Fluentbit box indicates that the log was enhanced with the help of data from the kube-apiserver. Next come the Fluentd logs. The original log line, minus the timestamp and &#39;*&#39;, is now in a variable called &#39;msg&#39;, with the timestamp now in a variable called &#39;time&#39;. In addition, a new variable &#39;level&#39; with value &#39;info&#39; as been added. From FluentD, the next station is &#39;Loki&#39;, which stores the data in &#39;S3&#39; and &#39;Grafana&#39;, which takes input from Loki to display the logs."/> <figcaption>
            <p>Overview of my logging pipeline.</p>
        </figcaption>
</figure>

<p>It all starts out with the app&rsquo;s logs. Those are output to stdout in the container.
Then, my container runtime, <a href="https://cri-o.io/">cri-o</a>, takes that and writes
it to files in <code>/var/log/containers/</code> by default. It also prefixes the log line
with the timestamp, the stream it was coming from, and <code>F</code> for full log lines
as well as <code>P</code> for partial log lines. If I understand correctly, that&rsquo;s just the
standard log format Kubernetes and/or the <a href="https://github.com/kubernetes/cri-api/tree/master">CRI spec</a>
expect.</p>
<p>To collect all of these logs, I have a <a href="https://fluentbit.io/">Fluentbit</a>
DaemonSet deployed on all nodes. Mostly, this serves as a pretty dumb log shipper,
tailing all files under <code>/var/log/containers</code>. But before sending the logs
onward, I&rsquo;m enriching them with some k8s information, like labels, via the
<a href="https://docs.fluentbit.io/manual/pipeline/filters/kubernetes">Kubernetes filter</a>.</p>
<p>To gather the logs from all containers in the cluster, as well as the logs from
the host&rsquo;s JournalD instance, I&rsquo;m using <a href="https://www.fluentd.org/">FluentD</a>,
where Fluentbit sends the logs via Fluent&rsquo;s own <a href="https://docs.fluentd.org/input/forward">Forward protocol</a>.
Here, the main task is in parsing the log lines themselves, bringing them all
into some kind of coherent format. This includes things like bringing all the
different ways of defining log levels in line, so I end up only with a few
levels.</p>
<p>FluentD then sends those logs to <a href="https://grafana.com/oss/loki/">Loki</a> for
long-term storage. There I then access them either via Grafana&rsquo;s explore
feature when I&rsquo;m actively looking for something, or via the <a href="https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/logs/">Log panel</a>:
<figure>
    <img loading="lazy" src="grafana-log-panel.png"
         alt="A screenshot of a Grafana log panel. At the top are two drop-downs. The first one is labeled &#39;Namespace&#39; and shows the value &#39;redis&#43;external-dns&#39;. The second one is labeled &#39;Container&#39; and has the value &#39;All&#39;. Below is a list of log lines. On the left side of each line is a colored indicator showing the line&#39;s severity. Next comes the timestamp, followed by the log line&#39;s labels. These are either &#39;external-dns&#39;/&#39;external-dns or &#39;redis&#39;/&#39;redis&#39; in this screenshot. After that come the log entries themselves, showing individual log elements with a &#39;key=value&#39; syntax. The actual log line content is not important here, as this screenshot is only intended to illustrate the final state of the logging setup."/> <figcaption>
            <p>Example of what the final stage of the setup will look like.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;ve got separate dashboards for syslogs from the cluster hosts and the running
apps.</p>
<p>One question you might ask is: Why not use Fluentbit for host logs as well?
The reason, as weird as it might sound, is unification: Not all of my hosts are
running, and will ever run, containers. Examples are my OpenWRT WiFi access point
and my OPNsense router. They both speak the <a href="https://de.wikipedia.org/wiki/Syslog">Syslog protocol</a>,
but don&rsquo;t run containers and aren&rsquo;t part of the k8s cluster. The same goes for my
desktop as well as several other hosts in the Homelab. And I&rsquo;ve found it to make
more sense to standardize on <a href="https://www.syslog-ng.com/">syslog-ng</a> and the
syslog protocol than to run Fluentbit everywhere.</p>
<h2 id="differences-to-nomad-setup">Differences to Nomad setup</h2>
<p>A short note on the difference to what I had before, in Nomad. The basic setup
was the same. With the main difference being that my Nomad cluster runs on
Docker, and I&rsquo;ve been using Docker&rsquo;s <a href="https://docs.docker.com/config/containers/logging/fluentd/">FluentD logging driver</a>.
Instead of writing the logs to a file, this driver supports FluentD&rsquo;s Forward
format and can send directly to the Fluentbit instance, without the diversion
via a file on disk.
In the beginning, I had Nomad/Docker configured in such a way that the first
time a log line touched a disk was when it was written to S3 after being
delivered to Loki. But this had the downside that when Loki, FluentD, Fluentbit
or Grafana were down, I didn&rsquo;t have a convenient way to get at my logs. So I
ended up enabling Nomad&rsquo;s log writing anyway.</p>
<p>I didn&rsquo;t follow the same approach for k8s simply because it seemed that k8s
requires the logs on disk anyway, for e.g. <code>kubectl logs</code> access.</p>
<h1 id="loki-setup">Loki setup</h1>
<p>So let&rsquo;s start with the first component. I started at the top of the stack with
<a href="https://grafana.com/oss/loki/">Loki</a> and worked my way down the stack to
Fluentbit mostly because this way, I could first disable the Nomad deployments
of Loki and FluentD, instead of reconfiguring the Nomad deployments to also
accept logs from k8s just to then switch the deployments off a couple of days
later.</p>
<p>For Loki, I wrote my own Helm chart. Loki <a href="https://grafana.com/docs/loki/latest/setup/install/helm/">does provide Helm charts</a>.
But they have one glaring downside:</p>
<blockquote>
<p>If you set the singleBinary.replicas value to 1, this chart configures Loki to run the all target in a monolithic mode, designed to work with a filesystem storage.</p></blockquote>
<p>See <a href="https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/">here</a>.
The Helm chart does not allow using any other form of storage, besides the local
file system, when using a single binary. And I have absolutely no reason to use
the clustered deployment of Loki. But I do want to use S3 for storage, as I find
the &ldquo;just a big lake of data&rdquo; approach of S3 pretty nice, especially for logs.</p>
<p>To begin with, let&rsquo;s look at the values file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">port</span>: <span style="color:#ae81ff">3100</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">lokiPath</span>: <span style="color:#e6db74">&#34;/hl-loki&#34;</span>
</span></span></code></pre></div><p>I&rsquo;ve kept this pretty simple, as it&rsquo;s my own chart, I&rsquo;m only using Helm values
for things which I need to reference in multiple different manifests. So this
only contains the port Loki will listen on, my common <code>homelab/part-of</code> label
and the path where some scratch storage will be mounted as a working directory.</p>
<p>Instead of starting with the deployment and going from there, I will start with
all the manifests used in there this time. Let&rsquo;s see whether that improves the
reading flow.</p>
<p>First, the S3 bucket. This bucket will be used as storage for the logs and
the index. I will create it with Ceph Rook&rsquo;s <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">Object Bucket Claim</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">objectbucket.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ObjectBucketClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">generateBucketName</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span></code></pre></div><p>This claim will create a bucket in Ceph&rsquo;s RGW S3 implementation and will
generate a set of credentials to access that bucket, in the form of both a
Secret and a ConfigMap. The bucket&rsquo;s name is partially randomly generated, and
in my case, the bucket is called <code>logs-4138cb40-b96c-4526-b47e-f474a4978775</code>.
The secret will be named after the <code>generateBucketName</code>, so in this case it
will just be called <code>logs</code>. It contains the values <code>AWS_ACCESS_KEY_ID</code> and
<code>AWS_SECRET_ACCESS_KEY</code>. This way, when used via the <code>fromEnv</code> functionality
in a Pod spec, it automatically exposes the S3 credentials in the standard ENV
variables.</p>
<p>The ConfigMap generated by the OBC is also called <code>logs</code>. It contains the necessary
config values to configure an application to access the bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_HOST</span>: <span style="color:#ae81ff">rgw-service.ceph-cluster-namespace.svc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_NAME</span>: <span style="color:#ae81ff">logs-4138cb40-b96c-4526-b47e-f474a4978775</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_PORT</span>: <span style="color:#e6db74">&#34;80&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_REGION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_SUBREGION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span></code></pre></div><p>This configuration is then used via environment variables, which are accessed
in the Loki configuration file, which I provide via a ConfigMap:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-conf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">loki.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    target: &#34;all&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    auth_enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    server:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      http_listen_port: {{ .Values.port }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      grpc_server_max_recv_msg_size: 8000000
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log_format: &#34;json&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    common:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      path_prefix: {{ .Values.lokiPath }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      instance_addr: 127.0.0.1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      storage:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        s3:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          s3forcepathstyle: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          bucketnames: ${BUCKET_NAME}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          endpoint: ${BUCKET_HOST}:${BUCKET_PORT}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          access_key_id: &#34;${AWS_ACCESS_KEY_ID}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          secret_access_key: &#34;${AWS_SECRET_ACCESS_KEY}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          insecure: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ring:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        kvstore:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          store: &#34;inmemory&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    query_range:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      cache_results: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      results_cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          redis:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            endpoint: redis.redis.svc.cluster.local:6379
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          embedded_cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    ingester:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      wal:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      lifecycler:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ring:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          replication_factor: 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          kvstore:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            store: &#34;inmemory&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    storage_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      boltdb_shipper:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        active_index_directory: {{ .Values.lokiPath }}/active_index
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        shared_store: &#34;s3&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        cache_location: {{ .Values.lokiPath }}/shipper_cache
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    chunk_store_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      chunk_cache_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        redis:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          endpoint: redis.redis.svc.cluster.local:6379
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    schema_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      configs:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        - from: 2024-01-01
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          store: boltdb-shipper
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          object_store: s3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          schema: v12
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          index:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            prefix: index_
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            period: 24h
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    compactor:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      working_directory: {{ .Values.lokiPath }}/compactor/
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      shared_store: s3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      retention_enabled: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    limits_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      reject_old_samples: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      max_entries_limit_per_query: 10000
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deletion_mode: &#34;filter-and-delete&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      retention_period: 1y</span>
</span></span></code></pre></div><p>Note that the <code>${ENV_VAR_NAME}</code> syntax is a feature of Loki when reading the
configuration file, it doesn&rsquo;t have anything to do with k8s directly.
I will show the CLI option which needs to be handed to Loki later.</p>
<p>I kept this config relatively simple. The <code>common:</code> config defines a couple
of shared components, most importantly the S3 storage. Further, I&rsquo;m also
configuring Redis for caching.</p>
<p>Last, before coming to the Deployment, here&rsquo;s the scratch volume, on my Ceph
SSD pool:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">2Gi</span>
</span></span></code></pre></div><p>This is merely a small workspace for Loki. As long as it gets enough time during
shutdown, it will upload all relevant data to S3.</p>
<p>And now, the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>      {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">grafana/loki:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config.expand-env=true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config.file={{ .Values.lokiPath }}/conf/loki.yaml&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">configMapRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.lokiPath }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.lokiPath }}/conf/</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">400Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/ready&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-conf</span>
</span></span></code></pre></div><p>The first config to look at is the <code>Recretate</code> update policy, which is needed
because I&rsquo;m mounting a PVC. With the default <code>Rolling</code> update strategy, the
fresh Pods won&rsquo;t be able to start, because the volume will still be mounted
by the old Pod.</p>
<p>I&rsquo;m also applying an interesting strategy in the Pod&rsquo;s annotations:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/config.yaml&#34;) . | sha256sum }}</span>
</span></span></code></pre></div><p>This takes the ConfigMap I showed before, templates it, and then takes the
Hash of the resulting string. As annotations cannot be changed on a running pod,
in contrast to labels, k8s will recreate the pod when that hash changes. This
way, the Pod is automatically recreated when the Loki config changes.</p>
<p>I also mentioned above that a CLI flag needs to be set to have Loki insert
environment variables into the configuration file. This is the <code>-config.expand-env=true</code>
flag.</p>
<p>Finally, I&rsquo;d like to point out the <code>securityContext.fsGroup</code> setting. I did not
have this setting in the beginning, which had the consequence that Loki threw
<code>Permission denied</code> errors when trying to create a couple of directories during
startup. This configuration is always required when mounting PVCs, at least
Ceph PVCs.</p>
<p>I also had to setup a Service and an Ingress, as my Nomad Grafana instance will
need to access Loki until I&rsquo;ve moved it over to k8s as well. I only
show the manifests here for completeness&rsquo; sake:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ClusterIP</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;logs.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`logs.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>I&rsquo;ve also configured a CiliumNetworkPolicy for the Namespace, which looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;loki&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span></code></pre></div><p>This also already contains the config for the later FluentD access.</p>
<h1 id="fluentd-setup">FluentD setup</h1>
<p>Next step down the logging stack is my <a href="https://www.fluentd.org/">FluentD</a>
instance. I&rsquo;m using it in the role of a log aggregator. All logs produced in
my Homelab ultimately end up here. I&rsquo;ve kept all per-host log shippers as dumb
as possible and do all log massaging and aggregation in FluentD. It has served
me well in the past, since the days when I used Influx&rsquo; <a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraph</a>,
for a short while. It is highly efficient, serving all the syslogs and application
logs from my entire setup with 105 MB of memory and 1.7% of the CPU. The only
slightly weird thing is the configuration language, which has a decidedly XML
character, showing the time it was initially implemented. &#x1f609;</p>
<p>I&rsquo;m running FluentD as a singular instance, made available to my entire Homelab
via a LoadBalancer type Service. It has to listen on a number of ports as
well, for syslogs in both standard formats (<a href="https://datatracker.ietf.org/doc/html/rfc3164">RFC3164</a>
and <a href="https://datatracker.ietf.org/doc/html/rfc5424">RFC5424</a>) and TCP and UDP
as well as forwarding from the Nomad and k8s cluster.</p>
<p>I&rsquo;m building my own Helm chart here again. Mostly because the current image used
in the official chart
does not support all plugins I need and I&rsquo;m building my own image.
I&rsquo;m currently using the following plugins in my configs:</p>
<ul>
<li><a href="https://rubygems.org/gems/fluent-plugin-grafana-loki/">fluent-plugin-grafana-loki</a></li>
<li><a href="https://github.com/repeatedly/fluent-plugin-record-modifier">fluent-plugin-record-modifier</a></li>
<li><a href="https://github.com/repeatedly/fluent-plugin-multi-format-parser">fluent-plugin-multi-format-parser</a></li>
<li><a href="https://github.com/fluent/fluent-plugin-rewrite-tag-filter">fluent-plugin-rewrite-tag-filter</a></li>
<li><a href="https://github.com/tagomoris/fluent-plugin-route">fluent-plugin-route</a></li>
<li><a href="https://github.com/k63207/fluent-plugin-http-healthcheck">fluent-plugin-http-healthcheck</a></li>
<li><a href="https://github.com/fluent-plugins-nursery/fluent-plugin-kv-parser">fluent-plugin-kv-parser</a></li>
</ul>
<p>I will start with the Kubernetes setup here and go into details about the
actual FluentD config later.</p>
<p>One nice thing I was able to do in my Helm chart is the way the config files
are delivered. They are all put into a single ConfigMap, automatically, without
me having to adapt the Kubernetes manifests when adding a new config file.</p>
<p>In my chart root directory, I&rsquo;ve got a subdirectory <code>configs/</code>, where I store
all of the FluentD config files. Then I&rsquo;ve got the following ConfigMap:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd-conf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>{{ <span style="color:#ae81ff">(tpl (.Files.Glob &#34;configs/*&#34;).AsConfig .) | indent 2 }}</span>
</span></span></code></pre></div><p>The magic is all in the last line. First, the <code>Files.Glob</code> function gets a number
of <code>Files</code> objects from the <code>configs/*</code> path. The <code>AsConfig</code> method then turns
each of those objects into the proper format to work in a ConfigMap&rsquo;s <code>data:</code>
key. This happens by formatting it like this, for a file at <code>configs/redis-k8s.conf</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">redis-k8s.conf</span>: <span style="color:#ae81ff">|                                                                                                  </span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># Log configuration for Redis on k8s                                                                                                                                                                                                    </span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#ae81ff">&lt;filter services.redis.redis&gt;                                                                                    </span>
</span></span><span style="display:flex;"><span>    @<span style="color:#ae81ff">type parser                                                                                                   </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">key_name log                                                                                                   </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">reserve_data true                                                                                              </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">remove_key_name_field true                                                                                     </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">&lt;parse&gt;                                                                                                        </span>
</span></span><span style="display:flex;"><span>      @<span style="color:#ae81ff">type regexp                                                                                                 </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">expression /^[0-9]+:[XCSM] (?&lt;logtime&gt;[0-9]{2} [A-Za-z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}) (?&lt;level&gt;[\.\-\*\#]) [^$]+$/                                                                                               </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">time_key logtime                                                                                             </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">time_type string                                                                                             </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">utc true                                                                                                     </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">time_format %d %b %Y %H:%M:%S.%N                                                                             </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">&lt;/parse&gt;                                                                                                       </span>
</span></span><span style="display:flex;"><span>  <span style="color:#ae81ff">&lt;/filter&gt;                      </span>
</span></span></code></pre></div><p>Finally, <code>tpl</code> is called on the result, as I&rsquo;m referencing Helm chart values in
some of the config files. The only thing I&rsquo;m a bit worried about is the maximum
size of a Kubernetes resource, which may be hit at some point with this approach.</p>
<p>Speaking of the Helm chart values, here they are:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">syslog</span>: <span style="color:#ae81ff">5144</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">syslogTls</span>: <span style="color:#ae81ff">6514</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logShippers</span>: <span style="color:#ae81ff">24225</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">netconsole</span>: <span style="color:#ae81ff">6666</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">health</span>: <span style="color:#ae81ff">8888</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">k8sLogs</span>: <span style="color:#ae81ff">24230</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">mountDir</span>: <span style="color:#ae81ff">/fluentd/log</span>
</span></span></code></pre></div><p>Again relatively simple, as I can configure most things by just changing the
templates directly. The <code>netconsole</code> here is a port for the kernel&rsquo;s netconsole,
which I&rsquo;m using for my netbooting hosts to see the early boot process without
having to connect them. That post has been sitting in my repo as a draft since
I started the k8s migration plan back in December. &#x1f605;
The <code>logShippers</code> port is used by the Fluentbit instances deployed on Nomad,
while the k8s logs are send to the <code>k8sLogs</code> port.</p>
<p>There&rsquo;s again a scratch volume involved, which is used for on-disk buffers
before logs are send over to Loki, so that there&rsquo;s no loss when the FluentD
container is suddenly killed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">4Gi</span>
</span></span></code></pre></div><p>And here is the Deployment for FluentD:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>      {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">registry.mei-home.net/homenet/fluentd:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mountDir }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/fluentd/etc</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_NODE_NAME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">fieldRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">fieldPath</span>: <span style="color:#ae81ff">spec.nodeName</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.health }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tcp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.syslog }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-udp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.syslog }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tls</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.syslogTls }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">log-shippers</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.logShippers }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k8s</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.k8sLogs }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">netconsole</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.netconsole }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd-conf</span>
</span></span></code></pre></div><p>The only new thing I learned here is how to access fields from the Pod spec
via environment variables, which I use to put the node name into the <code>HL_NODE_NAME</code>
env variable. I later use this to add the node name to FluentD&rsquo;s own logs.</p>
<p>Next, the service. This is of type LoadBalancer, as FluentD does not only handle
the internal cluster logs, but also the logs from all of my hosts.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">log-aggregator.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#e6db74">&#34;10.86.55.66&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tcp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">514</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">syslog-tcp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-udp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">514</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">syslog-udp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tls</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.syslogTls }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">syslog-tls</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">log-shippers</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.logShippers }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">log-shippers</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k8s</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.k8sLogs }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">k8s</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">netconsole</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.netconsole }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">netconsole</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span></code></pre></div><p>While setting up the service, I realized that I had forgotten the <code>syslog-udp</code>
entry in the port list. I added it, but the service did not show any Endpoints
for that particular port:</p>
<pre tabindex="0"><code>Port:                     syslog-udp  514/UDP
TargetPort:               syslog-udp/UDP
NodePort:                 syslog-udp  31268/UDP
Endpoints:
</code></pre><p>I tried to fix this by recreating the Pod, but to no avail. I finally found
<a href="https://ben-lab.github.io/kubernetes-UDP-TCP-bug-same-port/">this blog post</a>,
which proposed not just deleting the Pod, but also the Deployment - and that
did the trick.</p>
<p>Another issue I hit came after I switched the syslog-ng configs for all of my
hosts over to the new FluentD on k8s. All hosts worked well - safe for all of
my k8s nodes! None of them was able to connect to the FluentD service.
After quadruple-checking my Firewall config, I finally had another facepalm
moment. My CiliumNetworkPolicy looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;fluentd&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>The issue with this config is: <code>world</code> doesn&rsquo;t mean &ldquo;absolutely everybody&rdquo;. It
only means &ldquo;everybody outside the k8s cluster&rdquo;. So my desktop, all
of my Nomad cluster nodes and my baremetal Ceph nodes were allowed - but the
k8s nodes were not! I&rsquo;m wondering how often I will read through <a href="https://docs.cilium.io/en/latest/security/policy/language/#entities-based">the Cilium docs</a>
before this migration is over. &#x1f605;
The solution is to add <code>remote-node</code> <em>and</em> <code>host</code> to the <code>fromEntities</code> list,
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">world</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">remote-node</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">host</span>
</span></span></code></pre></div><p><code>remote-node</code> allows all cluster nodes which are not running the FluentD Pod
access, while <code>host</code> is specifically the node which is currently running the Pod.</p>
<p>But that was still not all the problems I had with the syslogs setup. In my old
config, I had configured FluentD to check the IP address of the sender and look
it up in DNS to set the <code>host</code> field of the log message. For the life of me,
I cannot remember why I would do such a thing.
The problem was, again, with the k8s hosts. They were showing up with IPs, not
their proper hostnames. I recognized the IPs as the ones which Cilium assigns
to the <code>cilium_host</code> interface it creates. Those IPs are not routable in my
wider network and there are no corresponding PTR records in my DNS.</p>
<p>To realize that that was my problem, I had to break out Wireshark yet again.
This is becoming a theme in this migration. Looking closer at the messages,
I was able to see that the host field was set properly. An example message
looked like this:</p>
<pre tabindex="0"><code>&lt;30&gt;1 2024-01-29T22:16:56+01:00 khepri.home kubelet 19562 - - W0129 22:16:56.971140   19562 machine.go:65] Cannot read vendor id correctly, set empty.
</code></pre><p>And this host field, <code>khepri.home</code> here, should be used by default. But it wasn&rsquo;t,
due to having this option set:</p>
<pre tabindex="0"><code>source_hostname_key host
</code></pre><p>This makes FluentD go out to DNS, instead of just using the host field from the
message. Ah well, it had been almost two weeks since I last had to get out
Wireshark anyway. &#x1f612;</p>
<p>Last but most certainly not least was a problem with OPNsense. I had reconfigured
the logging target to the new domain, but nothing happened. No proper log entries,
nothing.
Turns out, after an hour or so of debugging: OPNsense&rsquo;s syslog-ng needed a full
restart, not just a config reload, to start using the new values. &#x1f926;</p>
<h2 id="fluentd-configs">FluentD configs</h2>
<p>Before moving on to Fluentbit and the k8s logs, I want to at least have a short
look at my FluentD base config and the syslog configs.</p>
<p>My main config file for FluentD looks like this:</p>
<pre tabindex="0"><code>&lt;system&gt;
  log_level info
&lt;/system&gt;
# Fluentd&#39;s own logs
# This is only intended to prepare the logs for forwarding
# to the services loki instance!
&lt;label @FLUENT_LOG&gt;
	&lt;match fluent.**&gt;
		@type record_modifier
		@label @K8S
		tag services.fluentd.fluentd
		&lt;record&gt;
          namespace_name fluentd
          container_name fluentd
          host &#34;#{ENV[&#39;HL_NODE_NAME&#39;]}&#34;
          level ${tag_parts[1]}
		&lt;/record&gt;
	&lt;/match&gt;
&lt;/label&gt;

# Healthcheck endpoint
&lt;source&gt;
  @type http_healthcheck
  port {{ .Values.ports.health }}
  bind 0.0.0.0
&lt;/source&gt;

# Syslog Handling
@include syslogs.conf

# Service logs coming from the Nomad jobs
@include nomad-srv-logs.conf

# k8s logs
@include k8s.conf
</code></pre><p>The main part is the configuration of FluentD&rsquo;s own logs. With this config,
they&rsquo;re still put to <code>stdout</code>, but they&rsquo;re also forwarded to my <code>K8S</code> label.
I will show that later. Now that I look at the config, it might be interesting
to see whether I could also provide the namespace and container name as ENV
variables and set the record content from those, instead of hardcoding them
here.</p>
<p>In the <code>syslogs.conf</code> file, I&rsquo;m setting up the syslog handling:</p>
<pre tabindex="0"><code>&lt;source&gt;
	@type syslog
	port {{ .Values.ports.syslogTls }}
	bind 0.0.0.0
	tag syslogs.new
  severity_key level
	frame_type octet_count
	&lt;transport tcp&gt;
	&lt;/transport&gt;
	&lt;parse&gt;
		message_format auto
	&lt;/parse&gt;
&lt;/source&gt;

&lt;filter syslogs.new.**&gt;
  @type grep
  &lt;exclude&gt;
    key message
    pattern /^.* too long to fit into unit name, ignoring mount point\.$/
  &lt;/exclude&gt;
&lt;/filter&gt;

&lt;match syslogs.**&gt;
  @type route
  &lt;route syslogs.**&gt;
    @label @SYSLOGS
    copy
  &lt;/route&gt;
&lt;/match&gt;

&lt;label @SYSLOGS&gt;
  &lt;match syslogs.**&gt;
    @type loki
    url &#34;http://loki.loki.svc.cluster.local:3100&#34;
    extra_labels {&#34;job&#34;:&#34;syslogs&#34;}
    &lt;label&gt;
      host $.host
    &lt;/label&gt;
    &lt;buffer host&gt;
      path /fluentd/log/buffers/loki.syslog.*.buffer
      @type file
      total_limit_size 5GB
      flush_at_shutdown true
      flush_mode interval
      flush_interval 5s
      chunk_limit_size 5MB
    &lt;/buffer&gt;
  &lt;/match&gt;
&lt;/label&gt;
</code></pre><p>I have removed some content here, as a lot of it is repetitive code for
the different ports I need to listen on for the different syslog formats and
transports.</p>
<p>The <code>&lt;source&gt;</code> will listen on the <code>syslogTLS</code> port for incoming syslog messages
and tag them properly. The parsing, luckily, can be automated as FluentD has
the ability to automatically detect whether it&rsquo;s an RFC3164 or a RFC5424 formatted
message.</p>
<p>In the <code>@type grep</code> filter, I&rsquo;m dropping one specific log line, looking like this:</p>
<pre tabindex="0"><code>Mount point path &#39;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/7fa64edb6fced4ed6d5acd3643506bc6a3ea6bea547f6a6c2997653e203f4857/globalmount/0001-...too long to fit into unit name, ignoring mount point
</code></pre><p>These incredibly annoying lines are send by systemd, with <code>warning</code> severity.
See e.g. <a href="https://github.com/docker/for-linux/issues/679">this docker bug report</a>.
These messages also come up on my Nomad cluster, for the same reason, CSI mounts.
But from the GitHub issue it seems like there&rsquo;s now a way in systemd to do
this? Anyway, for now I&rsquo;m filtering in FluentD to not have the endless repeats
of that line every 10 seconds pollute my logs.</p>
<p>Finally, I&rsquo;m using the <code>SYSLOGS</code> label. Labels in FluentD are something like
subtrees in the log pipeline. By default, every log record flows from the point
where its <code>&lt;source&gt;</code> is until it hits a <code>&lt;match&gt;</code> section which fits its tag.
With labels, subsections combining several <code>&lt;filter&gt;</code> and <code>&lt;match&gt;</code> sections
can be created.</p>
<p>In this particular example, nothing special done, just the
match section for the Loki output plugin. I&rsquo;m setting the URL of the internal
Loki service. The <code>&lt;label&gt;</code> section defines which Loki labels are set for the
log entry. Labels are Loki&rsquo;s index. It can search over the entirety of a log
entry, but the labels are more efficient, especially with separate log sources
like I have here, with container logs and host logs. For this I&rsquo;m setting the
<code>job</code> label, <code>syslogs</code> here. In addition, any given syslog stream is also
identified uniquely by the host which emitted the log, which is why I&rsquo;m adding
the host field as another label. All other fields in the record, like the syslog
ident, will also be stored in Loki of course, but they will simply be part of
the log line, and will not be indexed.</p>
<p>I&rsquo;m also setting up some local buffering, again with the host key as the buffer
key. I&rsquo;m giving the buffer a rather generous 5GB of max size, just to make sure
I don&rsquo;t run out of space for logs if something goes down during the night and
subsequent workday.</p>
<h1 id="fluentbit-setup">Fluentbit setup</h1>
<p>Final component of the logging setup: <a href="https://fluentbit.io/">Fluentbit</a>.
There&rsquo;s quite a large choice of log shippers. I ended up deciding on Fluentbit
because I was already running FluentD at the point where I wanted per-host
log shippers.</p>
<p>As I had mentioned in the overview, Fluentbit is mostly a dump log shipper for me,
its config is kept pretty simple, letting FluentD do the heavy lifting. For
that reason, I was finally able to use an official Helm chart in this migration.
Fluentbit&rsquo;s can be found <a href="https://github.com/fluent/helm-charts/tree/main/charts/fluent-bit">here</a>.</p>
<p>The general part of the <code>values.yaml</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">testFramework</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;system-node-critical&#34;</span>
</span></span></code></pre></div><p>Only noteworthy part is setting the tolerations, so that Fluentbit is also
deployed on control plane nodes and, in my case, Ceph nodes. I&rsquo;ve also decided
to declare it node-critical, to prevent eviction.</p>
<p>More interesting than that is the <code>config:</code> section:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">inputs</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [INPUT]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name tail
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Path /var/log/containers/*.log
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        multiline.parser cri
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Tag kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Mem_Buf_Limit 5MB</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">filters</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name kubernetes
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Merge_Log On
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Keep_Log Off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        K8S-Logging.Parser On
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        K8S-Logging.Exclude On
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Annotations off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name nest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Operation lift
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Nested_under kubernetes
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name    modify
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match   kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove pod_id
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove docker_id
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove container_hash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove container_image
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name    grep
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match   kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Exclude namespace_name fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">outputs</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [OUTPUT]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name          forward
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match         *
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Tag           services.$namespace_name.$container_name
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Host          fluentd.fluentd.svc.cluster.local
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Port          24230
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Retry_Limit   50</span>
</span></span></code></pre></div><p>At the top is the <a href="https://docs.fluentbit.io/manual/pipeline/inputs/tail">tail input</a>.
It reads all the log files in <code>/var/log/containers</code>, which is mounted into the
Fluentbit container from the host. I&rsquo;ve set the parser to <code>cri</code>, as that&rsquo;s the
format cri-o writes.</p>
<p>The first and most interesting filter is <a href="https://docs.fluentbit.io/manual/pipeline/filters/kubernetes">Kubernetes</a>.
This plugin first extracts some data from the tag, namely the namespace, Pod name,
container name and container id. This is taken from the filename by the tail
plugin. A log file for a cilium pod would look like this:
<code>cilium-tmlqp_kube-system_cilium-agent-eb472a8bae2b95836acc51f70986c7bd2f659f0d69e4aff8b9f9fa80fef5d565.log</code>.
Where <code>cilium-tmlqp</code> is the Pod name, <code>kube-system</code> the Namespace, <code>cilium-agent</code>
the container name and finally the uuid is the container ID.</p>
<p>But the Kubernetes filter can extract additional data, by contacting the
kube-apiserver. These are the Pod ID, labels and annotations. In my config, I&rsquo;ve
disabled the annotations, because I don&rsquo;t find them too interesting in logging.
The labels, on the other hand, might come in handy.
One important thing to note about the Kubernetes filter: It needs to get this
additional data from the kube-apiserver. There&rsquo;s some caching, and I&rsquo;ve not found
any increased load on my kube-apiserver. But depending on how busy a cluster is,
meaning how many new pods appear in any given timeframe, the load on the kube-apiserver
might go pretty high.</p>
<p>The Kubernetes filter puts all of the keys it adds under a <code>kubernetes</code> key. I
don&rsquo;t find that particularly useful. That&rsquo;s why I raise all keys under <code>kubernetes</code>
by one level with the <code>nest</code> filter.</p>
<p>With the <code>modify</code> filter I&rsquo;m also removing some pieces of the log info I deem
superfluous in my log records.</p>
<p>The final filter, <code>grep</code>, is immensely important for a stable and useful
FluentD+Fluentbit setup. Imagine the following scenario: For debugging purposes,
you use FluentD&rsquo;s <a href="https://docs.fluentd.org/filter/stdout">stdout filter</a>, which
does nothing else than writing an entire log record to the stdout of the FluentD
container. Now, if we were collecting the FluentD log file and feeding it back
into FluentD, that would also then be output on stdout again - ad infinitum.
This is the end result:
<figure>
    <img loading="lazy" src="fluentd-log-loop.png"
         alt="A screenshot of a terminal, completely filled with &#39;/&#39;. Nothing else. No, really. I&#39;m not being lazy here. Just an entire terminal, filled with &#39;/&#39;."/> <figcaption>
            <p>An endless loop.</p>
        </figcaption>
</figure>

And of course, I&rsquo;ve got a screenshot of it. This has happened to me every single
time I have set up FluentD.
But this doesn&rsquo;t mean that we can&rsquo;t have FluentD logs. I will explain how that
works without risking an endless loop in the next section.</p>
<p>The OUTPUT part then transfers the logs to FluentD. I&rsquo;m setting the tag as the
Namespace name and the container name. These make the most sense as identifiers
from the data I have. I would have loved to also add the Pod name as another
key, but this has a problem: The Pod name as provided might have some hashes
attached. E.g. for the Cilium pod example from before, the Pod name would be
<code>cilium-tmlqp</code>. This hash changes when the Pod spec changes. While this is a
good idea to uniquely identify Pods, it isn&rsquo;t too useful for logs. Because
generally speaking, when looking at logs, I&rsquo;m interested in the logs for a
specific app, not (necessarily?) of a specific Pod from that app.
In addition, these unique identifiers also later form the labels for Loki, and
labels with too high cardinality are bad for Loki&rsquo;s performance.</p>
<h1 id="setting-up-k8s-component-logs">Setting up k8s component logs</h1>
<p>Before finishing up, I would also like to show what my k8s log handling looks
like in FluentD. As described above, the logs from the k8s containers arrive
in FluentD via the Forward protocol. The tag defined in the Fluentbit OUTPUT
section arrives untouched in the following <code>&lt;source&gt;</code>:</p>
<pre tabindex="0"><code>&lt;source&gt;
  @type forward
  port {{ .Values.ports.k8sLogs }}
  bind 0.0.0.0
  @label @K8S
&lt;/source&gt;
</code></pre><p>This source just ensures that all log records are routed to the <code>K8S</code> label.
A log, when it arrives at FluentD, looks like this:</p>
<pre tabindex="0"><code>time=&gt;&#34;2024-02-12T18:44:35.624186596+01:00&#34;,
stream=&gt;&#34;stderr&#34;,
_p=&gt;&#34;F&#34;,
log=&gt;&#34;2024-02-12 17:44:35.624124 I | sys: Device found - mmcblk0boot1&#34;,
pod_name=&gt;&#34;rook-discover-d9tcf&#34;,
namespace_name=&gt;&#34;rook-ceph&#34;,
labels=&gt;{
  app=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/component=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/created-by=&gt;&#34;rook-ceph-operator&#34;,
  app.kubernetes.io/instance=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/managed-by=&gt;&#34;rook-ceph-operator&#34;,
  app.kubernetes.io/name=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/part-of=&gt;&#34;rook-ceph-operator&#34;,
  controller-revision-hash=&gt;&#34;799f867d7&#34;,
  pod-template-generation=&gt;&#34;3&#34;,
  rook.io/operator-namespace=&gt;&#34;rook-ceph&#34;
},
host=&gt;&#34;khepri&#34;,
container_name=&gt;&#34;rook-discover&#34;
</code></pre><p>The only thing I&rsquo;m touching in FluentD is the <code>log</code> key. My goal is to extract
two pieces of information from it: First and foremost, the severity level of the
log. And let me tell you: This is <em>wild</em>. Because not only can the tech industry
not agree on a log format - no, it can&rsquo;t even agree on universal identifiers
for different log levels!</p>
<p>To extract the proper log line, I have separate <a href="https://docs.fluentd.org/filter/parser">Parser filters</a>
in my config, for each individual app. Because yes, it&rsquo;s the Wild West out there
when it comes to logging formats. Even <em>within the same app</em>! &#x1f620;</p>
<p>The above example is from a Ceph container. It would be parsed by this
config:</p>
<pre tabindex="0"><code>&lt;filter services.{rook-cluster,rook-ceph}.{watch-active,provision,rook-ceph-operator,rook-discover}&gt;
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  &lt;parse&gt;
    @type multi_format
    &lt;pattern&gt;
      format regexp
      expression /^(?&lt;logtime&gt;[0-9\-]+ [0-9\:\.]+) (?&lt;level&gt;[^ ]+) \| (?&lt;cmd&gt;[^:]+)\: (?&lt;msg&gt;.*)$/
      time_key logtime
      time_format %F %T.%N
      utc true
    &lt;/pattern&gt;
    &lt;pattern&gt;
      format regexp
      expression /^(?&lt;msg&gt;.*)$/
      time_key nil
    &lt;/pattern&gt;
  &lt;/parse&gt;
&lt;/filter&gt;
</code></pre><p>As you can see, this format is used by a number of other Ceph containers as well.
But you can also see: Besides these well-formatted lines, there are also others
which defy any formatting and hence just get a generic <code>.*</code> regex. And don&rsquo;t
be fooled: There are five other sections just like this one, just to parse logs
from all the Ceph Rook containers running in my cluster.</p>
<p>So what&rsquo;s happening here, exactly? First of all, there&rsquo;s the tag capture at the
top. This determines which logs are handled by this filter. In my logs, the
tags have the format <code>services.NAMESPACE.CONTAINER</code>. The <code>key_name</code> provides
the record key the parser should look at. <code>reserve_data true</code> tells the parser
to leave all other fields in the record, untouched. <code>remove_key_name_field true</code>
says that the parser should remove the <code>log</code> field from the record when parsing
was successful. If parsing fails, the record is left completely untouched.
I&rsquo;m using a multi_format parser here, as the containers spit out logs in
multiple formats. Then, I chose the <code>regexp</code> parser and provide a regex to
parse the <code>log</code> content.
I can very warmly recommend <a href="https://regex101.com">https://regex101.com</a>. It has
served me very well, both during this migration, and ever since I&rsquo;ve started
using FluentD in the Homelab.
One note on the named capture groups in the regexes: Those are transformed into
additional keys in the record.
Then there&rsquo;s the time handling. As whatever log lib the Ceph containers are using
here is too cool to use <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO8601</a>, I
need to not only specify in which key the time can be found, but also need to
define a format.
This can also be skipped, as seen in the second <code>pattern</code> section. In this case,
the time the record was originally received by Fluentbit is used.</p>
<p>Now, back to defining log levels. Here&rsquo;s the config I use to align to a single
identifier for each log level:</p>
<pre tabindex="0"><code>&lt;filter parsed.services.**&gt;
  @type record_modifier

  &lt;replace&gt;
      key level
      expression /^(crit|CRIT|ALERT|alert|4)$/
      replace critical
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(ERROR|eror|ERR|err|E|3)$/
      replace error
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(DEBUG|dbg|DBG|0|\.|\-)$/
      replace debug
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(INFO|Info|INF|inf|I|NOTICE|1|\*)$/
      replace info
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(WARN|warn|WRN|wrn|W|WARNING|2|\#)$/
      replace warning
  &lt;/replace&gt;
&lt;/filter&gt;
</code></pre><p>Note, in particular, the <code>./-/*/#</code>. That&rsquo;s Redis&rsquo; wild way of defining log
severity. See also <a href="https://build47.com/redis-log-format-levels/">this blog post</a>.</p>
<p>Why? Why can&rsquo;t we define one format? I&rsquo;m not asking for a unified log format
at this point. But perhaps - perhaps we can at least have unified identifiers
for the levels?
And also, please: Every single log line should have a severity attached. That&rsquo;s
just basic, good engineering.</p>
<p>One last thing to consider: How to find out whether a log line slipped through
unparsed? For this, I&rsquo;m employing a <a href="https://docs.fluentd.org/output/rewrite_tag_filter">rewrite_tag_filter</a>:</p>
<pre tabindex="0"><code>&lt;match services.**&gt;
  @type rewrite_tag_filter
  &lt;rule&gt;
    key log
    pattern /^.+$/
    tag unparsed.${tag}
  &lt;/rule&gt;
  &lt;rule&gt;
    key log
    pattern /^.+$/
    tag parsed.${tag}
    invert true
  &lt;/rule&gt;
&lt;/match&gt;
</code></pre><p>This filter comes after all the different <code>parser</code> filters. So when, at this
point, a log record still has the <code>log</code> key, it means none of the <code>parser</code>
filters was applied to it. These then get <code>unparsed</code> added to their tag.
There is one important point about <code>rewrite_tag_filter</code>: It re-emits the log
record. So the record runs through the entire chain of filters again. This
can easily lead to an endless chain. Imagine I only added the <code>unparsed</code> tag
to unparsed logs, but left parsed logs completely untouched, instead of
prepending them with the <code>parsed</code> tag. Then, the parsed tags would start the
filter chain again, completely unchanged - and run into this <code>rewrite_tag_filter</code>
again! And then again. And again. Producing a nice endless loop. So when using
a <code>rewrite_tag_filter</code>, always remember to make sure that you change the tags
on <em>all</em> log lines which might hit them.</p>
<p>These <code>unparsed</code> records then go into a <a href="https://docs.fluentd.org/filter/record_transformer">record_transformer</a>
filter:</p>
<pre tabindex="0"><code>  &lt;filter unparsed.**&gt;
    @type record_transformer
    enable_ruby true
    renew_record true
    &lt;record&gt;
      unparsed-log ${record}
      namespace_name hl-dummy
      container_name hl-unparsed-logs
      fluentd-tag ${tag}
      level warn
    &lt;/record&gt;
  &lt;/filter&gt;
</code></pre><p>Here I&rsquo;m putting the entire unparsed log into a subkey <code>unparsed-log</code>. This
also shows off the flexibility of FluentD in manipulating log records. I&rsquo;m also
adding a hardcoded namespace and container name. As those two keys are later
used by Loki to index logs. I&rsquo;m also using them in Grafana for filtering.
And so with this setup, I just need to look at the <code>hl-dummy/hl-unparsed-logs</code>
logs to see whether there&rsquo;s some new logs, or whether perhaps an older regex
needs adaption to a newer format.</p>
<p>And finally, just for completeness&rsquo; sake, the output to Loki:</p>
<pre tabindex="0"><code>&lt;match {parsed,unparsed}.services.**&gt;
    @type loki
    url &#34;http://loki.loki.svc.cluster.local:3100&#34;
    extra_labels {&#34;job&#34;:&#34;k8s-logs&#34;}
    &lt;label&gt;
        namespace $[&#34;namespace_name&#34;]
        container $[&#34;container_name&#34;]
    &lt;/label&gt;
    &lt;buffer namespace_name,container_name&gt;
        path /fluentd/log/buffers/loki.k8s.*.buffer
        @include loki-buffers.conf
    &lt;/buffer&gt;
&lt;/match&gt;
</code></pre><p>The <code>loki-buffers.conf</code> file has the same content as in the syslog example
above.</p>
<h1 id="conclusion">Conclusion</h1>
<p>And that&rsquo;s it! My logging stack, completely migrated to k8s. This was the first
service I actually migrated, and where I could remove jobs from Nomad, namely
the FluentD and Loki jobs. It is now only running its own Fluentbit job, which
transfers the logs to the FluentD instance running on k8s.</p>
<p>One thing I&rsquo;ve noted for the future: Start actively monitoring the logs.
I had a slightly panicky Sunday morning due to some malicious bash code showing
up in my logs when I went to check whether everything had worked.
<a href="https://blog.mei-home.net/posts/sunday-morning-panic/">Read all about it.</a>
I just so happened to see that log because it happened to be at the very top
when I opened my logging dashboard. Otherwise, I would have completely overlooked
it.</p>
<p>The last thing to do for this migration is to have a look at the unparsed logs
in a week or so and make sure I haven&rsquo;t forgotten anything.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 2b: Asymmetric Routing</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-2b-asymmetric-routing/</link>
      <pubDate>Sun, 04 Feb 2024 23:00:00 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-2b-asymmetric-routing/</guid>
      <description>Problems with Cilium BGP and the OPNsense firewall</description>
      <content:encoded><![CDATA[<p>Wherein I ran into some problems with the Cilium BGP routing and firewalls on
my OPNsense box.</p>
<p>This is the second addendum for Cilium load balancing in my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>While working on my <a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/">S3 bucket migration</a>,
I ran into several rather weird problems. After switching my internal wiki over to
using the Ceph RGW S3 from my k8s Ceph Rook cluster, I found that the final
upload of the generated site to the S3 bucket from which it was served did not work, even though I had all the necessary
firewall rules configured. The output I was getting looked like this:</p>
<pre tabindex="0"><code>WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 3 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 6 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 9 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 12 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 15 sec...
ERROR: S3 Temporary Error: Request failed for: /.  Please try again later.
</code></pre><p>I initially thought that something was wrong with the Rook setup here, but this
didn&rsquo;t seem to be the case - uploading something to a test bucket from my C&amp;C
host worked fine. Same for uploads from my workstation.
Before going on, let me show you a small networking diagram:
<figure>
    <img loading="lazy" src="packet-flow.svg"
         alt="A network diagram. It shows three host. The first one has the name &#39;k8s host 1&#39;. This host has an internal interface labeled with &#39;Ceph S3 address: 10.86.55.100&#39;, with an external interface &#39;Host IP: 10.86.5.xx&#39;. The second host is labeled &#39;Nomad Host: Runs CI&#39;, with a single interface labeled &#39;Host IP: 10.86.5.xx&#39;, same as the previous host. Finally, there is the &#39;OPNsense Firewall&#39; host. It shows a single interface labeled &#39;Homelab VLAN interface&#39;. There are arrows going from the &#39;CI host&#39; to the firewall, on to the k8s host&#39;s host interface and finally into the S3 interface. In the other direction goes a pair of red arrows, out of the S3 interface, into the hosts external interface and from there directly into the Nomad host&#39;s external interface - without a detour via the router."/> <figcaption>
            <p>Perfect example of asymmetric routing.</p>
        </figcaption>
</figure>
</p>
<p>This is a picture-perfect example of asymmetric routing. The S3 service is
announced via a LoadBalancer service and Cilium&rsquo;s BGP functionality. All my
LoadBalancer services are in a separate subnet from the hosts themselves.
So to reach the S3 service, all packets need to go through the OPNsense box.</p>
<p>This is obviously not ideal, as now the uplink to the router&rsquo;s interface becomes
a bottleneck. This could in theory be fixed by using L2 announcements instead,
but those put a pretty high load on the k8s control plane nodes, through using
k8s leases. And the load scales with the amount of hosts in the k8s cluster.</p>
<p>But in this particular case, the problem is asymmetric routing. The Nomad host
running the CI jobs trying to access the S3 buckets will use the LoadBalancer
IP, accessing the Ceph RGW through my Traefik ingress. This IP is in a different
subnet than the hosts, and hence the packets go through the default gateway,
which is my OPNsense box. There, they are routed to the next hop, which is the
k8s node currently running my ingress. From there, they&rsquo;re finally routed
internally to the Ceph RGW pod.</p>
<p>But on the return path for the response packets, they go directly from the
host running the RGW pod to the host running the CI job. This is due to the fact
that both hosts are in the same subnet.</p>
<p>The first consequence of this is the need to change the firewall rules for
accessing the Traefik ingress LoadBalancer service IP from the Homelab.
Initially, my rule used the default state tracking setting. But in this case,
that does not work. The firewall will see the initial TCP SYN packet coming
from the CI job host, but it won&rsquo;t see the SYN and ACK from the ingress,
because those are send directly from host to host, not via the router.
Seeing only one side of the connection, the firewall still blocks subsequent
packets.</p>
<p>The solution to this is to change the way OPNsense tracks connections for
the specific rule allowing access to the ingress from the Homelab VLAN. This can
be done in the rule&rsquo;s options, under &ldquo;Advanced features&rdquo;:
<figure>
    <img loading="lazy" src="sloppy.png"
         alt="A screenshot of part of the OPNsense firewall rule configuration. A dropdown besides the label &#39;State Type&#39; is visible, with the entry &#39;sloppy state&#39; selected."/> <figcaption>
            <p>The state type for rules which concern asymmetric routing needs to be set to &lsquo;sloppy state&rsquo;</p>
        </figcaption>
</figure>
</p>
<p>That was this problem fixed, and the upload started working - but incredibly
slowly. I got long phases with no transmission at all and some retries. It
looked like this:</p>
<pre tabindex="0"><code>upload: &#39;./public/404.html&#39; -&gt; &#39;s3://wiki/404.html&#39;  [1 of 97]


 9287 of 9287   100% in    0s     3.35 MB/s
 9287 of 9287   100% in    0s    33.80 KB/s  done

upload: &#39;./public/categories/index.html&#39; -&gt; &#39;s3://wiki/categories/index.html&#39;  [2 of 97]


 34120 of 34120   100% in    0s     6.75 MB/s
 34120 of 34120   100% in    0s   304.53 KB/s  done

upload: &#39;./public/ceph/index.html&#39; -&gt; &#39;s3://wiki/ceph/index.html&#39;  [3 of 97]


 34195 of 34195   100% in    0s     5.34 MB/s
 34195 of 34195   100% in    0s     5.34 MB/s  failed

WARNING: Upload failed: /ceph/index.html (The read operation timed out)

WARNING: Waiting 3 sec...

upload: &#39;./public/ceph/index.html&#39; -&gt; &#39;s3://wiki/ceph/index.html&#39;  [3 of 97]


 34195 of 34195   100% in    0s   769.35 KB/s
 34195 of 34195   100% in    0s   101.93 KB/s  done

upload: &#39;./public/ceph/index.xml&#39; -&gt; &#39;s3://wiki/ceph/index.xml&#39;  [4 of 97]
</code></pre><p>The pattern here seemed to be: Initially, the uploads work for a very short while,
and then they stop working. And at some later point, the transmission works
again.</p>
<h2 id="setting-up-an-iperf3-pod">Setting up an iperf3 Pod</h2>
<p>I wasn&rsquo;t able to make anything of the log output, so I build myself a test setup with
an iperf3 pod in the k8s cluster, made available via a LoadBalancer service
similar to how my ingress is made available.</p>
<p>As the basis, I&rsquo;m using the <a href="https://github.com/wbitt/Network-MultiTool">network-multitool</a>
container, in the <code>:extra</code> variant. I&rsquo;m launching the Pod via this Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">wbitt/network-multitool:extra</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;iperf3&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-p&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;55343&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-s&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">iperf-port</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">55343</span>
</span></span></code></pre></div><p>By default, the container runs a simple webserver. I&rsquo;m changing that here to
running an iperf3 instance in server mode.
In addition, I&rsquo;ve created the following Service to make the iperf3 server
externally available:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">iperf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">iperf-k8s.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#ae81ff">10.86.55.12</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">iperf-port</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">55343</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">55343</span>
</span></span></code></pre></div><p>Once that service is created, Cilium will announce a route to the IP <code>10.86.55.12</code>
with the k8s node currently running the iperf3 Pod as the next hop. This
route will be used by my OPNsense box. As the <code>10.86.55.0/24</code> subnet is not the
same as my Homelab VLAN&rsquo;s subnet, all traffic to the iperf3 instance will go
through the OPNsense box.</p>
<p>I started with a simple test, running from a different host inside the Homelab
VLAN.
<figure>
    <img loading="lazy" src="iperf3-homelab-net.png"
         alt="A screenshot of an iperf3 session. The session ran for a total of 60 seconds. During the first 5 seconds, there was a bit of traffic, with a bitrate of 519 kbit/s and a total of 317 kB transferred. Then, for the time from 5s to 50s, absolutely no traffic happened. Both transfers and bitrates are zero. At 50 - 55 seconds, a transfer of 91 MB at a bitrate of 153 Mbit/s is registered. In the final interval, 55 seconds to 60 seconds, 538 MB are transferred with a bitrate of 903 Mbit/s. The final tally over the whole 60 seconds is a bitrate of 88 Mbit/s and a total amount transferred of 639 MB."/> <figcaption>
            <p>Transfer from a Homelab host as the iperf3 client.</p>
        </figcaption>
</figure>
</p>
<p>This test showed a somewhat similar behavior. Initially, the transmission works,
but then it just stops. For about 45 seconds, in this case. And then, rather
suddenly, the transmission starts up again. At least I had the ability to
repeat the problem at will now.
I conducted a separate test, this one from my workstation, which is in a separate
VLAN:
<figure>
    <img loading="lazy" src="iperf3-desktop.png"
         alt="A screenshot of an iperf3 session. The session ran for a total of 60 seconds. This one shows relatively consistent bitrates of about 920 Mbit/s and around 540 MB transferred per 5 second internal. In total, 6.36 GB were transferred, at an average rate of 911 Mbit/s."/> <figcaption>
            <p>Same iperf3 server in a k8s Pod, but with my workstation, from my management VLAN, showing the expected (almost) line speed.</p>
        </figcaption>
</figure>
</p>
<p>So, from the management VLAN, I get full line speed, and no weird gaps in the
transmission. There are two differences here: First, the Homelab VLAN transmission
happened over the same VLAN, with the node hosting the iperf3 pod being in the
same subnet as the client. The second difference, and as it turns out, the more
relevant one, is that the management VLAN has very few firewall rules, while
the Homelab is nailed pretty much shut.</p>
<p>So I went investigating some more. As seems to be the case way too often in this
Kubernetes migration, it was Wireshark o&rsquo;clock again.</p>
<p>The initial packet capture, from both the iperf3 pod and the client on another
host, showed exactly what I was expecting, there was just a big, about 45 seconds
long hole in the traffic I couldn&rsquo;t explain, before the traffic started up again.</p>
<p>I also gathered some data on my router, specifically on the interface
which leads to my Homelab. And it showed something interesting.</p>
<p>First, here is the initial, successful transmission:</p>
<pre tabindex="0"><code>113	0.058354	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=62966 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863917
</code></pre><p>Then, and this is the important point, comes a second transmission, seemingly
of the same packet, which is marked by Wireshark as a TCP re-transmission:</p>
<pre tabindex="0"><code>114	0.058357	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=62966 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863917
</code></pre><p>What this actually is becomes obvious when looking at the L2 source MAC address.
The first packet is arriving from the MAC that belongs to the Pi I was running
the iperf3 client on, with the target MAC being the interface for Homelab VLAN
traffic on my OPNsense box.
The second packet, though, was send out with the MAC of the OPNsense box, with
the MAC of the next hop host of the LoadBalancer IP as a target.</p>
<p>And now comes the interesting part. After some successful transmissions, the
following happens:</p>
<pre tabindex="0"><code>158	0.059077	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=121790 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863918
[...]
165	0.059217	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=131366 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863918
[...]
175	54.149140	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=65702 Ack=1 Win=64256 Len=1368 TSval=115565850 TSecr=1303863918
</code></pre><p>All of these packets, right up to the last one at sequence number 175, are coming
from the Raspberry Pi serving as a client. At the same time, I&rsquo;m not seeing any
packets at all coming out of the OPNsense box, like the second packet from the
previous sequence. This looks like the firewall just blackholes the packets,
or as if routing temporarily fails.
And then it starts working again, without me actually doing anything:</p>
<pre tabindex="0"><code>176	54.149156	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=65702 Ack=1 Win=64256 Len=1368 TSval=115565850 TSecr=1303863918
177	54.150365	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=135470 Ack=1 Win=64256 Len=1368 TSval=115565851 TSecr=1303918009
178	54.150400	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=135470 Ack=1 Win=64256 Len=1368 TSval=115565851 TSecr=1303918009
</code></pre><p>Here the behavior is the same as in the beginning: The packet arrives with the Pi
as the source MAC and then leaves again with the firewall&rsquo;s MAC as the source.</p>
<p>And I&rsquo;m still not sure what this is all about - the suspicious, about 45 second
interval where no packets are routed.
So I will call my solution a workaround, and not a fix - because I might have
just fought a symptom, instead of the root problem.</p>
<p>The fix was to create another firewall rule, allowing access from the Homelab
VLAN to the IP of the iperf LoadBalancer. This must sound weird. But in my initial configuration,
I only allowed the Homelab access to specific other machines on specific ports,
and I normally only have inbound rules for most stuff besides the IoT VLAN.</p>
<p>What I did in OPNsense was to create an OUT rule, allowing access from the
Homelab VLAN to the IP of the iperf LoadBalancer service. And all of a sudden,
it all started working.</p>
<p>What&rsquo;s annoying me is that I have no explanation at all for this kind of behavior.
I mean sure, I think I understand why the firewall would block the packet when
it tries to leave the OPNsense box in the direction of my Homelab. But - why
does the iperf transmission start to work, all of a sudden? And why does it
work at the very beginning of the transmission? That&rsquo;s what I don&rsquo;t get. If the
missing firewall rule was the root cause, shouldn&rsquo;t it not work at all - instead
of just not work for 45 seconds in the middle of a connection?</p>
<p>And I&rsquo;ve also tried longer transmissions, e.g. 2 minutes instead of one. And
here I saw the same pattern. First the couple of successful packets, then a
45 second hole, and then it worked for the entire remaining 1 minute of the
test.</p>
<p>If any of my readers has any idea what&rsquo;s going on here, why I need the
firewall rule, and why only some part of the iperf transmission was blocked,
I would be very happy to hear about it on <a href="https://social.mei-home.net/@mmeier">Mastodon</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>To summarize, when setting up Cilium BGP within a pretty restricted OPNsense
firewall environment, check whether you&rsquo;ve got asymmetric routing going on.
If so, set the state tracking for the rule allowing access to the LoadBalancer
IP to <code>sloppy state</code>. In addition, add an outgoing rule on the VLAN of the
next hop advertised in the route to the LoadBalancer IP to make sure packets
don&rsquo;t get randomly dropped.</p>
<p>Finally, I&rsquo;m sadly still not sure what exactly is going on here.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 5: Non-service S3 Buckets</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/</link>
      <pubDate>Thu, 25 Jan 2024 20:50:41 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/</guid>
      <description>Migrating non-service S3 buckets over to the Ceph Rook cluster</description>
      <content:encoded><![CDATA[<p>Wherein I document how I migrated some S3 buckets over to the Ceph Rook cluster
and with that, made it load-bearing.</p>
<p>This is part six of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>So why write a post about migrating S3 buckets, and why do it at this point of
the Nomad -&gt; k8s migration? In short, it just fit in here very well. I already
planned to make Ceph Rook one of the first services to set up anyway. And then
the logical next step is to have a look at what I can then migrate over without
any other dependencies. And the answer to that was: Some <em>non-service</em> S3 buckets.
With &ldquo;non-service&rdquo; I mean those buckets which are not directly tied to specific
services running on the cluster, like Mastodon&rsquo;s media files bucket or Loki&rsquo;s
log storage bucket. Those I will migrate over with their respective services.</p>
<p>Instead, the buckets I&rsquo;m migrating over are things like my blog and wiki buckets.
Those run on Hugo and have been served directly by my Traefik proxy from S3
buckets. So with the <a href="https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/">previous Traefik ingress setup</a>
and Ceph Rook being set up, I had all the dependencies in place.</p>
<p>The final reason to do it right now is that I wanted to make the cluster
<em>load-bearing</em> ASAP. A little bit of that was to prevent myself from getting
into <em>too much</em> experimentation. Let&rsquo;s see whether that is going to pan out. &#x1f605;</p>
<h1 id="previous-setup-and-advantages-of-the-new-one">Previous setup and advantages of the new one</h1>
<p>Before getting into the S3 bucket setup with Ceph Rook and Ansible, let me
talk briefly about how the current setup on my baremetal Ceph cluster worked.</p>
<p>In one word: <em>Manually</em></p>
<p>So what&rsquo;s needed to create an S3 bucket and a new user and to configure that
bucket, manually?</p>
<p>Let&rsquo;s start with the user creation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>radosgw-admin user create --uid<span style="color:#f92672">=</span>USERNAME --display-name <span style="color:#e6db74">&#34;Description here&#34;</span>
</span></span></code></pre></div><p>This will output the new user&rsquo;s access ID and secret key. To make the credentials
usable by Nomad jobs, they also need to be written into Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span> vault kv put secret/some/path id<span style="color:#f92672">=</span>ID key<span style="color:#f92672">=</span>-
</span></span></code></pre></div><p>This would plop up a prompt to enter the secret key. As my internal docs say:</p>
<blockquote>
<p>NOTE THE SPACE AT THE BEGINNING OF THE LINE!
That&rsquo;s to prevent even the access ID from finding its way into your history.</p></blockquote>
<p>I&rsquo;d then use <a href="https://min.io/docs/minio/linux/reference/minio-mc.html">MinIO&rsquo;s S3 client</a>
to create the bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc alias set s3-SERVICENAME https://s3.example.com
</span></span><span style="display:flex;"><span>./mc mb s3-alias/BUCKETNAME
</span></span></code></pre></div><p>I&rsquo;m using the MinIO client mostly because I like the interface, although I don&rsquo;t
use MinIO itself.</p>
<p>That creates a bucket which can only be accessed with the previously created
credentials.
To provide a full bucket policy, I&rsquo;ve got to switch to a different command,
namely <a href="https://s3tools.org/s3cmd">s3cmd</a>, as MinIO does not support bucket
policies.</p>
<p>So I&rsquo;m then putting the credentials in a second place, for use with <code>s3cmd</code>,
and then create a JSON file for the policy to finally upload it with a command
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd -c .s3cmd.conf setpolicy policy.json s3://BUCKETNAME
</span></span></code></pre></div><p>All of the previous commands need to be entered in the right order and the right
format, and for the right bucket with the right credentials. Lots of places for
user error.</p>
<p>And that&rsquo;s the main thing I&rsquo;m gaining from the new approach with Ceph Rook and
Ansible: Declarative creation of users, buckets and policies. This has the added
bonus of finally being able to version-control the S3 bucket setup.</p>
<h1 id="creating-users-buckets-and-policies-declaratively">Creating users, buckets and policies declaratively</h1>
<p>There are broadly three pieces to creating a bucket with my new approach:</p>
<ol>
<li>Create the S3 user in Ceph</li>
<li>Write the credentials into Vault</li>
<li>Use those credentials in Ansible to create the bucket and set policies</li>
</ol>
<p>Before I continue, there&rsquo;s one important note: Rook has an <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">Object Bucket Claim</a>.
This CRD can be used to create buckets together with S3 credentials for that
bucket in the form of a Secret. I will use this CRD later on, when I&rsquo;m migrating
actual services, to create their individual S3 buckets. And this is exactly
what those bucket claims are intended for. But for the buckets I&rsquo;m migrating here,
I need access to them outside Kubernetes, and I need to do things like
setting bucket policies to allow access for multiple users. The object bucket
claim can do neither of those things.
So using OBCs would defeat the purpose of creating everything declaratively.</p>
<p>Also worth mentioning is <a href="https://container-object-storage-interface.github.io/">COSI</a>,
the <em>Container Object Storage Interface</em>. This is similar to the CSI, a
provider-agnostic way to provide object storage buckets. But it&rsquo;s currently still
experimental, both in Kubernetes and in Rook.</p>
<p>With that out of the way, let&rsquo;s create an S3 user in Rook. This is done with
the <a href="https://rook.io/docs/rook/latest-release/CRDs/Object-Storage/ceph-object-store-user-crd/">CephObjectStoreUser</a> CRD. It might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;A user for demo purposes&#34;</span>
</span></span></code></pre></div><p>When applying this manifest, Rook will create a user named <code>my-user</code> and
automatically create a Secret with the user&rsquo;s credentials. This secret will
be stored in the given namespace, <code>rook-cluster</code> in this case. Note that by
default, Rook only allows creation of <code>CephObjectStoreUser</code> objects in the
namespace of the Rook operator itself. This can be overwritten during creation
of the <code>CephObjectStore</code> in the cluster Helm chart, but it seems to be a prudent
measure to only allow those who can write into the cluster namespace to
actually create users.</p>
<p>The name of the secret for the example above will be <code>rook-ceph-object-user-rgw-bulk-my-user</code>.
The first part is the Rook operator namespace name (note that this is <em>not</em> the
cluster namespace necessarily, but the <em>operator&rsquo;s</em> NS). Then follows the string
<code>ceph-object-user</code> followed by the name of the <code>CephObjectStore</code> the user is
going to be created in. The last part is the username itself.</p>
<p>The secret will have the following <code>data:</code> section:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">AccessKey</span>: <span style="color:#ae81ff">ABCDE</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">Endpoint</span>: <span style="color:#ae81ff">s3.example.com:4711</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">SecretKey</span>: <span style="color:#ae81ff">FGHIJ</span>
</span></span></code></pre></div><p>So it will contain all the info necessary. Also always remember that data is
encoded in base64. So when extracting the credentials for use in other apps,
always push the strings through <code>base64 --decode</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n rook-cluster secrets rook-ceph-object-user-rgw-bulk-my-user --template<span style="color:#f92672">={{</span>.data.AccessKey<span style="color:#f92672">}}</span> | base64 -d
</span></span><span style="display:flex;"><span>kubectl get -n rook-cluster secrets rook-ceph-object-user-rgw-bulk-my-user --template<span style="color:#f92672">={{</span>.data.SecretKey<span style="color:#f92672">}}</span> | base64 -d
</span></span></code></pre></div><p>But of course, a declarative setup isn&rsquo;t worth very much when you have to now
manually push the credentials to Vault as in my old workflow. Instead, I will
be using external-secret&rsquo;s <a href="https://external-secrets.io/latest/api/pushsecret/">PushSecret</a>.
PushSecret&rsquo;s allow me to push Secrets from Kubernetes to a provider, in a reversal
of the ExternalSecret. In this instance, I&rsquo;m using them to push the S3 credentials
created by Rook to my Vault instance, for use in Ansible for the bucket creation.</p>
<p>The first step needed is to update the Vault policy used by the Vault AppRole
used by external-secrets to allow it to not only read, but also write secrets:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;secret/my_kubernetes_secrets/cluster/s3/users/*&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;read&#34;, &#34;create&#34;, &#34;update&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This allows the AppRole to push secrets, but only to a specific path.</p>
<p>The PushSecret itself then looks like this, again using the credentials of the
previously created user as an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PushSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">s3-my-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deletionPolicy</span>: <span style="color:#ae81ff">Delete</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#ae81ff">30m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRefs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>:  <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-my-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/s3/users/my-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">access</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/s3/users/my-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>Here again, for security reasons, the PushSecret needs to be in the same
namespace as the Secret it is pushing out to the provider. The <code>deletionPolicy</code>
defines what happens when the PushSecret is deleted. With <code>Delete</code>, the secret
in the secret store will also be removed. With <code>Retain</code>, the secret will be kept.</p>
<p>The <code>selector</code> selects the secret to be pushed, while <code>data:</code> defines what
actually gets pushed. With the config here and considering the Secret format
for S3 credentials created by Rook I showed above, the secret in Vault would
have the following format:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;request_id&#34;</span>: <span style="color:#e6db74">&#34;foo&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_id&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_duration&#34;</span>: <span style="color:#ae81ff">2764800</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;renewable&#34;</span>: <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;access&#34;</span>: <span style="color:#e6db74">&#34;ABCDE&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;custom_metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;managed-by&#34;</span>: <span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;secret&#34;</span>: <span style="color:#e6db74">&#34;FGHIJ&#34;</span>
</span></span><span style="display:flex;"><span>  },
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;warnings&#34;</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I&rsquo;m not pushing the <code>Endpoint</code> from the original secret, as that&rsquo;s not going to
change.</p>
<p>And this is part two done, the S3 credentials are now available to Ansible via
Vault. Now the final part, actually creating the buckets.</p>
<p>I&rsquo;m using Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/amazon/aws/s3_bucket_module.html">s3_bucket</a>
module to create my buckets. Compared to using Rook&rsquo;s OBC, this also allows me
add a policy. Here is an example play:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">command_and_control_host</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Play for creating the my-bucket bucket</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">example</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/my_kubernetes_secrets/cluster/s3/users/my-user:access token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/my_kubernetes_secrets/cluster/s3/users/my-user:secret token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create my-bucket bucket</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">example</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">amazon.aws.s3_bucket</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">access_key</span>: <span style="color:#e6db74">&#34;{{ s3_access }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret_key</span>: <span style="color:#e6db74">&#34;{{ s3_secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ceph</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpoint_url</span>: <span style="color:#ae81ff">https://s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">policy</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;file&#39;,&#39;bucket-policies/my-bucket.json&#39;) }}&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m reading the access ID and secret key for S3 access from Vault into Ansible
variables because I&rsquo;ve got a single &ldquo;s3-buckets&rdquo; playbook, creating different
buckets with different users, so using the <code>AWS_*</code> env variables doesn&rsquo;t work.
The example will create a bucket with the credentials of the <code>my-user</code> user,
called <code>my-bucket</code> on the Ceph S3 server reachable via <code>s3.example.com</code>.
The <code>policy</code> option only accepts a JSON string, not a filename, hence the use
of the <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/file_lookup.html">file lookup</a>.
A policy for a bucket with public read, like the ones I&rsquo;m using for my docs,
would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;arn:aws:s3:::my-bucket&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;*&#34;</span>
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;arn:aws:s3:::my-bucket/*&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;*&#34;</span>
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>So that&rsquo;s it. With three steps, I&rsquo;ve created a bucket with a policy, and all of
it is under version control. I only need to remember the following three
commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e"># Deploy the User manifest to k8s</span>
</span></span><span style="display:flex;"><span>kubectl apply -f my-user-manifest.yaml
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Deploy the PushSecret manifest to k8s</span>
</span></span><span style="display:flex;"><span>kubectl apply -f my-push-secret-manifest.yaml
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Run the Ansible playbook</span>
</span></span><span style="display:flex;"><span>ansible-playbook s3-buckets.yaml
</span></span></code></pre></div><p>Besides some filenames, they&rsquo;re going to be the same regardless of which bucket
I&rsquo;m creating. Way nicer than having to remember the <code>radosgw-admin</code>, <code>mc</code> MinIO
client, <code>vault</code> and <code>s3cmd</code> incantations I showed in the previous section.</p>
<h1 id="migrating-backup-buckets">Migrating backup buckets</h1>
<p>So let&rsquo;s get to actually migrating some buckets. The first set I worked on were
the S3 buckets for my backups. I will keep the description of the actual backup
procedure short - first, this isn&rsquo;t an article about backups, and second, mine
has so many warts that I&rsquo;m a bit embarrassed. &#x1f609;</p>
<p>My backups currently have two stages. First, I&rsquo;m using <a href="https://restic.net/">restic</a>
to back up the volumes of all of my services. There&rsquo;s one bucket per service,
and the backup runs nightly. In addition, I&rsquo;m backing up my <code>/home</code> on my desktop
and laptop. The second stage is backing up all of those buckets onto an external
HDD connected to one of my nodes, using <a href="https://rclone.org/">rclone</a>.</p>
<p>The only &ldquo;special&rdquo; thing about these backup buckets is that they need access for
more than one user. There&rsquo;s the restic backup user running the per-service
backups. This user needs read and write access to every bucket. Then there&rsquo;s the
external backup user, which only needs read access to the backups.
The S3 bucket policy for those buckets looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:DeleteObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:PutObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/service-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/external-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This policy is of course specific to my setup with restic and rclone. Other
S3-capable backup tools might need additional or fewer permissions on the
buckets.</p>
<p>I then just copied the buckets from the old cluster to the new cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc cp -a --recursive old-cluster-alias/my-bucket/ new-cluster-alias/my-bucket/
</span></span></code></pre></div><p>I will show a couple of metrics on the transfer speeds and so on in the later
<a href="#metrics">Metrics section</a>.</p>
<h2 id="problems-with-mismatches-between-files-and-their-hashes">Problems with mismatches between files and their hashes</h2>
<p>During the migration of my backup buckets, I hit a pretty frustrating problem
which cost me a lot of time analyze. During the copying of the buckets with
<code>mc</code> as well as during the initial services backup run with restic, everything
looked fine.
Then I migrated over the external disk backup, and rclone suddenly started
throwing errors like this:</p>
<pre tabindex="0"><code>&#34;Failed to sync with 4 errors: last error was: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Attempt 3/3 failed with 4 errors and: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting directories as there were IO errors&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting files as there were IO errors&#34;
&#34;ERROR : data/9e/9ea8a2f41ef73cb02ea0c4076c907210f814c26d92d22a5e59fafa1821c1f356.xabetij7.partial: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : data/51/519f791addd43bbb94b9edc9b0bf1bb7608a0736fd76b97fa126867b7aa5acc2.homotep1.partial: corrupted on transfer: md5 hash differ \&#34;0508dafb993525a6579d84cb8172c954\&#34; vs \&#34;24533a2effbf7d84b799d811f14e1dd3\&#34;&#34;
&#34;ERROR : data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93.gusidof8.partial: corrupted on transfer: md5 hash differ \&#34;647135d3de83dd64e398026c6cc8a1dd\&#34; vs \&#34;9eb1ded016386b727c35b29df58afe80\&#34;&#34;
&#34;ERROR : data/dd/dd6c50ae7e5b26aede0726128dc2d0f113dc896ffcabadd78b0e44bdd48226f8.hagoqic7.partial: corrupted on transfer: md5 hash differ \&#34;778f53d5a2987ebc060a9fce0b476613\&#34; vs \&#34;a7b482f2a64b634558587dc3f3518c39\&#34;&#34;
&#34;ERROR : Attempt 2/3 failed with 4 errors and: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting directories as there were IO errors&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting files as there were IO errors&#34;
&#34;ERROR : data/9e/9ea8a2f41ef73cb02ea0c4076c907210f814c26d92d22a5e59fafa1821c1f356.rewijes9.partial: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : data/dd/dd6c50ae7e5b26aede0726128dc2d0f113dc896ffcabadd78b0e44bdd48226f8.sixuwuf0.partial: corrupted on transfer: md5 hash differ \&#34;778f53d5a2987ebc060a9fce0b476613\&#34; vs \&#34;a7b482f2a64b634558587dc3f3518c39\&#34;&#34;
&#34;ERROR : data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93.filorut1.partial: corrupted on transfer: md5 hash differ \&#34;647135d3de83dd64e398026c6cc8a1dd\&#34; vs \&#34;9eb1ded016386b727c35b29df58afe80\&#34;&#34;
&#34;ERROR : data/51/519f791addd43bbb94b9edc9b0bf1bb7608a0736fd76b97fa126867b7aa5acc2.doyahiy9.partial: corrupted on transfer: md5 hash differ \&#34;0508dafb993525a6579d84cb8172c954\&#34; vs \&#34;24533a2effbf7d84b799d811f14e1dd3\&#34;&#34;
&#34;ERROR : Attempt 1/3 failed with 4 errors and: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting directories as there were IO errors&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting files as there were IO errors&#34;
&#34;ERROR : data/9e/9ea8a2f41ef73cb02ea0c4076c907210f814c26d92d22a5e59fafa1821c1f356.sosepip8.partial: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : data/dd/dd6c50ae7e5b26aede0726128dc2d0f113dc896ffcabadd78b0e44bdd48226f8.dalatiy1.partial: corrupted on transfer: md5 hash differ \&#34;778f53d5a2987ebc060a9fce0b476613\&#34; vs \&#34;a7b482f2a64b634558587dc3f3518c39\&#34;&#34;
&#34;ERROR : data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93.midizaw6.partial: corrupted on transfer: md5 hash differ \&#34;647135d3de83dd64e398026c6cc8a1dd\&#34; vs \&#34;9eb1ded016386b727c35b29df58afe80\&#34;&#34;
&#34;ERROR : data/51/519f791addd43bbb94b9edc9b0bf1bb7608a0736fd76b97fa126867b7aa5acc2.vonupuf0.partial: corrupted on transfer: md5 hash differ \&#34;0508dafb993525a6579d84cb8172c954\&#34; vs \&#34;24533a2effbf7d84b799d811f14e1dd3\&#34;&#34;
&#34;NOTICE: data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93: Not decompressing &#39;Content-Encoding: gzip&#39; compressed file. Use --s3-decompress to override&#34;
</code></pre><p>The first thing to note here is that the error did not appear for every file, nor
did these errors show up for every bucket. The above example comes from a very
small 350KB bucket with 15 files total. I never saw this same error for my 50GB
<code>/home</code> backup bucket.</p>
<p>After some false starts, I was at least able to verify that the error was right,
the MD5 sum (also called &ldquo;ETag&rdquo; e.g. in the <code>mc stat</code> output) did not fit the
file. I had no idea what&rsquo;s going wrong. My next test was to create a completely
new copy of one of the buckets, without running a service backup job against it,
to see whether it was restic that corrupted the bucket. But the errors showed
up immediately after syncing. I was also able to reproduce them by doing a local
<code>rclone sync new-ceph-alias:backup-mail</code> on my desktop, so it wasn&rsquo;t some weird
quirk of my backup jobs either.</p>
<p>For checking the checksum of a file in an S3 bucket, I used s3cmd like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd -c ~/.s3cmd-conf-file info s3://backup-mail/filename
</span></span></code></pre></div><p>The output might look like this:</p>
<pre tabindex="0"><code>s3://backup-mail/filename (object):
   File size: 2512
   Last mod:  Wed, 17 Jan 2024 22:31:15 GMT
   MIME type: application/octet-stream
   Storage:   STANDARD
   MD5 sum:   9eacc1551b0e80f38f77443aa33dc0d1
   SSE:       none
</code></pre><p>That was about the point where I got really nervous - were my backups corrupted
without me noticing? So I ran restic&rsquo;s <code>check</code> command on both, the new and
the old buckets:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>restic repair index -r s3:s3.example.com/backup-mail
</span></span></code></pre></div><p>This command came back with &ldquo;no errors&rdquo; on both the old and the new buckets.</p>
<p>I also got pointed in the completely wrong direction once, because I called this
command on one of the problematic files:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rclone check --download rooks3:backup-mail/my-dir/ cephs3:backup-mail/data/dd/
</span></span><span style="display:flex;"><span>2024/01/18 22:40:53 NOTICE: S3 bucket backup-mail path my-dir: <span style="color:#ae81ff">0</span> differences found
</span></span><span style="display:flex;"><span>2024/01/18 22:40:53 NOTICE: S3 bucket backup-mail path my-dir: <span style="color:#ae81ff">1</span> matching files
</span></span></code></pre></div><p>So, the files were supposedly matching. For now, I was
convinced that the files themselves were perfectly fine, and there was just
something wrong with the MD5 sums.
After some further digging, I found out that restic uses the MinIO client lib
as their S3 backend. And I had also used MinIO&rsquo;s <code>mc</code> client to do the
bucket-to-bucket copying. So I thought: Okay, there&rsquo;s definitely a bug in
MinIO&rsquo;s client lib! Hurray, progress! I was able to confirm this by using
<code>rclone sync</code> to do the bucket-to-bucket copy, and subsequent <code>rclone sync</code> to
local did not fail. But then I got the same <code>rclone sync</code> errors again after
I had run the first restic backup against the new buckets.</p>
<p>This in turn lead me to believe that there was something wrong with restic. But
I couldn&rsquo;t find anything at all on the Internet. It seemed I was the only person
seeing this error. I then updated all my restics, rclones and <code>mc</code>s to the newest
versions.</p>
<p>No dice, still the same error. That was when I started doubting the Ceph Rook
setup and questioning the entire Kubernetes migration.</p>
<p>And then, I a state of utter frustration, I ran the <code>rclone sync</code> again. And
this time, I looked at the actual errors more closely. And for the first time
in this multi-day investigation really perceived this line:</p>
<pre tabindex="0"><code>my-file: Not decompressing &#39;Content-Encoding: gzip&#39; compressed file. Use --s3-decompress to override
</code></pre><p>And it hit me like a brick. I&rsquo;m pretty sure I woke up my neighbors with the
facepalm I did. It was my Traefik config. I had enabled the <a href="https://doc.traefik.io/traefik/middlewares/http/compress/">Compression Middleware</a>.
And in my Ceph Rook setup, in contrast to my baremetal setup, Ceph S3 was only
reachable through my Traefik ingress.
After disabling the compression middleware, no MD5 sum problems occurred.</p>
<p>I have to admit that at this point, the story pretty much ends. I have no complete
explanation of what might be going on here. The error message above suggests
that a compressed file reached the S3 bucket and got stored there - but that
doesn&rsquo;t really make much sense, because the compression middleware only handles
responses, it doesn&rsquo;t touch the requests, from what I see in the docs at least.</p>
<p>If anybody has a theory or even better, an actual explanation, I would very
much love to hear it, e.g. via <a href="https://social.mei-home.net/@mmeier">the Fediverse</a>.</p>
<h1 id="migrating-my-hugo-blog-and-wiki">Migrating my Hugo blog and wiki</h1>
<p>Another pair of migrations which might be interesting to some of you were my
blog and my internal docs. Both run on <a href="https://gohugo.io/">Hugo</a>. One of these
days I will actually get around to writing the obligatory &ldquo;How I&rsquo;m running this
blog&rdquo; post, but that day is not (really) today. &#x1f601;</p>
<p>In short, Hugo is a static site generator, fed with Markdown files. I generate
the files in my CI and then push them into an S3 bucket. That bucket is then
directly served via my Traefik proxy.</p>
<p>I&rsquo;m running both, the blog and wiki, via my Traefik ingress in the new k8s
setup. The IngressRoute manifest for the blog is the more interesting one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;blog.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;some-host.none-of-your-bussiness&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">dmz</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`blog.mei-home.net`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-index</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-amz-headers</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>At the top, I&rsquo;m setting up DNS for the blog. This is only used internally. The
target is my internal fortress host, the only one reachable externally.</p>
<p>Then I specify the entry point as my DMZ entry point, the only port that the
DMZ can reach on the inside.</p>
<p>The rule itself is not too interesting, as the meat of the setup is found in
the middlewares. They look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-blogs-bucket</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">addPrefix</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prefix</span>: <span style="color:#ae81ff">/the-blogs-bucket</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-index</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replacePathRegex</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">(.*)(?:\/$|(\/[^\.\/]*)$)</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">${1}${2}/index.html</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-amz-headers</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">headers</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">customResponseHeaders</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-amz-meta-s3cmd-attrs</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-amz-request-id</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-amz-storage-class</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-rgw-object-type</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>The first one, <code>my-blogs-bucket</code>, is a simple rewrite rule, which adds the
bucket&rsquo;s name to the URL right after the root. This turns a URL like
<code>/posts/k8s-migration-5-s3-buckets/</code> into <code>/the-blogs-bucket/posts/k8s-migration-5-s3-buckets/</code>.
But with that, there&rsquo;s still no HTML file. And Traefik, being mainly a proxy,
not a webserver, doesn&rsquo;t have any tricks to automatically add an index file.</p>
<p>This problem is solved by the second Middleware, <code>blog-index</code>. It takes the path
and appends <code>/index.html</code> to it. But only if the path ends on a <code>/</code>. But even
that is not enough. Because some browsers seem to actively remove a <code>/</code> at the
end of a URL? That&rsquo;s what the second part does. It makes sure that all paths
which don&rsquo;t lead to a file are also appended with <code>/index.html</code>, even when they
don&rsquo;t end on <code>/</code>.</p>
<p>The last middleware, <code>blog-amz-headers</code>, just removes some S3 headers which
Ceph&rsquo;s RGW tacks on by default, and which really don&rsquo;t need to leave my network.</p>
<p>And that&rsquo;s it for the migrations. There were a couple of other utility buckets,
but they really aren&rsquo;t that interesting. Instead, let&rsquo;s go for some pretty
plots. &#x1f913;</p>
<h1 id="metrics">Metrics</h1>
<p>The first thing to note is that my transfer speed from a bucket on the old cluster
to the new cluster with <code>mc cp -a --recursive</code> capped out at around 50MiB/s.
This is way below line speed of my 1GB/s network. My disk IO on the receiving
Ceph hosts was around 80%, with about 55MB/s writes.</p>
<p>At first, I wasn&rsquo;t able to find the bottleneck. My command and control host,
where I was running the <code>mc cp</code> command, only showed about 400Mbit/s worth of transfers
in either direction. But then I recalled the network path, which looks something
like this:
<figure>
    <img loading="lazy" src="s3-network.svg"
         alt="A network diagram. It shows a switch in the middle. The switch is connected via a dotted orange line and a solid blue line to the Router box. Connected to the switch via another orange dotted line is a box called &#39;C&amp;C Host&#39;. Connected via solid blue lines to the switch are also boxes labeled &#39;Ceph Host A&#39; and &#39;Ceph Host B&#39;."/> <figcaption>
            <p>Simplified view of the network.</p>
        </figcaption>
</figure>
</p>
<p>All of the involved hosts - the Ceph hosts from the two Ceph clusters and the
C&amp;C host running <code>mc cp</code> are connected to the same switch, but in different
VLANs. So to get from the C&amp;C hosts to the Ceph hosts, the data needs to go
through the router, an OPNsense box in my case. The problem is the connection
from the router to the switch. It needs to carry the same traffic twice.
First, the traffic from the source cluster goes through the LAN NIC to the router,
and then out the same NIC but on a different VLAN to the C&amp;C host. Then the C&amp;C
host sends that data right back to the router&rsquo;s LAN NIC, where it leaves again
through the same NIC on my Homelab VLAN and finally reaches the Rook Ceph host.</p>
<p>Here is an example plot of the network traffic on one of the Ceph hosts involved:
<figure>
    <img loading="lazy" src="traffic-ceph.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is traffic in Mbit/s. For about 15 minutes, rx traffic of about 450Mbit/s can be seen."/> <figcaption>
            <p>Network traffic on one of my Ceph hosts during one of the transfers.</p>
        </figcaption>
</figure>

It looks similar on all other involved hosts. Safe for one: The OPNsense box.
Here is the traffic on the NIC which almost everything in my home hangs off of.
<figure>
    <img loading="lazy" src="traffic-router.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is traffic in Mbit/s. For about 15 minutes, rx and tx traffic of about 940 Mbit/s can be seen."/> <figcaption>
            <p>Network traffic on the LAN interface of my router.</p>
        </figcaption>
</figure>

It shows the likely bottleneck. The LAN interface on my router has about 940Mbit/s
worth of traffic in both directions. Time for some network upgrades, it seems. &#x1f913;</p>
<p>Next, let&rsquo;s look at the power consumption of it all. Due to running more HW than
normal, supporting both my Nomad and k8s clusters in parallel, the power usage
of my Homelab already grew from an average of 150W to about 200W. But these
S3 transfers tacked on another 130W:
<figure>
    <img loading="lazy" src="power.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is power usage in Watts. At the beginning and end of the graph, the consumption is around 190W. In the middle, it suddenly first goes up to about 300W and then, 12 minutes later, reaches the peak of almost 330W before going down to 190W again."/> <figcaption>
            <p>Overall Homelab power consumption during one of the bucket transfers.</p>
        </figcaption>
</figure>
</p>
<p>What I&rsquo;m actually a little bit curious about: How much of that increase comes
from the switch?</p>
<p>Finally, let&rsquo;s have a short look at disk usage. On both Ceph clusters, the S3
buckets reside on HDDs, while their indexes reside on SSDs.
First, the view of one of the source cluster&rsquo;s machines:
<figure>
    <img loading="lazy" src="hdd-src-1.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is disk IO utilization in percent. The interesting part here is the curve labeled &#39;sdc&#39;. It goes from 6% to around 60% and stays there for around 15 minutes before going back to 6%."/> <figcaption>
            <p>IO utilization of one of the source Ceph hosts in the transfer.</p>
        </figcaption>
</figure>

And here is the same graph for the second host in the source Ceph cluster.
<figure>
    <img loading="lazy" src="hdd-src-2.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is disk IO utilization in percent. The interesting part here is the curve labeled &#39;sdb&#39;. It goes from 6% to around 50% and stays there for around 15 minutes before going back to 6%."/> <figcaption>
            <p>IO utilization of the other source Ceph hosts in the transfer.</p>
        </figcaption>
</figure>

This shows pretty nicely that reads are distributed by Ceph. Combined, both hosts
together show a read rate of about 55MB/s.</p>
<p>Finally, let&rsquo;s have a look at one of the receiving hosts in the Ceph Rook
cluster. I will only show the metrics of one of them here, because the other one
is a VM, and the IO values don&rsquo;t make too much sense.
<figure>
    <img loading="lazy" src="hdd-dest.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is disk IO utilization in percent. The interesting part here are two curves, one labeled &#39;sda&#39; and one labeled &#39;sdb&#39;. Both curves increase together from almost zero. The sdb curve goes up to over 80%, while the sda curve goes up to about 20%. Both curves stay there for around 15 minutes before returning to their initial values."/> <figcaption>
            <p>IO utilization on one of the Ceph Rook destination hosts in the transfer.</p>
        </figcaption>
</figure>
</p>
<p>Here we can see that the raw data is not the only thing which needs to be written
during S3 operations. The higher curve, going up to around 80%, is the host&rsquo;s
HDD. There the actual S3 data is stored. The 20% curve is the SATA SSD in the
host, it holds the index of the S3 buckets. The writes come out to about 55MB/s
on the HDD, as expected. Surprisingly, the read and write on the SSD is almost
zero, so I&rsquo;m wondering what&rsquo;s producing the IOPS here?</p>
<p>And that concludes the &ldquo;pretty plots&rdquo; section of this post. &#x1f913;</p>
<h1 id="conclusion">Conclusion</h1>
<p>This was supposed to be a mostly mechanical action to do during the work week,
with not much thinking required. It turned into a really frustrating matter
through the difficult to debug Traefik compression issue.
And this was only half the issues I saw. The other half was created by sudden
connection loss during bucket copies. This one was solved by adding an outgoing
firewall rule, but I decided to add that to the
<a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">Cilium Load Balancer post</a>
as an update, as that&rsquo;s going to be easier on future readers.</p>
<p>But still, I&rsquo;m done now. The k8s cluster is officially load-bearing. What could
possibly go wrong, running two very different workload orchestrators, both critical
to the Homelab&rsquo;s function? &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 4: Storage with Ceph Rook</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/</link>
      <pubDate>Thu, 11 Jan 2024 00:15:01 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/</guid>
      <description>Setting up a Ceph cluster on Kubernetes with Rook</description>
      <content:encoded><![CDATA[<p>Wherein I talk about the setup of Ceph Rook on my k8s cluster.</p>
<p>This is part five of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<h1 id="the-current-setup">The current setup</h1>
<p><a href="https://blog.mei-home.net/posts/homelab-2022/storage/">I&rsquo;ve been running Ceph as my storage layer for quite a while now</a>.
In my current Nomad setup, it provides volumes for my jobs as well as S3 for those
apps which support it.
In addition, most of my Raspberry Pis are diskless, netbooting off of Ceph&rsquo;s
RBD block devices as their root.
At first glance, <a href="https://ceph.io/en/">Ceph</a> might look like you&rsquo;d need an Ops
team of at least three people to run it. But after the initial setup, I&rsquo;ve found
it to be very low maintenance. Adding additional disks or entire additional
hosts is very low effort.
I went through the following stages, with the exact same cluster, without any
outages or cluster recreation:</p>
<ol>
<li>Baremetal on my single homelab server - bad idea, as that server also needed
to mount Ceph disks</li>
<li>On a single VM on that same server</li>
<li>Spread over four LXD VMs on that server</li>
<li>Spread over three LXD VMs on that server and a Raspberry Pi CM4 in an IO board
with a SATA card attached</li>
<li>My current config, with three dedicated baremetal machines</li>
</ol>
<p>Before I started my migration, I had a setup with three x86 hosts, each with
one 1TB SATA SSD and one 4TB HDD. Overall, I was using only about 40% of that
storage. I then had several pools configured, each with a replication factor of
&ldquo;2&rdquo;. This works pretty much like a RAID1 setup, where all data is mirrored on
two disks. Or, in my case, on two different hosts even. This allows me to reboot
any of my hosts without any outages, as I&rsquo;ve configured Ceph to be okay with
only one replica when a host is down.</p>
<p>For the migration, I took out my emergency replacement HDD and SSD and put them
into my LXD VM host to create another VM with those two drives. I then also took
one of my baremetal hosts out of my baremetal cluster for later addition to the
Kubernetes cluster.</p>
<p>All of this allowed me to keep my baremetal cluster running without interruption,
continuing to supply my Nomad cluster with storage, while I have a whole separate,
<a href="https://rook.io/docs/rook/latest-release/">Ceph Rook</a> based cluster for the
migration. Once I&rsquo;m done with the migration, I will add the two other baremetal
hosts to the Rook cluster and remove the Ceph VM.</p>
<p>I got pretty lucky that I happen to have enough HW and disks laying
around to be able to afford two clusters. It&rsquo;s what allows me to do this migration
at my own pace, iteratively. I&rsquo;m really enjoying that there&rsquo;s no pressure to
finish the migration and I can go on detours like my recent implementation of
<code>LoadBalancer</code> services with Cilium.</p>
<h1 id="why-rook">Why Rook</h1>
<p>So, considering how happy I am with baremetal Ceph, why switch to Rook? First it
is of course some portion of just wanting to try something new. Then there&rsquo;s the
idea of having what&rsquo;s called a <em>hyperconverged</em> infrastructure. With running
Ceph on Kubernetes instead of stand-alone, I can also run other workloads on
those hosts, which are idling for a lot of the time in the current setup.
It allows me to use my
resources more efficiently. I&rsquo;m not implementing this right now, having added
a <code>NoSchedule</code> taint to my Ceph hosts. This is mostly because I&rsquo;m still unsure
of the behavior when I have to take the entire cluster down. Most of my services
will need Ceph, but they might get scheduled before the Ceph pods, on the Ceph
hosts.
I understand that I can work with <code>PriorityClass</code> here, but I have not gotten
to wrapping my head around that.</p>
<p>Another really important point: Ceph Rook is very declarative, as you&rsquo;ll see
shortly. Baremetal Ceph with Cephadm, on the other hand, is mostly invoking
commands by hand. There are no good ways to version control my Ceph cluster
setup at the moment. I&rsquo;ve had to keep meticulous notes about the commands I
need to execute.</p>
<p>But Rook is not all milk and honey. Rook can, for example, not use the same
pool for multiple storage types. A pool is either only for RBD, CephFS or RGW usage. No
mixing. So I will have to create multiple pools. In my current setup, I just
have a generic <code>bulk</code> and <code>fast</code> pool, with HDDs and SSDs respectively. Those
are then used for all three storage applications.
Furthermore, Ceph&rsquo;s dashboard does not support Rook as an orchestrator very well
right now. Several pieces of information I can see in my baremetal dashboard
are &ldquo;N/A&rdquo; on the Rook Ceph dashboard.</p>
<p>Here is an example:</p>
<figure>
    <img loading="lazy" src="host-list.png"
         alt="A screenshot of the Ceph dashboard&#39;s UI. The selected tab is &#39;Hosts List&#39;. In the table, the colums &#39;Hostname&#39; and &#39;Service Instances&#39; are blurred. The only column with actual content, &#39;Status&#39;, shows &#39;Available&#39; for all lines. There are additional columns, all showing &#39;N/A&#39; for all lines: &#39;Model&#39;, &#39;CPUs&#39;, &#39;Cores&#39;, &#39;Total Memory&#39;, &#39;Raw Capacity&#39;, &#39;HDDs&#39;, &#39;Flash&#39;, &#39;NICs&#39;."/> <figcaption>
            <p>The Host List of the Ceph Dashboard. No information at all is shown about any of the nodes.</p>
        </figcaption>
</figure>

<p>The dashboard data is pretty useless, with not showing any HW data about the hosts.
It looks similarly in the <code>Services</code> overview, which shows completely wrong
service counts. At least the <code>OSD</code> overview is entirely correct. The same is
true for the <code>Pools</code>, <code>Block</code> and <code>File Systems</code> UIs. But the Ceph RGW/S3 UI
is just not supported at all. Even with RGWs deployed and working, the dashboard
shows me that RGW access failed.</p>
<p>The problems with the dashboard are known and are currently being worked on by
the Rook maintainers. It also isn&rsquo;t a pure dashboard problem, but the same problem
occurs for the data visible via <code>ceph orch ps</code>, where the columns <code>Ports</code>, <code>Mem Use</code>
and <code>Mem Lim</code> are empty, and the <code>Version</code> column just shows <code>&lt;unknown&gt;</code>.
Not great to be honest, but I assume this kind of info just didn&rsquo;t seem too
urgent, because it is also available directly via Kubernetes.</p>
<p>Finally, one rather sad point: There&rsquo;s no official migration path between Ceph
baremetal and Ceph Rook. Yes, there&rsquo;s some wild guides floating around, but
I don&rsquo;t trust any of them with my data, as they all invariably contain some
variation of &ldquo;after you&rsquo;ve finished fuzzing around with the cluster ID&hellip;&rdquo;.</p>
<p>So what I&rsquo;ll do instead is manual migrations. I still have to look into Ceph RBD
import/export. I also thought about using Ceph&rsquo;s mirroring features, but the
setup of that between the baremetal and Rook clusters doesn&rsquo;t really look worth
it. So it will likely just come down to mounting both volumes on one host and
using trusty old <code>rsync</code>.</p>
<h1 id="ceph-rook-setup">Ceph Rook setup</h1>
<p>To set up Rook, I&rsquo;m using the <a href="https://rook.io/docs/rook/latest-release/Helm-Charts/helm-charts/">Helm charts</a>.
There are two of them, one for the Rook operator and one for the cluster. With
this separation, multiple clusters can be controlled by the same operator. I&rsquo;m
not using that capability here, though.</p>
<p>The initial node setup isn&rsquo;t too interesting, safe for one thing, namely
the taints. As I noted above, while I would like to share the resources of the
Ceph hosts with other deployments, for now I&rsquo;ve added a <code>NoSchedule</code> taint.
I&rsquo;m adding this taint via the <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-JoinConfiguration">kubeadm join config</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">JoinConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">taints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab/role=ceph&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab/role=controller&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_workers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab/role=worker&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span></code></pre></div><p>This file is put through Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/template_module.html">Template module</a>
during initial host setup, and I&rsquo;m then just joining the node to the cluster
via <code>kubeadm join --config /path/to/join.yaml</code>. Here, the <code>homelab/taint.role=ceph</code>
taint is added to all Ceph hosts, which are in my <code>kube_ceph</code> Ansible group.</p>
<p>Depending on which parts of Ceph you would like to use, you will also need to
make sure that all nodes (not just the Ceph nodes) have the <code>rbd</code> and <code>ceph</code>
kernel modules.</p>
<h1 id="rook-operator-setup">Rook operator setup</h1>
<p>As always, I&rsquo;m constructing my setup by reading through the default
<a href="https://github.com/rook/rook/blob/master/deploy/charts/rook-ceph/values.yaml">values.yaml file</a>.</p>
<p>But before looking at the config, let&rsquo;s look at what the operator actually
does: It takes the Ceph Rook CRDs and creates a full Ceph cluster from
them, including all the daemons necessary to run a full Ceph cluster. It also
takes care about changes in the CRDs and starts new daemons and re-configures
and deletes existing ones.</p>
<p>My configuration is rather simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/role</span>: <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">priorityClassName</span>: <span style="color:#ae81ff">system-cluster-critical</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">csi</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enableMetadata</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterName</span>: <span style="color:#ae81ff">k8s-rook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provisionerTolerations</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nfs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">enableDiscoveryDaemon</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">discoveryDaemonInterval</span>: <span style="color:#ae81ff">6h</span>
</span></span></code></pre></div><p>There&rsquo;s not actually that much to configure in the operator itself. Most of
the Ceph config is done in the next chart, which defines the cluster proper.</p>
<p>First of all, I&rsquo;m defining some things for the deployment of the operator pod.
It should tolerate the previously mentioned <code>NoSchedule</code> taint on Ceph nodes
and also gets a <code>nodeSelector</code> for those nodes. Furthermore, I&rsquo;m assigning it
the highest scheduling <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass">PriorityClass</a>.
I&rsquo;m not entirely sure this is really necessary, though. I&rsquo;ve just done it out of
reflex, because Ceph is pretty important in my Homelab stack. But the Operator
itself is not that important once the cluster has been initialized, until there&rsquo;s
a change in the cluster CRDs. In that case, the operator would be needed to make
the changes.</p>
<p>Then there&rsquo;s the CSI configs. In short, CSI is the <a href="https://github.com/container-storage-interface/spec">Container Storage Interface</a>. A standard for supplying storage to workloads. It&rsquo;s not
Kubernetes specific, although most implementations have been developed for k8s.
I&rsquo;m using <a href="https://github.com/ceph/ceph-csi">Ceph&rsquo;s CSI driver</a> on my Nomad
cluster, for example.
It consists of two parts. One is the provisioner, which manages the volumes and
talks to the Ceph and k8s clusters. The second part is a pod on each node,
which is mostly concerned with mounting the actual volumes when a pod running
on that node needs it.</p>
<p>I&rsquo;m defining the Ceph <code>NoSchedule</code> toleration here again, because this is
another piece of the infrastructure I want to allow to run on the Ceph nodes. But
I&rsquo;m not defining a <code>nodeSelector</code> on the Ceph nodes, because I don&rsquo;t want to
completely overload them, and it doesn&rsquo;t matter much where the provisioners are
running. While initially writing this, I had the following additional line
under the <code>csi</code> key in the <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">provisionerNodeAffinity</span>: <span style="color:#e6db74">&#34;homelab/role=ceph;&#34;</span>
</span></span></code></pre></div><p>But then, while writing a couple of things about which pods are deployed by
the operator, I realized that one of my MGR pods was in the pending state -
no space left on the two Ceph nodes. So I ended up removing the node affinity
for the provisioner pods, which balanced everything out again.</p>
<p>As in previous charts, I&rsquo;m explicitly disabling the metrics gathering, which I
will look at again later.</p>
<p>Finally, the <code>enableDiscoveryDaemon</code> and <code>discoveryDaemonInterval</code> configs
are related to dashboard functionality. As I&rsquo;ve shown above, the dashboard does
not show all disks at the moment. But without these options, the entire
<code>Physical Disks</code> page would be empty.</p>
<p>Okay. While looking at the pods in the operator namespace to verify which were
launched by the operator before any cluster was defined, I realized why I&rsquo;m not
getting any data on disks from my Ceph hosts: I did not set the Ceph <code>NoSchedule</code>
toleration on the discovery daemon containers. &#x1f926;</p>
<p>Here is what the &ldquo;Physical Disks&rdquo; page of the dashboard looked like up to now:</p>
<figure>
    <img loading="lazy" src="physical-disks.png"
         alt="A screenshot of the Ceph dashboard&#39;s UI. The table is headed &#39;Physical Disks&#39;. The &#39;Hostname&#39; column is blurred out. There are nine lines overall. The &#39;Size&#39; column contains only entries ranging for 4MB to 50 GB."/> <figcaption>
            <p>The physical disks list of the Ceph Dashboard. Note here that this list doesn&rsquo;t actually contain any of my Ceph hosts, and consequently none of the disks which are actually used as OSD disks at the time this screenshot was taken are actually shown.</p>
        </figcaption>
</figure>

<p>After the above realization, I&rsquo;ve added the following into the operator
<code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">discover</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span></code></pre></div><p>This makes the disks visible:</p>
<figure>
    <img loading="lazy" src="appearing-disks.png"
         alt="A screenshot of the Ceph dashboard&#39;s UI. It shows a table of four disks. Two of them have &#39;QEMU&#39; as the vendor and &#39;QEMU HARDDISK&#39; as the Model. One is 8 TB in size, and one 2 TB. The third line has the model &#39;Samsung SSD 870&#39; and is 1 TB in size. The last line has a string of alphanumerics as the &#39;Model&#39; and a size of 4 TB."/> <figcaption>
            <p>The physical disks in my Ceph nodes finally appeared.</p>
        </figcaption>
</figure>

<p>Once the Helm chart for the operator is deployed, it will create a pod of the
operator itself as well as pods for the enabled CSI provisioners. In my case,
these are the RBD and CephFS provisioners.
Note that at this point, the actual Ceph cluster is not yet defined. None of
the Ceph daemons have been created, as the cluster is only defined in the next
chart.</p>
<h1 id="rook-cluster-setup">Rook cluster setup</h1>
<p>Now that the operator is running, I could create the actual Ceph cluster. This
is done via a separate Helm chart which can be found <a href="https://github.com/rook/rook/tree/master/deploy/charts/rook-ceph-cluster">here</a>.</p>
<p>This chart sets up the cluster with all its daemons, like the MONs, OSDs and
so forth. It can also be used to create pools for RBD, CephFS and RGW usage and
storage classes to make those available in the k8s cluster.</p>
<p>I will discuss my <code>values.yaml</code> file in pieces, to make it more manageable.</p>
<p>Let&rsquo;s start with some housekeeping:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">clusterName</span>: <span style="color:#ae81ff">k8s-rook</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">operatorNamespace</span>: <span style="color:#ae81ff">rook-ceph</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">toolbox</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Here I&rsquo;m supplying a cluster name (you know, just in case I end up running
multiple clusters one day &#x1f609;) and I set the namespace into which I deployed
the operator.</p>
<p>In addition, I&rsquo;m disabling the <code>toolbox</code>. This is a pod which can be used to
run Ceph commands against the cluster. But I don&rsquo;t need it, as I&rsquo;m using the
rook-ceph <code>kubectl</code> plugin, which I will show later.
I&rsquo;m also disabling monitoring here, as I haven&rsquo;t deployed Prometheus yet.</p>
<p>Then let&rsquo;s start with the cluster spec:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mgr</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">modules</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pg_autoscaler</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">devicehealth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">diskprediction_local</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ssl</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">network</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">provider</span>: <span style="color:#ae81ff">host</span>
</span></span></code></pre></div><p>Here I&rsquo;m enabling a couple of modules. The <code>pg_autoscaler</code> can automatically
scale up and down the <a href="https://docs.ceph.com/en/reef/rados/operations/placement-groups/">placement groups</a>
in the cluster. <code>devicehealth</code> is pretty much what it says on the tin - it
enables gathering SMART data. The <code>diskprediction_local</code> module is a related
tool, which uses the SMART data to make some guesses on how long your disks
still have to live at current usage. Docs can be found <a href="https://docs.ceph.com/en/quincy/mgr/diskprediction/">here</a>.
Finally, the <code>rook</code> module hooks into Ceph&rsquo;s orchestrator functionality, but as
I&rsquo;ve noted <a href="#why-rook">above</a>, it doesn&rsquo;t yet implement all the
functionality of the official <a href="https://docs.ceph.com/en/latest/cephadm/">cephadm</a>
deployment tool.</p>
<p>I&rsquo;m also disabling SSL for the dashboard. Simple reason is that it&rsquo;s not going
to be reachable directly from the outside, and all the cluster internal traffic
is secured by Cilium&rsquo;s encrypted WireGuard tunnels.</p>
<p>The <code>network.provider: host</code> config option is important. This config makes it
so that the Ceph daemons use the node&rsquo;s network, not the k8s cluster network.
So they will be reachable by all hosts in my Homelab subnet, without any routing
and without needing <code>LoadBalancer</code> services or ingresses. This is important for
me, because I don&rsquo;t just use my Ceph cluster for supplying volumes to k8s services, but also
for other things, like a mounted CephFS on my workstation and the root disks
for my Pis.</p>
<p>Next comes the placements:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">all</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span></code></pre></div><p>Nothing too exiting. Worth noting perhaps is that I force my MON daemons to be
put on the controller nodes. This is a similar setup to what I&rsquo;ve got in my
baremetal/Nomad setup right now as well. The basic thought is to put all the
<code>server</code> components of all of my infrastructure on the same group of three hosts.
In addition, I&rsquo;m forcing all other Ceph daemons onto the Ceph nodes. That&rsquo;s just
so I know that as long as those nodes are up, I have a fully functional Ceph
cluster, e.g. for booting all of the other nodes in the cluster, which don&rsquo;t
have any attached storage.
I&rsquo;m still not sure how well this idea is really going to work out, and I will
have to do a full cluster shutdown test soonish, to make sure that the Ceph
cluster is able to come up fully and start serving requests without any other
worker node being online.</p>
<p>I&rsquo;m a bit worried how Kubernetes is going to behave in situations like this. I hope
it will just start scheduling the Ceph pods, and then once the Ceph cluster is
healthy I will be able to boot the diskless worker nodes.</p>
<p>And now for the storage definitions and dashboard ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">useAllNodes</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">useAllDevices</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">nodes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;node1&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_cephssd&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_cephhdd&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;node2&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-1234&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-5678&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">host</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-k8s.mei-home.net</span>
</span></span></code></pre></div><p>There are two basic options for configuring the storage. First, one can tell
Rook to use all nodes and all devices in those nodes. For me, that&rsquo;s not what
I want. I&rsquo;ve got a couple of dedicated Ceph nodes and dedicated storage inside
them. So I&rsquo;m using the setup shown here. It tells Rook to use the disks on
the two nodes <code>node1</code> and <code>node2</code>, and only those disks given here.</p>
<p>I&rsquo;m using the stable names here, instead of <code>/dev/sd*</code>. I&rsquo;m specifically using
the <code>by-id/wwn</code> numbers for the actual hard disks in my baremetal node to
ensure that I always get the correct disks. These <code>wwn-*</code> numbers are based on
the <a href="https://en.wikipedia.org/wiki/World_Wide_Name">World Wide Name</a> as provided
by the drives themselves, so they should be completely unique.
For some more details on the different ways of addressing disks, I found
<a href="https://wiki.archlinux.org/title/persistent_block_device_naming">the Arch wiki</a>
pretty useful.</p>
<p>And then there&rsquo;s also the ingress for the Ceph dashboard. Not much more than
defining which entrypoint from my Traefik Ingress it is supposed to use, and
under which domain it should be reachable.</p>
<p>One final note: You can do a lot of this setup piecemeal. Even without any nodes
in the list, you can already deploy the Helm chart, and then add the nodes
to the list as you add them to the k8s cluster. The same is true for the
next section, about the cluster pools and storage classes. I deployed the
cluster chart first without any pools defined, verified that all the base
daemons work and only then added the pools.</p>
<h1 id="setting-up-ceph-pools-and-storageclasses">Setting up Ceph pools and StorageClasses</h1>
<p>In this section, two concepts are combined. The first one are <a href="https://docs.ceph.com/en/latest/rados/operations/pools/">Ceph Pools</a>.
These pools are storage pools, logical pieces of your underlying disks. The same
disks can be used by multiple pools, the partition is purely logical. Pools are
also one of the units which can be used in setting permissions for Ceph auth.</p>
<p>One of the downsides of using Rook is that pools created through Rook cannot
be used for different applications. As noted before, Ceph can provide storage in
three forms: Block devices, CephFS as a POSIX compatible filesystem and S3
storage. In baremetal Ceph clusters, you can run multiple apps on the same pool.
But in Rook, each pool can only be used for one of the three storage applications.
Of course you can still create pools manually via the Ceph CLI and assign them
to multiple apps. But then you lose one of Rooks biggest advantages: Declarative
definition of Ceph.</p>
<p>The second concept are <a href="https://kubernetes.io/docs/concepts/storage/storage-classes/">Kubernetes StorageClasses</a>.
Those, similar to e.g. <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class">IngressClasses</a>, describe a specific form of storage that a user can request for their
pod/volume.</p>
<p>I&rsquo;m using all three storage types Ceph supplies, so I will define pools and
storage classes for all of them.</p>
<h2 id="rbds">RBDs</h2>
<p>Let&rsquo;s start with the RBD pool(s). RBDs are block devices as per the Linux
definition. They&rsquo;re raw buckets of bytes and look the same to Linux as your
physical disks. Mounting them, in contrast to e.g. NFS exports, needs an extra
step though. This step is <code>mapping</code> them, which creates the <code>/dev/foo</code> device.
This can then be mounted, formatted with an FS and so on, similar to a normal
disk.
RBDs can be pretty efficient, especially with the <code>exclusive-lock</code> feature
enabled, which allows the client to assume that they will never have to give up
the lock on the RBD volume once acquired, until the device is <code>unmapped</code> again.
For that reason, they&rsquo;re my default volume. In my Rook setup, I have two pools,
with two different storage classes. One based on my SSDs, and one based on my
HDDs. Most volumes for my services will end up on the HDD pool, with a couple
of exceptions like databases. And yes, I&rsquo;ve been running a Postgres DB off of a
Ceph RBD volume for almost three years now, without any issues, including
power outages and the like.</p>
<p>Here, I will only show the SSD pool and storage class, as the HDD variant only
has a different name and a different <code>spec.deviceClass</code>.
So here we go:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephBlockPools</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowVolumeExpansion</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeBindingMode</span>: <span style="color:#e6db74">&#34;Immediate&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">imageFeatures</span>: <span style="color:#e6db74">&#34;layering,exclusive-lock,object-map,fast-diff&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/fstype</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-name</span>: <span style="color:#ae81ff">rook-csi-rbd-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-name</span>: <span style="color:#ae81ff">rook-csi-rbd-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-name</span>: <span style="color:#ae81ff">rook-csi-rbd-node</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the pool. I&rsquo;m running all of my pools as replicated pools, with
size &ldquo;2&rdquo;. This way I have redundancy, including for whole-host failures via the
<code>failureDomain: host</code> setting. This forces Ceph to store both replicas of an
object on two different hosts. I&rsquo;m also giving a <code>min_size: &quot;1&quot;</code>. This tells
Ceph to continue operating as normal even when one of the two replicas is gone.
That&rsquo;s a little bit unsafe, but ensures that my systems continue running, even
when the host with the other replica goes down, for maintenance or through
sheer stupidity on my side. &#x1f605;</p>
<p>Then there&rsquo;s the storage class. I&rsquo;ve got the <code>isDefault</code> option disabled for
all of my storage classes, so that I always have to explicitly chose one. I&rsquo;m
just wired in such a way that I prefer explicit values to defaults. &#x1f937;
The <code>reclaimPolicy</code> is a <strong>very</strong> important option. It determines what happens
when a Kubernetes <a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">persistent volume</a>
is removed. With the setting <code>Retain</code>, the underlying Ceph volume is retained
and needs to be removed separately. As I&rsquo;m still a bit unfamiliar with k8s,
I find it prudent to set <code>Retain</code> on all of my storage classes. &#x1f605;</p>
<p>Next, the <code>allowVolumeExpansion</code> option does what it says: It allows existing
volumes to be enlarged. The <code>volumeBindingMode</code> option defines when the volume
is created. With the <code>Immediate</code> value, the Ceph volume will be created when
the PVC is deployed in the cluster. With the <code>WaitForFirstCustomer</code> value,
the Ceph volume would only be created once the first pod using it is created.</p>
<p>Then there&rsquo;s the parameters. Most of them are simply defaults I copied over from
the example <code>values.yaml</code> file. The important one is the <code>imageFeatures</code> setting.
It contains a list of options for newly created Ceph RBDs.
Details on these options can be found <a href="https://docs.ceph.com/en/latest/man/8/rbd/#cmdoption-rbd-image-feature">here</a>.</p>
<h2 id="cephfs">CephFS</h2>
<p>Next comes the CephFS pool and storage class. <a href="https://docs.ceph.com/en/latest/cephfs/">CephFS</a>
is a POSIX-compliant file system. It can be mounted on any Linux machine and
supports permissions and ACLs. It is a bit more complex than RBDs as a consequence,
as all the file metadata needs to be handled. For that reason, it needs an
additional set of daemons, the MDS (Metadata Server). It allows concurrent access,
and that&rsquo;s what I&rsquo;m mostly using it for in my setup. I&rsquo;m using it to share the
volume with my Linux ISOs between my Linux ISO media server and my desktop, for example.
I&rsquo;m also using it for cases where multiple Nomad jobs need to access the same
volume, RWX volumes in CSI parlance. It will serve the same purpose in my
k8s cluster.</p>
<p>As it allows concurrent access and hence suffers from the cost of networked
coordination, I&rsquo;m only having a single pool for it, located on my HDDs, as I
don&rsquo;t expect access to it to need to be fast anyway.</p>
<p>Here&rsquo;s the definition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephFileSystems</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadataPool</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dataPools</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bulk</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">preserveFilesystemOnDelete</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadataServer</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">activeCount</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">activeStandby</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;250m&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">priorityClassName</span>: <span style="color:#ae81ff">system-cluster-critical</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                        - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pool</span>: <span style="color:#ae81ff">bulk</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowVolumeExpansion</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeBindingMode</span>: <span style="color:#e6db74">&#34;Immediate&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/fstype</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-node</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span></code></pre></div><p>In contrast to RBDs, I&rsquo;m needing two pools, one for metadata and one for the
actual data. The setup of the two pools is similar to the RBD pool, replication
of &ldquo;2&rdquo;, with <code>host</code> as the failure domain. I&rsquo;m also again setting it up so
that a deletion of the file system CRD does not lead to a deletion of the
file system in Ceph. Just to protect myself from my own stupidity, with the
<code>preserveFilesystemOnDelete</code> option.
I also need to set up the MDS deployment here, which I do the same way as the
other Ceph daemons. Meaning with a toleration of the Ceph taint and an affinity
for my Ceph nodes.</p>
<p>The storage class is pretty much the same as before, with the addition of the
<code>pool</code> option, which denotes the data pool to be used.</p>
<h2 id="rgwss3">RGWs/S3</h2>
<p>Finally, S3. I&rsquo;m using it wherever I need a &ldquo;data lake&rdquo; type storage. It doesn&rsquo;t
have volumes, it doesn&rsquo;t need to be mounted, I can just push data into it with
a variety of tools until my physical disks are full. I found Ceph&rsquo;s S3 to be
pretty well supported and have yet to meet any application that wants S3 storage
but won&rsquo;t work with Ceph. The only thing one might have to do is to make sure
that the <code>path-based</code> option is enabled, as by default, S3 buckets are to be
addressed via DNS, not via a URL.
I&rsquo;m using S3 for a variety of applications, ranging from Nextcloud to restic
backups.</p>
<p>Similar to CephFS, S3 requires additional daemons, the Rados Gateway (RGW).
I have diverted a bit from my current setup here. In my baremetal cluster,
Consul agents are running on the Ceph hosts, announcing a service for my S3
storage. Then, the RGWs receive my Let&rsquo;s Encrypt cert and get accessed directly
from the outside, and via the Consul Connect mesh network from services in my
Nomad cluster.
For now, I decided to keep the RGWs for Rook internal and only accessible through
my ingress, to simplify my setup a bit. I will have to see how well that
performs, for example during backups.</p>
<p>Here is the definition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephObjectStores</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadataPool</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dataPool</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">preservePoolsOnDelete</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">gateway</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">hostNetwork</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">preferredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">weight</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">preference</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                        - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;500m&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;512Mi&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeBindingMode</span>: <span style="color:#e6db74">&#34;Immediate&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">host</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span></code></pre></div><p>First on the agenda is, again, the pool setup. Like CephFS, RGWs need metadata
and data pools. They will even create additional pools for indexes and such.
Same strategy as before, metadata on SSDs, data on HDDs and replication factor
of &ldquo;2&rdquo;.
And also similar to the other two storage types, I&rsquo;m preserving pools on deletion,
meaning when the pools CRD manifest is removed, the Ceph pool is kept until it
is deleted manually.</p>
<p>Then the setup for the RGW daemons themselves. Again, very similar to before.
Two instances, and tolerations and affinity for my Ceph nodes to allow them to
run there.</p>
<p>The storage class is again nothing special. The ingress is also pretty standard.
Going through my proxy via the HTTPS entrypoint, and being hosted at <code>s3.example.com</code>.
I might add a <code>LoadBalancer</code> service here if I find that I don&rsquo;t like the perf
of putting all cluster-external S3 traffic through the ingress.</p>
<p>And that&rsquo;s it. With all of these deployed, the operator will create the necessary
pods and pools as well as storage classes, and we&rsquo;re ready to make use of it.
At this point I haven&rsquo;t actually used the cluster yet, but I just figured that
I should provide at least some examples.</p>
<p>So let&rsquo;s see whether this entire setup actually works. &#x1f605;</p>
<h1 id="examples">Examples</h1>
<p>If you&rsquo;ve made it through the 22 minutes of reading time Hugo currently shows,
you probably already know how storage in k8s works, so I will be brief. For
storage management, k8s has the <a href="https://github.com/container-storage-interface/spec/blob/master/spec.md">CSI</a>.
A spec on how a workload scheduler will communicate with a storage provider to
get volumes. In k8s, the story of a storage volume begins with a persistent
volume claim (<a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">PVC</a>).</p>
<p>For my specific Ceph Rook setup, that might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-claim</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/usage</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">2Gi</span>
</span></span></code></pre></div><p>This will create a persistent volume of size 2 GB, with the <code>rbd-fast</code> class,
meaning Ceph will create the volume on my SSD pool.</p>
<p>This took quite a while to provision, about 7 minutes. I&rsquo;m pretty sure that&rsquo;s
not normal, but this post is already long enough, so I will investigate that
later. &#x1f609;</p>
<p>In the end, this produces a persistent volume claim like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n tests pvc
</span></span><span style="display:flex;"><span>AME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
</span></span><span style="display:flex;"><span>test-claim   Bound    pvc-cb5e3aaa-1292-4bbc-9a9e-309be90ee30d   2Gi        RWO            rbd-fast       7m49s
</span></span></code></pre></div><p>The volume itself can also be shown:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>k -n tests get pv
</span></span><span style="display:flex;"><span>NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
</span></span><span style="display:flex;"><span>pvc-cb5e3aaa-1292-4bbc-9a9e-309be90ee30d   2Gi        RWO            Retain           Bound    tests/test-claim   rbd-fast                7m49s
</span></span></code></pre></div><p>An important piece of information is also shown in the volume&rsquo;s <code>describe</code> output:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>Source:
</span></span><span style="display:flex;"><span>    Type:              CSI <span style="color:#f92672">(</span>a Container Storage Interface <span style="color:#f92672">(</span>CSI<span style="color:#f92672">)</span> volume source<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    Driver:            rook-ceph.rbd.csi.ceph.com
</span></span><span style="display:flex;"><span>    FSType:            ext4
</span></span><span style="display:flex;"><span>    VolumeHandle:      0001-000c-rook-cluster-0000000000000002-b4e37061-2df1-4a15-8a8d-93f3854cecbb
</span></span><span style="display:flex;"><span>    ReadOnly:          false
</span></span><span style="display:flex;"><span>    VolumeAttributes:      clusterID<span style="color:#f92672">=</span>rook-cluster
</span></span><span style="display:flex;"><span>                           imageFeatures<span style="color:#f92672">=</span>layering,exclusive-lock,object-map,fast-diff
</span></span><span style="display:flex;"><span>                           imageName<span style="color:#f92672">=</span>csi-vol-b4e37061-2df1-4a15-8a8d-93f3854cecbb
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>The interesting thing is the <code>imageName</code>, because that&rsquo;s the name of the RBD
volume in the Ceph cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rook-ceph rbd ls --long --pool rbd-fast
</span></span><span style="display:flex;"><span>NAME                                          SIZE   PARENT  FMT  PROT  LOCK
</span></span><span style="display:flex;"><span>csi-vol-b4e37061-2df1-4a15-8a8d-93f3854cecbb  <span style="color:#ae81ff">2</span> GiB            <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>I will leave it at this for the time being. The CephFS based volume will be
pretty similar, and the next thing I will do for the migration is to migrate my
S3 buckets over to Rook from the baremetal cluster, so S3 buckets will get their
very own article.</p>
<h1 id="kubectl-plugin">kubectl plugin</h1>
<p>Before I finish this, I would like to point you to the excellent <code>kubectl</code> plugin
for Rook. It can be found <a href="https://github.com/rook/kubectl-rook-ceph">here</a>.</p>
<p>It provides an easy to use interface to the Rook Ceph cluster, without having
to configure admin credentials on the machine you would like to use. The
kubectl certs are enough.
With it, I can access the full <code>ceph</code> CLI, as well as other tools, like <code>rbd</code>.</p>
<p>The main advantage for me is that I won&rsquo;t have to set up aliases for the
two Ceph clusters I&rsquo;m managing from the same machine. No danger of fat-fingering
the wrong cluster. &#x1f389;</p>
<h1 id="conclusion">Conclusion</h1>
<p>Phew. Congrats to the both of us, dear reader, for getting through this. &#x1f605;
Overall, I liked the experience of setting up Rook. Especially the aspect that
I can now configure at least some aspects of my Ceph cluster declaratively,
putting it under version control.</p>
<p>One question I&rsquo;ve been wondering about: How would this have gone if this Rook
setup was my first contact with Ceph? Would it have been equally easy and clear?</p>
<p>I&rsquo;m honestly not sure. As I&rsquo;ve noted in the intro, I find my Ceph cluster
surprisingly low maintenance, but this setup has again shown me how many moving
parts there are. At least from my PoV, Rook does a good job abstracting a lot of
that.</p>
<p>But I&rsquo;m not sure how comfortable I would have been with these abstractions if
I hadn&rsquo;t run a Ceph cluster for almost three years now.</p>
<h1 id="update-on-2024-01-21">Update on 2024-01-21</h1>
<p>A short update on this article: The setup as described here leads to the
PG autoscaler not working. If that is something you need, I&rsquo;ve written a
follow-up article with a fix <a href="https://blog.mei-home.net/posts/ceph-rook-crush-rules/">here</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 3: Ingress with Traefik</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/</link>
      <pubDate>Sat, 06 Jan 2024 02:00:43 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/</guid>
      <description>Setting up my k8s cluster ingress with Traefik</description>
      <content:encoded><![CDATA[<p>Wherein I talk about the Ingress setup for my Homelab&rsquo;s k8s cluster with Traefik.</p>
<p>This is part four of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>After the initial setup of some infrastructure like <code>external-dns</code> and <code>external-secrets</code>,
I went to work on the <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">Ingress</a>
implementation for my cluster.</p>
<p>I chose <a href="https://traefik.io/traefik/">Traefik</a> as my Ingress controller. This
was mostly driven by the fact that I&rsquo;m already using Traefik as the proxy in
front of my current Nomad cluster, and I&rsquo;ve become quite familiar with it.</p>
<p>One big advantage of Traefik is the extensive support for a wide array of
what they call <a href="https://doc.traefik.io/traefik/providers/overview/">Configuration Providers</a>.
In my current Nomad setup, I&rsquo;m making use of the Consul provider. In comparison
to software like Nginx or Apache, I can configure all proxy related config
in the <a href="https://developer.hashicorp.com/nomad/docs/job-specification/service">service block</a>
of my Nomad jobs, as labels on the Consul service definition. This allows for
centralization of the entire config related to a specific service, instead of
having two places: The config for the service&rsquo;s deployment, and the proxy config.</p>
<h1 id="networking-options">Networking options</h1>
<p>While planning my k8s cluster, I considered two different ways of doing
networking for the Ingress. The first one is to simply have the proxy using the
host&rsquo;s networking. This is the setup that I&rsquo;m currently working with in my
Nomad setup. I&rsquo;ve got the Traefik job pegged to a single host, and then I&rsquo;ve got
a hard-coded <code>A</code> entry in my DNS pointing to that machine. Traefik then listens on
port 443 and so forth. Then I&rsquo;m adding <code>CNAME</code> entries to DNS for other services
running through that proxy.</p>
<p>I set Traefik up the same way during my k8s experiments. But this has one large
downside: High availability. If the ingress host goes down, not only is Traefik
down, but also all services served through it. That doesn&rsquo;t bother me too much,
but with k8s, I had a different option: <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">Services of type LoadBalancer</a>.
This has the advantage that I no longer have to restrict Traefik to a specific
host to get a stable IP to point all the DNS entries at. Instead, the stable
IP is now supplied by Cilium, which also announces routes to those IPs to my
router.</p>
<p>The one downside of the <code>LoadBalancer</code> approach is that source IPs are not
necessarily preserved. This makes functionality like IP allow lists in Traefik
pretty useless.
The fix for this is to use <code>externalTrafficPolicy: Local</code> on the Service. This
config ensures that Cilium announces only the IPs of the hosts which currently
run a Traefik pod, and then the cluster-internal, source NAT&rsquo;ed routing does not
apply, and source IPs are preserved.</p>
<h1 id="deployment">Deployment</h1>
<p>I&rsquo;m using the <a href="https://github.com/traefik/traefik-helm-chart">official Helm chart</a>
for my deployment. Currently, I&rsquo;m only running a single replica, but that might
change in the future.</p>
<p>I will go through my <code>values.yaml</code> file piece-by-piece, to make the explanation
a bit more manageable.</p>
<p>Let&rsquo;s start with the values for the Deployment itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">deployment</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">healthchecksPort</span>: <span style="color:#ae81ff">4435</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">additionalArguments</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;--ping.entryPoint=health&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;--providers.kubernetesingress.ingressendpoint.hostname=ingress-k8s.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">logs</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">general</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">level</span>: <span style="color:#ae81ff">DEBUG</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">format</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">access</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">format</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">prometheus</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;250m&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;250M&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m not changing very much about the deployment, safe for setting a specific
health port. This is there just because it&rsquo;s the same for my Nomad Traefik.
The <code>homelab/ingress</code> label is there to be used in <code>NetworkPolicy</code> manifests
to allow access for Traefik to services proxied through it.</p>
<p>The <code>ingressendpoint</code> option is an option which ensures that <code>external-dns</code>
later just creates a <code>CNAME</code> entry for each Ingress resource pointing to the
given DNS entry, which will point to the Traefik <code>LoadBalancer</code> Service IP.</p>
<p>I&rsquo;m disabling metrics here because I have not yet set up Prometheus. The
resources assignments are simply coming from the metrics I&rsquo;ve gathered from my
Nomad Traefik deployment over the years.</p>
<p>Next, let&rsquo;s define Traefik&rsquo;s ports. I&rsquo;m staying with the ports for HTTP and HTTPS
here. There are a couple more, like the health port, but I&rsquo;m leaving them out
for the sake of brevity (yes, you are allowed to chuckle dryly now &#x1f609;).</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">traefik</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">websecure</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secureweb</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exposedPort</span>: <span style="color:#ae81ff">443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">expose</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">traefik-ingress-compression@kubernetescrd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">traefik-ingress-headers-security@kubernetescrd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">traefik-ingress-local-net@kubernetescrd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8081</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exposedPort</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">expose</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">redirectTo</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">secureweb</span>
</span></span></code></pre></div><p>The <code>traefik</code>, <code>websecure</code> and <code>metrics</code> ports are enabled in the default
<code>values.yaml</code> file of the chart, but I&rsquo;m using my own nomenclature. I will also
show the manifests for the middlewares later.</p>
<p>The port options impact two manifests generated by the chart. First, the
<a href="https://github.com/traefik/traefik-helm-chart/blob/master/traefik/templates/_podtemplate.tpl">pod template</a>,
which defines the entrypoints for all of them via CLI arguments for the Traefik
pod:</p>
<pre tabindex="0"><code>[...]
--entrypoints.secureweb.address=:8000/tcp
--entrypoints.web.address=:8081/tcp
--entrypoints.secureweb.http.middlewares=traefik-ingress-compression@kubernetescrd,traefik-ingress-headers-security@kubernetescrd,traefik-ingress-local-net@kubernetescrd
--entrypoints.secureweb.http.tls=true
--entrypoints.web.http.redirections.entryPoint.to=:443
--entrypoints.web.http.redirections.entryPoint.scheme=https
[...]
</code></pre><p>Those ports are also used in the definition of the Service:</p>
<pre tabindex="0"><code>Port:                     secureweb  443/TCP
TargetPort:               secureweb/TCP
NodePort:                 secureweb  31512/TCP
Endpoints:                10.8.4.116:8000
Port:                     web  80/TCP
TargetPort:               web/TCP
NodePort:                 web  30208/TCP
Endpoints:                10.8.4.116:8081
</code></pre><p>Traefik also provides a nice read-only dashboard to see all the configured
routes, services and so forth. It is supplied with an Ingress via the chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">ingressRoute</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">admin</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">admin-basic-auth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">healthcheck</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">health</span>
</span></span></code></pre></div><p>As you can see, this is not a default Kubernetes Ingress, but instead Traefik&rsquo;s
own Ingress definition, the <a href="https://doc.traefik.io/traefik/routing/providers/kubernetes-crd/">IngressRoute</a>.
Normal Kubernetes Ingress manifests <a href="https://doc.traefik.io/traefik/routing/providers/kubernetes-ingress/">also work fine</a>,
but they need to then supply Traefik options via annotations.</p>
<p>Next comes the Service definition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">single</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">ingress-k8s.mei-home.net</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#e6db74">&#34;10.86.55.22&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span></code></pre></div><p>With the <code>single</code> option, you can configure whether Traefik creates a single
Service for both TCP and UDP or a separate Service for each.
The <code>external-dns.alpha.kubernetes.io/hostname</code> annotation sets the DNS name
automatically configured by external-dns. I&rsquo;m also setting a fixed IP instead
of letting Cilium assign one from the pool, so I can properly configure firewall
rules.
The <code>homelab/public-service</code> label is significant, because it denotes the services
which Cilium announces. See <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">my post</a>
on using the Cilium BGP load balancer.
As noted above, <code>externalTrafficPolicy: Local</code> gives me source IP preservation.</p>
<p>The last base configuration options are for TLS, but I will go into more details
about how I manage the TLS cert later on.</p>
<h1 id="middlewares">Middlewares</h1>
<p>In Traefik, Middlewares are part of the request handling pipeline. A request
enters Traefik via any of the <a href="https://doc.traefik.io/traefik/routing/entrypoints/">EntryPoints</a>.
Then, all Middlewares are applied. These range from IP allow listing to URL
rewriting. They can be assigned to EntryPoints, which means they are getting
applied to every request, or to specific routes via Ingress or IngressRoute
configs.</p>
<p>I&rsquo;m using a couple of them, which I supply via the Helm chart&rsquo;s <code>extraObjects</code>
value:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">extraObjects</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">compression</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">compress</span>: {}
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">headers-security</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">headers</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">stsSeconds</span>: <span style="color:#ae81ff">63072000</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">stsIncludeSubdomains</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">customFrameOptionsValue</span>: <span style="color:#e6db74">&#34;sameorigin&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">contentTypeNosniff</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">referrerPolicy</span>: <span style="color:#e6db74">&#34;same-origin&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">browserXssFilter</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">local-net</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ipWhiteList</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceRange</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;10.1.1.0/24&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;192.168.1.0/24&#34;</span>
</span></span></code></pre></div><p>The first one, <code>compression</code>, just enables the <a href="https://doc.traefik.io/traefik/middlewares/http/compress/">compression</a> middleware.</p>
<p><code>headers-security</code> adds a couple of best practices headers to all requests
for security&rsquo;s sake. The last one, <code>local-net</code>, is an IP allow list for some of my Homelab
subnets.</p>
<h1 id="securing-the-dashboard">Securing the dashboard</h1>
<p>Let&rsquo;s look at the IngressRoute for the dashboard a second time, specifically
its <code>middlewares</code> option:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">admin-basic-auth</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span></code></pre></div><p>This option enables the following middleware:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">admin-basic-auth</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">basicAuth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secret</span>: <span style="color:#ae81ff">basic-auth-users</span>
</span></span></code></pre></div><p>This is a <a href="https://doc.traefik.io/traefik/middlewares/http/basicauth/">BasicAuth</a>
middleware, adding HTTP basic auth to my dashboard, just as another layer of
security.</p>
<p>This middleware expects the secret <code>basic-auth-users</code> to contain a key
<code>users</code>, where the users are listed in the following format:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">username:hashedpassword
myuser:$apr1$wpjd1k59$B5E9r2e8DUgmGWubIb/Bk/
</code></pre><p>The entries can for example be created with <a href="https://httpd.apache.org/docs/2.4/misc/password_encryptions.html">htpasswd</a>.</p>
<p>In my setup, I&rsquo;m handling secrets via my <a href="https://www.vaultproject.io/">Vault instance</a>
with <a href="https://external-secrets.io/latest/">external-secrets</a>. I&rsquo;ve described
the setup <a href="https://blog.mei-home.net/posts/k8s-migration-1-external-secrets/">here</a>.
The secret definition for the basic auth secret looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;basic-auth-users&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;15m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">users</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ printf &#34;{{ `{{ .user1 }}` }}&#34; }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">user1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;secret/my_kubernetes_secrets/cluster/ingress/auth/user1&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">val</span>
</span></span></code></pre></div><p>What happens here is that external-secrets takes the JSON object returned by
Vault for the <code>secret/my_kubernetes_secrets/cluster/ingress/auth/user1</code> path,
and then takes the <code>val</code> key in that object, putting it into <code>user1</code>. That&rsquo;s
then accessed in the template for the Kubernetes Secret.</p>
<p>The weird <code>{{ printf &quot;{{ '{{ .user1 }}' }}&quot; }}</code> syntax comes from the fact
that I&rsquo;m using Helmfile for my Helm charts management, and that puts value
files through a round of Go templating. That&rsquo;s what the outer <code>printf</code> is used
to escape. Then that value file goes through Helm&rsquo;s templating. That&rsquo;s escaped
by <code>{{  }}</code> and the backticks. And then <code>{{ .user1 }}</code> is the template that&rsquo;s
used by external-secrets.</p>
<h1 id="the-tls-certificate">The TLS certificate</h1>
<p>My TLS certificate is a wildcard certificate from Let&rsquo;s Encrypt. Sadly, my
domain registrar does not support an API for the DNS entries, so for now I have
to solve the DNS challenge manually.
I&rsquo;m using the LE cert for both, internal and external services. Mostly so that
I don&rsquo;t have to muck around with distributing a self-signed CA cert to all my
end-user devices.
After I&rsquo;ve renewed the cert, I push it to Vault and use it from there.</p>
<p>The <code>ExternalSecret</code> for getting the certs into Kubernetes looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;le-cert&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;15m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">kubernetes.io/tls</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">tls.key</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ printf &#34;{{ `{{ .privkey }}` }}&#34; }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">tls.crt</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ printf &#34;{{ `{{ .fullchain }}` }}&#34; }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">extract</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/my_kubernetes_cluster/cluster/ingress/le-cert</span>
</span></span></code></pre></div><p>The two-level escape of the <code>{{ .privkey }}</code> and <code>{{ .fullchain }}</code> templates
is again to make sure neither Helmfile nor Helm itself try to interpret the
templates.</p>
<p>Here, I&rsquo;m using a slightly different format for fetching the secret. With
<code>dataFrom</code> instead of <code>data</code>, as in the basic auth secret, I&rsquo;m getting the entire
JSON object from that path, instead of a specific key from that object.
When I push my cert to Vault, I have four keys, with the private key, the cert
itself, the cert chain and the full chain. Here, I only need the private key
and the full chain.</p>
<p>This secret is then used in Traefik&rsquo;s <a href="https://doc.traefik.io/traefik/https/tls/#certificates-stores">TLSStore</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">tlsStore</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">default</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">defaultCertificate</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">le-cert</span>
</span></span></code></pre></div><h1 id="network-policies">Network policies</h1>
<p>Before coming to an example, I also want to show the <code>NetworkPolicy</code> I&rsquo;m using:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;traefik-allow-world-only&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>With the <code>{}</code> <code>endpointSelector</code>, the policy is applied to all pods in the
namespace the policy resides in. In this particular case, that&rsquo;s only the Traefik
pod.
The <code>fromEndpoints:</code> setting in turn says that ingress should be allowed from
all pods within the same namespace. Finally the only really interesting setting
here is the <code>fromEntities: [world]</code>. This setting allows all external traffic
from nodes which are not managed by Cilium, meaning the rest of my Homelab and
especially my end-user devices.</p>
<h1 id="example-ingress">Example Ingress</h1>
<p>Last but not least, let&rsquo;s have a look at a quick example. In my
<a href="https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/#example">post about load balancers</a>,
I introduced a simple echo server and made it available via a <code>LoadBalancer</code> type
Service. With Traefik up and running, I can now switch that service to <code>ClusterIP</code>
and introduce the following <code>Ingress</code> manifest:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">networking.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">testsetup.mei-home.net</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">http</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">pathType</span>: <span style="color:#ae81ff">Prefix</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">backend</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup-service</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">port</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span></code></pre></div><p>The only Traefik specific config here is the <code>entrypoints</code> annotation, telling
Traefik to accept connections to the service on the <code>secureweb</code> entrypoint.</p>
<p>One nice thing about external-dns is that I don&rsquo;t have to provide an extra
annotation to create a DNS entry. It is automatically created from the
<code>host:</code> value.</p>
<p>Traefik will parse the Ingress and create a route, where requests are
routed by which domain they request.
Traefik then automatically routes those requests via the Kubernetes Service
and will automatically execute all the Middlewares for the <code>secureweb</code> entrypoint.</p>
<p>To ensure that Traefik can access the echo pod, I also needed another
network policy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;testsetup-deny-all-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span></code></pre></div><p>Again, this policy is applied to all pods in the namespace for my <code>testsetup</code>
pod, and it allows ingress from all pods in that namespace.
But the Traefik pod lives in another namespace, and so access needs to be
explicitly granted. That&rsquo;s what the <code>matchLabels</code> key is about, where I provide
my ingress label and, importantly, also the namespace, as that&rsquo;s part of Cilium&rsquo;s
secure identity.</p>
<p>And with that, another piece of important cluster infrastructure is up. :slight_smile:</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 2a: Switching the LoadBalancer to BGP</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/</link>
      <pubDate>Tue, 02 Jan 2024 23:00:26 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/</guid>
      <description>I decided to switch from Cilium&amp;#39;s L2 announcements to BGP</description>
      <content:encoded><![CDATA[<p>Wherein I talk about migrating from Cilium&rsquo;s L2 announcements for <code>LoadBalancer</code>
services to BGP.</p>
<p>This is an addendum to the <a href="https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/">third part</a>
of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<h1 id="bgp-instead-of-l2-announcements">BGP instead of L2 announcements?</h1>
<p>In the last post, I described my setup to make <code>LoadBalancer</code> type services
functional in my k8s Homelab with Cilium&rsquo;s <a href="https://docs.cilium.io/en/stable/network/l2-announcements/">L2 Announcements</a>
feature. While working on the next part of my Homelab, introducing Ingress with
Traefik, I ran into the issue that the source IP is not necessarily preserved
during in-cluster routing.</p>
<p>By default, packets which arrive on a node which doesn&rsquo;t have a pod of the
target service are forwarded to a node which has such a pod. During
that forwarding, source NAT is applied to the packet, overwriting the source IP
with the IP of the node where it originally arrived.
This is also described in the <a href="https://kubernetes.io/docs/tutorials/services/source-ip/">Kubernetes docs</a>.</p>
<p>This is true for both, <code>NodePort</code> and <code>LoadBalancer</code> services. I see this as a
problem specifically for Ingress proxies, as it prevents stuff like IP allow lists
and any other IP dependent functionality in the proxy. All packets would look
like they&rsquo;re coming from a cluster node. With Cilium&rsquo;s L2 announcements, they
would all have the source IP of the node which is currently announcing the
service.</p>
<p>This can be fixed with a config option on Kubernetes services, namely
<code>externalTrafficPolicy: Local</code>. This has the effect that packets are not
forwarded to another node if the one they arrive on doesn&rsquo;t have a pod of the
target service. The default mode is <code>Cluster</code>, where packets are forwarded to
other nodes, but with the downside of sNAT.</p>
<p>Now, at some point, while reading into L2 announcements and the <code>externalTrafficPolicy</code>
option, I read that the <code>Local</code> setting doesn&rsquo;t work properly with the ARP based
L2 announcements.
But now, I can&rsquo;t find that anywhere anymore. &#x1f614;</p>
<p>This was my main trigger, but there are a couple of additional downsides
of the L2 announcements feature. First, it produces a lot of load on the
kube-apiserver. I went into a bit of detail
<a href="https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/#performance">in my previous post</a>.</p>
<p>Then there&rsquo;s the fact that with the L2 announcements feature, there&rsquo;s also no
real load balancing. Due to how ARP works, there can only ever be one node which
announces the service IP, and so only that node will ever receive traffic for
that service.
Combined with what I previously wrote, this also means that if you want to have
a service with preserved source IPs and multiple pods, you&rsquo;re out of luck. With
<code>externalTrafficPolicy: Local</code>, packets will never be forwarded to another node&rsquo;s
pod, regardless of how many there are. The current announcer will have to carry
all of the load, and any other pods on other nodes will only ever be idle.</p>
<p>To be entirely honest, that&rsquo;s not going to be too much of a problem in my
Homelab. I&rsquo;m currently running exactly no jobs with more than one replica.
But hey, who knows? At some point, my writing might really take off and I might
need three instances serving my blog. &#x1f609;</p>
<h1 id="bgp">BGP</h1>
<p>So instead of the ARP based L2 announcements, it&rsquo;s now going to be Cilium&rsquo;s beta
<a href="https://docs.cilium.io/en/stable/network/bgp-control-plane/">BGP control plane</a>
feature.</p>
<p>I really don&rsquo;t know enough about the protocol, so I&rsquo;m not going to annoy you with
my 1 day old half-knowledge here.</p>
<p>Suffice it to say that with BGP, routers can exchange routes, mostly telling
their peers which networks they can reach.</p>
<p>In the Kubernetes <code>LoadBalancer</code> application, Cilium will announce routes to
the individual <code>LoadBalancer</code> service IPs through a group of cluster nodes.
A route announcement could look like this:</p>
<pre tabindex="0"><code>10.86.55.1/32 via 10.86.5.206
</code></pre><p>That would tell the peer that it can reach the service IP <code>10.86.55.1/32</code> via
the Kubernetes host <code>10.86.5.206</code>. Here, the <code>10.86.5.206</code> host is hanging off
of a switch directly connected to my router, so the router already knows how
to reach it. With the above announcement, it now also knows to forward packets
targeted at <code>10.86.55.1</code> to <code>10.86.5.206</code>, where Cilium will then forward it to
a pod of the target service.</p>
<p>One of the advantages over the Layer 2 ARP protocol used by L2 announcements
is that a completely different, non-routable subnet can be used for the service
IPs.</p>
<p>There are two parts to the setup, one is configuring the router and the other is
configuring Cilium.</p>
<p>One thing to decide on before continuing is the <em>Autonomous System Number</em>.
This number is an identifier for autonomous networks. Similar to IPs, there is
a range of ASNs for private usage which will never be handed out to the public
Internet. It is the range <code>64512–65534</code>. For more infos, have a look at the ASN
table in the <a href="https://en.wikipedia.org/wiki/Autonomous_system_(Internet)#ASN_table">Wikipedia</a>.
While you can use different ASNs for the router and Cilium, it is not necessary,
and I will continue with the same ASN, <code>64555</code>, for both.</p>
<h1 id="router-setup">Router setup</h1>
<p>The first step to using BGP is setting it up on the router. I&rsquo;m using OPNsense
here and will describe the setup. If you&rsquo;re using a different router,
you can adapt the instructions.</p>
<h2 id="generic-instructions">Generic instructions</h2>
<p>To setup BGP in the router, you need a piece of software which listens on port
<code>179</code> by default, receiving route announcements from peers and sending route
announcements to them.
OPNsense uses a plugin which installs <a href="https://frrouting.org/">FRRouting</a>,
which can also be used standalone if you are for example running a Linux host
as a router.</p>
<p>Once you&rsquo;ve enabled BGP, you will need to add all the k8s nodes you would like
to participate in BGP as peers to the router. At least in OPNsense, this means
simply adding the node&rsquo;s routable IP and the Cilium ASN as the node&rsquo;s ASN.</p>
<p>One very important point that cost me quite some time: Don&rsquo;t forget to make sure
that the Kubernetes cluster nodes participating in BGP can actually reach port
<code>179/TCP</code> on your router. I spend quite a while trying to figure out why my
router and Cilium won&rsquo;t peer. &#x1f611;</p>
<h2 id="opnsense-configuration">OPNsense configuration</h2>
<p>For OPNsense, the first step is to go to <code>System</code> -&gt; <code>Firmware</code> -&gt; <code>Plugins</code>
and install the <code>os-frr</code> plugin, which is OPNsense&rsquo;s way to install <a href="https://frrouting.org/">FRROuting</a>.
Once that&rsquo;s done, a new top level menu entry called <code>Routing</code> will appear.</p>
<p><strong>Note:</strong> This is not the <code>System</code> -&gt; <code>Routes</code> menu!</p>
<p>Then, enable the general routing functionality, which starts the necessary
daemons:</p>
<figure>
    <img loading="lazy" src="routing-general.png"
         alt="A screenshot of the OPNsense UI for routing. In the menu on the left, the menu item &#39;General&#39; under the top-level entry &#39;Routing&#39; is chosen. In the configs, the &#39;Enable&#39; checkbox is checked. So is the &#39;Enable logging&#39; checkbox."/> <figcaption>
            <p>Screenshot of the Routing -&gt; General UI.</p>
        </figcaption>
</figure>

<p>Hit <code>Save</code> after you&rsquo;ve checked <code>Enable</code>.</p>
<p>Next, go to <code>BGP</code> and also check <code>enable</code>. Under <code>BGP AS Number</code>, enter the ASN
you chose from the private range.
As I don&rsquo;t need OPNsense redistributing any routes, I&rsquo;ve left the <code>Route Redistribution</code>
drop-down at <code>Nothing selected</code>. I&rsquo;ve left the <code>Network</code> field empty for the
same reason.</p>
<figure>
    <img loading="lazy" src="bgp-general.png"
         alt="A screenshot of the OPNsense UI. In the menu on the left, the sub-item &#39;GP&#39; is chosen under &#39;Routing&#39;. The active tab is &#39;General&#39;. The checkboxes labeled &#39;enable&#39; and &#39;Log Neighbour Changes&#39; are checked. The field &#39;BGP AS Number&#39; has the value 64555. The field &#39;Network&#39; is empty, while the drop-down &#39;Route Redistribution&#39; contains the value &#39;Nothing selected&#39;."/> <figcaption>
            <p>My config for the BGP -&gt; General config.</p>
        </figcaption>
</figure>

<p>The next step is adding the neighbors. For each of the Kubernetes hosts which
should announce routes, click on the <code>+</code> in the bottom right corner of the
<code>BGP</code> -&gt; <code>Neighbors</code> tab and enter the following information:</p>
<ul>
<li>A description so you know which host it is. I&rsquo;m just using the hostname</li>
<li>Under <code>Peer-IP</code>, add the IP of the Kubernetes host</li>
<li>Under <code>Remote AS</code>, enter the ASN you chose from the private range</li>
<li>Under <code>Update-Source Interface</code>, set the interface from which the Kubernetes host
is reachable</li>
</ul>
<p>I left all the checkboxes unchecked, and did not set anything in the
<code>Prefix-List</code> or <code>Route-Map</code> fields:</p>
<figure>
    <img loading="lazy" src="bgp-neighbor.png"
         alt="Another screenshot of the OPNsense UI. It is headed &#39;Edit Neighbor&#39;. The &#39;Description&#39; field has the value &#39;My new shiny Raspberry Pi 5&#39;. The &#39;Peer-IP&#39; field is set to &#39;10.86.5.512&#39;, while &#39;Remote AS&#39; is set to &#39;64555&#39;. The &#39;Update-Source Interface&#39; is set to &#39;VLANHomelab&#39;. All checkboxes besides &#39;Enabled&#39; are unchecked. All &#39;Prefix-List&#39; and &#39;Route-Map&#39; drop-downs are set to &#39;None&#39;."/> <figcaption>
            <p>Example entry for a new neighbor.</p>
        </figcaption>
</figure>

<p>Here I&rsquo;ve got a question to my readers: Isn&rsquo;t there a better way than adding
every single Kubernetes worker host as a peer here? It just feels like unnecessary
manual work, but I didn&rsquo;t find any other info on it.</p>
<p>With all of that done, the router config is complete.</p>
<p><strong>As noted above, don&rsquo;t forget to open port <code>179/TCP</code> on your firewall!</strong></p>
<h3 id="addendum-2024-02-04">Addendum 2024-02-04</h3>
<p>I encountered an error later, when I really started using the Cilium LB. I&rsquo;ve
described it in <a href="https://blog.mei-home.net/posts/k8s-migration-2b-asymmetric-routing/">this post</a>.</p>
<p>In short, if you have a situation like this:</p>
<ul>
<li>LoadBalancer service setup as described in this post</li>
<li>Host in the same subnet as your Kubernetes nodes trying to use LoadBalancer service</li>
<li>LoadBalancer IPs assigned with different subnet than those hosts</li>
</ul>
<p>You will end up with asymmetric routing. Your packets from the host accessing
the service will go through OPNsense, as the packets need to be routed. But the
return path of the packets will be direct, as the k8s nodes and the host using
the service are in the same subnet.</p>
<p>You will then need to do the following:</p>
<ol>
<li>Switch the &ldquo;State Type&rdquo; for all rules allowing access from the subnet to
the LoadBalancer IPs to &ldquo;sloppy state&rdquo;, as OPNsense will only ever see one
side of a connection attempt and consequently block the connection</li>
<li>Create an OUTGOING firewall rule which allows the k8s subnet to access the
LoadBalancer IP as well as an INCOMING rule. I&rsquo;m not sure why this works
right now, but it seems to be necessary, at least in my setup.</li>
</ol>
<h1 id="cilium-setup">Cilium Setup</h1>
<p>The documentation for the Cilium BGP feature can be found <a href="https://docs.cilium.io/en/stable/network/bgp-control-plane/">here</a>.</p>
<p>The first step of the setup is enabling the BGP functionality. As I&rsquo;m using
Helm to deploy Cilium, I&rsquo;m adding this option to my <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">bgpControlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>Similar to the L2 announcement, the BGP functionality needs something which
hands out IP addresses to the <code>LoadBalancer</code> services. This can be done with
Cilium&rsquo;s <a href="https://docs.cilium.io/en/stable/network/lb-ipam/">Load Balancer IPAM</a>.</p>
<p>As I&rsquo;ve noted above, because BGP in contrast to L2 ARP announces routes, it is
easier to chose a CIDR which does not overlap with the subnet the Kubernetes nodes
are located in. In my case, the <code>CiliumLoadBalancerIPPool</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumLoadBalancerIPPool</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-lb-ipam</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cidrs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">cidr</span>: <span style="color:#e6db74">&#34;10.86.55.0/24&#34;</span>
</span></span></code></pre></div><p>I&rsquo;ve chosen only a single <code>/24</code>, as I don&rsquo;t expect to ever reach 254 LoadBalancer
services. Most of my services will run through my Traefik Ingress instead of being
directly exposed.</p>
<p>The second part of the Cilium config is the BGP peering policy. It sets up the
details of how to peer, what to announce and with whom the peering should happen.</p>
<p>For me, it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/role</span>: <span style="color:#ae81ff">worker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;10.86.5.254/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>A couple of things to note: There can be multiple neighbors that Cilium peers
with. In my case though, I&rsquo;ve only got the one OPNsense router, which is
reachable under <code>10.86.5.254</code> from the Kubernetes nodes. I&rsquo;m using the same ASN
as I used for the router&rsquo;s BGP setup, <code>64555</code>. I didn&rsquo;t see any reason for why
I should have different ASNs.</p>
<p>The <code>nodeSelector</code> ensures that only my worker nodes announce routes.</p>
<p>Important to note is also the <code>serviceSelector</code>. A missing <code>serviceSelector</code> is
notably not an error. It just means that Cilium won&rsquo;t announce any routes
for <code>LoadBalancer</code> services.</p>
<p>If you&rsquo;d like to, you can also have Cilium announce routes to the actual pods,
by setting <code>exportPodCIDR</code> to <code>true</code>.</p>
<h1 id="running-example">Running Example</h1>
<p>With my current k8s Homelab, I have configured my three worker nodes as
neighbors in OPNsense. I&rsquo;ve also got the following service running for my Ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">ingress-k8s.mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">nodePort</span>: <span style="color:#ae81ff">31512</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span></code></pre></div><p>This is a simplified version of the service the Traefik Helm chart automatically
creates for me.
Important here are the <code>type: LoadBalancer</code> and the <code>externalTrafficPolicy: Local</code>
settings.</p>
<p>It currently has the following IP:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n traefik-ingress service
</span></span><span style="display:flex;"><span>NAME              TYPE           CLUSTER-IP     EXTERNAL-IP   PORT<span style="color:#f92672">(</span>S<span style="color:#f92672">)</span>        AGE
</span></span><span style="display:flex;"><span>traefik-ingress   LoadBalancer   10.7.122.207   10.86.55.5    443:31512/TCP  32h
</span></span></code></pre></div><p>And here is the culmination of this entire article:</p>
<figure>
    <img loading="lazy" src="routing-result.png"
         alt="The final OPNsense screenshot. It shows a table with one row. In the &#39;Network&#39; column, it has the value &#39;10.86.55.5/32&#39;. The &#39;Next Hop&#39; col has the value &#39;10.86.5.206&#39;. The &#39;Path&#39; column says &#39;Internal&#39;, with the &#39;Origin&#39; saying &#39;IGP&#39;."/> <figcaption>
            <p>Example routing table</p>
        </figcaption>
</figure>

<p>So there we are. There&rsquo;s only one Traefik pod running for the moment. And it&rsquo;s
running on the node with the IP <code>10.86.5.206</code>. As I said in the beginning,
with <code>externalTrafficPolicy: Local</code>, only the nodes which host pods of a given
service announce routes to themselves. This prevents intra-cluster routing and
preserves the source IP.</p>
<p>I also had a trial with <code>externalTrafficPolicy: Cluster</code>, and in that case all
three of my current cluster nodes announce the service IP to OPNsense.</p>
<p>Finally, another request to my readers: Do you have a favorite book about
networking? I was initially completely lost (and as you see from my explanation
of BGP, still mostly am) reading about BGP and even ARP when I was working on
the L2 announcement. It&rsquo;s the one big glaring hole in my Homelab knowledge.
Took me ages to get started on VLANs as well, for example.</p>
<p>So if you&rsquo;ve got a favorite book about current important networking tech and
protocols, drop me a note <a href="https://social.mei-home.net/@mmeier">on the Fediverse</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 2: Setting up Cilium as the Load Balancer</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/</link>
      <pubDate>Sat, 30 Dec 2023 10:22:34 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/</guid>
      <description>Fun with ARP</description>
      <content:encoded><![CDATA[<p>This is the third part of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>This time, I will be talking about using <a href="https://cilium.io/">Cilium</a> as the load
balancer for my Kubernetes cluster with L2 announcements.</p>
<h1 id="but-why">But Why?</h1>
<p>A couple of days ago, I was working on setting up my Traefik ingress for the
cluster. While doing so, I yet again had to do a couple of things that just felt
weird and hacky. The most prominent of those was using <code>hostPort</code> a lot when
setting up the pod.</p>
<p>In addition, I would also pin the Traefik pod to a specific host and provide
a DNS entry for that host, all hardcoded.</p>
<p>All of this has a couple of downsides. First, if that ingress host running
Traefik is down, so is my entire cluster, at least as seen from the outside.
Furthermore, using <code>hostPort</code> and a fixed host also has a problem with the
<code>RollingUpdate</code> strategy. Because the ports and the host are fixed, Kubernetes
cannot start a fresh pod before the old pod has been killed.</p>
<p>More generally speaking, there&rsquo;s also the fact that most examples and tutorials,
as well as most Helm chart defaults assume that <code>LoadBalancer</code> type services
work.</p>
<h1 id="and-with-what">And with what?</h1>
<p>Initially, I looked at two potential load balancer implementations. These were
<a href="https://kube-vip.io/">kube-vip</a> and <a href="https://metallb.universe.tf/">MetalLB</a>.
I was initially leaning towards kube-vip, if for no other reason than that I
had kube-vip already running on my control plane nodes, providing the VIP for
the k8s API endpoint.</p>
<p>But while researching, I found out that newer versions of Cilium also had
load balancer functionality. Reading through it, it sounded like it had all
the features I wanted. Its biggest advantage is the simple fact that it doesn&rsquo;t
need me to install any additional components into the Kubernetes cluster. It&rsquo;s
just a couple of configuration changes in Cilium, plus two more maninfests.</p>
<h1 id="interlude-migrating-the-cilium-install-to-helm">Interlude: Migrating the Cilium install to Helm</h1>
<p>Before I started, I decided to change my Cilium install approach. Up to now,
I had Cilium installed via the Cilium CLI, as described in their <a href="https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/">Quick Start Guide</a>.</p>
<p>There is one pretty big downside in this approach in my mind: It&rsquo;s manual
invocations of a tool, with a specific set of parameters. It&rsquo;s also not simple
to put under version control properly. Sure, I could always create a bash script
which contains the entire invocation with the right parameter, but that&rsquo;s
just not too nice.</p>
<p>So instead of having to document somewhere with which command line parameters
I needed to invoke the Cilium CLI, I switched it all over to Helm and Helmfile,
so now it&rsquo;s treated like everything else in the cluster.</p>
<p>The migration was pretty painless, because in the background, the Cilium CLI
already just calls Helm.</p>
<p>So for the migration, I first needed to get the translation of the command line
parameters into the Helm values for my running install. That can be done with
Helm like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>helm get values cilium -n kube-system -o yaml
</span></span></code></pre></div><p>I then put those values into a <code>values.yaml</code> file for use with Helmfile.
The Helmfile config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://helm.cilium.io/</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">cilium/cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v1.14.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./value-files/cilium.yaml</span>
</span></span></code></pre></div><p>The <code>cilium.yaml</code> values file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">encryption</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">wireguard</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ipam</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">cluster-pool</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">operator</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">clusterPoolIPv4PodCIDRList</span>: <span style="color:#ae81ff">10.8.0.0</span><span style="color:#ae81ff">/16</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">k8sServiceHost</span>: <span style="color:#ae81ff">api.k8s.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">k8sServicePort</span>: <span style="color:#ae81ff">6443</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeProxyReplacement</span>: <span style="color:#ae81ff">strict</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">operator</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">serviceAccounts</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cilium</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">operator</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-operator</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">tunnel</span>: <span style="color:#ae81ff">vxlan</span>
</span></span></code></pre></div><p>With this config, there&rsquo;s no redeployment necessary, it is equivalent to
what the Cilium CLI does.</p>
<h1 id="cilium-l2-announcements-setup">Cilium L2 announcements setup</h1>
<p>Cilium (and load balancers in general, it seems) have two modes for announcing
IPs of services. The more complex one is the <a href="https://docs.cilium.io/en/stable/network/bgp-control-plane/">BGP mode</a>.
In this mode, Cilium would announce routes to the exposed services. This needs
an environment where BGP is configured. I decided to skip this approach, as
my network knowledge in general isn&rsquo;t that great. I&rsquo;ve only got a relatively
hazy idea what the BGP protocol even does.</p>
<p>So I settled on the simpler approach, <a href="https://docs.cilium.io/en/latest/network/l2-announcements/">L2 Announcements</a>.
In this approach, all Cilium nodes in the cluster take part in a leader election
for each of the services which should be exposed and receive a virtual IP. The
node which wins the election then answers any ARP requests asking for the MAC
address of the node with the service virtual IP.
The node then regularly renews a lease in Kubernetes to signal to all other
nodes in the cluster that it&rsquo;s still there. If a lease isn&rsquo;t renewed in a certain
time frame, another node takes over the ARP announcements.</p>
<p>One consequence of this approach is the fact that this is not true load balancing.
All traffic for a given service will always arrive at one specific node. From
the documentation, this is different when using the BGP approach, as that approach
does provide true load balancing.
But what the L2 announcements approach does provide is fail over, and this is
all that I really care about for my setup, at least for now.</p>
<h2 id="cilium-config">Cilium config</h2>
<p>The first step in enabling L2 announcements is to enable the Helm option:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">l2announcements</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>Once that was done, I had the problem that nothing seemed to happen at all.
It turns out that the Helm options are written into a <code>ConfigMap</code> in the Cilium
Helm chart, which is then read by the Cilium pods. And the pods are not
restarted automatically. So to get the option to take any effect, I had to
run the following two commands after deploying the updated Helm chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rollout restart -n kube-system deployment cilium-operator
</span></span><span style="display:flex;"><span>kubectl rollout restart daemonset -n kube-system cilium
</span></span></code></pre></div><p>Then the option was active. You can see the active options in the log output
of the <code>cilium-operator</code> and <code>cilium</code> pods if you ever want to check what the
pods are actually running with.</p>
<p>If anybody out there has any idea what I might have done wrong, needing those
manual <code>rollout restart</code> calls, please ping me on <a href="https://social.mei-home.net/@mmeier">Mastodon</a>.</p>
<p>But still, nothing happens just from enabling the option. There are two manifests
which need to be deployed.</p>
<h2 id="load-balancer-ip-pools">Load balancer IP pools</h2>
<p>First, a <code>CiliumLoadBalancerIPPool</code> manifest needs to be deployed. This manifest
controls the pools of IPs which are handed out to <code>LoadBalancer</code> type services.
In my setup, the manifest looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumLoadBalancerIPPool</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-lb-ipam</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cidrs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">cidr</span>: <span style="color:#e6db74">&#34;10.86.5.80/28&#34;</span>
</span></span></code></pre></div><p>It defines a relatively small IP range, as I don&rsquo;t expect to expose too many
services. Most of what I will expose will run through the ingress service.
Documentation on the pools and additional options can be found <a href="https://docs.cilium.io/en/stable/network/lb-ipam/">here</a>.</p>
<h2 id="l2-announcement-policies">L2 announcement policies</h2>
<p>The second piece of config is the configuration for which services should get
an IP and which nodes should do the L2 announcements. This is done via a
<code>CiliumL2AnnouncementPolicy</code> manifest, which is documented <a href="https://docs.cilium.io/en/latest/network/l2-announcements/">here</a>.</p>
<p>For me, the config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumL2AnnouncementPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-lb-all-services</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/role</span>: <span style="color:#ae81ff">worker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">loadBalancerIPs</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This restricts the announcements to only happen from my worker nodes, not from
the control plane or Ceph nodes.</p>
<p>In addition, I&rsquo;m adding a <code>serviceSelector</code> here, so that only certain services
get an IP and are announced. This is necessary due to <a href="https://github.com/cilium/cilium/issues/28752">this bug</a>.
The bug leads to all services being considered for L2 announcements, regardless
of whether they are of type <code>LoadBalancer</code> or not. This doesn&rsquo;t make much
sense at all, and also costs performance, which I will go into in a later section.</p>
<h1 id="example">Example</h1>
<p>With all of that config done, let&rsquo;s have a look at an example. I used the
following deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">echo-server</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">jmalloc/echo-server</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span></code></pre></div><p>This is just a simple echo server which returns a bit of information on the
HTTP request it received.
Then this is the service for exposing that pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup-service</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">testsetup.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">http-port</span>
</span></span></code></pre></div><p>As noted above, only services with the <code>homelab/public-service=&quot;true&quot;</code> label
are handled by the Cilium L2 announcements. In addition, I&rsquo;m supplying the
service with an external-dns hostname to get an automated DNS entry.
In short, any requests which reach the service IP on port <code>80</code> are forwarded
to port <code>8080</code> in the pod, which is where the <code>echo-server</code> is listening.</p>
<p>One very important thing to note: Use <strong>curl</strong> for testing! Ping won&rsquo;t work,
as the service IP does not answer to <code>ping</code>.
When starting to debug, first check whether the service got an IP assigned:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n testsetup service
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>NAME                TYPE           CLUSTER-IP     EXTERNAL-IP   PORT<span style="color:#f92672">(</span>S<span style="color:#f92672">)</span>        AGE
</span></span><span style="display:flex;"><span>testsetup-service   LoadBalancer   10.7.174.128   10.86.5.93    80:32206/TCP   14h
</span></span></code></pre></div><p>The important part here is the <code>EXTERNAL_IP</code>.
Next, check whether there is a Kubernetes lease created by anyone , signaling
that the node is announcing the service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n kube-system leases.coordination.k8s.io
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>NAME                                            HOLDER       AGE
</span></span><span style="display:flex;"><span>cilium-l2announce-testsetup-testsetup-service   sehith       13h
</span></span></code></pre></div><p>You can also use <code>arping</code> to check whether there&rsquo;s anyone announcing the IP:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>arping 10.86.5.93
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">58</span> bytes from 00:16:3e:17:a4:31 <span style="color:#f92672">(</span>10.86.5.93<span style="color:#f92672">)</span>: index<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> time<span style="color:#f92672">=</span>253.747 usec
</span></span></code></pre></div><p>Important to note: <code>arping</code> will only work from within the same subnet, as ARP
is a layer 2 protocol. Ask me how much time I spend trying to figure out why
I didn&rsquo;t get an answer on an <code>arping</code> from a separate subnet. &#x1f609;</p>
<h1 id="performance">Performance</h1>
<p>One last point I&rsquo;ve got to bring up is the efficiency of Cilium&rsquo;s L2 load
balancer approach.</p>
<p>As noted, <a href="https://github.com/cilium/cilium/issues/28752">this bug</a> made
Cilium announce every service in my cluster initially, <code>type=LoadBalancer</code> or
not.</p>
<p>This produced quite a high load increase on one of my control plane nodes:</p>
<figure>
    <img loading="lazy" src="control_load.png"
         alt="A screenshot of a Grafana plot, titled &#39;CPU Utilization&#39;. The y axis shows CPU utilization in percent, going from 82% at the bottom to 100% at the top. The x axis shows time, from 21:05 to 23:10. At the beginning, the plot shows about 89% idle load for the CPU. At around 21:20, the idle load is reduced to about 87% idle. This level of load is held until about 23:03, when the idle load increases back to about 88%."/> <figcaption>
            <p>L2 announcements were enabled for the first time around 21:20. At around 23:03, I reduced the L2 announcements to a single service, instead of 5.</p>
        </figcaption>
</figure>

<p>The CPU load on this 4 core control node was increased by about 2% during the time
where Cilium had to announce the 5 services I had defined in my cluster.
This is most likely all API server/etcd load, as Cilium uses Kubernetes'
<a href="https://kubernetes.io/docs/concepts/architecture/leases/">leases</a> functionality.
For every L2 announcement, all nodes continuously check whether the current lease
holder is still holding the lease, so that another node can take over if the
one which previously did the announcement for the service failed for some reason.</p>
<p>This 2% load increase was from only five services with three nodes in the cluster.
My cluster will very likely end up with 9 worker nodes in the end, and possibly
more than 5 services. I really don&rsquo;t like where that might lead.</p>
<p>I will have to keep my eye on this while I migrate more hosts and services over
from Nomad. If it gets too bad, I will have to return to this topic and try out
MetalLB, or potentially go ahead and have a look at BGP after all.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 1: Setting up external-secrets</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-1-external-secrets/</link>
      <pubDate>Tue, 26 Dec 2023 17:29:59 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-1-external-secrets/</guid>
      <description>My setup for external-secrets with HashiCorp Vault</description>
      <content:encoded><![CDATA[<p>This is the second post in my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I will skip the cluster setup itself in this series, as I did not make many
changes compared to my <a href="https://blog.mei-home.net/posts/kubernetes-cluster-setup/">experimental setup</a>.</p>
<p>Instead I will start with my very first deployed service, <a href="https://external-secrets.io/latest/">external-secrets</a>.</p>
<h1 id="motivation">Motivation</h1>
<p>In my initial experimentation, I decided to not go with any secrets management
and instead use <a href="https://helmfile.readthedocs.io/en/latest/remote-secrets/#remote-secrets">Helmfile&rsquo;s secret handling</a>.
But I&rsquo;ve come around to the fact that having some sort of service which can
automatically take in secrets from my <a href="https://www.vaultproject.io/">Vault</a>
instance would be pretty nice to have.
One trigger was the fact that while setting up a number of services, I found
that Helmfile&rsquo;s approach for getting secrets was not actually that great.</p>
<p>So what does external-secrets do? It is a connector between Kubernetes Secrets
and an external secrets provider. In my case, that&rsquo;s HashiCorp&rsquo;s Vault.
With external-secrets, an operator is set up. This operator watches for new
objects of type <code>ExternalSecret</code>. When one of those appears, it reads the
object&rsquo;s values and contacts Vault to download the secrets. Then, external-secrets
creates a new Kubernetes Secret with that secret matter collected from the
external secrets provider for use in the Kubernetes cluster.</p>
<h1 id="vault-setup">Vault setup</h1>
<p>Before I could deploy external-secrets, I had to do some reconfiguration of my
Vault setup. I&rsquo;m managing all of the setup for Vault in Terraform.</p>
<p>The first step was creating a rather restrictive policy for the external-secrets
access, as my Vault doesn&rsquo;t just provide secrets for my workloads, but also for
my Ansible playbooks and image generation setup. For now, I&rsquo;m planning to
restrict access to just the <a href="https://developer.hashicorp.com/vault/docs/secrets/kv/kv-v1">Vault kv secrets store</a>,
and only particular paths therein.
A policy for that might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;secret/my_kubernetes_secrets/cluster/*&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;read&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>With that, if my k8s cluster ever gets breached, the attacker will at most have
access to the Kubernetes specific secrets.
This policy is then added to Vault via Terraform like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_policy&#34; &#34;external-secrets&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>  policy <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;path-to-file.hcl&#34;</span>)
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Policies as short as this could also be added verbatim instead of having a
separate file and loading that, but I like it better like this.</p>
<p>The second part of the Vault setup is the authentication. For this I chose
Vault&rsquo;s <a href="https://developer.hashicorp.com/vault/docs/auth/approle">AppRole</a>,
which is intended for use cases exactly like this. I did not actually have that
auth backend configured yet, so I added it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;approle&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>  local <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I just kept the default mount path. In addition to mounting the backend, I also
needed to create a role for external-secrets. For my setup, it looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_approle_auth_backend_role&#34; &#34;external-secrets&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_auth_backend</span>.<span style="color:#66d9ef">approle</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  role_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>  token_policies <span style="color:#f92672">=</span> [<span style="color:#66d9ef">vault_policy</span>.<span style="color:#66d9ef">external</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">secrets</span>.<span style="color:#66d9ef">name</span>]
</span></span><span style="display:flex;"><span>  secret_id_bound_cidrs <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;10.1.1.0/24&#34;</span>]
</span></span><span style="display:flex;"><span>  token_bound_cidrs <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;10.1.1.0/24&#34;</span>]
</span></span><span style="display:flex;"><span>  token_explicit_max_ttl <span style="color:#f92672">=</span> <span style="color:#ae81ff">86400</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This creates an application role with the previously created access policy and
the default policy attached. The default policy just allows things like looking
up your own token but doesn&rsquo;t grant any secret access.
For additional security, I also configured restricted CIDRs for both the
<code>secret-id</code>, which is used to log in, and the tokens produced for the role after
login. This restricts the IPs from which logins can happen and after that
restricts the IPs from which the generated tokens can be used.
For purely best practices reasons, I also restricted the max TTL for tokens
created for this role to 24h.</p>
<p>What I did decide to not do here was to also set a TTL for the <code>secret_id</code>. This
is due to the fact that while external-secrets can renew tokens if they become
invalid, it cannot automatically get a new <code>secret_id</code>. So I&rsquo;ve added the
<code>secret_id</code> to my regular manual secrets rotation plan. I definitely need to
write a playbook or script to do all of those rotations at some point. &#x1f62c;</p>
<p>Once all of the above has been configured and Terraform has been executed,
there are two pieces of information needed to configure external-secrets.
The first one is the AppRole <code>role-id</code>. It can be collected via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault read auth/approle/role/external-secrets/role-id
</span></span></code></pre></div><p>The second piece is the <code>secret_id</code>. A fresh one is generated and shown every
time the following Vault command is executed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write -force auth/approle/role/external-secrets/secret-id
</span></span></code></pre></div><p>The <code>-force</code> is required here because normally Vault needs at least some input
parameters, but in this case I didn&rsquo;t need any.</p>
<p>Finally, I stored the <code>secret_id</code> in the Vault KV store for later access by
my external-secrets deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/my_kubernetes_secrets/role-secret secret-id<span style="color:#f92672">=</span>-
</span></span></code></pre></div><p>In theory, I could also have gotten the <code>secret_id</code> via Terraform and then
written it to the KV store also via Terraform. But that would have meant that
the <code>secret_id</code> would have ended up in the Terraform state. Not optimal.</p>
<h1 id="kubernetes-deployment">Kubernetes deployment</h1>
<p>With all of the Vault config now prepared, the next step is to actually deploy
external-secrets. And this went relatively well. I used the
<a href="https://github.com/external-secrets/external-secrets/tree/main/deploy/charts/external-secrets">official Helm chart</a>.</p>
<p>I&rsquo;m using <a href="https://github.com/helmfile/helmfile">Helmfile</a> for managing the
deployments in my Kubernetes cluster. I will not go into the details here, but
I&rsquo;ve got a draft for a post on my deployment setup almost done and will finish
it after this post.</p>
<p>My <code>values.yaml</code> file for the Helm chart looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">approleSecretId</span>: {{ <span style="color:#e6db74">&#34;ref+vault://secret/my_kubernetes_secrets/role-secret#/secret-id&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">approleId</span>: {{ <span style="color:#e6db74">&#34;ref+vault://auth/approle/role/external-secrets/role-id#/role_id&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">external-secrets</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">webhook</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certManager</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The <code>ref+vault</code> syntax uses <a href="https://helmfile.readthedocs.io/en/latest/remote-secrets/">Helmfile&rsquo;s secret management</a>
to get the AppRole credentials from my Vault instance during deployment.
The <code>caBundle</code> value will later be used to supply the SecretStore with my
internal CA so external-secrets can validate the TLS cert coming from my Vault
instance. I will go over this in detail later.</p>
<p>The values under <code>external-secrets</code> are the actual values for the external-secrets
Helm chart, as that chart is managed as a dependency.
I&rsquo;m not doing anything special here, just explicitly disabling the <code>serviceMonitor</code>.
This is mostly so that I can later grep over my Homelab repo and find all apps
providing service monitors once I&rsquo;ve deployed Prometheus.</p>
<h1 id="enabling-the-vault-secrets-store">Enabling the Vault secrets store</h1>
<p>In external-secrets, the different supported secrets providers can be enabled
separately via <code>SecretStore</code> or <code>ClusterSecretStore</code> manifests. I decided to
use the <code>ClusterSecretStore</code>, as providing per-namespace stores didn&rsquo;t look
like they would make much sense.
My thinking here is that yes, I could provide one store per namespace, which
would mean one store per deployed app. I could then create different roles for
each of these stores in Vault and give them highly restrictive policies to only
access what they really need.</p>
<p>But in Kubernetes, it&rsquo;s not the pods themselves which have access to the
Secrets and the secrets stores. It&rsquo;s the admins and operators which create and
write manifests. In the case of this cluster, that&rsquo;s only me. And I&rsquo;ve already
got all the permissions there are. So if somebody were to get into my Kubernetes
account, they would have access to anything anyway. So it didn&rsquo;t make much sense
to me to work with different secret stores.</p>
<p>Without further delay, here is my <code>ClusterSecretStore</code> manifest:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">vault</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;https://vault.example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-internal-ca</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">caCert</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;secret&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">version</span>: <span style="color:#e6db74">&#34;v1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">appRole</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#75715e"># RoleID configured in the App Role authentication backend</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">roleId</span>: {{ <span style="color:#ae81ff">.Values.approleId }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#75715e"># Reference to a key in a K8 Secret that contains the App Role SecretId</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;my-approle-secret&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;secretId&#34;</span>
</span></span></code></pre></div><p>In addition to this, I&rsquo;m also deploying two more secrets, one for my internal
CA and one with the AppRole <code>secret_id</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-approle-secret</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretId</span>: {{ <span style="color:#ae81ff">.Values.approleSecretId | b64enc }}</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-internal-ca</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><p>These are getting their values from the following lines in the <code>values.yaml.gotmpl</code>
file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">approleSecretId</span>: {{ <span style="color:#e6db74">&#34;ref+vault://secret/my_kubernetes_secrets/role-secret#/secret-id&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>As mentioned before, I&rsquo;m using Helmfile&rsquo;s templating capabilities here. If
you&rsquo;re not using Helmfile, you will have to get the secrets created in a different
way.
For me, this approach has the advantage of having absolutely everything under
version control while not exposing any secrets.</p>
<p>While trying to deploy the <code>ClusterSecretStore</code>, I hit two problems I will
describe in detail in a later section.</p>
<p>For now, the above config works.</p>
<h1 id="deploying-a-secret">Deploying a secret</h1>
<p>To test the setup, I created a fresh dummy secret in Vault. First, I pushed
the secret to Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/my_kubernetes_secrets/cluster/testsecret secret<span style="color:#f92672">=</span>supersecretpw
</span></span></code></pre></div><p>Then, an <code>ExternalSecret</code> manifest using the previously created <code>my-vault-store</code>
secret store can be created:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsecret</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">mysecret</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/testsecret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>Once that manifest has been deployed, external-secrets will create a Kubernetes
Secret called <code>testsecret</code> in the namespace <code>external-secrets</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n external-secrets secrets testsecret -o yaml
</span></span><span style="display:flex;"><span>apiVersion: v1
</span></span><span style="display:flex;"><span>data:
</span></span><span style="display:flex;"><span>  mysecret: c3VwZXJzZWNyZXRwdw<span style="color:#f92672">==</span>
</span></span><span style="display:flex;"><span>immutable: false
</span></span><span style="display:flex;"><span>kind: Secret
</span></span><span style="display:flex;"><span>metadata:
</span></span><span style="display:flex;"><span>  annotations:
</span></span><span style="display:flex;"><span>    meta.helm.sh/release-name: external-secrets
</span></span><span style="display:flex;"><span>    meta.helm.sh/release-namespace: external-secrets
</span></span><span style="display:flex;"><span>    reconcile.external-secrets.io/data-hash: <span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>  creationTimestamp: <span style="color:#e6db74">&#34;2023-12-26T12:01:30Z&#34;</span>
</span></span><span style="display:flex;"><span>  labels:
</span></span><span style="display:flex;"><span>    app.kubernetes.io/managed-by: Helm
</span></span><span style="display:flex;"><span>    reconcile.external-secrets.io/created-by: <span style="color:#ae81ff">1235</span>
</span></span><span style="display:flex;"><span>  name: testsecret
</span></span><span style="display:flex;"><span>  namespace: external-secrets
</span></span><span style="display:flex;"><span>  ownerReferences:
</span></span><span style="display:flex;"><span>  - apiVersion: external-secrets.io/v1beta1
</span></span><span style="display:flex;"><span>    blockOwnerDeletion: true
</span></span><span style="display:flex;"><span>    controller: true
</span></span><span style="display:flex;"><span>    kind: ExternalSecret
</span></span><span style="display:flex;"><span>    name: testsecret
</span></span><span style="display:flex;"><span>    uid: <span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>  resourceVersion: <span style="color:#e6db74">&#34;1839454&#34;</span>
</span></span><span style="display:flex;"><span>  uid: <span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>type: Opaque
</span></span></code></pre></div><p>Here, <code>target.name</code> is the name of the secret to be created, with <code>target.namespace</code>
being the namespace to deploy to.
Under <code>data</code>, the <code>secretKey</code> is the key under which the secret data will be
stored in the newly created Secret, and <code>remoteRef.key</code> is the path to the
secret in Vault, with <code>remoteRef.property</code> being the property of the resulting
JSON object at that path which contains the value to be stored in <code>secretKey</code>.</p>
<h1 id="network-policy-problems">Network policy problems</h1>
<p>While working on deploying specifically the Vault <code>ClusterSecretStore</code>, I
hit multiple errors. The first one was this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>Error: UPGRADE FAILED: cannot patch <span style="color:#e6db74">&#34;vault-backend&#34;</span> with kind ClusterSecretStore: Internal error occurred: failed calling webhook <span style="color:#e6db74">&#34;validate.clustersecretstore.external-secrets.io&#34;</span>: failed to call webhook: Post <span style="color:#e6db74">&#34;https://external-secrets-webhook.external-secrets.svc:443/validate-external-secrets-io-v1beta1-clustersecretstore?timeout=5s&#34;</span>: context deadline exceeded
</span></span></code></pre></div><p>It appeared whenever I tried to deploy the new <code>ClusterSecretStore</code>. I finally
tweaked to the fact that this was likely to do with my <code>CiliumNetworkPolicy</code>,
which at that point looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;external-secrets-deny-all-ingress&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span></code></pre></div><p>This is the canonical network policy for allowing all egress, while blocking
all ingress to all pods inside the namespace, save for traffic from pods in the same
namespace.
So I was extremely confused when I saw that network requests were getting
blocked. I removed the policy, and the deployment of the secret store worked
fine.</p>
<p>First, I confirmed that the policy was actually applied correctly. This can
be done with <a href="https://cilium.io/">Cilium</a>, the CNI plugin I&rsquo;m using, as follows.
Some documentation on troubleshooting can be found <a href="https://docs.cilium.io/en/stable/operations/troubleshooting/">here</a>.</p>
<p>First, get the host where the pod which blocks access is running:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get pods -n external-secrets -o wide
</span></span><span style="display:flex;"><span>NAME                                               READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
</span></span><span style="display:flex;"><span>external-secrets-7fcd5969c8-sltbl                  1/1     Running   <span style="color:#ae81ff">0</span>          25h   10.8.4.173   sait     &lt;none&gt;           &lt;none&gt;
</span></span><span style="display:flex;"><span>external-secrets-cert-controller-fc578ccdd-mcksx   1/1     Running   <span style="color:#ae81ff">0</span>          25h   10.8.4.168   sait     &lt;none&gt;           &lt;none&gt;
</span></span><span style="display:flex;"><span>external-secrets-webhook-68c99c7557-nrqpz          1/1     Running   <span style="color:#ae81ff">0</span>          25h   10.8.5.60    sehith   &lt;none&gt;           &lt;none&gt;
</span></span></code></pre></div><p>Next, check which Cilium pod runs on that specific host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get pods -n kube-system -o wide
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>cilium-wcffs                       1/1     Running   <span style="color:#ae81ff">0</span>               5d22h   10.86.5.205   sehith   &lt;none&gt;           &lt;none&gt;
</span></span></code></pre></div><p>Then, I needed the correct Cilium endpoint for the pod I was interested in:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-wcffs -- cilium endpoint list
</span></span><span style="display:flex;"><span>ENDPOINT   POLICY <span style="color:#f92672">(</span>ingress<span style="color:#f92672">)</span>   POLICY <span style="color:#f92672">(</span>egress<span style="color:#f92672">)</span>   IDENTITY   LABELS <span style="color:#f92672">(</span>source:key<span style="color:#f92672">[=</span>value<span style="color:#f92672">])</span>                                                       IPv6   IPv4         STATUS   
</span></span><span style="display:flex;"><span>           ENFORCEMENT        ENFORCEMENT                                                                                                                        
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">1101</span>       Enabled            Disabled          <span style="color:#ae81ff">29452</span>      k8s:app.kubernetes.io/instance<span style="color:#f92672">=</span>external-secrets                                          10.8.5.60    ready   
</span></span><span style="display:flex;"><span>                                                           k8s:app.kubernetes.io/name<span style="color:#f92672">=</span>external-secrets-webhook                                                           
</span></span></code></pre></div><p>Finally armed with the <code>ENDPOINT</code> identifier, <code>1101</code> here, we can display the
policy rules applied to it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-wcffs -- cilium endpoint get -o yaml <span style="color:#ae81ff">1101</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>rule: <span style="color:#e6db74">&#39;{&#34;port&#34;:0,&#34;protocol&#34;:&#34;ANY&#34;,&#34;l7-rules&#34;:[{&#34;\u0026LabelSelector{MatchLabels:map[string]string{k8s.io.kubernetes.pod.namespace: external-secrets,},MatchExpressions:[]LabelSelectorRequirement{},}&#34;:null},]}&#39;</span>
</span></span></code></pre></div><p>This was exactly the rule I was expecting - allowing all traffic from the
<code>external-secrets</code> namespace.
While looking all this up, I also looked at the monitoring for the endpoint,
which can be looked at like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-wcffs -- cilium monitor --type drop
</span></span></code></pre></div><p>It spat out lines like this whenever I tried to deploy the secret store:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 1101, ifindex 6, file bpf_lxc.c:1968, , identity remote-node-&gt;29452: 10.8.0.17:59258 -&gt; 10.8.5.60:10250 tcp SYN
</code></pre><p>What I did not realized for way too long: The source IP, <code>10.8.0.17</code>, wasn&rsquo;t
coming from any pod in my entire cluster. I just couldn&rsquo;t figure out what that
IP was. It&rsquo;s in the CIDR for my cluster pods, but it&rsquo;s not showing up in the
<code>kubectl get -A pods -o wide</code> output.</p>
<p>After an exceedingly long time spend searching for the root cause, I finally
found it, through sheer dump luck. I had switched into the terminal of my VM
host, where the output of a previous <code>lxc ls</code> command was still visible.
And lo and behold, there was the IP, as the <code>cilium_host</code> network interface for
one of my control plane nodes.</p>
<p>Some digging later, I found out that this is a network interface created by
Cilium, and it is used for the traffic of all static, host networking using
pods on a host.
This also explained why I never saw any error in the logs of any of the
external-secrets pods. The request wasn&rsquo;t made by any of them. The <code>webhook</code>
pod runs a webhook which is used when deploying new secret stores, to verify
them before the Kubernetes objects are created.
This means the hook isn&rsquo;t triggered by any external-secrets pod, but by the
kube-apiserver.</p>
<p>Going a bit further, I just wanted to allow the kube-apiserver ingress into
the webhook pod. This also did not work, because there wasn&rsquo;t actually any
identity for it, as the pod&rsquo;s networking is not controlled by Cilium.</p>
<p>After a while, I looked back at the Cilium monitoring line:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 1101, ifindex 6, file bpf_lxc.c:1968, , identity remote-node-&gt;29452: 10.8.0.17:59258 -&gt; 10.8.5.60:10250 tcp SYN
</code></pre><p>Note the <code>identity remote-node</code>. Luckily, Cilium defines an entity for that,
see the docs <a href="https://docs.cilium.io/en/latest/security/policy/language/#entities-based">here</a>.
So what finally solved my problem was to add the following network policy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;external-secrets-allow-webhook-all&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">external-secrets-webhook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">remote-node</span>
</span></span></code></pre></div><p>This allows ingress to the webhook pod from any remote node. These remote nodes
are all nodes in the Kubernetes cluster which are not the local node where
the pod is running. It isn&rsquo;t quite as secure as explicitly defining that
only the kube-apiserver pods can access the webhook pod, but it will have
to do for now, as the kube-apiserver is not under Cilium control and hence it
cannot be controlled by, for example, using its labels. I will have to return
to this issue at a later point and see whether I can do better.</p>
<h2 id="the-ca-cert-formatting-problem">The CA cert formatting problem</h2>
<p>After I had finally fixed the networking issue, I got another error, this time from
the external-secrets pod itself. It was not able to connect to Vault:</p>
<pre tabindex="0"><code>[...]&#34;error&#34;:&#34;could not get provider client: unable to log in to auth met hod: unable to log in with app role auth: Put \&#34;https://vault.example.com/v1/auth/approle/login\&#34;: tls: failed to verify certificate: x509: certificate signed by unknown authority&#34;[...]
</code></pre><p>This was somewhat expected, because my Vault access does not go through my
proxy, and uses my Homelab internal CA.</p>
<p>I thought this problem would be easily fixable, as external-secrets does provide
settings in the <code>ClusterSecretStore</code> for providing a CA cert for server cert
validation. See the docs <a href="https://external-secrets.io/latest/api/spec/#external-secrets.io/v1beta1.VaultProvider">here</a>.
But I had really rotten luck with the <code>caBundle</code> config. I&rsquo;m getting the PEM
formatted CA cert directly from Vault, which runs my internal CA. But I couldn&rsquo;t
get it into a format which external-secrets would accept. Whatever I tried,
introducing newlines, putting the cert through <code>b64enc</code>, nothing worked. I was
just getting CA cert parsing errors from external-secrets.</p>
<p>What finally worked was to use the <code>caProvider</code> option instead. For this,
I created an additional secret (even though the CA cert isn&rsquo;t exactly a secret):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-internal-ca</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><p>And then I used that secret in the <code>caProvider</code> section as seen above. This was
an extremely frustrating journey. Through the networking problem, I at least
learned something about Cilium networking and how to debug it and finally found
the root cause and an acceptable fix.
But in this case, the only thing I got out of it was a high dosage of frustration
and a workaround switching to a completely different approach.</p>
<h1 id="conclusion">Conclusion</h1>
<p>First service set up successfully on the production k8s cluster. &#x1f389;
But also lots of frustration. Finding the fix for the networking problem was
pretty frustrating, but at least I learned a bit about Cilium debugging.
But the formatting problem got me pretty riled up.</p>
<p>If anyone reading this has any good ideas about how to produce a Cilium network
policy which only allows access from the kube-apiserver instead of the
&ldquo;allow all cluster nodes&rdquo; setup I&rsquo;ve got now, hit me up on the <a href="https://social.mei-home.net/@mmeier">Fediverse</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 0: The Plan</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-0-plan/</link>
      <pubDate>Mon, 18 Dec 2023 21:53:39 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-0-plan/</guid>
      <description>My plan for migrating my Nomad cluster to Kubernetes</description>
      <content:encoded><![CDATA[<p>In a <a href="https://blog.mei-home.net/posts/hashipocalypse/">previous post</a>, I had noted that
due to HashiCorp&rsquo;s recent decisions about the licensing for their tools, I
was thinking about switching away from Nomad as my workload scheduler.</p>
<p>Since then, HashiCorp made a change to the Terraform registry&rsquo;s Terms of Service
which only allowed usage with HashiCorp Terraform. This was obviously an action
against <a href="https://opentofu.org/">OpenTOFU</a>, and it reeked of pure spite. That
turned my musings about the future of my Homelab from &ldquo;okay, this leaves a bad taste&rdquo;
to &ldquo;Okay, I just lost all trust in HashiCorp&rdquo;. So Kubernetes it is.</p>
<p>Just to make one thing clear: Both Nomad and Consul, which I will be replacing
here, worked great for me. They provided everything I could have wished for, in
a rather lightweight package. And the integration was excellent. I also think
the documentation for all HashiCorp tools deserves a lot of praise.
There&rsquo;s no technical reason to replace Nomad and Consul. It&rsquo;s purely due to the
license change, and even more so due to the ToS change which followed.</p>
<p>After <a href="https://blog.mei-home.net/tags/kubeexp/">some experimentation</a> with Kubernetes, I&rsquo;m
satisfied that it&rsquo;s going to work for everything which I&rsquo;m currently doing with
Nomad, and I&rsquo;ve spend the last few weeks on making a plan to migrate.</p>
<p>My one main goal here is to make the migration as incremental as possible.
To me, this has the advantage of reducing the pressure, because I can just
migrate service-by-service, slowly, at any pace which fits the rest of my life.</p>
<p>To this end, I intend to run my Nomad and Kubernetes clusters in parallel. The
one big problem with this: Depending on time and motivation, this might draw
out the migration quite a bit. I might still be running two workload schedulers
come spring 2024. &#x1f605;</p>
<h1 id="the-current-situation">The current situation</h1>
<p>Let&rsquo;s start with the current state.
<figure>
    <img loading="lazy" src="homelab-before.svg"
         alt="A stylized graphic of my Homelab setup. In the middle are eight boxes containing the Raspberry Pi, Nomad and Consul logos. They are all labeled &#39;Raspberry Pi CM4 8G Worker&#39;. Above them are three further boxes, with all the previous logos plus the Ceph and Vault logo. They are labeled Raspberry Pi 4 4GB Controller. To the side are three more boxes, only containing the Ceph logo labeled &#39;Ceph Storage Host&#39;"/> <figcaption>
            <p>The current state of my Homelab.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;m running most of the Homelab on Raspberry Pi 4s. Three of them with 4GB RAM serve as controllers,
hosting one Vault, Nomad and Consul server as well as Ceph MON daemons each,
for high availability purposes. My main workhorses are the eight Pi CM4 with 8GB,
each hosting a Consul and Nomad client running my workloads. Storage is provided
by three x86 machines, each with one HDD and one SSD.</p>
<p>Nomad is the main workload scheduler. Consul provides both, service discovery
and authenticated as well as encrypted connections between different Nomad jobs.
Vault is used for secrets, not only within Nomad jobs but also for example by
my Ansible playbooks.</p>
<p>Ceph provides storage to Nomad jobs via CSI as well as the root disks for the
eight worker Pis, which <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">netboot</a> and
are completely diskless.</p>
<p>At the time of writing, 70% of the cluster&rsquo;s CPU and 46% of the RAM are assigned
to jobs. But in reality, the cluster overall is about 90% CPU idle. All of this
together currently eats about 150W.</p>
<h1 id="the-plan">The Plan</h1>
<p>As noted, I would like to do the migration incrementally, keeping everything up
as much as possible. The first challenge in that was more hardware to run the
two clusters in parallel. Luckily, I&rsquo;ve got my old x86 machine from before I
ventured into multi-host territory. It is an Intel 8C/16T CPU with 64 GB of RAM
and a couple 500 GB SSDs. That&rsquo;s more than enough power to run my entire
Homelab, if necessary.
In addition, I&rsquo;ve got my spare disks for when one of the prod disks fails, a
2TB SSD and a 6GB HDD.</p>
<p>I already used the x86 machine as the host for my Kubernetes experiments and
will now use it in a similar way, with LXD VMs running three Kubernetes controllers,
one Ceph host for using the disks and a couple worker VMs.</p>
<h2 id="preparation">Preparation</h2>
<p>To begin with, I will create the aforementioned VMs and init the cluster itself.
After that, I will migrate the first host. This will also double as a test to
see whether everything works fine on Raspberry Pis, and I will also be writing
an Ansible playbook to remove all the Nomad cluster&rsquo;s tools from a host.</p>
<p>Once that&rsquo;s done, the first couple of services will be foundational stuff, like
<code>external-dns</code> and <code>external-secrets</code>. Then the first migrated Pi will become
the Ingress host with a <code>Traefik</code> deployment.</p>
<h2 id="ceph">Ceph</h2>
<p>I will continue using Ceph. It has served me very well in the past two years and
I know my way around it by now. But instead of continuing with the current
baremetal cluster, I will go with <a href="https://rook.io/">Ceph Rook</a>, a Ceph cluster
deployed in Kubernetes. This approach will have the advantage that I will be
able to use the Ceph hosts also for other workloads than Ceph.</p>
<p>Sadly, Ceph Rook does not support any kind of import from a baremetal cluster.
There is no way to create daemons in Rook and join them into a baremetal cluster.
As a consequence of that, I will be setting up a fresh cluster in Kubernetes,
and then slowly migrate the data over as I migrate hosts and services from Nomad
to Kubernetes. Luckily, my cluster is still empty enough that I can take one
host out of the baremetal cluster and add it, and its disks, to the Rook cluster,
so that I will have one VM using my spare disks and one of the baremetal hosts
in the Rook cluster, and the other two baremetal hosts will stay in the
baremetal cluster.</p>
<p>For the data transfer, I will very likely just use <code>rsync</code>, as the export/import
doesn&rsquo;t make much sense especially for CSI volumes, as they will be created and
maintained by Rook/Kubernetes, so importing them as whole volumes would need
even more config to make sure the volume request gets the existing volume.</p>
<p>For the setup itself, I will need to create a number of StorageClasses. There
will be two for RBD volumes, the main volume type for my CSI volumes. One will
be SSD, one HDD, depending on which kind of performance is needed by a given
service. Then there will also
be a CephFS class, for those few cases where I need multiple writer capabilities.
The same goes for the S3 StorageClass. These two only get HDD variants, as I
don&rsquo;t expect high throughput requirements here anyway.</p>
<h2 id="s3-content">S3 content</h2>
<p>After the Ceph Rook cluster is set up, the first data to be migrated will be
all the S3 buckets which are not directly related to a specific service. These
are mostly my <code>restic</code> backups and some misc stuff, like the Terraform bucket.</p>
<h2 id="migrating-the-logging-setup">Migrating the Logging setup</h2>
<p>This is going to be the first actual migration. Because I don&rsquo;t care too much
about my previous logs, I will simply create a completely new setup and not
bother to transfer the S3 bucket with my logs.</p>
<p>The setup will be similar to my current Nomad setup. Loki will do log storage,
which will be accessed via Grafana. Then comes my FluentD instance, which
aggregates the logs and unifies them, e.g. making sure there is only one level
for &ldquo;info&rdquo;, instead of <code>INF</code>, <code>info</code> and <code>I</code>. That instance will push logs to
Loki. I will also redirect all my logs, meaning syslogs from hosts and service
logs from Fluentbit, to this k8s instance and then retire Loki/FluentD from my
Nomad setup.</p>
<p>As said, should be relatively simple because I don&rsquo;t care about preserving past
logs.</p>
<p>At this point, the k8s cluster will be running the logging setup for the entire
Homelab. So it will have become load-bearing.</p>
<h2 id="setting-up-metrics-gathering">Setting up metrics gathering</h2>
<p>This part is a bit more complicated because I won&rsquo;t be migrating my old metrics
stack with Prometheus and Grafana over 1:1. Instead, I will start using the
<a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>.
Here I do want to preserve old data, as I like looking at older metrics as well
as current ones.
This showed the first challenge during planning: In Nomad, volumes are created
separately from the main job. For Kubernetes, I will be using Helm as my &ldquo;job&rdquo;
management tool. My current idea for cases where I want to migrate data over
is to do the first deployment of the Helm chart with zero replicas for the pod,
thus just creating everything else including volumes.</p>
<p>Another interesting difference is going to be Grafana. From everything I understand
now, Grafana&rsquo;s Helm chart relies on <a href="https://grafana.com/docs/grafana/latest/administration/provisioning/">provisioning</a>
for things like data sources, dashboards and the like. And in principle, I like
the idea of having my dashboards in Git. But it remains to be seen how much
exporting them to Git after every change starts annoying me.</p>
<p>On the positive side: Lots more data and pretty graphs about Kubernetes to look
at. &#x1f604;</p>
<p>The idea here is similar to the logging section: I will retire my Nomad setup
and let the k8s Prometheus instance do all the scraping for the Homelab.</p>
<p>One open question is going to be about the CSI volume utilization data. At the
moment, I&rsquo;m running a cronjob on all workers which regularly reports the
results of a filtered <code>df -h</code> via the local <code>node-exporter</code>&rsquo;s <code>textfile</code> feature.</p>
<h2 id="setting-up-a-docker-registry">Setting up a docker registry</h2>
<p>At the moment, I run two Docker registries. One for my own Docker images, and
one as a pull-through cache for DockerHub. I will be trying out Harbor to see
how I like it.</p>
<h2 id="backups">Backups</h2>
<p>My backup setup currently consists of some simple Python scripting driving
<code>restic</code>, which does incremental backups of all locally mounted volumes every
night to my Ceph S3. This doesn&rsquo;t get me more redundancy, as the S3 is stored
on the same disks as the (mostly) Ceph RBD volumes used with CSI. But it does
protect me from fat-fingered <code>rm -rf /</code> commands. I will go into more detail
about what I&rsquo;m doing exactly in a separate post when I find the time.</p>
<p>In addition, I&rsquo;ve got a second job which downloads all of the backup S3 buckets
onto an external HDD via <code>rclone</code>.</p>
<p>No off-site backup yet. &#x1f62c;</p>
<p>This part will likely require at least a limited rewrite of my Python scripting.
Due to the fact that the backups run per node and backup whatever happens to
run there at the time the backup job runs, I will be able to continue running
the per-node backup job on both clusters in parallel, as they will be backing
up different service&rsquo;s data to different backup S3 buckets.</p>
<h2 id="service-migration">Service migration</h2>
<p>With all of the previous sections, all of the infrastructure is now in place
and I can begin migrating the services.
Here is an overview:
<figure>
    <img loading="lazy" src="service_deps.png"
         alt="A dependency diagram of the services in my Homelab. It shows 27 different services, ranging from Audiobookshelf to zigbee2mqtt. The largest number of connections go into Traefik, my Ingress proxy, and into CephRBD, which provides storage for services. There are two clusters. On the one side is Prometheus, with dependencies onto a number of smaller services like Mosquitto for MQTT or UptimeKuma for monitoring and service availability. On the other side is a service clustered around Postgres and Redis. Here are the heavier services, like Mastodon, Wallabag, Keycloak, Jellyfin and so forth."/> <figcaption>
            <p>An overview of my services and their dependencies.</p>
        </figcaption>
</figure>
</p>
<p>The first service to be created will be Postgres. Here, I decided to go with
a proposal from <a href="https://transitory.social/@rachel">Rachel</a>, <a href="https://cloudnative-pg.io/">cloudnative-pg</a>.
I will then migrate each databases when I migrate the service using it, via
importing/exporting.</p>
<p>After that will come <code>Audiobookshelf</code>. This service will serve as a testbed for
service migration, and I will write up some documentation on service migration
and create a Helm chart template for the rest of the migrations.</p>
<p>After that I don&rsquo;t expect many surprises. Where available, I will use the official
Helm chart for a service. Otherwise, I will be writing my own. Each individual
migration will consist of deploying the Helm chart first with zero replicas,
to create e.g. S3 buckets or CSI volumes. Then I will migrate databases, volume
and S3 data and then start up the k8s instance with Ingress via my Traefik.</p>
<p>Somewhere in the middle of all this, I will also have to update my host update
Ansible playbook, to properly fence off the Kubernetes hosts before rebooting
them.</p>
<p>The one service which I thought might be problematic is <code>Drone CI</code>. By default,
its runner runs CI pipelines in Docker. From what I&rsquo;ve read, I might be able to
setup Docker-in-Docker pods and run there. But quite honestly, the Docker runner
requires mounting in the Docker socket, giving the runner root access, and I
had planned to migrate to <code>Woodpecker</code> anyway. So I will just do this as part
of the Kubernetes migration, as Drone CI&rsquo;s Kubernetes runner is still marked
experimental, while Woodpecker&rsquo;s isn&rsquo;t.</p>
<h2 id="cleanup">Cleanup</h2>
<p>The final step, after all workers are in Kubernetes, will be to migrate the
three Raspberry Pi 4 controller nodes over to serve the Kubernetes cluster. This
will be a bit complicated. I can shut down the Nomad cluster completely once the
last job is done, but the Consul cluster is different.</p>
<p>There are two things which rely on it: First, proper scraping of the Ceph cluster
MGR daemon for metrics. Here Consul&rsquo;s healthcheck connected DNS is currently
used to find the active MGR instance.
Second, access to my three Vault servers requires Consul for high availability.
Here, I&rsquo;m still not sure how I will solve this. I might possibly just migrate
the Vault cluster into Kubernetes as well.</p>
<p>Once the last few bits of data are cleared from the baremetal Ceph cluster,
I can finally migrate over the two baremetal servers to the Ceph Rook cluster.
To begin with, I will have them restricted to Ceph pods, but I will also test
what happens when I remove the &ldquo;Ceph&rdquo; taint I currently plan to put onto them.
But to make that decision, I will have to look more deeply into how Kubernetes
scheduling and especially preemption works.</p>
<p>The final act of the migration will be updating all docs (&#x1f91e;),
removing Nomad/Consul setups from my Ansible playbooks and finally shutting down
the VMs and retiring the x86 host again.</p>
<p>For this entire migration, to make sure I do not forget anything, I have
also created no less than 698 tasks in my favorite task management software,
<a href="https://taskwarrior.org/">Taskwarrior</a>.
I&rsquo;m accepting bets on how many tasks
in I will have to nuke the plan and start fresh. &#x1f609;</p>
]]></content:encoded>
    </item>
    <item>
      <title>KubeExp: Day 1 operations</title>
      <link>https://blog.mei-home.net/posts/kubernetes-day-1/</link>
      <pubDate>Thu, 19 Oct 2023 15:49:58 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-day-1/</guid>
      <description>The first couple of steps with my new Kubernetes cluster</description>
      <content:encoded><![CDATA[<p>In the <a href="https://blog.mei-home.net/posts/kubernetes-cluster-setup/">last post</a> of
<a href="https://blog.mei-home.net/tags/kubeexp/">the series on my Kubernetes experiments</a>, I
described how to initialize the cluster. In this post, I will go into a bit more
detail on what I did once I finally had a cluster set up.</p>
<h1 id="tutorials">Tutorials</h1>
<p>Never having done anything with Kubernetes before, I started out with a couple
of tutorials.</p>
<p>The first one was <a href="https://kubernetes.io/docs/tutorials/configuration/configure-redis-using-configmap/">this one</a>.
It uses Redis as an example deployment to demonstrate how to use ConfigMaps.
This is an interesting topic for me, because one of the things I liked a lot
about Nomad was the tight integration with <a href="https://github.com/hashicorp/consul-template">consul-template</a>
for config files and environment variables via the <a href="https://developer.hashicorp.com/nomad/docs/job-specification/template">template stanza</a>.
This stanza allows the user to template config files with inputs taken from other
tools. My main use case at the moment is taking secrets from Vault and injecting
them into configuration files.
Kubernetes does not have this capability out of the box, but I will get into
how I will do it further down in this post.
The one important piece of knowledge I gained from this tutorial was that when
a ConfigMap is used by the pod spec in a deployment manifest, the deployment&rsquo;s
pods are not automatically restarted to take the new configuration into account.
This is a bit annoying, to be honest, because it&rsquo;s something which Nomad does
out of the box, at least for certain ways of writing job files.
The solution I found for this (while working with pure <code>kubectl</code> at least,
using Helm the problem can be solved more elegantly) was to just run <code>kubectl rollout restart deployment &lt;NAME&gt;</code>.</p>
<p>Next up was a small tutorial setting up a Service for the first time <a href="https://kubernetes.io/docs/tutorials/services/connect-applications-service/">with Nginx</a>.
At first I had a little problem with this one, because I had written the
ConfigMap for it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nginxconfigmap</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">nginx</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">webserver</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">connecting-apps</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">tutorials</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">default</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    server {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            listen 80 default_server;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            listen [::]:80 default_server ipv6only=on;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            listen 443 ssl;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            root /usr/share/nginx/html;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            index index.html;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            server_name localhost;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            ssl_certificate /etc/nginx/ssl/tls.crt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            ssl_certificate_key /etc/nginx/ssl/tls.key;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            location / {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    try_files $uri $uri/ =404;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }</span>
</span></span></code></pre></div><p>As a consequence, nothing came up in the Nginx container, but I also wasn&rsquo;t
getting any error messages in the logs. So I first assumed that something was
wrong with the Service setup, because I was getting &ldquo;Connection refused&rdquo; errors.
But it turns out I just didn&rsquo;t understand the ConfigMap semantics correctly.
The keys under the <code>data:</code> key are actual filenames. So in the setup above,
I was adding a file just called <code>default</code> and mounted it into the Nginx conf
directory. But in the main Nginx config, only files with the <code>.conf</code> extension
are automatically included from the config snippet dir. But because there wasn&rsquo;t
anything malformed about the config, I was simply getting an Nginx instance
without a server block, instead of some sort of error message. Just changing
that <code>default:</code> key to <code>default.conf:</code> fixed the issue.
This was also the first service I made available outside the cluster, using a
<code>NodePort</code> type service. It looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">connecting-apps-nginx</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">nginx</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">webserver</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">connecting-apps</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">tutorials</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">NodePort</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">port</span>: <span style="color:#ae81ff">443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">https</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">nginx</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">webserver</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">connecting-apps</span>
</span></span></code></pre></div><p>This service listens on two random ports on every single Kubernetes node. If
packets arrive on those random ports, they are then forwarded to the Pod running
Nginx.
At first, I thought this would be the way I would be running my Traefik ingress
later on, but then I realized that while you can configure an explicit port for
NodePort services, they can only have port numbers &gt; 30000.</p>
<p>Next, I had a look an example <a href="https://docs.cilium.io/en/stable/gettingstarted/demo/">from Cilium</a>,
to get more comfortable with Network Policies. In this example, you launch a
number of Star Wars themed services, deciding docking permissions for the Death
Star. This is simulated with CiliumNetworkPolicy objects and was pretty good
at teaching me the basics there. I&rsquo;m especially interested in NetworkPolicy as
a network connection permission mechanism. In my Nomad cluster, I&rsquo;m using
Consul Connect to control the connections between different services, deciding
who can connect to who, and I wanted something similar in Kubernetes.
<a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">NetworkPolicies</a>
do exactly that, and this nice Star Wars demo from Cilium demonstrated that.</p>
<p><a href="https://kubernetes.io/docs/tutorials/stateless-application/guestbook/">The last tutorial</a>
was more of an &ldquo;all-in-one&rdquo; deal with a lot more complexity, connecting several
services to each other. I made it even more interesting by instituting a deny
all network policy on the &ldquo;default&rdquo; namespace. As a consequence, I needed to
make sure that both, the Redis pods could talk to each other for replication,
and that the PHP frontend could talk to the Redis pods. After having just finished
the Cilium demo, that part was pretty simple. What I overlooked completely:
I also had to explicitly allow traffic from outside the cluster to reach the
frontend, which I could do with a policy like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;guestbook-redis-allow&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">guestbook-redis-allow</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">tutorial</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;L3-L4 policy to restrict redis access to frontend only&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">key-value-store</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">frontend</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">toPorts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;6379&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>Without this, Cilium dutifully blocks all external traffic coming in via the
NodePort service I had configured for the tutorial.</p>
<h1 id="how-to-handle-all-those-manifests">How to handle all those manifests?</h1>
<p>While doing the tutorials, I started wondering how to handle all of those
Yaml files - in some better way than running <code>kubectl apply</code> for every one of
them. At first I looked at Helm, which already looked more like what I wanted.
But then somebody on Mastodon mentioned <a href="https://helmfile.readthedocs.io/en/latest/">Helmfile</a>.
This also uses Helm in the background, but has a central config file to combine
all the things I&rsquo;ve currently got deployed in my cluster, and it allows
deployment of all of it with a single command. Exactly what I was looking for.</p>
<p>Currently, with Traefik Ingress I set up myself, and using the <a href="https://rook.io/">Ceph Rook</a>
Helm chart, my Helmfile looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-release</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://charts.rook.io/release</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-rook-operator</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">rook-release/rook-ceph</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v1.12.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-ceph</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./value-files/rook-operator.yaml</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-rook-cluster-internal</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">rook-release/rook-ceph-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v1.12.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-ceph-cluster-internal</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./value-files/rook-ceph-cluster-internal.yaml</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">./traefik</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">appVersion</span>: <span style="color:#e6db74">&#34;v2.10.4&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">meiHomeNetCert</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">chain</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            {{- &#34;ref+vault://secret/cert#/foo&#34; | fetchSecretValue | nindent 12 }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            {{- &#34;ref+vault://secret/cert#/bar&#34; | fetchSecretValue | nindent 12 }}</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">basicAuthAdminPw</span>: {{ <span style="color:#e6db74">&#34;ref+vault://secret/traefik/auth/baz#/pw&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span></code></pre></div><p>This format has several nice features. Each entry in the <code>releases</code> array is a
different Helm chart. As you can see, for the Ceph Rook charts I use the official
sources, while the Traefik chart comes from a local directory as I wrote it
myself. I will write separate blog posts about both, Traefik as Ingress and
Ceph Rook.
Besides defining which charts to apply, you can also centrally define the
namespaces and values. Here I&rsquo;m using two different ways of defining values
for the Helm charts. For the Ceph Rook deployments, I&rsquo;m using separate value
files, because they need a lot of config. But Traefik&rsquo;s values I just define
directly inside the Helmfile.</p>
<p>Another big plus is Helmfile&rsquo;s ability to get secrets from Vault, which
I use here to get at my Let&rsquo;s Encrypt certs.</p>
<p>Levels of templating: Two. Helmfile&rsquo;s own, and then Helm&rsquo;s to generate the
actual Manifests.</p>
<p>While working on this and complaining a bit on Mastodon about the fact that Pods
are not restarted when the config file changes, I was pointed towards
a neat trick which can be applied when deploying with Helm. This trick uses the
fact that when an annotation of a Pod changes, the Pod is automatically redeployed.</p>
<p>Let&rsquo;s say we have a Deployment with a <code>spec.template</code> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/name</span>: <span style="color:#e6db74">&#34;traefik&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/static-conf</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/static-conf.yml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">traefik:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">static-conf</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#e6db74">&#34;/etc/traefik&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">static-conf</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik-static-conf</span>
</span></span></code></pre></div><p>And then we have a ConfigMap like this at <code>templates/static-conf.yml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik-static-conf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#e6db74">&#34;traefik&#34;</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">traefik.yml</span>: <span style="color:#ae81ff">|</span>
</span></span><span style="display:flex;"><span>{{ <span style="color:#ae81ff">tpl (.Files.Get &#34;configs/static.yml&#34;) . | indent 4 }}</span>
</span></span></code></pre></div><p>Then, whenever this ConfigMap, or the <code>configs/static.yml</code> referenced in the
<code>tpl</code> function, change content, the annotation on the Deployment&rsquo;s template
also changes, and the Pod is redeployed. This way, the Pod is automatically
restarted when the config file is changed.</p>
<p>You can also see another nice point about using Helmfile, at least with local
Helm charts you create yourself: I can set central, common labels to be set
on all resources.</p>
<h1 id="secrets">Secrets</h1>
<p>One little story about Secrets I need to tell here: I got myself utterly
confused about how Secrets work. For some reason, I got it into my head for
several days that with Secrets being stored just in plain text (okay, base64
encoded), they were a security risk. Going from that, I also felt that things like
<a href="https://external-secrets.io/latest/">external-secrets</a> wouldn&rsquo;t add anything -
yes, it takes the secrets from e.g. Vault, but then they are again stored
unencrypted in the cluster.</p>
<p>But of course that&rsquo;s a misconception. Secrets cannot just randomly be accessed
by anything running in a Kubernetes cluster, which was my initial impression. They
require access via the Kubernetes API server, which can be controlled via
<a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/">RBAC</a>. So for
now at least, I decided to rely on Helmfile&rsquo;s <a href="https://github.com/helmfile/vals">Vals</a>
to extract the secrets from Vault at deployment time. This just looks simpler
than setting up e.g. external-secrets.
I also see an, albeit small, security advantage here, because I don&rsquo;t need to
configure anything with broad access to my Vault instance in the cluster. Instead,
I can rely on Vault&rsquo;s login mechanisms on my Command and Control host, which
uses time limited tokens and such.</p>
<h1 id="showing-cluster-resource-usage">Showing cluster resource usage</h1>
<p>I was also all the time looking for some place where I could see how much capacity
I still had free in my experimental cluster. This is something which Nomad has
baked into its web UI, but Kubernetes does not have anything like it out of the
box.</p>
<p>I was finally pointed towards <a href="https://github.com/robscott/kube-capacity">kube-capacity</a>,
a kubectl plugin. It does exactly what I wanted, telling me how much free capacity
I still have left on the cluster and individual node level. The output looks
something like this at the moment:</p>
<pre tabindex="0"><code>kubectl resource-capacity
NODE     CPU REQUESTS   CPU LIMITS     MEMORY REQUESTS   MEMORY LIMITS
*        17600m (44%)   31100m (77%)   27551Mi (58%)     46408Mi (98%)
mehen    2150m (53%)    3000m (75%)    1524Mi (43%)      3472Mi (98%)
mesta    1950m (48%)    3000m (75%)    1384Mi (39%)      3132Mi (88%)
min      1950m (48%)    3000m (75%)    1384Mi (39%)      3132Mi (88%)
nakith   4450m (74%)    9300m (155%)   9660Mi (87%)      15008Mi (136%)
naunet   5350m (89%)    9900m (165%)   10472Mi (95%)     16032Mi (145%)
sait     950m (11%)     1200m (15%)    1619Mi (22%)      2560Mi (35%)
sehith   800m (10%)     1700m (21%)    1508Mi (20%)      3072Mi (42%)
</code></pre><p>This shows me that all of my current pod&rsquo;s requests together use 44% of the
available CPU capacity and 58% of the memory capacity. It has already been
pretty useful for figuring out why I wasn&rsquo;t able to deploy all of my Ceph Rook
pods, for example.</p>
<h1 id="thanks-to-the-homelabbers-in-the-fediverse">Thanks to the Homelabbers in the Fediverse</h1>
<p>Kubernetes is a pretty complex topic, as I have now found out. There are a lot
of pitfalls and great tools to avoid them which I might never have found just
from Googling. The Fediverse homelabbing community has been extremely helpful
in pointing me in the right direction multiple times, e.g. in recommending
Helmfile and kube-capacity to me.</p>
<p>Thanks everyone! :slight_smile:</p>
]]></content:encoded>
    </item>
    <item>
      <title>KubeExp: Setting up the cluster</title>
      <link>https://blog.mei-home.net/posts/kubernetes-cluster-setup/</link>
      <pubDate>Sat, 07 Oct 2023 00:05:34 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-cluster-setup/</guid>
      <description>Setting up a bare-metal cluster with Cilium and kubeadm</description>
      <content:encoded><![CDATA[<p>After setting up my lab environment in the <a href="https://blog.mei-home.net/posts/kubernetes-lab-setup/">previous article</a>,
I&rsquo;ve now also set up the Kubernetes cluster itself, with <a href="https://kubernetes.io/docs/reference/setup-tools/kubeadm/">kubeadm</a>
as the setup tool and <a href="https://cilium.io/">Cilium</a> as the CNI plugin for
networking.</p>
<p>Here, I will describe why I chose the tools I did, and how I initialized the
cluster, as well as how to remove the cluster when necessary.</p>
<h1 id="tools-choice">Tools choice</h1>
<p>Before setting up a cluster, several choices need to be made. The first one in
the case of Kubernetes is which distribution to use.</p>
<p>The first one, and the one I chose, is &ldquo;vanilla&rdquo; k8s. This is the default distribution,
with full support for all the related standards and functionality.</p>
<p>Another well-liked one is <a href="https://k3s.io/">k3s</a>, which bills itself as a
lightweight distribution. Its most distinguishing feature seems to be the
fact that its control plane comes along as a single binary, instead of an entire
set, as in the case of vanilla k8s.
Also in contrast to k8s, it uses a simple SQLite database as a storage backend
for cluster data, instead of a full <code>etcd</code> cluster.
It also falls into the &ldquo;opinionated&rdquo; part of the tech spectrum. Instead of
telling you to make a choice on things like CNI and CRI, it already comes
with some options out of the box. Flannel is pre-chosen as a CNI plugin,
while e.g. Traefik is already set up as an <a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/">Ingress Controller</a>.</p>
<p>If you want to go even further from vanilla, there&rsquo;s also things like
<a href="https://www.talos.dev/">Talos Linux</a>. It&rsquo;s an entire Linux distro made with
only one goal: Running a Kubernetes cluster. It doesn&rsquo;t even allow you to
SSH into it.</p>
<p>For now, I will stay with vanilla k8s, which I will install with <code>kubeadm</code>. Simply
because I like making the &ldquo;vanilla&rdquo; experience my first contact with some tech.
I also prefer being forced to make my own decisions on tools, so that I am forced
to inform myself about the alternatives. Once I&rsquo;ve completed my current experimentation,
I will likely at least take another look at Talos OS. Its premise sounds quite
interesting, especially with the declarative config files, but the &ldquo;no SSH&rdquo;
is honestly somewhat weird for me.</p>
<p>The next choice to be made is the <a href="https://kubernetes.io/docs/concepts/architecture/cri/">CRI</a>,
the container runtime. The only thing I knew going into this is that I did not
want to go with <em>Docker</em>. Too many bad experiences with memory leaks and other
shenanigans with their daemon. After some research, my choice fell on
<a href="https://cri-o.io/">CRI-O</a>. To be honest, mostly because it bills itself as
a container engine focused on use with Kubernetes.</p>
<p>Next is the <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/">CNI plugin</a>.
This is the piece of the Kubernetes stack which controls networking, most
importantly inter-Pod networking. With this, I had my biggest problem to choose.
The websites of all of them are chock-full of buzzwords. eBPF! It&rsquo;s better than
sliced bread! &#x1f612; In the end, my decision was between <a href="https://cilium.io/">Cilium</a>
and <a href="https://www.tigera.io/project-calico/">Calico</a>. The one thing I was really
interested in and I definitely wanted was <a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">Network Policies</a>. Those allow defining rules for inter-Pod connectivity,
allowing me to define which pods can talk with each other. I like having this
for the sake of security, so that e.g. only the apps which actually need a DB
can talk to the Postgres pod.
In my current HashiCorp Nomad based cluster, I&rsquo;ve got something similar using
Consul&rsquo;s service mesh.
One more thing I find pretty nice is that both Calico and Cilium support
encryption. This was another reason for why I started using Consul: It provides
me with encrypted network traffic, without me having to setup TLS certs for
each individual service.
In the end, even after reading through most of the docs for both Calico and Cilium,
I didn&rsquo;t know which one to choose. So I did the obvious thing:</p>
<figure>
    <img loading="lazy" src="dice.png"
         alt="A picture of a 20 sided dice with the number 16 on the upper face."/> <figcaption>
            <p>When in doubt, just ask Principal Lead Architect Dice for their opinion.</p>
        </figcaption>
</figure>

<p>And that&rsquo;s how I came to use Cilium as the CNI plugin in my cluster. &#x1f605;</p>
<p>Without further ado, let&rsquo;s conjure ourselves a Kubernetes cluster. &#x1f913;</p>
<h1 id="preparing-the-machines">Preparing the machines</h1>
<p>Before we can actually call <code>kubeadm init</code>, we need to install the tools on the
machines and do some additional config. For most of the setup, I followed the
<a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/">official Kubernetes kubeadm docs</a>.</p>
<p>Before installing the tools, a couple of config options need to be set on the
machines, defined <a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/#forwarding-ipv4-and-letting-iptables-see-bridged-traffic">here</a>.</p>
<p>I configured the options using Ansible:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">load overlay kernel module</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">community.general.modprobe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">overlay</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistent</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">load br_netfilter kernel module</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">community.general.modprobe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">br_netfilter</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistent</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">enable ipv4 netfilter on bridge interfaces</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.posix.sysctl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.bridge.bridge-nf-call-iptables</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">enable ipv6 netfilter on bridge interfaces</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.posix.sysctl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.bridge.bridge-nf-call-ip6tables</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">enable ipv4 forwarding</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.posix.sysctl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.ipv4.ip_forward</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span></code></pre></div><p>This takes care of the pre-config. But if you&rsquo;re using the Ubuntu cloud image,
for example with LXD VMs, you also need to switch to a different kernel to
later be able to make use of Cilium as the CNI plugin.</p>
<h2 id="fun-with-ubuntus-cloud-images">Fun with Ubuntu&rsquo;s cloud images</h2>
<p>When I started up the cluster and went to install Cilium, its containers went
into a crash loop, due to the Ubuntu cloud image kernel, which is specialized
for use with e.g. KVM, not having all Kernel modules available. For me,
the kernel installed was <code>linux-image-kvm</code>. The Cilium docs have a <a href="https://docs.cilium.io/en/stable/operations/system_requirements/#base-requirements">page detailing
the required kernel config</a>.
I initially thought: Those will be fulfilled by any decently current kernel.
But I was wrong, the <code>-kvm</code> variant of Ubuntu&rsquo;s kernel seems to be lacking
some of the configs.</p>
<p>To fix this, I then needed to switch to the <code>-generic</code> kernel. Naively, I
again thought: How difficult could it possibly be? And I just ran
<code>apt remove linux-image-5.15.0-1039-kvm</code>. That did not have the hoped for effect.
Instead, it tried to remove that image and then install <code>linux-image-unsigned-5.15.0-1039-kvm</code>.
Which would not have been too useful. Finally, I followed <a href="https://discuss.linuxcontainers.org/t/usb-passthrough-on-ubuntu-based-vms/12170">this tutorial</a>, but decided to install the <code>-generic</code>
kernel, instead of the <code>-virtual</code> one.</p>
<h2 id="installing-and-setting-up-cri-o">Installing and setting up CRI-O</h2>
<p>As noted above, <a href="https://cri-o.io/">cri-o</a> is my container interface of choice.
To install it, several additional APT repos need to be added. I will only show
the Ansible setup for one of them here. First, we need the public keys for the
repos, which I normally just download and then store in my Homelab repo. Before
storing the keys, you should pipe them through <code>gpg --dearmor</code>.</p>
<p>Setting up the keys then simply means copying them into the right directory,
where apt can find them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add libcontainers cri-o repo key</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/share/keyrings/libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0644</span>
</span></span></code></pre></div><p>Once that&rsquo;s done, we can set up the actual repo:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add libcontainers cri-o ubuntu repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.27/xUbuntu_22.04/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">libcontainers-crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">libcontainers_ubuntu_repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39;</span>
</span></span></code></pre></div><p>As you can see, I&rsquo;m only adding the specific Ubuntu repo if the distribution
actually is Ubuntu.
Please note that I&rsquo;ve only shown one additional repo, but there&rsquo;s another one
which needs to be added in a similar way.</p>
<p>Finally, we can install cri-o:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create cri-o config dir</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/crio</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;755&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install cri-o config file</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/crio/crio.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">crio.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;644&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-o</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-o-runc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">autostart cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span></code></pre></div><p>One weird thing which tripped me up in the beginning was that cri-o needs <code>runc</code>,
but it doesn&rsquo;t come with a dependency on <code>cri-o-runc</code>.</p>
<p>The config file I&rsquo;m using for cri-o is pretty simple, as <a href="https://github.com/cri-o/cri-o/blob/main/docs/crio.conf.5.md">the defaults</a> were mostly fine for me.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml"><span style="display:flex;"><span>[<span style="color:#a6e22e">crio</span>.<span style="color:#a6e22e">runtime</span>]
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">cgroup_manager</span> = <span style="color:#e6db74">&#34;systemd&#34;</span>
</span></span></code></pre></div><p>So at least for now, I just make sure that <code>systemd</code> is set as the cgroup
manager, as the CRI&rsquo;s and the kubelet&rsquo;s cgroup manager need to be the same.</p>
<h2 id="installing-the-kubernetes-tools">Installing the Kubernetes tools</h2>
<p>Finally, we need to install a couple of Kubernetes tools. First, similar to above,
we need to add the Kubernetes APT repo at <code>pkgs.k8s.io</code>. Then we can install
the three necessary Kubernetes tools:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install kubernetes tools</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>In addition to installing the tools, they should also be pinned to their
respective versions, as updating the Kubernetes tools cannot be done during a
random system package update:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubelet version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubeadm version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubectl version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">autostart kubelet</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>At the end, I&rsquo;m also making sure that the kubelet is auto-started.</p>
<h2 id="setting-up-kube-vip-as-a-load-balancer-for-the-control-plane">Setting up kube-vip as a load balancer for the control plane</h2>
<p>If you want a HA control plane with Kubernetes, you need a load balancer to
balance the Kubernetes API endpoint to the three control plane instances.</p>
<p>Luckily, you don&rsquo;t need to migrate your Homelab into a cloud to make this work.
Through some helpful comments on the Fediverse, I was pointed towards the
<a href="https://kube-vip.io/">kube-vip</a> app. It provides a virtual IP for the control
plane, notably the Kubernetes API server. In my setup, I ran it as a static
pod, as I liked the idea of tying it to the kubelet, instead of running it
standalone.</p>
<p>To do so, I put the following static pod config file into <code>/etc/kubernetes/manifests/kube-vip.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">creationTimestamp</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">manager</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_arp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bgp_enable</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;6443&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_cidr</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;32&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cp_enable</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">svc_enable</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cp_namespace</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_ddns</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leaderelection</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leasename</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">plndr-cp-lock</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leaseduration</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;5&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_renewdeadline</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;3&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_retryperiod</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">address</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">10.12.0.100</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">prometheus_server</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: :<span style="color:#ae81ff">2112</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/kube-vip/kube-vip:v0.6.2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">Always</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: {}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">NET_ADMIN</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">NET_RAW</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/kubernetes/admin.conf</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeconfig</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hostAliases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">hostnames</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ip</span>: <span style="color:#ae81ff">127.0.0.1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hostNetwork</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">hostPath</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/kubernetes/admin.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeconfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">status</span>: {}
</span></span></code></pre></div><p>Most of these options should be relatively clear. More information can be found
in the <a href="https://kube-vip.io/docs/installation/flags/">docs</a>.
Important to note are the
<code>vip_arp</code> and <code>bgp_enable</code> options. These configure how the address is made
known. Because I&rsquo;m definitely not a networking wizard, I went with the simpler
ARP based approach.</p>
<p>Also worth noting is that I disabled the <code>svc_enable</code> option, which can be
switched on to allow kube-vip to act as a <code>LoadBalancer</code> for Kubernetes services
of that type. To reduce initial complexity, I will be working with ClusterIP and
NodePort services for now and look at LoadBalancer type services later again,
including things like MetalLB.</p>
<p>The final and most important config is <code>address</code>. It determines which virtual IP
address kube-vip will advertise. In my case, I also added a DNS name for that IP
into my authoritative DNS server for easier access.</p>
<p>Kube-vip should be a static pod, so it can
run (more or less) outside Kubernetes. In my setup, this is necessary because
I will point <code>kubeadm</code> towards the virtual IP during the setup of the actual
cluster, so kube-vip needs to work before the cluster is actually up and
running.</p>
<h1 id="initializing-the-cluster">Initializing the cluster</h1>
<p>All preparations finally complete, it&rsquo;s time to get ourselves a Kubernetes
cluster. As I&rsquo;ve noted above, I&rsquo;m using vanilla k8s with <strong>kubeadm</strong>. There
are two ways to initialize the cluster and add additional nodes. First,
using the command line. Second, using a kubeadm init config file.
I will be going with the config file approach, to be able to put the
initialization under version control.</p>
<p>Generally speaking, command line flags and config files cannot be mixed at
the moment.</p>
<p>The documentation for the init config file can be found <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta4/#kubeadm-k8s-io-v1beta4-InitConfiguration">here</a>.</p>
<p>There is no default location for the config file, so I just put them alongside
all of the other Kubernetes configs under <code>/etc/kubernetes</code>.</p>
<p>And here is my init config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">InitConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">skipPhases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;addon/kube-proxy&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=ceph&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=controller&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_workers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=worker&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">networking</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podSubnet</span>: <span style="color:#e6db74">&#34;10.20.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceSubnet</span>: <span style="color:#e6db74">&#34;10.21.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controlPlaneEndpoint</span>: <span style="color:#e6db74">&#34;api.k8s.example.com:6443&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiServer</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeoutForControlPlane</span>: <span style="color:#ae81ff">4m0s</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">authorization-mode</span>: <span style="color:#e6db74">&#34;Node,RBAC&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controllerManager</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">allocate-node-cidrs</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">clusterName</span>: <span style="color:#e6db74">&#34;exp-cluster&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubelet.config.k8s.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">KubeletConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">cgroupDriver</span>: <span style="color:#e6db74">&#34;systemd&#34;</span>
</span></span></code></pre></div><p>First, you can see the defaults used without any flags or config files by running
the <code>kubeadm config print init-defaults</code>.</p>
<p>I&rsquo;m actually not diverging very much from the defaults here. As you can see with
the <code>if</code>s in the <code>kubeletExtraArgs</code>, I&rsquo;m using Ansible&rsquo;s templating engine here
to assign roles to nodes in Kubernetes node labels.
Furthermore, I&rsquo;m also disabling the <code>kube-proxy</code> initialization phase. This is
due to the fact that I will be using Cilium as my Container Networking plugin,
and it can provide the proxy functionality, mostly concerned with Kubernetes
service handling, already. So I don&rsquo;t want kubeadm to install <code>kube-proxy</code> on
the nodes.</p>
<p>For the cluster itself, I&rsquo;m also setting the service and pod CIDRs.</p>
<p><strong>Important note:</strong> I&rsquo;m using example values here, not my actual configs. If
you see any weird inconsistencies between IP addresses or DNS names, please yell
at me on <a href="https://social.mei-home.net/@mmeier">Mastodon</a>. &#x1f609;</p>
<p>The <code>allocate-node-cidrs</code> option for the controllerManager is recommended by
Cilium.</p>
<p>Last but not least, the Kubernetes docs recommend to set the cgroupDriver
explicitly, which I do in the <code>KubeletConfiguration</code>.</p>
<p>After this file has been defined, we can run the command to init the cluster on
the first control plane node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubeadm init --upload-certs --config /etc/kubernetes/kube-init-config.yaml
</span></span></code></pre></div><p>Noteworthy here is the <code>--upload-certs</code> flag. Without it, the initial certs
generated for the cluster will not be stored inside the cluster. As a consequence,
they won&rsquo;t be usable for subsequent additions of more control plane nodes and
would require the execution of another command to generate a new set and upload
those to be able to add additional control plane nodes. By default, those certs
only have a TTL of 24 hours. So if you plan to add more control planes past
that point, you can skip that flag for now.</p>
<p>After this first node has been initialized, the next step is to copy the admin
cert to your workstation for use with <code>kubectl</code>. You can find it under
<code>/etc/kubernetes/admin.conf</code>.
To use this file, copy it to <code>~/.kube/config</code>.
And now you should be able to run your first command against your newly inaugurated
Kubernetes cluster, e.g. <code>kubectl get all -n kube-system</code>.</p>
<p><strong>Security note:</strong> This file contains the private key which gives you full
access to the new Kubernetes cluster. Secure it appropriately. I will probably
do another post once I&rsquo;ve figured out what to do with it.</p>
<p>You will see a number of pods, mostly the Kubernetes control plane elements,
namely <code>etcd</code>, <code>kube-apiserver</code>, <code>kube-controller-manager</code> and <code>kube-scheduler</code>.
In addition, there should be a <code>kube-vip</code> instance. If the <code>kubectl get</code> command
fails, first check whether <code>kube-vip</code> starts up correctly.
The kubectl config file we copied from the initial cluster node to the
workstation contains the address entered under the <code>controlPlaneEndpoint</code> in
the kubeadm init config above. In my setup, that&rsquo;s a DNS entry which points to
the virtual IP managed by kube-vip.</p>
<p>You will also see that the <code>coredns</code> pods are currently still in <code>PENDING</code> state.
That&rsquo;s because CoreDNS, the default Kubernetes internal DNS server, only starts
up when a Container Networking Interface plugin has been installed. In our case
that&rsquo;s Cilium.</p>
<h2 id="installing-cilium">Installing Cilium</h2>
<p>As I&rsquo;ve <a href="#preparing-the-machines">detailed the necessary preparations above</a>, the only thing left to
install Cilium is to run the install command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cilium install --set ipam.mode<span style="color:#f92672">=</span>cluster-pool --set ipam.operator.clusterPoolIPv4PodCIDRList<span style="color:#f92672">=</span>10.20.0.0/16 --set kubeProxyReplacement<span style="color:#f92672">=</span>true --version 1.14.1 --set encryption.enabled<span style="color:#f92672">=</span>true --set encryption.type<span style="color:#f92672">=</span>wireguard
</span></span></code></pre></div><p>This command should be run on your workstation. Cilium will automatically
use the kubectl config file in <code>~/.kube/config</code> to contact the cluster and
install itself.</p>
<p>The <code>clusterPoolIPv4PodCIDRList</code> is important here. Because while we already set
the Pod address CIDR in the kubeadm init config file above, Cilium does not seem
to have access to that and instead uses its internal default.
In addition, I&rsquo;m telling Cilium here that it should act as a replacement for
<code>kube-proxy</code>. Finally, I&rsquo;m enabling pod-to-pod encryption with WireGuard. This
way, I don&rsquo;t have to care about encrypting traffic between pods myself, e.g.
by configuring all my services to use TLS.</p>
<p>If the install command fails and the Cilium pods do not come up, check to make
sure that the preconditions I noted above are all fulfilled.
You should now see a single <code>cilium-operator</code> and a <code>cilium</code> pod in Running
state when you execute <code>kubectl get pods -n kube-system</code>. Furthermore, the
CoreDNS pod should now also be in the Running state.</p>
<p>You can check whether everything went alright by executing <code>cilium status</code> on
your workstation.</p>
<h1 id="joining-remaining-nodes-to-the-cluster">Joining remaining nodes to the cluster</h1>
<p>For joining additional nodes, I went with a similar approach as for the cluster
init, using a <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-JoinConfiguration">join configuration file</a>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">JoinConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=ceph&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=controller&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_workers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=worker&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">discovery</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrapToken</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">token</span>: <span style="color:#ae81ff">Token here</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">apiServerEndpoint</span>: <span style="color:#ae81ff">api.k8s.example.com:6443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caCertHashes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;Cert Hash here&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">certificateKey</span>: <span style="color:#e6db74">&#34;Cert key here&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span></code></pre></div><p>This file is a bit more unwieldy than the init config, because it also needs
to contain some secrets. This wouldn&rsquo;t be a problem if those secrets were
permanent, I could just store them in Vault. But they are pretty short lived,
so storing them and templating them into the file during Ansible deployments
doesn&rsquo;t really work. So I just input them into the file without committing the
result to git.</p>
<p>When you run the <code>kubeadm init</code> command, the output will look something like
this, provided you supply the <code>--upload-certs</code> flag:</p>
<pre tabindex="0"><code>You can now join any number of control-plane node by running the following command on each as a root:
    kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07
</code></pre><p>The important part you don&rsquo;t get without the <code>--upload-certs</code> flag is the
<code>--certificate-key</code>. This is required for new control plane nodes.
The values in this message fit into the <code>JoinConfiguration</code> as follows:</p>
<ul>
<li><strong>discovery.bootstrapToken.token:</strong> <code>--token</code> value</li>
<li><strong>discovery.bootstrapToken.caCertHashes:</strong> <code>--discovery-token-ca-cert-hash</code> value</li>
<li><strong>controlPlane:</strong> <code>--certificate-key</code> value</li>
</ul>
<p>A fully rendered version of the <code>JoinConfiguration</code> file above would look like
this, using the values from the <code>kubeadm init</code> example output:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">JoinConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=controller&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">discovery</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrapToken</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">token</span>: <span style="color:#e6db74">&#34;9vr73a.a8uxyaju799qwdjv&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">apiServerEndpoint</span>: <span style="color:#ae81ff">192.168.0.200</span>:<span style="color:#ae81ff">6443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caCertHashes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">certificateKey</span>: <span style="color:#e6db74">&#34;f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07&#34;</span>
</span></span></code></pre></div><p>If some time has passed since running the <code>kubeadm init</code> command, the bootstrap
token and cert will have expired. You can recreate them by running the following
command on a control plane node which has already been initialized:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubeadm init phase upload-certs --upload-certs
</span></span></code></pre></div><p>The output created will be similar to this:</p>
<pre tabindex="0"><code>[upload-certs] Storing the certificates in Secret &#34;kubeadm-certs&#34; in the &#34;kube-system&#34; Namespace
[upload-certs] Using certificate key:
supersecretcertkey
</code></pre><p>The last line is the new value for <code>certificateKey</code>. The next step is generating
a fresh bootstrap token, as that is invalidated after 24 hours:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubeadm token create --certificate-key supersecretcertkey --print-join-command
</span></span></code></pre></div><p>This will create a fresh join command you can use to join additional control
nodes to the cluster or enter into your <code>JoinConfiguration</code> file.</p>
<p>Finally, additional worker nodes can be joined in a similar manner. Simply remove
the following lines from the <code>JoinConfiguration</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">controlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">certificateKey</span>: <span style="color:#e6db74">&#34;Cert key here&#34;</span>
</span></span></code></pre></div><h1 id="deleting-a-cluster">Deleting a cluster</h1>
<p>As I had to kill the cluster multiple times but did not want to completely
reinstall the nodes every time, I also researched how to remove a cluster.
The steps are as follows:</p>
<ul>
<li>First remove the CNI plugin, with <code>cilium uninstall</code> in my case</li>
<li>Starting with the worker nodes, execute the following commands on each node:
<ol>
<li><code>kubeadm reset</code></li>
<li><code>rm -fr /etc/cni</code></li>
<li>Reboot the machine (this is for undoing the networking changes of the CNI plugin)</li>
</ol>
</li>
</ul>
<p>It is important to note the order here, always start with the worker nodes before
removing the control plane nodes.</p>
<h1 id="final-thoughts">Final thoughts</h1>
<p>First of all: Yay, Kubernetes Cluster! &#x1f973;</p>
<p>This was a pretty vexing process. The research phase, before I set up a cluster
for the first time, was considerably longer than for my current Nomad/Consul/Vault
cluster. And I feel that that&rsquo;s mostly due to the differences in the documentation.
HashiCorp&rsquo;s docs, especially their tutorials, for all three tools, are top notch.</p>
<p>Sure, if you follow the instructions in the docs for Kubernetes and Cilium, you
will relatively reliably end up with a working cluster. But it just feels like
there are a lot more moving parts. And some decisions you need to make up front,
like choosing a CNI plugin and a container engine.</p>
<p>Don&rsquo;t misunderstand me,
having that choice is great. As I mentioned above, I&rsquo;m a fan of apps that don&rsquo;t
have opinions on everything, so I can make choices for myself.
But in HashiCorp&rsquo;s Nomad, I can also do that. I even have greater choice, because
I can decide per workload which container engine I want to use, and which
networking plugin I want to use.</p>
<p>On the bright side, at least for now I have not seen anything I would consider
a show stopper for my migration to Kubernetes. As this article was a bit longer
in the making, I&rsquo;ve just finished setting up Traefik as ingress a couple of days
ago, and I&rsquo;m working on setting up a Ceph Rook cluster now. Let&rsquo;s see how this
continues. :slight_smile:</p>
<p>Last but not least a comment mostly to myself: Write setup articles more closely
to the actual setup happening. I&rsquo;m writing a lot of this over a month after I
issued the (currently &#x1f607;) final <code>kubeadm init</code>. I&rsquo;d made some notes on
important things, but I had not thought of copying the outputs from the
<code>kubeadm init</code> or <code>kubeadm join</code> commands to show how they&rsquo;re supposed to look
like. I also did not think of making a couple of notes on the initial output of
some <code>kubectl get</code> commands during the setup phase to show what to expect,
which I think might have been nice.</p>
<p>The next article in the series will be about day 1 operations, writing about how
I plan to handle Kubernetes manifests for actual workloads in my setup.</p>
]]></content:encoded>
    </item>
    <item>
      <title>KubeExp: Putting the &#39;lab&#39; back in &#39;Homelab&#39;</title>
      <link>https://blog.mei-home.net/posts/kubernetes-lab-setup/</link>
      <pubDate>Sun, 27 Aug 2023 16:50:42 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-lab-setup/</guid>
      <description>How I set up my lab environment for my Kubernetes experiment</description>
      <content:encoded><![CDATA[<p>So, as I mentioned in my <a href="https://blog.mei-home.net/posts/hashipocalypse/">last article</a>,
I want to give Kubernetes another try after HashiCorp&rsquo;s recent license change.</p>
<p>This also gives me a chance to put the <em>lab</em> back in Home<em>lab</em>, as it has mostly
been a Home<em>prod</em> environment - not much experimentation going on there, just
slow, intentional incremental changes here and there. But my Homeprod env is not
really suited for housing a Kubernetes cluster. It mostly consists of Raspberry
Pis. Don&rsquo;t get me wrong, it is serving me well - but running two parallel clusters
with two different orchestrators on the same HW is probably not a good idea. &#x1f609;</p>
<p>So I decided to dig out my old Homeserver from the days when my Homelab was only
a single machine. It is an x86 machine, with an Intel i7 10th gen CPU, 2 500GB
SSDs and 64 GB of RAM that&rsquo;s been gathering dust in storage since I decommissioned
it sometime in the spring this year. It still had all the innards from when I
decommissioned it, and after a quick once-over to make sure I hadn&rsquo;t unplugged
any important cables, I was able to boot it right up.</p>
<p>The first thing to go was the old Arch Linux install, as it&rsquo;s wholly unsuitable
to what I needed to be doing.
Instead, the machine got an Ubuntu install. Which was the first hurdle. Not the
actual install, but rather the setup of the damn stick for it. Because I didn&rsquo;t
want to connect a monitor and keyboard, I wanted to do a headless install.</p>
<p>And the Ubuntu installer even sets up an SSH server. But the password to log in
is randomized and - you guessed it - only shown on-screen after bootup.
Which I find an interesting decision. Of course, that&rsquo;s done for security
reasons - having default installer passwords is frowned upon these days.</p>
<p>Next, I thought I could just unpack the ISO, set a password for the <code>installer</code>
user and repackage it. I even found a couple of guides to do so, but I was not
able to properly repackage the changed ISO to be bootable. So I finally gave up
and just connected a monitor and keyboard.</p>
<h2 id="baremetal-host-setup">Baremetal host setup</h2>
<p>For the actual cluster machines, I went with LXD. For the simple reason that I
used it before I moved to a fleet of baremetal Raspberry Pis and had some good
experience with it.</p>
<p>For storage, a local 70GB disk serves as the root disk. The remaining 400 GB of
that SSD became an LVM volume group to serve as an LXD storage pool:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create LXD storage volume group</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">storage</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">community.general.lvg</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">vg</span>: <span style="color:#ae81ff">vg-lxd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">pvs</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/dev/sdb3</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span></code></pre></div><p>One more important change needed for the VM host is the creation of a bridge
interface so that the VMs can communicate to the host and to my wider network.
A Linux bridge interface is rather similar to a software network switch.
I set it up with Ubuntu&rsquo;s netplan file. One very important point: You need to
disable DHCP on your main interface if it is part of the bridge, and instead
enable DHCP for the bridge interface. Otherwise, if you disable it for the
bridge but leave it enabled for the physical interface, you get DHCP DISCOVER
requests, but it won&rsquo;t actually answer to the DHCP OFFER send from your
networking infrastructure. It says so, plain and clear, in the <a href="https://netplan.readthedocs.io/en/stable/examples/#how-to-configure-network-bridges">netplan docs</a>
(why exactly does a Google search for &ldquo;netplan bridge&rdquo; not contain the netplan
docs on the very first page?!). So guess who had to pick the server up from its
corner and connect a keyboard and mouse again because he thought he is smarter
than the docs? :unamused_face:</p>
<h2 id="the-vm-os">The VM OS</h2>
<p>All of the setup finally done, the next decision was on which OS to use for the
VMs. At first, I was a bit fascinated with <a href="https://www.talos.dev/">Talos Linux</a>,
which bills itself as a Linux for Kubernetes. It follows the new &ldquo;immutable distro&rdquo;
paradigm, and I had not yet dipped my toes into that particular topic - so time
to make this a double experiment? Alas, no. It looks like
Talos isn&rsquo;t just a distro which is &ldquo;good for Kubernetes&rdquo;, but also believes it
knows better than I do. Namely, it disables SSH access. Completely. You don&rsquo;t
really need shell access, you know? In fact, it&rsquo;s bad for you.</p>
<p>Let&rsquo;s clean up one myth right away: This is certainly not for security reasons.
Because they still have an API with which you can supposedly do everything. So
we have replaced a decades old project, OpenSSH, which had audits up the wazoo,
with a hip new API. Yeah. Sure. I definitely trust your API way more than OpenSSH&hellip;</p>
<p>Another argument I heard and found more believable than security is saving Ops
teams from themselves, by way of removing the temptation of SSH&rsquo;ing into a machine
to fix a problem, instead of say going through the GitOps process, including code
reviews and everything, to fix a problem. That one I buy a lot more readily.
Although I will say: I&rsquo;ve been working in ops for a while now, and I have been
very happy to have access to the actual machines for debugging purposes. Because
sometimes, you just need to attach strace to random processes.</p>
<p>Apart from that particular piece of opinionated design, it also has an admittedly
bigger problem when it comes to my goal of experimenting with Kubernetes: It
provides its own ability to set up a Kubernetes cluster, and automates a bit
too much, at least for my initial, experimental cluster. So I&rsquo;ve put it off for
now, and might set up another experiment once I&rsquo;ve become a bit more familiar
with Kubernetes.</p>
<p>On the positive side, it has support for Raspberry Pis, so at least that&rsquo;s
not a blocker.</p>
<p>I ended up going with what I already knew: Ubuntu, which I also run on all the
other machines in my Homelab.</p>
<h2 id="lxd-setup">LXD setup</h2>
<p>To setup the VMs, I decided to go with Terraform, because it allows me to store
the setup in config files, instead of having a playbook with a series of LXD
commands. I am using the <a href="https://registry.terraform.io/providers/terraform-lxd/lxd/latest">terraform-lxd provider</a>.</p>
<p>To initialize the provider, I first had to introduce it to my Terraform main
config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#a6e22e">terraform</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">required_providers</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">lxd</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">source</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;terraform-lxd/lxd&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">version</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;~&gt; 1.10.1&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">data</span> <span style="color:#e6db74">&#34;vault_generic_secret&#34;</span> <span style="color:#e6db74">&#34;lxd-pw&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">path</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secret/lxd-pw&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;lxd&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">generate_client_certificates</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">accept_remote_certificate</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">lxd_remote</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">scheme</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">address</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-fqdn-here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">default</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">password</span> <span style="color:#f92672">=</span> data.<span style="color:#a6e22e">vault_generic_secret</span>.<span style="color:#a6e22e">lxd</span><span style="color:#f92672">-</span><span style="color:#a6e22e">pw</span>[<span style="color:#e6db74">&#34;pw&#34;</span>]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>To get at the password, I&rsquo;m using my Vault instance again, where I pushed the
secret with <code>vault kv put secret/lxd-pw pw=-</code>. This is a bit of an anti-pattern,
as it ends up storing the password in the Terraform state. But I&rsquo;ve come to
accept that sometimes, this happens. My state is pretty well secured. But keep
this in mind when following along - your Terraform state should be kept secure!</p>
<p>Next step is configuring the LVM based LXD storage pool I mentioned above. This
is also done in Terraform:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_storage_pool&#34;</span> <span style="color:#e6db74">&#34;lvm-pool&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lvm-pool&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">driver</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lvm&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">source</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vg-lxd&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;lvm.thinpool_name&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;LXDThinPool&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;lvm.vg_name&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vg-lxd&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Next a couple of profiles, one for my controller nodes, with 4 CPUs and 4 GB of
RAM, somewhat similar to the Raspberry Pi 4 4 GB which will ultimately run my
control plane. Another one for my Ceph nodes with a bit more RAM, and finally
some base profile for networking, which adds a NIC based on the bridge interface
created previously. And then another profile for VMs with a local root disk
from the LVM pool.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-base&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;base&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;boot.autostart&#34;</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;cloud-init.vendor-data&#34;</span> <span style="color:#f92672">=</span> file(<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">path</span>.module<span style="color:#e6db74">}</span><span style="color:#e6db74">/lxd/vendor-data.yaml&#34;</span>)
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">device</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;network&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;nic&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">properties</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">nictype</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridged&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">parent</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;your-bridge-interface-name&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-localdisk&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;localdisk&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">device</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">properties</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">pool</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_storage_pool</span>.<span style="color:#a6e22e">lvm</span><span style="color:#f92672">-</span><span style="color:#a6e22e">pool</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">size</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;50GB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">path</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-controller&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.cpu&#34;</span> <span style="color:#f92672">=</span> <span style="color:#ae81ff">4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.memory&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;4GB&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-ceph&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.cpu&#34;</span> <span style="color:#f92672">=</span> <span style="color:#ae81ff">4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.memory&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;8GB&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>With all of that created, I only needed the VMs themselves. But there was one
problem: In the rest of my (baremetal) Homelab, I&rsquo;m producing disk images with
HashiCorp&rsquo;s Packer, with my Ansible user and some other bits and pieces already
baked in. But now, I needed another way to bake in the Ansible user, as the goal
here is to learn Kubernetes - not LXD image creation. I didn&rsquo;t really want yet
another yak to shave.</p>
<h2 id="vm-images">VM images</h2>
<p>As noted above, I had already decided to go for Ubuntu as my base OS for the VMs.
And the Ubuntu LXD images support <a href="https://cloudinit.readthedocs.io/en/latest/">cloud-init</a>,
and so does <a href="https://documentation.ubuntu.com/lxd/en/latest/cloud-init/">LXD</a>.</p>
<p>After some digging, I found that I could relatively easily create my Ansible
user and provide the SSH key for it. I could also adapt the sudoers files as
I needed to make it all work.</p>
<p>But there was one problem remaining: I want my Ansible user to require a password
for sudo. But I did not want to have my Ansible user&rsquo;s password in the Terraform
state, let alone just plainly written out in the Terraform config file. So what
to do?
In the end, the only thing I could come up with was to instead set a temporary
password for my Ansible user, and run a short bootstrapping playbook to change
it to the actual password. It does not feel very elegant, but keeps my user&rsquo;s
sudo password out of the Terraform state and configs.</p>
<p>This can all be achieved with cloud-init. My <code>profile-base</code> LXD profile adds
the required cloud-config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;cloud-init.vendor-data&#34;</span> <span style="color:#f92672">=</span> file(<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">path</span>.module<span style="color:#e6db74">}</span><span style="color:#e6db74">/lxd/vendor-data.yaml&#34;</span>)
</span></span><span style="display:flex;"><span>  }
</span></span></code></pre></div><p>LXD&rsquo;s <a href="https://documentation.ubuntu.com/lxd/en/latest/cloud-init/#vendor-data-and-user-data">cloud-init.vendor-data</a>
config option is used here. The <code>cloud-init</code> config file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e">#cloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">your-ansible-user</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">sudo</span>: <span style="color:#ae81ff">ALL=(ALL:ALL) ALL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ssh_authorized_keys</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">from=&#34;1.2.3.4&#34; ssh-ed25519 abcdef12345 ssh-identifier</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">shell</span>: <span style="color:#ae81ff">/bin/bash</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">packages</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">sudo</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">python3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">chpasswd</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">expire</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">your-ansible-user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>: <span style="color:#ae81ff">your-temporary-password</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">text</span>
</span></span></code></pre></div><p>This first creates the <code>your-ansible-user</code> user, with an appropriate <code>SUDOERS</code>
entry. It also adds an SSH key, allowing access only from a single machine,
which in my case is a dedicated Command &amp; Control host. I also add the <code>python3</code> and <code>sudo</code>
packages, which are required by Ansible.
Finally, I set the password for <code>your-ansible-user</code> to a pretty simple value
which I had no problem with committing to git.</p>
<p>The experience with how well this worked also has me thinking about revamping
my Netbooting setup. At the moment, I&rsquo;m generating one image per host, even
though most things are the same among all hosts, and I could just have two base
images (one amd64, one arm64) and then do the necessary per-host adaptions by
running a cloud-init server in my network.</p>
<h2 id="creating-the-vms">Creating the VMs</h2>
<p>The last part of the setup is creating the VMs themselves:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_container&#34;</span> <span style="color:#e6db74">&#34;ceph-vm-1&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vm-name&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtual-machine&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">image</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu:22.04&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">profiles</span> <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_profile</span>.<span style="color:#a6e22e">profile</span><span style="color:#f92672">-</span><span style="color:#a6e22e">base</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_profile</span>.<span style="color:#a6e22e">profile</span><span style="color:#f92672">-</span><span style="color:#a6e22e">localdisk</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_profile</span>.<span style="color:#a6e22e">profile</span><span style="color:#f92672">-</span><span style="color:#a6e22e">ceph</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">start_container</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">device</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;cephdisk&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">properties</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">source</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/dev/sda2&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">path</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/dev/cephdisk&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;cloud-init.user-data&#34;</span> <span style="color:#f92672">=</span> <span style="color:#f92672">&lt;&lt;-EOT</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      #cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      hostname: vm-name
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    </span><span style="color:#f92672">EOT</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This resource remotely contacts the LXD server and creates a new VM. Don&rsquo;t get
confused by the <code>lxd_container</code> resource type, this is simply the resource type
shared between LXD containers and VMs, where the <code>type</code> determines what&rsquo;s really
created.
In the <code>config</code> section, I&rsquo;m explicitly setting the <code>hostname</code> of the new machine
with the cloud-init user-data config option.
By default, the hostname is the same as the LXD VM name, which would be the <code>name</code>
field in Terraform. But as I sometimes have the habit of naming my VMs something
else than their actual hostname, I provided it explicitly here.</p>
<p><em>One very important point:</em> The <code>#cloud-init</code> at the top of the file is <strong>not</strong>
a comment - it is part of the cloud init spec. It has to be there. Took me a
while to realize that&hellip;</p>
<p>The above example is one of the two VMs which will end up serving as Ceph Rook
hosts, so it also gets handed another disk for later use by Ceph.</p>
<p>And that&rsquo;s it. After a final <code>terraform apply</code>, I&rsquo;ve finally got a Home<em>lab</em> again.</p>
<p>Over the last week, I have been researching Kubernetes and cluster setups. I&rsquo;ve
got a couple of notes on the topic and will likely write another blog post with
all the prep work rather soon. If I&rsquo;m really lucky I might finally be ready
to issue the <code>kubeadm init</code> command later today. &#x1f609;</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
